Skip to content
/ mvbin Public

A script for generating multivariate and correlated binary data.

Notifications You must be signed in to change notification settings

shz9/mvbin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Multivariate binary data in python

This script generates multivariate and correlated binary data using the procedure outlined in

On the Generation of Correlated Artificial Binary Data
Friedrich Leisch, Andreas Weingessel, Kurt Hornik (1998)

and implemented in the R package bindata.


Replicating Example 1 in the original paper:

from mvbin import mvbin
import numpy as np

# The joint probability matrix:
joint_prob = np.array([[0.2, 0.05, 0.15],
                       [0.05, 0.5, 0.45],
                       [0.15, 0.45, 0.8]])
p = np.diag(joint_prob)

# Population correlation matrix::
corr = np.array([[1., -0.25, -0.0625],
                 [-0.25, 1., 0.25],
                 [-0.0625, 0.250, 1.]])

# Sample:
sample = mvbin(p=np.diag(joint_prob),
               joint_prob=joint_prob,
               size=10000)

# Sample correlation:
print(np.corrcoef(sample, rowvar=False))

Which gives us the following sample correlation:

[[ 1.         -0.25164281 -0.06168207]
 [-0.25164281  1.          0.25074679]
 [-0.06168207  0.25074679  1.        ]]

About

A script for generating multivariate and correlated binary data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages