In [9]:
import numpy as np
from scipy import stats

In [10]:
class BookSCM:
    def __init__(self, random_seed=None):
        self.random_seed = random_seed
        self.u_0 = stats.uniform()
        self.u_1 = stats.norm()
    def sample(self, sample_size=100):
        """Samples from the SCM"""
        if self.random_seed:
            np.random.seed(self.random_seed)
        u_0 = self.u_0.rvs(sample_size)
        u_1 = self.u_1.rvs(sample_size)
        a = u_0 > .61
        b = (a + .5 * u_1) > .2
        return a, b

First, let’s instantiate our SCM and set the random seed to 45

In [11]:
scm = BookSCM(random_seed=45)

Next, let’s sample 100 samples from it:

In [12]:
buy_book_a, buy_book_b = scm.sample(100)

Let’s check whether the shapes are as expected:

In [13]:
buy_book_a.shape, buy_book_b.shape

((100,), (100,))

## Now to answer the question

We generated the data, and we’re now ready to answer the question that we posed at the beginning of this section – what is the probability that a person will buy book A, given that they bought book B?

In [19]:
proba_book_a_given_book_b = buy_book_a[buy_book_b].sum() / buy_book_b.sum()
print(f'Probability of buying book A given B: {proba_book_a_given_book_b:0.3f}')

Probability of buying book A given B: 0.638


As we can see, the probability of buying book A, given we bought book B, is 63.8%. This indicates a positive relationship between both variables (if there was no association between them, we would expect the result to be 50%). These results inform us that we can make meaningful predictions using observational data alone. This ability is the essence of most contemporary (supervised) machine learning models.