# Introduction to SciPy for A/B Testing

`SciPy` is a Python library used for scientific and technical computing.  To use `SciPy`, you must ALWAYS first import the package at the beginning of your Python script.

In [1]:
from scipy.stats import chi2_contingency

We are only using `SciPy` for A/B testing.  Therefore, we only need to import **chi2_contingency** from the `stats` module. This is what we use to get the p-value of a Chi Square Test.  Actually, a call to **chi2_contingency** returns:
- *Chi2* = the test statistic
- *pvalue* = the p-value of the test
- *dof* = the degrees of freedom
- *expected* = the expected frequencies, based on the marginal sums of the table

For example, after analyzing purchase results of visitors for website versions A and B we get: 
<br>
Option A: purchased = 183, not purchased = 1483
<br>
Option B: purchased = 316, not purchased 1350
<br><br>
Assume our metric was the percentage of purchases made on website version A versus B and that we predicted website version B would produce a higher percentage in purchases than website version A.  Here is our analysis of experiment results:

In [8]:
option_A = 183/(183+1483)
option_B = 316/(316+1350)
print(option_A)
print(option_B)

0.10984393757503001
0.18967587034813926


From the above, we see that the experiment showed a higher percentage of purchases (19%) made from website version B, but are people really purchasing more because of the tweaked website version or are these results just by random chance? 
<br><br>
We decide on a significance level of 95% and then run a Chi Square Test.

To run a Chi Square Test, we must use the purchased/not purchased data to create a contingency table for input into **chi2_contingency( )**.

In [2]:
contingency = [[183,1483],[316,1350]]

In [3]:
chi2,pvalue,dof,expected = chi2_contingency(contingency)
print(pvalue)

1.47008361057e-10


Notice, we had to give each output of the chi2_contingency( ) method a name.  This way we could reference the value we want.  In this case, we need to print only the *pvalue*. 
<br><br>
The pvalue = 0.000000000147 IS LESS THAN 0.05, which means the percentage of purchases were significantly higher.  Therefore, we are 95% confident we can choose website version B over version A.

# More A/B Testing

You work at a bank.  Currently, no SMS text reminders regarding loan payments are sent to its customers.  You think that a fewer percentage of customers will default on loans after receiving SMS text reminders than those who do not receive SMS text reminders.  You run an experiment on each option and get the following results:
<br><br>
*no_reminder*: defaulted = 1005, not defaulted = 3294
<br>
*reminder*: defaulted = 896, not defaulted = 3403
<br><br>
Is a fewer percentage of people defaulting after receiving SMS text reminders at a 95% confidence level?

In [9]:
no_reminder = 1005/(1005+3294)
reminder = 896/(896+3403)
print(no_reminder)
print(reminder)

0.23377529658060014
0.20842056292160968


From these results you see that a fewer percentage of people (21%) defaulted on loans after receiving SMS text reminders, but are these results significant or merely by chance? 
<br><br>
You run a Chi Square Test with a significance level of 95%.

In [10]:
contingency = [[1005,3294], [896,3403]]
chi2,pvalue,dof,expected = chi2_contingency(contingency)
print(pvalue)

0.00500565771251


*pvalue* = 0.005 IS LESS THAN 0.05 which means the percentage of defaults were significantly fewer.  Therefore, you are 95% confident you should begin sending all our customers SMS text reminders regarding their loan payments.

### Exercise

You work at a social media company that gets revenue by the number of ad clicks it gets from its users.  Currently there is no way to share videos on your platform.  You think that you will have a higher percentage of ad clicks on your platform if you added the ability for a user to share his videos.  You run an experiment on each option for users of your site and get the following results:
<br><br>
no_videos: clicks = 100, no clicks = 300
<br>
videos: clicks = 150, no clicks = 250
<br><br>
With a 95% significance level, should the ability to share videos be added to your platform?  Explain in the cell below your analysis.