# Download the data and prepare it for analysis

## Project description
As an analyst at a big online store. Together with the marketing department, we compiled a list of hypotheses that may help boost revenue.
We need to prioritize these hypotheses, launch an A/B test, and analyze the results.

## Import

In [222]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats as st
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats as st
import datetime as dt

%matplotlib inline

## Load data

In [223]:
try:
    hypotheses = pd.read_csv('hypotheses_us.csv', sep=';')
except:
    hypotheses = pd.read_csv('/datasets/hypotheses_us.csv', sep=';')

try:
    orders = pd.read_csv('orders_us.csv', sep=',')
except:
    orders = pd.read_csv('/datasets/orders_us.csv', sep=',')
    
try:
    visits = pd.read_csv('visits_us.csv', sep=',')
except:
    visits = pd.read_csv('/datasets/visits_us.csv', sep=',')

## Prepare data for analysis

### hypotheses

- Hypotheses — brief descriptions of the hypotheses
- Reach — user reach, on a scale of one to ten
- Impact — impact on users, on a scale of one to ten
- Confidence — confidence in the hypothesis, on a scale of one to ten
- Effort — the resources required to test a hypothesis, on a scale of one to ten. The higher the Effort value, the more resource-intensive the test.

In [224]:
hypotheses

Unnamed: 0,Hypothesis,Reach,Impact,Confidence,Effort
0,Add two new channels for attracting traffic. T...,3,10,8,6
1,Launch your own delivery service. This will sh...,2,5,4,10
2,Add product recommendation blocks to the store...,8,3,7,3
3,Change the category structure. This will incre...,8,3,3,8
4,Change the background color on the main page. ...,3,1,1,1
5,Add a customer review page. This will increase...,3,2,2,3
6,Show banners with current offers and sales on ...,5,3,8,3
7,Add a subscription form to all the main pages....,10,7,8,5
8,Launch a promotion that gives users discounts ...,1,9,9,5


In [225]:
# rename Pandas columns to lower case
hypotheses.columns= hypotheses.columns.str.lower()
hypotheses.columns

Index(['hypothesis', 'reach', 'impact', 'confidence', 'effort'], dtype='object')

Print full text from hypothesis columns

In [226]:
for row in range(hypotheses.shape[0]):
    print(hypotheses['hypothesis'][row])

Add two new channels for attracting traffic. This will bring 30% more users
Launch your own delivery service. This will shorten delivery time
Add product recommendation blocks to the store's site. This will increase conversion and average purchase size
Change the category structure. This will increase conversion since users will find the products they want more quickly
Change the background color on the main page. This will increase user engagement
Add a customer review page. This will increase the number of orders
Show banners with current offers and sales on the main page. This will boost conversion
Add a subscription form to all the main pages. This will help you compile a mailing list
Launch a promotion that gives users discounts on their birthdays


hypotheses['hypothesis'][0]

In [227]:
hypotheses.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hypothesis  9 non-null      object
 1   reach       9 non-null      int64 
 2   impact      9 non-null      int64 
 3   confidence  9 non-null      int64 
 4   effort      9 non-null      int64 
dtypes: int64(4), object(1)
memory usage: 488.0+ bytes


looks ok

### orders

- transactionId — order identifier
- visitorId — identifier of the user who placed the order
- date — of the order
- revenue — from the order
- group — the A/B test group that the user belongs to

In [228]:
orders.sample(3)

Unnamed: 0,transactionId,visitorId,date,revenue,group
622,2148936881,654930236,2019-08-29,300.6,B
568,4021746511,113298937,2019-08-09,425.6,B
225,2931845376,2954449915,2019-08-17,170.0,A


In [229]:
orders

Unnamed: 0,transactionId,visitorId,date,revenue,group
0,3667963787,3312258926,2019-08-15,30.4,B
1,2804400009,3642806036,2019-08-15,15.2,B
2,2961555356,4069496402,2019-08-15,10.2,A
3,3797467345,1196621759,2019-08-15,155.1,B
4,2282983706,2322279887,2019-08-15,40.5,B
...,...,...,...,...,...
1192,2662137336,3733762160,2019-08-14,100.8,B
1193,2203539145,370388673,2019-08-14,50.1,A
1194,1807773912,573423106,2019-08-14,165.3,A
1195,1947021204,1614305549,2019-08-14,5.5,A


In [230]:
# change columns name to lower case with underscore
orders.rename(columns={'transactionId': 'transaction_id', 'visitorId': 'visitor_id'}, inplace=True)

In [231]:
orders.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1197 entries, 0 to 1196
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   transaction_id  1197 non-null   int64  
 1   visitor_id      1197 non-null   int64  
 2   date            1197 non-null   object 
 3   revenue         1197 non-null   float64
 4   group           1197 non-null   object 
dtypes: float64(1), int64(2), object(2)
memory usage: 46.9+ KB


In [232]:
# change date column to datetime format
orders['date'] = orders['date'].map(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))

Check if the transaction_id are unique

In [233]:
orders['transaction_id'].value_counts()

3667963787    1
4167421281    1
2875229767    1
504373597     1
2017680897    1
             ..
1853288890    1
2084190534    1
2956594374    1
3646798055    1
3936777065    1
Name: transaction_id, Length: 1197, dtype: int64

Good! this is unique as we expected

Check if there are users who where in both groups

In [234]:
orders['visitor_id'].value_counts().head()

4256040402    11
2458001652    11
2378935119     9
2038680547     8
199603092      5
Name: visitor_id, dtype: int64

We will leave the 1st entry from user

In [235]:
orders_min = orders.groupby(['visitor_id']).first().reset_index()
# orders_min.columns=['visitor_id','transaction_id','date']
orders_min

Unnamed: 0,visitor_id,transaction_id,date,revenue,group
0,5114589,900025958,2019-08-16,10.8,B
1,6958315,1887336629,2019-08-04,25.9,B
2,8300375,3679129301,2019-08-01,165.7,B
3,11685486,797272237,2019-08-23,100.4,A
4,39475350,437876380,2019-08-08,65.4,B
...,...,...,...,...,...
1026,4266935830,1811671147,2019-08-29,1220.2,A
1027,4278982564,51443164,2019-08-05,385.7,A
1028,4279090005,2724331495,2019-08-04,105.3,B
1029,4281247801,3796325351,2019-08-25,45.6,B


Let's test

In [236]:
orders_min['visitor_id'].value_counts().head()

5114589       1
2666467773    1
2785527055    1
2801638130    1
2803209166    1
Name: visitor_id, dtype: int64

Good. Now lets verify we didn't distort the distribution between groups and b after removing values

In [237]:
df = orders_min['group'].value_counts().reset_index()
fig = px.pie(df, values='group', names='index')
fig.show()



In [238]:
df = orders['group'].value_counts().reset_index()
fig = px.pie(df, values='group', names='index')
fig.show()

looks ok

Now lets replace orders_min by the name orders and new df as orders_raw that will store the data without filtering

In [239]:
orders_raw = orders
orders = orders_min

### visits

- date — date
- group — A/B test group
- visits — the number of visits on the date specified in the A/B test group specified

In [240]:
visits

Unnamed: 0,date,group,visits
0,2019-08-01,A,719
1,2019-08-02,A,619
2,2019-08-03,A,507
3,2019-08-04,A,717
4,2019-08-05,A,756
...,...,...,...
57,2019-08-27,B,720
58,2019-08-28,B,654
59,2019-08-29,B,531
60,2019-08-30,B,490


In [241]:
visits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    62 non-null     object
 1   group   62 non-null     object
 2   visits  62 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 1.6+ KB


In [242]:
# change date column to datetime format
visits['date'] = visits['date'].map(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))

# Part 1. Prioritizing Hypotheses

## Apply the ICE framework to prioritize hypotheses. Sort them in descending order of priority.

In [243]:
hypotheses['ICE'] = hypotheses['impact']*hypotheses['confidence']/hypotheses['effort']
print(hypotheses[['hypothesis', 'ICE']].sort_values(by='ICE', ascending=False))

                                          hypothesis        ICE
8  Launch a promotion that gives users discounts ...  16.200000
0  Add two new channels for attracting traffic. T...  13.333333
7  Add a subscription form to all the main pages....  11.200000
6  Show banners with current offers and sales on ...   8.000000
2  Add product recommendation blocks to the store...   7.000000
1  Launch your own delivery service. This will sh...   2.000000
5  Add a customer review page. This will increase...   1.333333
3  Change the category structure. This will incre...   1.125000
4  Change the background color on the main page. ...   1.000000


## Apply the RICE framework to prioritize hypotheses. Sort them in descending order of priority.

In [244]:
hypotheses['RICE'] = hypotheses['reach']*hypotheses['impact']*hypotheses['confidence']/hypotheses['effort']
print(hypotheses[['hypothesis', 'RICE']].sort_values(by='RICE', ascending=False))

                                          hypothesis   RICE
7  Add a subscription form to all the main pages....  112.0
2  Add product recommendation blocks to the store...   56.0
0  Add two new channels for attracting traffic. T...   40.0
6  Show banners with current offers and sales on ...   40.0
8  Launch a promotion that gives users discounts ...   16.2
3  Change the category structure. This will incre...    9.0
1  Launch your own delivery service. This will sh...    4.0
5  Add a customer review page. This will increase...    4.0
4  Change the background color on the main page. ...    3.0


When using ICE hypotheses number 8 leads follow by hypotheses 0. When applying the RICE framework the lead replace by hypotheses 7 follow by hypotheses 2. The reason is the reach parameter that by multiply it by the ICE framework we get the RICE. For 7 and 2 it's very high. That means these hypotheses have good reach in the market. As oppose to hypotheses 8 that lost the lead by 5 places because it almost have no reach

# Part 2. A/B Test Analysis

## Graph cumulative revenue by group. Make conclusions and conjectures.

In [245]:
orders.sample(3)

Unnamed: 0,visitor_id,transaction_id,date,revenue,group
111,447836349,416093091,2019-08-09,30.7,B
634,2634105098,1674284750,2019-08-11,265.4,B
432,1781647073,1876275626,2019-08-14,25.4,A


In [246]:
visits.sample(3)

Unnamed: 0,date,group,visits
16,2019-08-17,A,412
29,2019-08-30,A,490
17,2019-08-18,A,668


In [247]:
# building an array with unique paired date-group values
datesGroups = orders[['date','group']].drop_duplicates()

# getting aggregated cumulative daily data on orders 
ordersAggregated = datesGroups.apply(lambda x: orders[np.logical_and(orders['date'] <= x['date'], orders['group'] == x['group'])].agg({'date' : 'max', 'group' : 'max', 'transaction_id' : pd.Series.nunique, 'visitor_id' : pd.Series.nunique, 'revenue' : 'sum'}), axis=1).sort_values(by=['date','group'])

# getting aggregated cumulative daily data on visitors  
visitorsAggregated = datesGroups.apply(lambda x: visits[np.logical_and(visits['date'] <= x['date'], visits['group'] == x['group'])].agg({'date' : 'max', 'group' : 'max', 'visits' : 'sum'}), axis=1).sort_values(by=['date','group'])

# merging the two tables into one and giving its columns descriptive names
cumulativeData = ordersAggregated.merge(visitorsAggregated, left_on=['date', 'group'], right_on=['date', 'group'])
cumulativeData.columns = ['date', 'group', 'orders', 'buyers', 'revenue', 'visitors']

print(cumulativeData.head(5))

        date group  orders  buyers  revenue  visitors
0 2019-08-01     A      19      19   1950.0       719
1 2019-08-01     B      20      20   1484.7       713
2 2019-08-02     A      36      36   3148.1      1338
3 2019-08-02     B      42      42   4007.5      1294
4 2019-08-03     A      60      60   4963.3      1845


To plot graphs, we have to collect cumulative data. Let's declare a DataFrame called cumulativeData with the following columns:

To plot the graphs, we have to collect cumulative data. Let's declare a DataFrame called cumulativeData with the following columns:
- date
- group
- orders — the number of orders on the date specified for the group specified
- buyers — the number of users who make at least one order on the date specified for the group specified
- revenue — the revenue on the date specified for the group specified (average purchase size)
- visitors — the number of visitors to the online store on the date specified for the group specified

Let's plot cumulative revenue graphs by day and A/B test group:

In [248]:
# DataFrame with cumulative orders and cumulative revenue by day, group A
cumulativeRevenueA = cumulativeData[cumulativeData['group']=='A'][['date','revenue', 'orders']]

# DataFrame with cumulative orders and cumulative revenue by day, group B
cumulativeRevenueB = cumulativeData[cumulativeData['group']=='B'][['date','revenue', 'orders']]

fig = go.Figure()

# Plotting the group A revenue graph 
fig.add_trace(go.Scatter(x=cumulativeRevenueA['date'], y=cumulativeRevenueA['revenue'],
                    mode='lines',
                    name='A'))
# Plotting the group B revenue graph 
fig.add_trace(go.Scatter(x=cumulativeRevenueB['date'], y=cumulativeRevenueB['revenue'],
                    mode='lines',
                    name='B'))
fig.update_layout(title="cumulative revenue")
fig.show()

Revenue is rising steadily throughout the whole test.
For group B we can see between Aug 18 to Aug 19 a rapid increase in revenue of more than 20,000. This is tremendous increase. The normal increase is of 2,000 a day. Maybe this is a very expensive order in sample B this day. 

## Graph cumulative average order size by group. Make conclusions and conjectures.

Let's plot average purchase size by group. We'll divide cumulative revenue by the cumulative number of orders:


In [249]:
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=cumulativeRevenueA['date'], y=cumulativeRevenueA['revenue']/cumulativeRevenueA['orders'],
                    mode='lines',
                    name='A'))
# Plotting the group B revenue graph 
fig.add_trace(go.Scatter(
    x=cumulativeRevenueB['date'], y=cumulativeRevenueB['revenue']/cumulativeRevenueB['orders'],
                    mode='lines',
                    name='B'))
fig.update_layout(title="average purchase size by group")
fig.show()

Average purchase size starts with small fluctuation and than after Aug 15 it's quite steadily. All this changes by Aug 18 where the average purchase size of group B rises. We can see that no change happened in group A and moreover we see steady decline after this day in group B average purchase size, means that there is correction back to the previous average

## Graph the relative difference in cumulative average order size for group B compared with group A. Make conclusions and conjectures.

Let's plot a relative difference graph for the average purchase sizes. We'll add a horizontal axis with the axhline() method (i.e. horizontal line across the axis):

In [250]:
# gathering the data into one DataFrame
mergedCumulativeRevenue = cumulativeRevenueA.merge(cumulativeRevenueB, left_on='date', right_on='date', how='left', suffixes=['A', 'B'])

fig_order_size = go.Figure()

# plotting a relative difference graph for the average purchase sizes
fig_order_size.add_trace(go.Scatter(
    
    x=mergedCumulativeRevenue['date'], y=(mergedCumulativeRevenue['revenueB']/mergedCumulativeRevenue['ordersB'])/(mergedCumulativeRevenue['revenueA']/mergedCumulativeRevenue['ordersA'])-1,
                    mode='lines'))
# adding the X axis
fig_order_size.add_hline(y=0, line_dash="dash")

fig_order_size.update_layout(title="relative difference")

fig_order_size.show()

At several points the difference between the segments spikes. Aug 19 stands out again as the place of peak change. But also between Aug 2 and Aug 10 there was a big difference that rise until Aug 6 and than decline to non difference.

## Calculate each group's conversion rate as the ratio of orders to the number of visits for each day. Plot the daily conversion rates of the two groups and describe the difference. Draw conclusions and make conjectures.

We'll analyze the cumulative conversion graph in an analogous way

In [251]:
# calculating cumulative conversion
cumulativeData['conversion'] = cumulativeData['orders']/cumulativeData['visitors']

# selecting data on group A 
cumulativeDataA = cumulativeData[cumulativeData['group']=='A']

# selecting data on group B
cumulativeDataB = cumulativeData[cumulativeData['group']=='B']

In [252]:
# plotting the graphs
fig_conversion = go.Figure()

fig_conversion.add_trace(go.Scatter(
    x=cumulativeRevenueA['date'], y=cumulativeDataA['conversion'],
                    mode='lines',
                    name='A'))
# Plotting the group B revenue graph 
fig_conversion.add_trace(go.Scatter(
    x=cumulativeRevenueB['date'], y=cumulativeDataB['conversion'],
                    mode='lines',
                    name='B'))
fig_conversion.update_layout(title="cumulative conversion")
fig_conversion.show()


After Aug 4 the conversions behave in almost mirror reflection of one another. Group B have higher conversion from this date and on. 

## Plot a scatter chart of the number of orders per user. Make conclusions and conjectures.

Let's find the number of orders per user and print the result:

In [253]:
orders.sample(2)

Unnamed: 0,visitor_id,transaction_id,date,revenue,group
526,2201480904,1483298733,2019-08-29,10.2,B
325,1367935136,4243941421,2019-08-29,550.9,A


In [254]:
ordersByUsers = (
    orders.drop(['group', 'revenue', 'date'], axis=1)
    .groupby('visitor_id', as_index=False)
    .agg({'transaction_id': pd.Series.nunique})
)

ordersByUsers.columns = ['visitor_id', 'orders']

print(ordersByUsers.sort_values(by='orders', ascending=False).head(10))

     visitor_id  orders
0       5114589       1
645  2666467773       1
679  2785527055       1
680  2801638130       1
681  2803209166       1
682  2805132315       1
683  2819072957       1
684  2819363575       1
685  2830871062       1
686  2837914161       1


In [255]:
x_values = pd.Series(range(0,len(ordersByUsers)))
fig = px.scatter(ordersByUsers, x=x_values, y=ordersByUsers['orders'])
fig.show()

No more than 1 order per user. We removed the multiply urders per user and left the 1st only as need to be done.

## Calculate the 95th and 99th percentiles for the number of orders per user. Define the point at which a data point becomes an anomaly.

There is only 1 order per user since we remove the rest. More than that in A/B test is wrong practice.

## Plot a scatter chart of order prices. Make conclusions and conjectures.

In [256]:
x_values = pd.Series(range(0,len(orders)))
fig = px.scatter(orders, x=x_values, y=orders['revenue'])
fig.show()

We have one extreme! A 20k worth single order. This is probably the one from the 19 of Aug. Let's verify

In [257]:
orders.query('revenue > 19000')

Unnamed: 0,visitor_id,transaction_id,date,revenue,group
463,1920142716,590470918,2019-08-19,19920.4,B


Correct

## Calculate the 95th and 99th percentiles of order prices. Define the point at which a data point becomes an anomaly.

In [258]:
np.percentile(orders['revenue'], [90, 95, 99])

array([285.  , 435.6 , 896.71])

No more than 5% of orders cost more than 436, and no more than 1% cost more than 896. Let's define the point at which a data point becomes an anomaly as 400. between the 90 to 95 percentile. 

## Find the statistical significance of the difference in conversion between the groups using the raw data. Make conclusions and conjectures.

Let's print the relative conversion gain for group B: group B conversion / group A conversion - 1. We'll round the value to three decimal places.

In [259]:
ordersByUsersA = orders_raw[orders_raw['group']=='A'].groupby(
    'visitor_id', as_index=False).agg({'transaction_id' : pd.Series.nunique})
ordersByUsersA.columns = ['visitor_id', 'orders']

ordersByUsersB = orders_raw[orders_raw['group']=='B'].groupby(
    'visitor_id', as_index=False).agg({'transaction_id' : pd.Series.nunique})
ordersByUsersB.columns = ['visitor_id', 'orders']

sampleA = pd.concat([ordersByUsersA['orders'],pd.Series(
    0, index=np.arange(
        visits[visits['group']=='A']['visits'].sum() - len(
            ordersByUsersA['orders'])), name='orders')],axis=0)

sampleB = pd.concat([ordersByUsersB['orders'],pd.Series(
    0, index=np.arange(
        visits[visits['group']=='B']['visits'].sum() - len(
            ordersByUsersB['orders'])), name='orders')],axis=0)

print("{0:.3f}".format(st.mannwhitneyu(sampleA, sampleB)[1]))

print("{0:.3f}".format(sampleB.mean()/sampleA.mean()-1)) 

0.017
0.138


Conclusion: based on the raw data, there is a difference between groups A and B.
The first row of the output gives us the p-value, 0.017, which is smaller than 0.05. So we can reject the null hypothesis that there's not a statistically significant difference in conversion between the groups. The relative gain of group B is 14% (the second row of the output).

## Find the statistical significance of the difference in average order size between the groups using the raw data. Make conclusions and conjectures.

To calculate the statistical significance of the difference in the segments' average order size, we'll pass the data on revenue to the mannwhitneyu() criterion.
We'll also find the relative difference in average order size between the groups:

In [260]:
print('{0:.3f}'.format(st.mannwhitneyu(
    orders_raw[orders_raw['group']=='A']['revenue'], orders[orders['group']=='B']['revenue'])[1]))
print('{0:.3f}'.format(
    orders_raw[orders_raw['group']=='B']['revenue'].mean()/orders[orders['group']=='A']['revenue'].mean()-1))

0.960
0.228


Conclusion: based on the raw data, there is no difference between groups A and B. The first row of the output gives us the p-value, 0.96, which is greater than 0.05. So we can't reject the null hypothesis that there's not a statistically significant difference in average order size between the groups. The relative gain of group B is 23% (the second row of the output).

## Find the statistical significance of the difference in conversion between the groups using the filtered data. Make conclusions and conjectures.

Like before. Now we will filter the data before

In [261]:
usersWithExpensiveOrders = orders[orders['revenue'] > 400]['visitor_id'].sort_values()
# abnormalUsers = pd.concat(
#     [usersWithManyOrders, usersWithExpensiveOrders], axis = 0).drop_duplicates().sort_values()
print(usersWithExpensiveOrders.head(5))
print(usersWithExpensiveOrders.shape) 

19    113298937
26    148427295
29    154576532
47    204675465
86    358944393
Name: visitor_id, dtype: int64
(62,)


We have 62 anomalous users in total.
Let's try to find out how their actions affected the test results. We'll calculate the statistical significance of the differences in conversion between the groups, using filtered data. First we'll prepare samples of the number of orders per user for each test group:

In [262]:
sampleAFiltered = pd.concat(
    [ordersByUsersA[np.logical_not(
        ordersByUsersA['visitor_id'].isin(usersWithExpensiveOrders))]['orders'],pd.Series(
        0, index=np.arange(
            visits[visits['group']=='A']['visits'].sum() - len(
                ordersByUsersA['orders'])),name='orders')],axis=0)

sampleBFiltered = pd.concat(
    [ordersByUsersB[np.logical_not(
        ordersByUsersB['visitor_id'].isin(usersWithExpensiveOrders))]['orders'],pd.Series(
        0, index=np.arange(
            visits[visits['group']=='B']['visits'].sum() - len(
                ordersByUsersB['orders'])),name='orders')],axis=0)

Let's apply the statistical Mann-Whitney criterion to the resulting samples:

In [263]:
print("{0:.3f}".format(st.mannwhitneyu(sampleAFiltered, sampleBFiltered)[1]))
print("{0:.3f}".format(sampleBFiltered.mean()/sampleAFiltered.mean()-1))

0.032
0.132


The results for conversion almost didn't change. The p-value increased a little but still below the alpha of 0.05. 

## Find the statistical significance of the difference in average order size between the groups using the filtered data. Make conclusions and conjectures.

In [264]:
print('{0:.3f}'.format(st.mannwhitneyu(
    orders[np.logical_and(
        orders['group']=='A',
        np.logical_not(orders['visitor_id'].isin(usersWithExpensiveOrders)))]['revenue'],
    orders[np.logical_and(
        orders['group']=='B',
        np.logical_not(orders['visitor_id'].isin(usersWithExpensiveOrders)))]['revenue'])[1]))

print('{0:.3f}'.format(
    orders[np.logical_and(orders['group']=='B',np.logical_not(orders['visitor_id'].isin(usersWithExpensiveOrders)))]['revenue'].mean()/
    orders[np.logical_and(
        orders['group']=='A',
        np.logical_not(orders['visitor_id'].isin(usersWithExpensiveOrders)))]['revenue'].mean() - 1))

0.707
-0.062


The p-value increased. Now the p-value is 0.707, which is, again, greater than 0.05 but a bit smaller then in the raw data. So we can't reject the null hypothesis that there's not a statistically significant difference in average order size between between the groups. But now instead of high relative gain we have relative loss of group B is 6% (the second row of the output).

## Make a decision based on the test results. The possible decisions are: 1. Stop the test, consider one of the groups the leader. 2. Stop the test, conclude that there is no difference between the groups. 3. Continue the test.

Here are the facts:
- Both raw nor filtered data demonstrate statistically significant differences in conversion between the groups.
- Neither raw nor filtered data revealed any statistically significant differences in average order size between the groups.

- The graph showing the difference in conversion between the groups tells us that group A's results are worse and stays with difference of ~ 0.03 from group B conversion:

In [265]:
fig_conversion.show()

- We can't draw conclusion on the average order size between the groups because there is no statistically difference.

Based on these facts, we can conclude that the test is partly successful and should not be stopped. We should continue another period and than check again if there is significance difference in average order size between the groups.