# E-Commerce Case Study EDA notebook

## Data preparation

### Step 1
#### Importing the dataset

In [70]:
import pandas as pd

path = '../data/dummy_data_case_study.csv'
df = pd.read_csv(path)
df.head()

Unnamed: 0,Week Ending,asin,orderedRevenueAmount,orderedUnits,ASP,Category,subcategory,Marketing spend,Views
0,1/6/2024,99345,1902.66,95,20.028,a,aa,3741,111
1,1/6/2024,91686,224.15,12,18.679167,b,bb,2309,185
2,1/6/2024,90798,437.74,9,48.637778,c,cc,4781,253
3,1/6/2024,28305,4.95,1,4.95,d,dd,1643,1069
4,1/6/2024,52947,13.49,1,13.49,a,ee,3206,347


### Step 2
#### Preliminar data cleaning

In [71]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14717 entries, 0 to 14716
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Week Ending           14717 non-null  object 
 1   asin                  14717 non-null  int64  
 2   orderedRevenueAmount  14717 non-null  object 
 3   orderedUnits          14717 non-null  int64  
 4   ASP                   14717 non-null  float64
 5   Category              14717 non-null  object 
 6   subcategory           14717 non-null  object 
 7   Marketing spend       14717 non-null  int64  
 8   Views                 14717 non-null  int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 1.0+ MB


In [72]:
def clean_currency(x):
    return float(x.replace('$', '').replace(',', '').strip())


In [73]:
df['orderedRevenueAmount'] = df['orderedRevenueAmount'].apply(clean_currency)
df['orderedRevenueAmount'] = pd.to_numeric(df['orderedRevenueAmount'])
df['Week Ending'] = pd.to_datetime(df['Week Ending'])

# Create a new column calculating the difference between weekly revenue and Marketing Spend
df['revenueMinusMarketing'] = df['orderedRevenueAmount'] - df['Marketing spend']

df.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14717 entries, 0 to 14716
Data columns (total 10 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   Week Ending            14717 non-null  datetime64[ns]
 1   asin                   14717 non-null  int64         
 2   orderedRevenueAmount   14717 non-null  float64       
 3   orderedUnits           14717 non-null  int64         
 4   ASP                    14717 non-null  float64       
 5   Category               14717 non-null  object        
 6   subcategory            14717 non-null  object        
 7   Marketing spend        14717 non-null  int64         
 8   Views                  14717 non-null  int64         
 9   revenueMinusMarketing  14717 non-null  float64       
dtypes: datetime64[ns](1), float64(3), int64(4), object(2)
memory usage: 1.1+ MB


Check if null values are present in the dataset

In [74]:
df.isnull().sum()

Week Ending              0
asin                     0
orderedRevenueAmount     0
orderedUnits             0
ASP                      0
Category                 0
subcategory              0
Marketing spend          0
Views                    0
revenueMinusMarketing    0
dtype: int64

## Exploration of data

### Step 3
#### Show products that were sold only for a few weeks without considering revenue minus marketing

In [75]:
prod_count= df.groupby('asin').size().sort_values(ascending=True).reset_index(name='asin_count')
prod_count

Unnamed: 0,asin,asin_count
0,111,1
1,24087,1
2,24531,1
3,26529,1
4,26862,1
...,...,...
1000,70596,25
1001,70263,25
1002,70152,25
1003,62937,25


In [76]:
# Get a list of the ASIN that belong to products with less than 4 weeks
filtered_asin_list = prod_count[prod_count['asin_count'] < 4]['asin'].tolist()
len(filtered_asin_list)

234

From the 1005 products, there are 234 that were sold in less than 4 weeks. Those could be discarded as outliers or keep them until records from future weeks provide more information about the product sales.

In [77]:
# Display rows from the original dataframe where 'asin' is in the filtered list and 'revenueMinusMarketing' is positive
result_df = df[(df['asin'].isin(filtered_asin_list)) & (df['revenueMinusMarketing'] > 0)]
result_df

Unnamed: 0,Week Ending,asin,orderedRevenueAmount,orderedUnits,ASP,Category,subcategory,Marketing spend,Views,revenueMinusMarketing
14422,2024-06-22,111444,3799.62,38,99.99,c,cc,2601,988,1198.62


There is only one product with low sales consistency that had a positive revenue minus marketing. That means that the corner case of products with low sales consistency but positive revenue is almost not present in the data provided.

### Step 4 
#### Check for general sales consistency
For how many weeks were how many products sold? 

In [78]:
prod_weeks_count = prod_count.groupby('asin_count').size().reset_index(name='count')
prod_weeks_count

Unnamed: 0,asin_count,count
0,1,119
1,2,72
2,3,43
3,4,34
4,5,22
5,6,31
6,7,29
7,8,23
8,9,23
9,10,15


### Step 5
#### Exploring proportion of products with revenue higher than marketing spending

In [79]:
print("There are {} different products in dataset".format(df['asin'].nunique()))


# 2. Filter records with a positive calculated difference 
positive_diff_df = df[df['revenueMinusMarketing'] > 0]
print('From {} total records, {} of them have a revenue higher that the marketing expenses. That is {:.2f}% of the records.'
      .format(df.shape[0], positive_diff_df.shape[0], 100*positive_diff_df.shape[0]/df.shape[0]))

# 3. Count products that have produced a positive flow
pos_count_group_df = positive_diff_df.groupby('asin').size().sort_values(ascending = False).reset_index(name='positive_weeks')
print('There are {} products with positive weekly revenue.'.format(pos_count_group_df.shape[0]))

There are 1005 different products in dataset
From 14717 total records, 2308 of them have a revenue higher that the marketing expenses. That is 15.68% of the records.
There are 238 products with positive weekly revenue.


### Step 5
#### Checking sales consistency from products with revenue higher than marketing spending. How many products are sold regularly and how many are not?

In [80]:
pos_weeks_count_df = pos_count_group_df.groupby('positive_weeks').size().reset_index(name='asin_count')
pos_weeks_count_df = pos_weeks_count_df.sort_values(by='positive_weeks', ascending=False)
pos_weeks_count_df

Unnamed: 0,positive_weeks,asin_count
24,25,24
23,24,6
22,23,5
21,22,2
20,21,1
19,20,5
18,19,6
17,18,7
16,17,6
15,16,6


## Correlation Analysis and Solutions

### Step 6
#### Correlation
A correlation matrix can be created to use the Pearson's coefficient to detect how variables affect each other. When applying a correlation matrix directly to the dataset instead of applying it to a specific product, the correlation between variables is low. Only some redundant correlations were found: orderedUnits correlates with orderedRevenueAmount and revenueMinusMarketing. 

In [81]:
df.corr()

  df.corr()


Unnamed: 0,asin,orderedRevenueAmount,orderedUnits,ASP,Marketing spend,Views,revenueMinusMarketing
asin,1.0,0.077135,-0.055297,0.253007,0.003271,0.012854,0.075009
orderedRevenueAmount,0.077135,1.0,0.639308,0.204622,0.005253,0.023773,0.979858
orderedUnits,-0.055297,0.639308,1.0,-0.052021,0.009937,0.009912,0.625117
ASP,0.253007,0.204622,-0.052021,1.0,0.008034,0.012124,0.199108
Marketing spend,0.003271,0.005253,0.009937,0.008034,1.0,0.001011,-0.194546
Views,0.012854,0.023773,0.009912,0.012124,0.001011,1.0,0.023117
revenueMinusMarketing,0.075009,0.979858,0.625117,0.199108,-0.194546,0.023117,1.0


However, the same behavior cannot be expected for every single product. So the products  with a positive correlation between marketing spending and revenue can be identified.

A good business question is: how much time does it take for marketing spending to produce any effect in the revenue obtained? In this calculation, two kinds of correlation will be obtained: 
- Marketing spending for each product in the current  week with revenue
- Marketing spending for each product in the previous week with revenue

By filtering out records that show a high correlation, a better idea about which products can help to improve the revenue of the company can be obtained. There are some considerations to have from the beginning:
- When a product has a high amount of records, the measurement of the correlation is more reliable. Products that were sold only in a determined week do not provide enough information for a correlation calculation.
- Correlation does not imply causation. Each product should be analyzed individually with help of a dashboard and business context.

In [82]:
# Sort the dataframe by asin and date
df = df.sort_values(by=['asin','Week Ending'])

# Create a new column 'prev_mkt_spend' containing the previous week's mkt_spend for each asin
df['prev_mkt_spend'] = df.groupby('asin')['Marketing spend'].shift(1)

# Function to calculate statistics for each group
def calculate_stats(group):
    correlation = group['Marketing spend'].corr(group['revenueMinusMarketing'])
    correlation1 = group['Marketing spend'].corr(group['orderedRevenueAmount'])
    prev_correlation = group['prev_mkt_spend'].corr(group['revenueMinusMarketing'])
    prev_correlation1 = group['prev_mkt_spend'].corr(group['orderedRevenueAmount'])
    record_count = group.shape[0]
    prod_final_revenue = group['revenueMinusMarketing'].sum()
    return pd.Series({'correlation': correlation, 
                      'correlation_rev': correlation1, 
                      'prev_correlation': prev_correlation, 
                      'prev_correlation_rev': prev_correlation1, 
                      'record_count': record_count, 
                      'prod_final_revenue': prod_final_revenue})                     
                                                
# Group by 'asin' and calculate the statistics for each group
corr_df = df.groupby('asin').apply(calculate_stats).reset_index()

  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.t

In [83]:
corr_df

Unnamed: 0,asin,correlation,correlation_rev,prev_correlation,prev_correlation_rev,record_count,prod_final_revenue
0,111,,,,,1.0,-1841.81
1,222,-0.999946,-0.119407,-0.177853,0.177878,13.0,-44866.80
2,333,-0.999946,-0.170123,0.092769,0.129190,20.0,-57178.99
3,444,-1.000000,0.274008,-0.052087,0.363338,9.0,-26254.88
4,555,-1.000000,-1.000000,,,2.0,-8462.18
...,...,...,...,...,...,...,...
1000,111111,,,,,1.0,-2620.01
1001,111222,-1.000000,-1.000000,,,2.0,-5451.03
1002,111333,-1.000000,-1.000000,,,2.0,-6950.04
1003,111444,,,,,1.0,1198.62


In [87]:
# Filter to keep only those with a positive correlation
positive_corr_df = corr_df[corr_df['correlation'] > 0]
negative_corr_df = corr_df[corr_df['correlation'] < 0]

positive_corr_rev_df = corr_df[corr_df['correlation_rev'] > 0]
negative_corr_rev_df = corr_df[corr_df['correlation_rev'] < 0]

positive_prev_corr_df = corr_df[corr_df['prev_correlation'] > 0]
negative_prev_corr_df = corr_df[corr_df['prev_correlation'] < 0]

positive_prev_corr_rev_df = corr_df[corr_df['prev_correlation_rev'] > 0]
negative_prev_corr_rev_df = corr_df[corr_df['prev_correlation_rev'] < 0]

# Sort by correlation, record_count, and total_revenue in descending order
sorted_pos_corr_df = positive_corr_df.sort_values(by=['record_count', 'correlation', 'prod_final_revenue'], ascending=False)
sorted_neg_corr_df = negative_corr_df.sort_values(by=['record_count', 'correlation', 'prod_final_revenue'], ascending=[False, True, True])

# Get the list of unique 'asin'
unique_asin_list = sorted_pos_corr_df['asin'].tolist()
unique_asin_neg_list = sorted_neg_corr_df['asin'].tolist()

print("List of unique asin with positive correlation, sorted by correlation, number of records, and revenue:\n", unique_asin_list)
print("\nDetailed DataFrame:\n", sorted_pos_corr_df)

List of unique asin with positive correlation, sorted by correlation, number of records, and revenue:
 [86247, 29748, 93462, 54501, 66933, 108447, 52170]

Detailed DataFrame:
        asin  correlation  correlation_rev  prev_correlation  \
776   86247     0.157206         0.328908          0.211663   
267   29748     0.027518         0.106555          0.557867   
841   93462     0.163361         0.274782          0.070467   
490   54501     0.072753         0.360047          0.105783   
602   66933     0.165462         0.306551          0.023414   
976  108447     0.139755         0.404532          0.445227   
469   52170     1.000000         1.000000               NaN   

     prev_correlation_rev  record_count  prod_final_revenue  
776              0.217841          25.0           359503.91  
267              0.543456          25.0          1440553.01  
841              0.112623          24.0             8421.46  
490              0.098650          24.0            20158.65  
602      

### 1. Products that should receive more marketing budget according to current marketing spending correlation.
In order to identify products that could produce a revenue higher than the marketing expenses, a strong positive correlation and high number of records are needed.

In [119]:
# Correlation Revenue - Marketing
sol1_df = corr_df[(corr_df['record_count']>10.0) & (corr_df['correlation']>0.15)]
sol1_df = sol1_df.sort_values(by=['record_count', 'correlation', 'prod_final_revenue'], ascending=False)

sol1_df[['asin','record_count','correlation', 'prod_final_revenue', 'correlation_rev','prev_correlation','prev_correlation_rev']]


Unnamed: 0,asin,record_count,correlation,prod_final_revenue,correlation_rev,prev_correlation,prev_correlation_rev
776,86247,25.0,0.157206,359503.91,0.328908,0.211663,0.217841
841,93462,24.0,0.163361,8421.46,0.274782,0.070467,0.112623
602,66933,18.0,0.165462,246245.33,0.306551,0.023414,0.065027


In [120]:
# Correlation with Revenue
sol1_rev_df = corr_df[(corr_df['record_count']>20.0) & (corr_df['correlation_rev']>0.4)]
sol1_rev_df = sol1_rev_df.sort_values(by=['record_count', 'correlation_rev', 'prod_final_revenue'], ascending=False)
sol1_rev_df[['asin','record_count','correlation_rev', 'prod_final_revenue', 'correlation', 'prev_correlation','prev_correlation_rev']]

Unnamed: 0,asin,record_count,correlation_rev,prod_final_revenue,correlation,prev_correlation,prev_correlation_rev
264,29415,25.0,0.532019,-68719.58,-0.994734,-0.071109,0.248996
547,60828,25.0,0.499876,-66234.68,-0.998326,-0.181816,0.051644
735,81696,25.0,0.436715,25707.65,-0.557778,0.370013,0.110827
842,93573,25.0,0.425415,-50190.94,-0.95367,0.045355,0.026076
510,56721,25.0,0.405993,17142.4,-0.233348,0.086558,0.274119
791,87912,24.0,0.462324,-67936.86,-0.999187,-0.131261,-0.03294
54,6105,21.0,0.413748,-55065.37,-0.999378,-0.120854,0.255102


### 2. Products that should receive more marketing budget according to previous week marketing spending correlation. 

It is important to consider that effects on marketing spending could not be observed immediately. In other words, the effect of a marketing spending increase could result in higher revenue not on the same week the spending was done, but in the next one. This happens , for example, in asin 29415, 60828 and 81696.

In [121]:
# Use Previous Marketing Spend correlation with Revenue-Mkt
prev_corr_df = corr_df[(corr_df['prev_correlation'] > 0.55) & (corr_df['record_count'] > 10)]

# Sort the filtered DataFrame
sol2_df = prev_corr_df.sort_values(by=['record_count', 'prev_correlation', 'prod_final_revenue'], ascending=[False, True, True])
sol2_df[['asin','record_count','prev_correlation','prod_final_revenue', 'correlation','correlation_rev', 'prev_correlation_rev']]

Unnamed: 0,asin,record_count,prev_correlation,prod_final_revenue,correlation,correlation_rev,prev_correlation_rev
267,29748,25.0,0.557867,1440553.01,0.027518,0.106555,0.543456
317,35298,25.0,0.602858,-60210.31,-0.99375,0.040451,0.217858
722,80253,22.0,0.594771,-65215.56,-0.972812,0.304531,0.088442
474,52725,22.0,0.602801,-62756.47,-0.998692,-0.227937,0.071092
137,15318,16.0,0.586829,-48850.66,-0.992577,-0.356882,0.446415
14,1665,16.0,0.605297,-42447.66,-0.999971,-0.476702,0.245567
726,80697,15.0,0.569483,-28096.75,-0.845241,-0.012963,0.255446
45,5106,14.0,0.584113,-30813.06,-0.999837,-0.011874,-0.198132
619,68820,13.0,0.564404,-25768.78,-0.935736,-0.424127,0.130246
626,69597,11.0,0.578272,-12678.68,-0.625255,-0.300338,0.50684


In [122]:
# Use Previous Marketing Spend correlation with Revenue
prev_corr_df = corr_df[(corr_df['prev_correlation_rev'] > 0.5) & (corr_df['record_count'] > 10)]

# Sort the filtered DataFrame
sol2_rev_df = prev_corr_df.sort_values(by=['record_count', 'prev_correlation_rev', 'prod_final_revenue'], ascending=[False, True, True])
sol2_rev_df[['asin','record_count','prev_correlation_rev','prod_final_revenue', 'correlation','correlation_rev', 'prev_correlation']]

Unnamed: 0,asin,record_count,prev_correlation_rev,prod_final_revenue,correlation,correlation_rev,prev_correlation
258,28749,25.0,0.517287,18143.61,-0.649748,0.109692,0.342022
262,29193,25.0,0.539806,-57805.48,-0.978231,0.010849,-0.073561
267,29748,25.0,0.543456,1440553.01,0.027518,0.106555,0.557867
296,32967,12.0,0.522306,-34755.6,-0.999949,-0.255497,0.042727
202,22533,12.0,0.592441,-31880.76,-0.999992,0.070579,-0.27475
353,39294,12.0,0.616279,-31095.57,-0.999957,-0.615678,0.310981
626,69597,11.0,0.50684,-12678.68,-0.625255,-0.300338,0.578272
439,48840,11.0,0.596258,-35208.08,-0.99905,0.143482,-0.271695


By finding strong values for prev_correlation field, there are products with potential of generating positive cash flow to the company if the marketing investment for that product increases. Product with asin 19314 is an example where the revenue became higher than the marketing spending some weeks after a considerable increase in the marketing expenses.

### 3. Products that should receive less marketing budget
In order to identify products that might not have a greater revenue after spending in marketing, a strong negative correlation is needed, as well good data samples. 

In [123]:
# Use Current Marketing Spend negative correlation with Revenue-Mkt
neg_corr_df = corr_df[(corr_df['correlation'] < -0.5) & (corr_df['record_count'] > 24)]
# Sort the filtered DataFrame
sol3_df = neg_corr_df.sort_values(by=['record_count', 'correlation', 'prod_final_revenue'], ascending=[False, True, True])
sol3_df[['asin','record_count','correlation', 'prod_final_revenue', 'correlation_rev','prev_correlation','prev_correlation_rev']]


Unnamed: 0,asin,record_count,correlation,prod_final_revenue,correlation_rev,prev_correlation,prev_correlation_rev
226,25197,25.0,-0.999875,-75303.92,0.322872,-0.172485,-0.161835
284,31635,25.0,-0.999646,-83559.01,-0.155555,0.075765,0.032824
832,92463,25.0,-0.999552,-78543.07,-0.178364,0.202915,0.300554
434,48285,25.0,-0.999503,-69306.09,0.069955,-0.255347,-0.051201
380,42291,25.0,-0.999502,-72726.84,0.050228,0.080903,-0.267883
...,...,...,...,...,...,...,...
397,44178,25.0,-0.552529,-7513.40,0.027889,0.147349,0.098192
325,36186,25.0,-0.548574,76774.17,0.012220,-0.144492,-0.149740
545,60606,25.0,-0.543634,109757.26,0.037259,0.268987,0.231733
399,44400,25.0,-0.509401,-15150.19,-0.241533,0.331508,0.223463


In [125]:
# Use Current Marketing Spend negative correlation with Revenue-Mkt
neg_corr_df = corr_df[(corr_df['prev_correlation'] < -0.45) & (corr_df['record_count'] > 20)]
# Sort the filtered DataFrame
sol3_prev_df = neg_corr_df.sort_values(by=['record_count', 'prev_correlation', 'prod_final_revenue'], ascending=[False, True, True])
sol3_prev_df[['asin','record_count','prev_correlation', 'prod_final_revenue', 'correlation','correlation_rev','prev_correlation_rev']]


Unnamed: 0,asin,record_count,prev_correlation,prod_final_revenue,correlation,correlation_rev,prev_correlation_rev
530,58941,25.0,-0.625513,-65890.48,-0.996696,-0.155056,-0.252421
936,104007,25.0,-0.611577,11755.58,-0.655359,-0.303922,-0.522042
888,98679,25.0,-0.499944,143054.64,-0.15544,-0.066884,-0.509917
849,94350,25.0,-0.489603,-35068.82,-0.868489,0.050565,-0.283023
173,19314,25.0,-0.473993,-8985.75,-0.824176,0.345637,0.155731
839,93240,25.0,-0.463743,-42758.37,-0.678787,-0.239269,-0.348532
684,76035,25.0,-0.462509,-48359.3,-0.86753,0.070073,-0.360241


In [128]:
print('ASIN to be studied because high correlation between marketing spending and revenue-mkt(ROAS) are:\n{}\nInvesting in marketing for these products could help increase total revenue\n'
      .format(sol1_df['asin'].values))
print('ASIN to be studied because high correlation between marketing spending and revenue are:\n{}\nInvesting in marketing for these products could help increase total revenue'
      .format(sol1_rev_df['asin'].values))

ASIN to be studied because high correlation between marketing spending and revenue-mkt(ROAS) are:
[86247 93462 66933]
Investing in marketing for these products could help increase total revenue

ASIN to be studied because high correlation between marketing spending and revenue are:
[29415 60828 81696 93573 56721 87912  6105]
Investing in marketing for these products could help increase total revenue


In [129]:
print('ASIN of products to be studied because high correlation between marketing spending of previous week and revenue-mkt(ROAS) are:\n{}\nInvesting in marketing for these products could help increase total revenue\n'
      .format(sol2_df['asin'].values))
print('ASIN of products to be studied because high correlation between marketing spending of previous week and revenue are:\n{}\nInvesting in marketing for these products could help increase total revenue'
      .format(sol2_rev_df['asin'].values))

ASIN of products to be studied because high correlation between marketing spending of previous week and revenue-mkt(ROAS) are:
[ 29748  35298  80253  52725  15318   1665  80697   5106  68820  69597
   4440 108336   7437]
Investing in marketing for these products could help increase total revenue

ASIN of products to be studied because high correlation between marketing spending of previous week and revenue are:
[28749 29193 29748 32967 22533 39294 69597 48840]
Investing in marketing for these products could help increase total revenue


In [131]:
print('ASIN to be studied because negative correlation between marketing spending and revenue-mkt are:\n{}\nInvesting in marketing for these products is not a good idea since a lot of resources have been invested unsuccessfully to increase revenue.\n'
      .format(sol3_df['asin'].values))
print('ASIN to be studied because negative correlation between marketing spending of the previous week and revenue are:\n{}\nInvesting in marketing for these products is not a good idea since a lot of resources have been invested unsuccessfully to increase revenue.'
      .format(sol3_prev_df['asin'].values))

ASIN to be studied because negative correlation between marketing spending and revenue-mkt are:
[ 25197  31635  92463  48285  42291  37407   2664  51060  91797  42402
  92685  40071  82473  38406  60828  87801  78255  47397  53280  91686
  92352  68043  67488  47730  77589  58941  99789  78366  19536  46176
  67155  55500 103341  58719  13764   6660  64047  80364  57498  29415
  44511  99012  56388 103452  55722   8991  35298  55056  58053  95349
  78033  60162  39627  56832  78921  83028  21201  79365  39849  35409
  36963  79143  83583  96459  86913  86802  45954  88023  87135  90798
  10878  79254  54279  75147  95904  55167  45066  62937  45510  38073
  97791  57942  84249  79587 103563  44622  59052  78144  82917  35964
 102786  37629  70263  42846  48174  41070  55944  79032  40515  40182
  71817  98568  85137  65157  73815  83472  29193  29526  87468  83694
 103674  64269  86691  46731  70596  44289  57276  99123  54168  41736
  81474  59496  81252 101565  71706  59385 100122  4