# Beat the Boombox?
## Prime versus Nonprime

When we can transform the nonprime prices to primeprices we can reduce the complexity.

The problem we reduce you can formulate as:
At every single point in time the buybox can be won by an arbitrary number of nonprime listings.
Or an arbitrary number of prime listings. Or a random combination of both.

Let's find the transformation.
By an univariate analysis of the price
We do that because it's a good way to get closer to the truth.

So we differ not by sellerid, instead we build the mean and count the buybox winning listings split by prime and non prime listings.

The result will be a correctly forward filled timeseries.
You have to be conscious about your np.nan and 0 and None values.
They all behave different and you have to convert them multiple times.

In [None]:
from mlrepricer import match, setup, helper
import altair as alt
import numpy as np
import pandas as pd
import warnings
alt.data_transformers.enable('default', max_rows=1000000)

In [None]:
cnx = sqlite3.connect(f"{setup.configs['datafolder']}listenerdb.sqlite")
df = pd.read_sql_query("SELECT * FROM price_monitor", cnx)

In [None]:
# make to helper columns
# we are aiming for two rows for each asin at each point in time
df = df[df.isbuyboxwinner==1]
df['prime_price'] = np.where(df['isprime']==1, df['price'], np.nan)
df['nonprime_price'] = np.where(df['isprime']==0, df['price'], np.nan)

In [None]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=RuntimeWarning)
    result = df.groupby(['asin', 'time_changed']).agg({'prime_price': np.nanmean, 'nonprime_price': np.nanmean}).sort_index()
    # here the 0 joins, as placeholder, gets removed later
    result = result.fillna(0)  # very important or you will fill up if no prime offer did win the pricebox

In [None]:
# that's wide form data, i think that's more intuitive and nice to ffill
crazy = result.reset_index().pivot(index='time_changed', columns='asin').fillna(method='ffill').asfreq('1MIN', method='ffill')

In [None]:
longform.to_msgpack('/home/flo/asin_1min')

In [None]:
longform.rename({'value': 'price'}, inplace=True, axis=1)

In [None]:
df2 = pd.read_msgpack('/home/flo/asin_1min')

In [None]:
result = df2.unstack().reset_index()

In [None]:
# for altair longform data is prefered
longform = crazy.reset_index().melt('time_changed')
# control subsets of your data, is it plausible
# longform[longform.time_changed=='2018-05-09 23:18:11.862'].dropna()

In [None]:
# we use the fact that the subsets have the same length and we can merge them this way
base = pd.DataFrame()
base['nonprime_price'] = longform[longform[None]=='nonprime_price']['value'].values
base['prime_price'] = longform[longform[None]=='prime_price']['value'].values
base['time_changed'] = longform[longform[None]=='prime_price']['time_changed'].values
# wanna have a look at a subset?
# base[base.time_changed=='2018-05-09 23:18:11.862'].dropna()

In [None]:
# what the heck where did this zeros come from?
base = base.replace(0, np.nan)
# oh we are only interest in those points in time where a prime and nonprime listing shares the buybox
base = base[base.nonprime_price.notna() & base.prime_price.notna()]

In [None]:
m = base
import statsmodels.api as sm

X = m['prime_price']
y = m['nonprime_price']
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
predictions = model.predict(X) # make the predictions by the model

# Print out the statistics
model.summary()

In [None]:
price = alt.Chart(base).mark_point().encode(
    y=alt.Y('nonprime_price'),
    x=alt.X('prime_price', scale=alt.Scale(zero=False)))

x = np.arange(22)
# here you should but in your function from the statsmodel
data = pd.DataFrame({'nonprime_price': x*0.859-0.5670,
                     'prime_price': x})

regr = alt.Chart(data).mark_line().encode(
    x='prime_price',
    y='nonprime_price'
)

In [None]:
regr + price