# eBay 

The dataset, `ebay_ho2.parquet`, contains rows, each corresponding to a sold item on eBay under the *Best Offer* platform. 

The data comes from https://www.nber.org/research/data/best-offer-sequential-bargaining, from the paper “Sequential Bargaining in the Field: Evidence from Millions of Online Interactions” by Backus, Blake, Larsen & Tadelis (2020) in the Quarterly Journal of Economics. 

In [None]:
import pandas as pd 
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt 
from statsmodels.distributions.empirical_distribution import ECDF
from scipy.optimize import minimize

In [None]:
dat = pd.read_parquet('ebay_ho2.parquet')

In [None]:
dat.shape

Variable labels in human readable form

In [None]:
var_labels = {'anon_item_id': 'Anonymized listing ID',
              'anon_leaf_categ_id': 'Anonymized leaf category ID, a finer categorization than meta categor',
              'fdbk_pstv_start':'sellers percent feedback score at the time of the listin',
              'start_price_usd': 'Buy-It-Now price', 'photo_count': 'Number of photos in listing', 
              'to_lst_cnt': 'Number of listings created by the seller dating back to 2008',
              'bo_lst_cnt': 'Number of BO-listings created by the seller dating back to 2008',
              'item_cndtn_id': 'Indicator for the new/used status of the item',
              'view_item_count': 'Number of times the item page was viewed',
              'wtchr_count': 'Number of users who selected the "add to watch list" option for this listing',              
              'anon_product_id': 'Anonymized product ID, only available for items that can be linked to specific cataloged products', 
              'count1': 'Number of listings used in creating ref_price1 for this observation',
              'ref_price1': 'Average price for sold fixed price listings with the same listing title as this item, sold during the time frame of the data',   
              'item_price': 'Final price (Buy-It-Now price if the item sold through Buy-It-Now option, or the final negotiated price if the item sold through Best Offer. ',
              'bo_ck_yn': 'Indicator for whether item sold through best offer.',
              'decline_price': 'Price chosen by seller, if the seller chooses to report one, below which any offer will be automatically rejected', 
              'accept_price': 'Price chosen by seller, if the seller chooses to report one, above which any offer will be automatically accepted', 
              'bin_rev': 'Indicator for whether the Buy-It-Now price was ever modified by the seller during the time the item was listed', 
              'lstg_gen_type_id': 'Indicator for whether the item is a re-listing (i.e. an item that failed to sell before and was then re-listed by the seller)', 
              'store': 'Indicator for whether the listing is part of an eBay store',
              'slr_us': 'Indicator for whether seller is located in US or not', 
              'buyer_us': 'Indicator for whether buyer is located in US or not', 
              'metacat': 'Product category (broad)', 
              'item_condition': 'Indicator for the new/used status of the item (categorical).', 
              'price2ref': 'Price relative to avg. among identical products (ref_price1)', 
              'price2start': 'Price relative to starting price (start_price_usd)'
             }

In [None]:
for v in dat.columns: 
    print(f'{v:<20}: {var_labels[v]}')

# Create variables

In [None]:
dat['price2ref']   = dat['item_price'] / dat['ref_price1']
dat['price2start'] = dat['item_price'] / dat['start_price_usd']

In [None]:
price_vars = ['item_price', 'price2ref', 'price2start']

In [None]:
price_labs = {'price2ref':'Price rel. to avg. for product', 'price2start':'Price rel. to starting price', 
              'item_price':'Price (USD)'}

# A few descriptives, just for curiosity

## Number of sales per category

In [None]:
ax=dat.metacat.value_counts().plot(kind='bar'); 
ax.set_ylabel('Number of product listings'); 

## Price distributions

In [None]:
I = dat.item_price <= 500 # just to remove a few outliers 
ax = dat.loc[I].item_price.hist(bins=100,edgecolor='white'); 
ax.set_xlabel(price_labs['item_price']); 

In [None]:
I = (dat.price2ref <= 2.0) & (dat.count1 >= 10) # if count1<10, then the reference price gets very noisy
ax = dat.loc[I].price2ref.hist(bins=100,edgecolor='white'); 
ax.set_xlabel(price_labs['price2ref']); 

In [None]:
I = (dat.price2start <= 1.05) 
ax = dat.loc[I].price2start.hist(bins=100,edgecolor='white'); 
ax.set_xlabel(price_labs['price2start']); 

# Compare prices for two specific categories

In [None]:
cats = ['Antiques',  'Cell Phones & Accessories']

fig,ax = plt.subplots(); 
for cat in cats: 
    I = (dat['price2start'] < 1.05) & (dat.metacat == cat)
    ax.hist(dat.loc[I, 'price2start'], bins=20, label=cat, alpha=0.6, density=True, edgecolor='white')

ax.legend(loc='best'); ax.set_xlabel(price_labs['price2start']); sns.despine();     

# Uniform valuations

If the underlying distribution is uniformly random, then we know that the order statistics are beta-distributed. More precisely, 
$$ X_{(k)} \sim \mathcal{B}(k, n+1-k),$$ 
and specifically for the 2nd order statistic, 
$$ X_{(n-1)} \sim \mathcal{B}(n-1, 2). $$

This also means that the expected value is 
$$ \mathbb{E}(X_{(k)}) =  \frac{k}{n+1}. $$

In [None]:
d = dat.groupby('metacat')[['price2start', 'view_item_count']].mean()
d

... continue on your own from here... write a function that takes `d.view_item_count` as an input and returns a predicted price. The data price that this should be compared to is `d.price2start`. 

# Part B

Now, we instead focus on the price variable, `price2ref`. 

## Subset the dataset

In [None]:
I = (dat.metacat == 'Cell Phones & Accessories') & (dat.price2ref < 2.0) & (dat.count1 >= 10)
d = dat[I].copy() # to avoid having a pointer 

In [None]:
price_var = 'price2ref'

Showing the overall distribution of that price

In [None]:
d[price_var].hist(bins=100); 
plt.xlabel('Price relative to reference'); 

In [None]:
n = 10
R = 100000

# chi square distributed valuations
v = np.random.chisquare(1, size=(n,R))
# next, find the winning price for each column, r=1,...,R
# save an R-vector with the winning payments
# ... 
win_chi2 = XXX # fill in 

# log normal draws
w = np.exp(np.random.normal(-0.5, 0.5, size=(n,R)))
# ... and do the same here 
# ... 
win_lognorm = XXX # fill in 

In [None]:
# if you called the variables "win_chi2" and "win_lognorm", 
# then this code creates a graph with the histograms together 
fig,ax = plt.subplots(); 
ax.hist(win_chi2,       alpha=0.5, density=True, bins=100, label='chi2');      # winning bids with chi squared distributed valuations
ax.hist(win_lognorm,    alpha=0.5, density=True, bins=100, label='lognormal'); # ... with log normals 
ax.hist(d['price2ref'], alpha=0.5, density=True, bins=100, label='Observed');  # the data we want to compare to 
ax.legend(loc='best'); sns.despine(); 

## Question B.2

In [None]:
# 1. set up the grid on which we will be evaluating the ECDF functions
plow = 0.0
phigh = 2.0
xx = np.linspace(plow, phigh, 50)

# 2. construct the empirical CDF function handles
ecdf_dat = ECDF(d['price2ref'])
ecdf_sim = ECDF(win_lognorm) # the variable from question B.1

# 3. show a plot together 
plt.plot(xx, ecdf_dat(xx), '-o', xx, ecdf_sim(xx), '-x'); 
plt.legend(['Data', 'Simulation']); 

Some scaffolding code to help you get started with the criterion function. 

In [None]:
def crit(mu, sig): 
    np.random.seed(1337) # always set the seed before drawing
    w = np.random.lognormal(mu, sig, size=(n,R))
    # ... 
    w2nd = XXX # fill in the code that computes the payment by the winner based on the simulated draws, w
    
    # this code snippet shows how to subtract two ECDF functions on a common grid
    ecdf_sim = ECDF(w2nd)
    diff = ecdf_dat(xx) - ecdf_sim(xx)
    mean_squared_residuals = XXX # fill in here 

    return mean_squared_residuals

In [None]:
# here is how we call the criterion function
crit(mu=-.5, sig=.5)

In [None]:
# 1. set up anonymous criterion function in a form that minimize() accepts
crit_ = lambda x : crit(x, sig=0.5)

# 2. test that it works
crit_(x0)

# 3. call minimizer 
res = minimize(crit_, x0, method='Nelder-Mead')
res