Here we estimate the following conditional expectation function:
$$
\mathbb{E}[valuation_i|ispolice_i, sellerfeedbackscore_i]=\alpha+\beta_1 ispolice_i+\beta_2 sellerfeedbackscore_i
$$
using listings of amazon tablets where the number of bids received is strictly between 3 and 12.

In [1]:
import pandas as pd
import numpy as np
from scipy import optimize
from main import estimate_median, get_loss_function, transform_covariates

data = pd.read_csv("../../data/demeaned.csv")
df = data.groupby(["id", "ispolice", "sellerfeedbackscore", "bidcount", "apple", "amazon"])["bids"].apply(lambda x: x.values).reset_index()

In [3]:
amazon_bids = list(df[(df.amazon == 1) & (df.ispolice == 1)].bidcount.value_counts().index)

include = df[(df.bidcount > 3) & (df.bidcount < 12) & (df.amazon == 1) & (df.bidcount.isin(amazon_bids))]

bids = list(include.bids)

logged_feedback = np.log(include.sellerfeedbackscore+1)
logged_feedback = transform_covariates(logged_feedback, 100)
include.sellerfeedbackscore = logged_feedback

covariates = np.array(include[["ispolice", "sellerfeedbackscore"]])
covariates = list([list(cov) for cov in covariates])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [4]:
expected_upper, expected_lower = estimate_mean(bids, covariates, (0,9))

calculating values for covariate: [0.0, 6.411818267709897] (1/100)
total time elapsed: 0.00027485800000004446s


  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  return integrate.quad(self._mom_integ1, 0, 1, args=(m,)+args)[0]


calculating values for covariate: [0.0, 6.882437470997846] (2/100)
total time elapsed: 37.131329965s
calculating values for covariate: [0.0, 6.079765600547129] (3/100)
total time elapsed: 72.527327082s
calculating values for covariate: [0.0, 9.26416545867109] (4/100)
total time elapsed: 108.652126492s
calculating values for covariate: [0.0, 6.741700694652055] (5/100)
total time elapsed: 144.761459916s
calculating values for covariate: [0.0, 6.133398042996649] (6/100)
total time elapsed: 187.20196385399998s
calculating values for covariate: [0.0, 8.031710375322042] (7/100)
total time elapsed: 225.255064993s
calculating values for covariate: [0.0, 6.054436591076521] (8/100)
total time elapsed: 264.016385141s
calculating values for covariate: [0.0, 6.895682697747867] (9/100)
total time elapsed: 303.046652176s
calculating values for covariate: [0.0, 6.7464121285733745] (10/100)
total time elapsed: 344.217850157s
calculating values for covariate: [0.0, 7.4360278163518485] (11/100)
total tim

calculating values for covariate: [0.0, 6.5250296578434615] (81/100)
total time elapsed: 3648.961371444s
calculating values for covariate: [0.0, 4.787179145085087] (82/100)
total time elapsed: 3681.6215595000003s
calculating values for covariate: [0.0, 13.275246150920841] (83/100)
total time elapsed: 3715.29117841s
calculating values for covariate: [0.0, 6.173786103901937] (84/100)
total time elapsed: 3747.1520388540002s
calculating values for covariate: [0.0, 6.043802187577048] (85/100)
total time elapsed: 3779.746160864s
calculating values for covariate: [0.0, 5.016731314478937] (86/100)
total time elapsed: 3812.4726732010004s
calculating values for covariate: [0.0, 6.234410725718371] (87/100)
total time elapsed: 3845.246504481s


In [5]:
def loss_function(c):
    a, b1, b2 = c
    cef = lambda cov: a+b1*cov[0]+b2*cov[1]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat = optimize.brute(loss_function, ranges=[(0,2), (-1,1), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [6]:
b_hat

array([-0.08790365,  2.12122365,  0.12223487])

In [7]:
def loss_function(c):
    a, b1, b2, b3 = c
    cef = lambda cov: a+b1*cov[0]+b2*cov[1]+b3*cov[0]*cov[1]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat_2 = optimize.brute(loss_function, ranges=[(0,2), (-1,1), (-1,1), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [8]:
b_hat_2

array([-0.08793775, -0.30354177,  0.12223971,  0.23259789])