Here we estimate the following conditional expectation function:
$$
\mathbb{E}[valuation_i|sellerfeedbackscore_i]=\alpha+\beta_1 sellerfeedbackscore_i,
$$
using listings of amazon tablets where the number of bids received is strictly between 3 and 12.

In [1]:
import pandas as pd
import numpy as np
from scipy import optimize
from main import estimate_mean, get_loss_function

data = pd.read_csv("../../data/demeaned.csv")
df = data.groupby(["id", "ispolice", "sellerfeedbackscore", "bidcount", "apple", "amazon"])["bids"].apply(lambda x: x.values).reset_index()

In [2]:
amazon_bids = list(df[(df.amazon == 1) & (df.ispolice == 1)].bidcount.value_counts().index)
include = df[(df.bidcount > 3) & (df.bidcount < 12) & (df.amazon == 1) & (df.bidcount.isin(amazon_bids))]

bids = list(include.bids)
logged_feedback = np.log(include.sellerfeedbackscore+1)
covariates = [[cov] for cov in list(logged_feedback)]

covariates = [[cov] for cov in transform_covariates([cov[0] for cov in covariates],100)]

In [3]:
expected_upper, expected_lower = estimate_mean(bids, covariates, (0,9))

calculating values for covariate: [7.462789157412447] (1/100)
total time elapsed: 0.00023839700000038988s


  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  return integrate.quad(self._mom_integ1, 0, 1, args=(m,)+args)[0]


calculating values for covariate: [6.677083461247136] (2/100)
total time elapsed: 49.351677912s
calculating values for covariate: [6.401917196727186] (3/100)
total time elapsed: 93.170858684s
calculating values for covariate: [6.206575926724928] (4/100)
total time elapsed: 136.760382389s
calculating values for covariate: [6.473890696352274] (5/100)
total time elapsed: 182.453314713s
calculating values for covariate: [9.955277308666151] (6/100)
total time elapsed: 222.523888214s
calculating values for covariate: [6.230481447578482] (7/100)
total time elapsed: 264.717034309s
calculating values for covariate: [7.691200097522863] (8/100)
total time elapsed: 308.206211177s
calculating values for covariate: [5.706558089675363] (9/100)
total time elapsed: 348.22071537s
calculating values for covariate: [5.551688515712215] (10/100)
total time elapsed: 388.162539871s
calculating values for covariate: [6.685860947068359] (11/100)
total time elapsed: 428.450415503s
calculating values for covariat

  the requested tolerance from being achieved.  The error may be 
  underestimated.
  return integrate.quad(self._mom_integ1, 0, 1, args=(m,)+args)[0]


calculating values for covariate: [6.234410725718371] (36/100)
total time elapsed: 1465.387728362s
calculating values for covariate: [6.882437470997846] (37/100)
total time elapsed: 1506.656569032s
calculating values for covariate: [6.902742737158593] (38/100)
total time elapsed: 1546.5718076590001s
calculating values for covariate: [6.079765600547129] (39/100)
total time elapsed: 1588.542601957s
calculating values for covariate: [6.741700694652055] (40/100)
total time elapsed: 1628.252380762s
calculating values for covariate: [8.031710375322042] (41/100)
total time elapsed: 1668.668715049s
calculating values for covariate: [6.054436591076521] (42/100)
total time elapsed: 1709.476032456s
calculating values for covariate: [6.895682697747867] (43/100)
total time elapsed: 1749.411310814s
calculating values for covariate: [7.4360278163518485] (44/100)
total time elapsed: 1789.105117329s
calculating values for covariate: [6.1601245927962225] (45/100)
total time elapsed: 1831.1437432320001s


In [4]:
def loss_function(c):
    a, b = c
    cef = lambda cov: a+b*cov[0]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat = optimize.brute(loss_function, ranges=[(0,2), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [5]:
b_hat

array([ 5.34600882e-01, -5.17296347e-04])