Here we estimate the following conditional expectation function:
$$
\mathbb{E}[valuation_i|log\_sellerfeedbackscore_i]=\alpha+\beta_1 log\_sellerfeedbackscore_i,
$$
using the entire sample without considering the number of bids received.

In [1]:
import pandas as pd
import numpy as np
from scipy import optimize
from main import estimate_median, get_loss_function, transform_covariates

data = pd.read_csv("../../data/demeaned.csv")
df = data.groupby(["id", "ispolice", "sellerfeedbackscore", "bidcount", "apple", "amazon"])["bids"].apply(lambda x: x.values).reset_index()

In [2]:
compatible_bids = list(df[df.ispolice == 1].bidcount.value_counts().index)
include = df[df.bidcount.isin(compatible_bids)]

bids = list(include.bids)
logged_feedback = np.log(include.sellerfeedbackscore+1)
covariates = [[cov] for cov in list(logged_feedback)]

covariates = [[cov] for cov in transform_covariates([cov[0] for cov in covariates],100)]

In [3]:
expected_upper, expected_lower = estimate_mean(bids, covariates, (0,9))

calculating values for covariate: [3.3524442662096194] (1/100)
total time elapsed: 0.0003543439999997844s


  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  return integrate.quad(self._mom_integ1, 0, 1, args=(m,)+args)[0]


calculating values for covariate: [5.979787415449747] (2/100)
total time elapsed: 261.879225673s
calculating values for covariate: [2.533449658204007] (3/100)
total time elapsed: 598.129949333s
calculating values for covariate: [4.3320091121662205] (4/100)
total time elapsed: 931.7472917990001s
calculating values for covariate: [6.950685674976283] (5/100)
total time elapsed: 1355.953608753s
calculating values for covariate: [7.271272950468081] (6/100)
total time elapsed: 1726.670555096s
calculating values for covariate: [5.888120972371637] (7/100)
total time elapsed: 2040.032028277s
calculating values for covariate: [8.012520509835065] (8/100)
total time elapsed: 2311.508472213s
calculating values for covariate: [7.010766195476258] (9/100)
total time elapsed: 2539.482146984s
calculating values for covariate: [5.580993455214383] (10/100)
total time elapsed: 2751.910664247s
calculating values for covariate: [2.824449727261541] (11/100)
total time elapsed: 2959.781669776s
calculating valu

calculating values for covariate: [1.2277304454369522] (84/100)
total time elapsed: 18085.498270738s
calculating values for covariate: [6.8388384000963995] (85/100)
total time elapsed: 18276.866226263s
calculating values for covariate: [8.069807973999874] (86/100)
total time elapsed: 18464.479394241s
calculating values for covariate: [6.617297006879771] (87/100)
total time elapsed: 18652.983819182s
calculating values for covariate: [2.119673090808945] (88/100)
total time elapsed: 18838.281739951002s
calculating values for covariate: [6.081863851886263] (89/100)
total time elapsed: 19026.172972038s
calculating values for covariate: [7.320708614691379] (90/100)
total time elapsed: 19209.767776893s
calculating values for covariate: [7.710150199345256] (91/100)
total time elapsed: 19393.67400721s
calculating values for covariate: [4.525454164522782] (92/100)
total time elapsed: 19611.793688868s
calculating values for covariate: [1.7619265553871364] (93/100)
total time elapsed: 19827.966878

In [4]:
def loss_function(c):
    a, b = c
    cef = lambda cov: a+b*cov[0]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat = optimize.brute(loss_function, ranges=[(0,2), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [5]:
b_hat

array([0.58750368, 0.0382627 ])