Here we estimate the following conditional expectation function:
$$
\mathbb{E}[valuation_i|ispolice_i, sellerfeedbackscore_i]=\alpha+\beta_1 ispolice_i + \beta_2sellerfeedbackscore_i,
$$
using the entire sample without considering the number of bids received.

In [1]:
import pandas as pd
import numpy as np
from scipy import optimize
from main import estimate_median, get_loss_function, transform_covariates

data = pd.read_csv("../../data/demeaned.csv")
df = data.groupby(["id", "ispolice", "sellerfeedbackscore", "bidcount", "apple", "amazon"])["bids"].apply(lambda x: x.values).reset_index()

In [2]:
compatible_bids = list(df[df.ispolice == 1].bidcount.value_counts().index)
include = df[df.bidcount.isin(compatible_bids)]

bids = list(include.bids)

logged_feedback = np.log(include.sellerfeedbackscore+1)
logged_feedback = transform_covariates(logged_feedback, 100)
include.sellerfeedbackscore = logged_feedback

covariates = np.array(include[["ispolice", "sellerfeedbackscore"]])
covariates = list([list(cov) for cov in covariates])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [3]:
expected_upper, expected_lower = estimate_mean(bids, covariates, (0,9))

calculating values for covariate: [0.0, 7.5643505225246495] (1/100)
total time elapsed: 0.0002913659999999041s


  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  return integrate.quad(self._mom_integ1, 0, 1, args=(m,)+args)[0]


calculating values for covariate: [0.0, 5.66892376443991] (2/100)
total time elapsed: 285.29934601900004s
calculating values for covariate: [0.0, 4.206434182253519] (3/100)
total time elapsed: 620.708989173s
calculating values for covariate: [0.0, 9.436794237468936] (4/100)
total time elapsed: 979.1227334299999s
calculating values for covariate: [0.0, 5.620782532867282] (5/100)
total time elapsed: 1400.358966102s
calculating values for covariate: [1.0, 9.955277308666153] (6/100)
total time elapsed: 1766.9347300660002s
calculating values for covariate: [0.0, 4.63829374083503] (7/100)
total time elapsed: 2066.474696679s
calculating values for covariate: [0.0, 5.165054621323595] (8/100)
total time elapsed: 2318.256439163s
calculating values for covariate: [0.0, 9.484552478583174] (9/100)
total time elapsed: 2552.871486917s
calculating values for covariate: [0.0, 2.28815873655513] (10/100)
total time elapsed: 2777.637248435s
calculating values for covariate: [0.0, 6.775978638583625] (11/10

calculating values for covariate: [0.0, 9.69370293408745] (80/100)
total time elapsed: 17587.752047342s
calculating values for covariate: [0.0, 3.3524442662096194] (81/100)
total time elapsed: 17779.731492221003s
calculating values for covariate: [0.0, 4.707890119869979] (82/100)
total time elapsed: 17973.303141105s
calculating values for covariate: [0.0, 8.283085105665934] (83/100)
total time elapsed: 18165.125533059003s
calculating values for covariate: [0.0, 7.954475519512463] (84/100)
total time elapsed: 18358.65386994s
calculating values for covariate: [0.0, 5.888120972371637] (85/100)
total time elapsed: 18551.964618707003s
calculating values for covariate: [0.0, 7.010766195476258] (86/100)
total time elapsed: 18740.013911521s
calculating values for covariate: [0.0, 2.824449727261541] (87/100)
total time elapsed: 18926.639500032s
calculating values for covariate: [0.0, 4.588530380982284] (88/100)
total time elapsed: 19115.818785738s
calculating values for covariate: [0.0, 12.9276

In [4]:
def loss_function(c):
    a, b1, b2 = c
    cef = lambda cov: a+b1*cov[0]+b2*cov[1]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat = optimize.brute(loss_function, ranges=[(0,2), (-1,1), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [6]:
b_hat

array([0.70028711, 0.3793633 , 0.0452895 ])

In [7]:
def loss_function(c):
    a, b1, b2, b3 = c
    cef = lambda cov: a+b1*cov[0]+b2*cov[1]+b3*cov[0]*cov[1]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat_2 = optimize.brute(loss_function, ranges=[(0,2), (-1,1), (-1,1), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [8]:
b_hat_2

array([ 0.7002562 , -0.15997289,  0.04529115,  0.0528367 ])