Here we estimate the following conditional expectation function:
$$
\mathbb{E}[valuation_i|ispolice_i, sellerfeedbackscore_i]=\alpha+\beta_1 ispolice_i+\beta_2 sellerfeedbackscore_i
$$
using the entire sample where the number of bids received is strictly between 3 and 12.

In [1]:
import pandas as pd
import numpy as np
from scipy import optimize
from main import estimate_mean, get_loss_function

data = pd.read_csv("../../data/demeaned.csv")
df = data.groupby(["id", "ispolice", "sellerfeedbackscore", "bidcount", "apple", "amazon"])["bids"].apply(lambda x: x.values).reset_index()

In [2]:
include = df[(df.bidcount > 3) & (df.bidcount < 12)]

bids = list(include.bids)

logged_feedback = np.log(include.sellerfeedbackscore+1)
logged_feedback = transform_covariates(logged_feedback, 100)
include.sellerfeedbackscore = logged_feedback

covariates = np.array(include[["ispolice", "sellerfeedbackscore"]])
covariates = list([list(cov) for cov in covariates])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [3]:
expected_upper, expected_lower = estimate_mean(bids, covariates, (0,9))

calculating values for covariate: [0.0, 9.869815537797194] (1/100)
total time elapsed: 0.00020939699999988903s


  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  return integrate.quad(self._mom_integ1, 0, 1, args=(m,)+args)[0]


calculating values for covariate: [0.0, 3.4599452524466194] (2/100)
total time elapsed: 86.89877515s
calculating values for covariate: [0.0, 5.3228978543165555] (3/100)
total time elapsed: 175.726371385s
calculating values for covariate: [0.0, 5.439956999893489] (4/100)
total time elapsed: 257.99094324000004s
calculating values for covariate: [0.0, 1.4606755448912938] (5/100)
total time elapsed: 345.391337946s
calculating values for covariate: [0.0, 5.724391198805521] (6/100)
total time elapsed: 432.27053192s
calculating values for covariate: [0.0, 3.5636780469053466] (7/100)
total time elapsed: 526.769724695s
calculating values for covariate: [0.0, 2.440017827490736] (8/100)
total time elapsed: 611.7945327040001s
calculating values for covariate: [0.0, 6.926257377649698] (9/100)
total time elapsed: 701.071301622s
calculating values for covariate: [0.0, 4.257356013476545] (10/100)
total time elapsed: 786.871487624s
calculating values for covariate: [0.0, 9.326772384676856] (11/100)
tot

calculating values for covariate: [0.0, 6.164360431972145] (80/100)
total time elapsed: 6657.422606262s
calculating values for covariate: [0.0, 8.288236466386568] (81/100)
total time elapsed: 6750.041824473s
calculating values for covariate: [0.0, 6.3836558487392425] (82/100)
total time elapsed: 6844.996687732s
calculating values for covariate: [0.0, 7.307635641656483] (83/100)
total time elapsed: 6938.515677316001s
calculating values for covariate: [0.0, 6.882170651082656] (84/100)
total time elapsed: 7032.320352211001s
calculating values for covariate: [1.0, 10.394073463078007] (85/100)
total time elapsed: 7124.5859430010005s
calculating values for covariate: [0.0, 8.007850269063637] (86/100)
total time elapsed: 7251.496651836001s
calculating values for covariate: [0.0, 6.33767709149798] (87/100)
total time elapsed: 7345.743183941s
calculating values for covariate: [0.0, 6.07562674059974] (88/100)
total time elapsed: 7441.431616639s
calculating values for covariate: [0.0, 5.252218213

In [9]:
def loss_function(c):
    a, b1, b2 = c
    cef = lambda cov: a+b1*cov[0]+b2*cov[1]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat = optimize.brute(loss_function, ranges=[(0,2), (-1,1), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [48]:
def fun(c):
    return loss_function(c)-loss_function(b_hat)-10

print(optimize.newton(fun, [b_hat[0]+5,b_hat[1]+5,b_hat[2]+4]))
print(optimize.newton(fun, [b_hat[0]-2,b_hat[1]-2,b_hat[2]-2]))

[0.27967938 0.27855938 0.04257074]
[0.07791318 0.07614375 0.03040485]


In [16]:
loss_function([0.13918062, 0.13635494, 0.06331243])

50.94494942026856

In [5]:
b_hat

array([0.18091567, 0.17550602, 0.03566906])

In [7]:
def loss_function(c):
    a, b1, b2, b3 = c
    cef = lambda cov: a+b1*cov[0]+b2*cov[1]+b3*cov[0]*cov[1]
    return get_loss_function(covariates, expected_upper, expected_lower, cef)

b_hat_2 = optimize.brute(loss_function, ranges=[(0,2), (-1,1), (-1,1), (-1,1)])
# interval_lower = optimize.newton(lambda a_l, b_l: loss_function(a_l, b_l)-loss_function(b_hat)-10, b_hat-0.1)
# interval_upper = optimize.newton(lambda b_l, b_u: loss_function(b_l, b_u)-loss_function(b_hat)-10, b_hat+0.1)

In [8]:
b_hat_2

array([ 0.18091552,  0.71680167,  0.03566808, -0.0552502 ])