In [None]:
version = "REPLACE_PACKAGE_VERSION"

# Experiment Design and Analysis
## School of Information, University of Michigan

## Week 2: 


## Assignment Overview
### The objective of this assignment is to:

- Apply theory of experiment design and knowledge of analysis techniques to real experiment data.


### The total score of this assignment will be 19 points


### Resources:
- StatsModels, Scipy.stats, Numpy
    - We recommend using two python libraries called [StatsModels](https://www.statsmodels.org/stable/index.html) and [Scipy.stats](https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html) for data analysis. For this dataset, we'll be using [Numpy](https://numpy.org/devdocs/reference/index.html) as well.
    
- Optional Reading: [Holt C.A, & Laury S.K. Risk Aversion and Incentive Effects. (2002).](https://www.jstor.org/stable/3083270)  


Datasets used in this assignment
- Trust data [download csv file](assets/assignment2_data1.csv)
- Fixed-Price Auction data [download csv file](assets/assignment2_data2.csv)
    - Source for dataset: [Chen, Y., et al. Sealed bid auctions with ambiguity: Theory and experiments. (2007).](https://www.sciencedirect.com/science/article/pii/S0022053107000178)

In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms
from scipy import stats
#you may or may not use all of the above libraries, and that is OK!

trust_data = pd.read_csv('assets/assignment2_data1.csv') #Trust Game data for this assignment
fpa_data = pd.read_csv('assets/assignment2_data2.csv') #First-price auction data for this assignment - this is the same dataset from last week

In [None]:
#uncomment the below line to view readme files for this dataset (includes explanation of variable names)
#!cat assets/assignment2_data1_readme.md
#!cat assets/assignment2_data2_readme.md

#uncomment the below line to view snippet of csv file
#trust_data.head()
#fpa_data.head()

## Part A (3 points)

1. For the Trust Game, 
subjects are grouped in pairs, paired with one assigned the role of an investor and another a recipient. Let's examine  the correlation between the amounts the investors invest and the amounts the recipients return. Complete the function below to return the correlation coefficient.

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [None]:
def inv_rec_corrcoef(provided_data):
    """ Later in this problem set, you will be modeling OLS regressions on your data. For now, we'll calculate just
    the correlation coefficient using numpy. If needed, refer to the numpy documentation linked above.
    """
    ### BEGIN SOLUTION
    invest = trust_data[trust_data['player role'] == 'first']['decision']
    returner = trust_data[trust_data['player role'] == 'second']['decision']
    answer = round(np.corrcoef(invest, returner)[0][1],2)
    ### END SOLUTION
    return answer

Your function should return a string with the correct coefficient value. Check that it does:

In [None]:
inv_rec_corrcoef(trust_data)

In [None]:
assert type(inv_rec_corrcoef(trust_data)) == np.float64

In [None]:
"""Part A #1: Checking value of correlation coefficient"""
# Hidden tests
### BEGIN HIDDEN TESTS
assert inv_rec_corrcoef(trust_data) == 0.82, "Part A #1 correlation coefficient value differs"
### END HIDDEN TESTS

## Part B (4 points)

For the first-price auctions experiment, there are ten experimental sessions, with eight subjects per session. In this context, subjects are tasked with completing auction and lottery (Holt-Laury 2002) tasks in two orders. In five of the ten sessions, subjects first complete a lottery task, followed by 30 rounds of auctions. In the other five sessions, subjects first complete 30 rounds of auctions, followed by a lottery task. At the end of each session, subjects complete a demographics survey. The data sets extract the first period auction data for each treatment.


In this case, say that the control for the first-price auction experiment is the order in which subjects complete the lottery task followed by the auction task (k1_8_lot_exp) and the outcome variable we want to measure is the bid-value ratio (b/v).

1. Using differences-in-means, what is the average treatment effect for the first-price auction experiment? (4 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [None]:
def ate_fpa_payoff(provided_data):
    """
    Write the function to manually check the differences in means of bid-value ratios across the different groups explained above.
    Tip: the easiest way to do this is to create a new dataframe column called 'bidval_ratio' in the provided data.
    Your function should output a dataframe with the following columns: 'lot_auc_mean', 'auc_lot_mean', 'diff in means'.
    """
    ### BEGIN SOLUTION
    ate_fpa_payoff_df = pd.DataFrame(columns=['lot_auc_mean','auc_lot_mean', 'diff in means'])
    provided_data['bidval_ratio'] = provided_data['b'] / provided_data['v']
    ate_fpa_payoff_df['lot_auc_mean'] = [round(provided_data[provided_data['treatment'] == 'k1_8_lot_exp']['bidval_ratio'].mean(),2)]
    ate_fpa_payoff_df['auc_lot_mean'] = [round(provided_data[provided_data['treatment'] == 'k1_8_exp_lot']['bidval_ratio'].mean(),2)]
    ate_fpa_payoff_df['diff in means'] = ate_fpa_payoff_df['lot_auc_mean']- ate_fpa_payoff_df['auc_lot_mean']
    ###END SOLUTION
    return ate_fpa_payoff_df

Your function should return a dataframe with the correct values and columns. Check that it does:

In [None]:
ate_fpa_payoff(fpa_data)

In [None]:
assert isinstance (ate_fpa_payoff(fpa_data), pd.core.frame.DataFrame), "checking your data is in a dataframe"

In [None]:
assert ate_fpa_payoff(fpa_data).columns[0] == 'lot_auc_mean', "checking df column names"
assert ate_fpa_payoff(fpa_data).columns[1] == 'auc_lot_mean', "checking df column names"
assert ate_fpa_payoff(fpa_data).columns[2] == 'diff in means', "checking df column names"

In [None]:
"Part B: lot_auc_mean value"
# Hidden tests
### BEGIN HIDDEN TESTS
assert next(iter(ate_fpa_payoff(fpa_data)["lot_auc_mean"])) == round(fpa_data[fpa_data['treatment'] == 'k1_8_lot_exp']['bidval_ratio'].mean(),2), "Part B #1 lot_auc_mean value differs"
### END HIDDEN TESTS

In [None]:
"Part B: auc_lot_mean value"
# Hidden tests
### BEGIN HIDDEN TESTS
assert next(iter(ate_fpa_payoff(fpa_data)['auc_lot_mean'])) == round(fpa_data[fpa_data['treatment'] == 'k1_8_exp_lot']['bidval_ratio'].mean(),2), "Part B #1 auc_lot_mean value differs"
### END HIDDEN TESTS

## Part C (10 points)

Continuing with the ```fpa_data``` dataset from last week, we would expect subjects to bid a certain fraction of their value in a first-price sealed bid auction depending on their risk attitudes (e.g., risk neutral, risk averse). Let's explore what effect gender has on bid-value ratios when controlled with risk. This time, let's calculate this average treatment effect using an ordinary least-squares regression.

1. Using the ```fpa_data``` dataframe and an ordinary least-squares regression model, complete the ```ols_riskfemale_on_bidvalue``` function to evaluate how subjects’ risk attitudes and gender (in the form of the _female_ variable) affect their bid/value ratio. For now, we'll just return a summary view of our data. (2 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [None]:
def ols_riskgender_on_bidvalue(provided_data):
    
    """
    The easiest way to evaluate how subjects' risk attitudes and gender affect their bid/value ratio is to run an OLS linear
    regression on your fpa_data dataframe. Use the statsmodels library to run an OLS linear regression, and return the summary
    view of your results.

    """
    X = provided_data[['female', 'ra']]
    ### BEGIN SOLUTION
    Y = provided_data['bv_ratio'] = provided_data['b'] / provided_data['v']
    X = sm.add_constant(X)
    model = sm.OLS(Y,X).fit()
    model_summary = model.summary()
    ### END SOLUTION
    return model_summary


Your function should return a summary view of your results. Check that it does:

In [None]:
print(ols_riskgender_on_bidvalue(fpa_data)) #we've wrapped this in a print statement to preserve the original statsmodels layout -- if you get a deprecation warning, that is fine.

In [None]:
assert isinstance (ols_riskgender_on_bidvalue(fpa_data), sm.iolib.summary.Summary), "checking your summary is output as a statsmodel summary object"

2. Now, modify the ols_riskgender_on_bidvalue function to access the model's coefficients (parameters) and associated p-values, instead of printing out the entire summary view. For now, we won't worry about rounding. (1 point)
    

In [None]:
def ols_riskgender_on_bidvalue(provided_data):
    
    """
    The easiest way to evaluate how subjects' risk attitudes and gender affect their bid/value ratio is to run an OLS linear
    regression on your data dataframe. Use the statsmodels library to run an OLS linear regression, and this time return the
    the coefficients and the p-values for your model.

    """
    X = provided_data[['female', 'ra']]
    # complete the function by assigning your Y, and fitting your model.
    ### BEGIN SOLUTION
    Y = provided_data['bv_ratio'] = provided_data['b'] / provided_data['v']
    X = sm.add_constant(X)
    model = sm.OLS(Y,X).fit()
    model_params = model.params
    pvals = model.pvalues
    ### END SOLUTION
    return model_params, pvals #we're returning a tuple of a series here -- pretty ugly, right?

Your function should return a raw tuple of your results in pandas Series form. Check that it does:

In [None]:
ols_riskgender_on_bidvalue(fpa_data)

In [None]:
"checking your return value is a tuple of type pandas series"
assert isinstance (ols_riskgender_on_bidvalue(fpa_data)[0], pd.core.series.Series)
assert isinstance (ols_riskgender_on_bidvalue(fpa_data)[1], pd.core.series.Series)

In [None]:
"checking the value of 'const' for both values"
# Hidden tests
### BEGIN HIDDEN TESTS
assert round(next(iter(ols_riskgender_on_bidvalue(fpa_data)[0])),2) == 1.06, "Part C #2 const in first part of tuple differs"
assert round(next(iter(ols_riskgender_on_bidvalue(fpa_data)[1])),2) == 0.07, "Part C #2 const in second part of tuple differs"
### END HIDDEN TESTS

3. Now, let's make our results more readable. Let's modify our function once again to this time create a dataframe that has the coefficients and p-values for the control variables and constant, **rounding to the nearest hundredth decimal**. (2 points)
    

In [None]:
def ols_riskgender_on_bidvalue_df(provided_data):
    
    """
    This function should use the results of the ols_riskgender_on_bidvalue function above to output a dataframe
    that has the coefficients and p-values for the control variables and constant. 
    The dataframe should have the following columns: 'variable', 'coefficient', and 'p-value'
    
    """
    # define your parameters for your model and the p-values, then fill in the rest of the function below.
    
    ### BEGIN SOLUTION
    model_params = ols_riskgender_on_bidvalue(provided_data)[0]
    pvals = ols_riskgender_on_bidvalue(provided_data)[1]
    ### END SOLUTION
    ols_model_df = pd.DataFrame(columns=['variable','coefficient','p-value'])
    variables = ['const','ra','female']
    ols_model_df['variable'] = variables
    for variable in ols_model_df['variable']:
        ### BEGIN SOLUTION
        ols_model_df.loc[ols_model_df['variable']== variable,'coefficient'] = round(float(model_params[variable]),2)
        ols_model_df.loc[ols_model_df['variable']==variable,'p-value'] = round(float(pvals[variable]),2)
        ### END SOLUTION
    return ols_model_df

Your function should return a dataframe of your results. Check that it does:

In [None]:
ols_riskgender_on_bidvalue_df(fpa_data)

In [None]:
"""Part C: Check the dataframe outputs the correct p-values from OLS model"""
# Hidden tests
### BEGIN HIDDEN TESTS
assert ols_riskgender_on_bidvalue_df(fpa_data).iloc[2][2] == 0.41, "Part C #3 female p-value differs"
assert ols_riskgender_on_bidvalue_df(fpa_data).iloc[1][2] == 0.89, "Part C #3 risk aversion p-value differs"
### END HIDDEN TESTS

In [None]:
"""Part C: Check the dataframe outputs the correct coefficients from OLS model"""
# Hidden tests
### BEGIN HIDDEN TESTS
assert ols_riskgender_on_bidvalue_df(fpa_data).iloc[2][1] == -0.21, "Part C #3 female coefficient differs"
assert ols_riskgender_on_bidvalue_df(fpa_data).iloc[1][1] == -0.01, "Part C #3 risk aversion coefficient differs"
### END HIDDEN TESTS

2. If you remove the risk attitudes variable from the model, does it have a significant effect on how gender contributes to bid/value ratios? Complete the ```ols_female_on_bidvalue``` function to assess this. Part of the function has already been completed for you. (3 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [None]:
def ols_gender_on_bidvalue_df(provided_data):
    
    """
    Complete the function that takes the provided data and creates a OLS model that determines the effect of 
    gender (using the female variable) on subjects' bid/value ratios. It should output (by filling in the missing values) 
    a dataframe that has the coefficients for the control variables and intercept.
    
    """
    # assign your X and Y variables, and define your parameters and pvalues. Then, fill in the rest of the function below.
    ### BEGIN SOLUTION
    X = provided_data['female']
    Y = provided_data['bv_ratio'] = provided_data['b'] / provided_data['v']
    X = sm.add_constant(X)
    model = sm.OLS(Y,X).fit()
    model_params = model.params
    pvals = model.pvalues
    ### END SOLUTION
    ols_gender_model_df = pd.DataFrame(columns=['variable','coefficient','p-value'])
    variables = ['const','female']
    ols_gender_model_df['variable'] = variables

    for variable in ols_gender_model_df['variable']:
        ### BEGIN SOLUTION
        ols_gender_model_df.loc[ols_gender_model_df['variable']==variable,'coefficient'] = round(float(model_params[variable]),2)
        ols_gender_model_df.loc[ols_gender_model_df['variable']==variable,'p-value'] = round(float(pvals[variable]),2)
        ### END SOLUTION
    return ols_gender_model_df


Your function should return a dataframe with each of the variables and their completed coefficient and p-value for the OLS model. 

Check that it does:

In [None]:
ols_gender_on_bidvalue_df(fpa_data)

In [None]:
assert ols_gender_on_bidvalue_df(fpa_data).iloc[0][2] == 0, "checking the const p-value value"

In [None]:
"""Check that the dataframe outputs the correct values from the OLS model"""
# Hidden tests
### BEGIN HIDDEN TESTS
assert ols_gender_on_bidvalue_df(fpa_data).iloc[0][1] == 0.98, "Part C const coefficient differs"
assert ols_gender_on_bidvalue_df(fpa_data).iloc[1][1] == -0.21, "Part C female coefficient differs"
assert ols_gender_on_bidvalue_df(fpa_data).iloc[1][2] == 0.4, "Part C female p-value differs"
### END HIDDEN TESTS