--- 
Project for the course in Microeconometrics | Summer 2020, M.Sc. Economics, Bonn University | [Hyein Jeong](https://github.com/huiren-j)
# Replication of Decarolis, Francesco (2014)  <a class = "tocSkip">
---

---
# 1. Introduction
---

Decarolis examine the effect of the switch to first price sealed bid acutions(FPAs) for public procurement on performances using data from Italy. FPAs is known as a public procurement method with low cost and high transparency. Italy government switched their public procurement method from to FPAs occured between January 2000 to June 2006. However, the author argue that people miss some facts about FPAs. That is, high discounts at the awarding stage are obtained at the cost of excessviely favoring bidders might be problematic because in fact, the bidders are least likely to deliever what has been promised. Depending on the setup, this could mean a failure to respect the promised complement time, cost budget or work quality, all of which I will consider as measures of ex post contract performance.

Despite the vast theoretical literature, little is known about the empirical relevance of this trade-off.
The main results confirm that the switch to FPAs substantially lowers the wining price but that it also worsen performance.
The results are obtained through a Difference-in-Differences identification strategy which is applied seperately to two different large PAs that switched to FPAs in 2003:the County of Turin and the Municipality of Turin.
- Municipality of Turin: FPAs increased the winning discount by approximately 13 percent of the reserve price, cost overruns by 6 percent of the reserve price, and the delay in the completion time by 28 percent of the contractual length of the job.
- County of Turin: generated a smaller increase of the winning discount, in the order of 8 percent of the reserve price, and an increase in the extra cost and extra time that is one-third in terms of magnitude of that estimated for the Municipality of Turin, and not statistically significant.

An interesting finding of this study is that the difference in the effects of FPAs across the two PAs is related to how they screen bids for their reliability.
PAs might exclude the highest discount if the offer is judged “too good to be true.”
Using the number of days between when bids are opened and when the contract is awarded as a measure of screening intensity, I observe that in both PAs the switch to FPAs is associated with an increase of this variable.
This finding suggests the presence of a second trade-off.
However, there are two costs to screen bids.
1. a direct cost: ex, the cost of the engineers assessing the bid liability
2. an indirect cost: generated by the lessening of competition induced among bidders.
This paper appears to be the first to quantify its interplay with FPAs.

The Final Part
I analyze a more recent wave of reforms that temporarily expand the use of FPAs in Italy, but that were ultimately reversed.
The mandatory use of FPAs  for PAs reduces the winning price but also significantly increases the screening cost.
Hence, the effectiveness of a switch toward FPAs appears to crucially hinge on a separate institution, the ex post bid screening.

this paper presents nontrivial policy implications indicating that the benefits of adopting FPAs in procurement depends on the strength of the institutions ensuring that bids are enforceable. 

# 1.1. Public Procurement System and Policy Changes

- A. The Italian Public Procurement System
    : With a few exceptions for military and strategic infrastructures, the procurement of contracts for the construction and maintenance of public works by all types of Italian Public Administration(PAs) occurs broadly under the same regulation. Contracts for public works can typically be procured exclusively through auctions based exclusively on price. Between 2000 and 2006, these auctions accounted for 79 percent of all procurements held and 82 percent of the total value of all contracts procured. As regards the auctions based only on price, two distinct mechanisms exist: first price auctions(FPA)and average bid auctions (ABA).
    - Detailed explations about auctions
        the process starts with a PA releasing a call for tenders that illustrates the contract characteristics, including the maximum price the PA is willing to pay (i.e., the reserveprice) and the procedure used (FPA or ABA). Then every firm qualified to bid for public contracts can submit its sealed bid, consisting of a discount over the reserve price.
        -> FPA: the highest “responsible discount” wins.
        -> ABA: the highest discount is always eliminated because there is a judgment of which discounts are not responsible is automated through an algorithm that discharges discounts greater than a kind of trim mean.
- B. Timing of the Reforms
    : In the period between January 2000 and June 2006, the regulation required the use of ABAs for all contracts with a reserve price below (approximately) €5 million. The European Union regulation mandates the use of FPAs for contract at or above this value.In January 2003, after a case of collusion in ABAs became public, the Municipality of Turin ruled to replace ABAs with FPAs for all contracts. Two months later, the same reform was followed by the County of Turin. Therefore, although the switch to FPAs is clearly not random, the fact that it occurred first in Turin and only years later in other similarPAs is due to causes unrelated to the effects of FPAs.

# 2. Theory Overview

**2.A. The Trade-Off between the Winning Bid and Performance**


FPAs lead the bidders into high competition. This results in high dicounts from the reserve price which means the government's maximum willingness to pay.
On the other hand, this high competitive environment yields unreliable discount level.Thus at the performance stage, the government could get low quality of perfomance by the winner as the trade-off low price.

**2.B. The Trade-Off between Performance and the Screening Cost**

To prevent low quality of performance, the government can induce ex post screeing procedure. However, the screening costs will occur instead of getting low quality of performance.

**2.C. The ABA and the Expected Effects of a Switch from ABA to FPA**

a switch from ABAs to FPAs should cause a decrease of th eprice at which the contract is awarded, a worsening of the measures of ex post performance, and an increase in bid screening. 

# 3. Data

**3.A. Details about the Chosen Samples**

The Authority collects data on all contracts for public works,awarded between Jan 2000 and June 2006, with a reserve price above €150,000 procured by all PAs. However, I restrict my analysis to onlythe simplest types of public works (consisting mostly of roadwork construction and repair jobs), awarded through either ABAs or FPAs, having a reserve price between €300,000 and €5 million, auctioned off by either counties or municipalities located in five regions in the North (Piedmont, Lombardy, Veneto, Emilia, and Liguria).

These simple contracts are about a quarter of all the public works procured. Moreover, they are the most appropriate for the analysis of FPAs on winning bids because their reserve prices are comparable across PAs.

1. A key feature of the contracts that this paper handled with is that the PA is not in full control of the reserve price.
2. The PA engineers evaluate the types and quantities of inputs needed to complete a project. The RP is then obtained by these input* their prices, and summing up these products.
3. However, input prices are not the current market prices but list prices set every year for the respective regions and used exclusively by PAs to calculate reserve prices.
4. The similarity of these prices in the chosen regions(P,Lo,V,E,Li) helps the comparability of reserve prices.
5. Furthermore, at least in the case of simple roadwork jobs, it seems plausbile to assume that there is not too much discretion in the type and quantity of inputs to use. The tehnology of the work determines them. Since the geographical area of the chosen regions is rather homogeneous, similar roadwork jobs likely require the same types and quantities of inputs in all the PAs in the used sample. 

- seperated by two period: Jan 2000 to Dec 2002 (reform in Turin) Jan 2003 to June 2006
- auctioneer: Municipality of Turin, County of Turin, Other PAs 

**Table1. Descriptive Statistics**

In [8]:
import pandas as pd
import numpy as np
import matplotlib as plt
from linearmodels import PanelOLS
import statsmodels.api as sm
import econtools as econ
import econtools.metrics as mt
import math
from statsmodels.stats.outliers_influence import variance_inflation_factor

from auxiliary.prepare import *
from auxiliary.table2 import *
from auxiliary.table3 import *
from auxiliary.table4 import *
from auxiliary.table6 import *
from auxiliary.table7 import *
from auxiliary.extension import *
from auxiliary.table_formula import *

In [9]:
data = pd.read_stata('data/Authority.dta')
print(data.shape)

(16127, 31)


In [10]:
#table1
df_1 = prepare_data(data)
df_pre = presort_describe(df_1)
presort = table1_presort(df_pre)
presort.sample(9)

(16127, 31)
(8105, 31)
(3004, 31)


Unnamed: 0_level_0,1. Municipality of Turin,1. Municipality of Turin,1. Municipality of Turin,2. County of Turin,2. County of Turin,2. County of Turin,3. Other PAs,3. Other PAs,3. Other PAs
Unnamed: 0_level_1,Mean,SD,N,Mean,SD,N,Mean,SD,N
Number of bidders,59.9091,26.8533,121,40.254,40.4921,63,37.5411,34.5337,1009
Population,900.608,2.73983e-12,121,2242.77,2.7504e-12,63,1023.99,1082.77,1009
Reserve price,919.072,776.756,121,914.095,805.409,63,868.83,710.547,1009
Days to award,146.894,40.9554,111,97.5909,42.6578,44,59.3789,39.8938,768
Winning discount,17.0717,5.04908,121,17.3237,5.89664,63,12.8304,6.16729,1009
Fiscal efficiency,0.750452,0.0347996,121,0.884224,0.0147748,63,0.8131,0.140129,1009
Extra time,47.1121,53.1709,75,62.8024,66.7607,47,63.2966,75.7257,711
Extra cost,5.78489,8.67399,83,6.86463,16.9805,45,5.29898,10.6573,672
Experience,523.0,0.0,121,416.0,0.0,63,186.403,90.9841,1009


In [12]:
#table1
df_post = postsort_describe(df_1)
postsort = table1_postsort(df_post)
postsort.sample(9)

Unnamed: 0_level_0,1. Municipality of Turin,1. Municipality of Turin,1. Municipality of Turin,2. County of Turin,2. County of Turin,2. County of Turin,3. Other PAs,3. Other PAs,3. Other PAs
Unnamed: 0_level_1,Mean,SD,N,Mean,SD,N,Mean,SD,N
Fiscal efficiency,0.805913,0.0395676,156,0.866495,0.0323597,137,0.866218,0.0918481,930
Experience,523.0,0.0,156,416.0,0.0,137,171.413,75.2393,930
Number of bidders,7.61538,9.3391,156,12.7153,15.3239,137,46.9946,35.1868,930
Reserve price,1370.78,892.667,156,988.45,760.45,137,922.082,791.005,930
Winning discount,30.9739,9.83663,156,27.6648,7.24379,137,12.382,5.4445,930
Population,900.608,2.73727e-12,156,2242.78,0.0,137,388.392,245.926,930
Extra cost,13.9357,13.8791,79,6.6663,9.65511,62,7.99173,13.8449,665
Days to award,121.383,82.6208,94,101.62,49.6456,100,30.6847,34.3186,425
Extra time,56.0595,66.1695,92,79.6506,89.5052,87,53.7268,73.8185,697


In [13]:
#data for table2,3,5,6
df = basic_setting(data)
df.shape()

TypeError: 'tuple' object is not callable

In [51]:
#table2_regression
table2_reg = table2_list(df)

#table2
main_table(table2_reg)

  return 1 - self.ssr/self.centered_tss
  vif = 1. / (1. - r_squared_i)
  return 1 - self.ssr/self.centered_tss


Unnamed: 0_level_0,Unnamed: 1_level_0,Control(1),Control(2),Control(3),Control(4),Control(5),Control(6)
Panel,value_title,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,First Auction Price,13.10***,11.99***,13.32***,12.02***,13.71***,12.26***
A,Standard Error,(1.61),(1.32),(1.77),(1.47),(1.72),(1.38)
A,R$^2$,0.505,0.639,0.493,0.614,0.526,0.644
A,Observations,1262,1262,1275,1275,880,880
B,First Auction Price,7.20***,5.87***,5.14**,4.33**,7.25***,5.71**
B,Standard Error,(1.99),(2.05),(2.16),(2.13),(2.23),(2.20)
B,R$^2$,0.111,0.156,0.185,0.214,0.160,0.221
B,Observations,1092,1092,1049,1049,742,742
C,First Auction Price,25.23**,34.18***,19.36*,27.98***,27.73**,39.28***
C,Standard Error,(12.05),(12.13),(9.81),(10.14),(10.78),(12.00)


In [52]:
#table3
table3_reg = table3_list(df)
main_table(table3_reg)

  return 1 - self.ssr/self.centered_tss
  vif = 1. / (1. - r_squared_i)
  return 1 - self.ssr/self.centered_tss


Unnamed: 0_level_0,Unnamed: 1_level_0,Control(1),Control(2),Control(3),Control(4),Control(5),Control(6)
Panel,value_title,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,First Auction Price,8.65***,8.69***,8.58***,8.29***,8.78***,8.66***
A,Standard Error,(1.15),(1.10),(1.31),(1.17),(1.33),(1.10)
A,R$^2$,0.419,0.579,0.544,0.671,0.553,0.675
A,Observations,1355,1355,653,653,567,567
B,First Auction Price,0.04,0.20,1.01,0.66,1.58,0.56
B,Standard Error,(3.29),(3.55),(3.19),(3.23),(3.18),(3.25)
B,R$^2$,0.120,0.149,0.155,0.188,0.156,0.193
B,Observations,1167,1167,517,517,454,454
C,First Auction Price,12.95,10.76,9.69,7.99,10.36,9.59
C,Standard Error,(17.84),(18.72),(16.12),(18.27),(17.65),(20.00)


**3.B. The Dependent Variables**

1. Winning Discount
2. Performance
    - used two proxies of performance: cost overruns and time delays
    - cost overruns: the diff btw the final payment and the winning bid as a percentage of the reserve price
    - time delays: the diff btw the actual and the contractual time as a percentage of the contractual time
    - each will be referred to as *Extra Cost* and *Extra Time*
3. Screening Cost
    - the diff in days btw =n when the bids are opned by the awarding commission and when the PA announces the identity of the winner
    - consist of indirect costs  of slowing the procurement process and generating transaction costs
    - referred to as *Days to Award*
    - Screening: the engineers of the PA review the bids and ask bidders to justify the prices they offered.

# 4. Empirical Analysis

**4.A. Empirical Strategy**

The methology will be analogous to a difference-in-differences(DD) regression exploiting the difference in the timing with which Turin adopted FPAs relative to other PAs. The only differences are due to some features on the pretreatment regime.

\begin{align*}
Y_{ist} = a_s + b_t + cX_{ist} + \beta FPA_{st} + \epsilon_{ist}
\end{align*}

- i = the auction
- s = the PA
- t = the year
- beta = the coefficient of interest, dummy; the effect of FPAs on the dependent variable equal to one
- a_s = condtional on fixed effects for the PA
- b_t = condtional on fixed effects for time
- X = other covariates 

- The two main challenges to interpret beta  
(1) the treated PAs are not randomly assigend to the FPA but switch to it voluntarily  
(2) features of the pretreatment regime, which matter for both the construction of the DD estimator and for the interpretation of the estimates.

* Assumptions to defining the control group for DD analysis  
(1) The essential randomness that the switch observed in 2003 occurred in Turin and not in another one of the PAs that would have abandoned the ABA had the central government not challenged the Turin reform.  
(2) We can reasonably infer which PAs would have switched together with Turin if so allowed.

**4.B. Effects of the FPA on Price, Performance, and Bids Screening**

Table (2) and (3) : The DD estimates for the Municipality of Turin and the County of Turin

- Table(2) M of Turin: Four panels(the winning discount, the cost overrun, the time delay, the number of days taken to award the contract)  
Within each pannel, the reulst of six regressions are reported
* the switch from ABAs to FPAs is associated with a large and statistically significant increase in the winning discount(12-14% of the RP)
* but because of ex post renegotiation, PA lost between one third and one half of its saving.

- Table(3) C of Turin

**4.C. Robustness Checks**

1) the presence of common time trends among the treated and control groups(table4)  
-> augmenting the models of Table2 and 3 with PA-specific, time-varying variables.

In [53]:
df_4 = table4_setting(df)

In [73]:
#table4 new version
tab4_1 = table4_odd(df_4)
tab4_2 = table4_even(df_4)

table4_new(tab4_1, tab4_2)

  vif = 1. / (1. - r_squared_i)
  return 1 - self.ssr/self.centered_tss


Unnamed: 0_level_0,value_title,W.Discount(1),W.Discount(2),Extra Cost(3),Extra Cost(4),Extra Time(5),Extra Time(6),Days Award(7),Days Award(8)
Panel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A,First Auction Price,12.178***,6.136***,32.425**,5.323,5.987***,-0.821,30.511***,25.438
A,Standard Error,(1.329),(1.305),(12.670),(28.383),(1.937),(3.583),(11.212),(22.608)
A,R$^2$,0.639,0.651,0.122,0.135,0.156,0.170,0.568,0.597
A,Observations,1262,1262,1110,1110,1092,1092,777,777
B,First Auction Price,8.705***,7.564***,11.203,-1.422,0.294,-0.398,36.449***,26.305***
B,Standard Error,(1.086),(0.884),(18.748),(19.315),(3.587),(3.408),(8.347),(9.110)
B,R$^2$,0.579,0.592,0.126,0.140,0.149,0.174,0.410,0.466
B,Observations,1355,1355,1206,1206,1167,1167,817,817


2) The standard errors used to conduct inference about the effect of FPAs.(table5)  
-> criticism: errors autocorrelation -> the PA-year level clustering to produce statistical significance when sig is in fact absent, requires assessing whether the estimate of beta remains significant once standard errors are clustered at PA level. 

In [None]:
#table5

3) The presence of a sample selection bias(table6)

In [8]:
#table6 setting
df_6 = table6_setting(df)

#table6 calculation
table6_reg = table6_list(df_6)

#table6
table6(table6_reg)

Unnamed: 0,authority_code,year,auth_anno,fpsb_auction,n_bidders,discount,reserve_price,work_category,complex_work,delay_ratio,...,OS24,OS26,OS28,OS30,OS32,OS33,OS34,post01pre05,post02pre04,complexity_dummy
15628,2.0,2006.0,17.0,0.0,46.0,13.730000,1.549260e+05,OG03,1.0,,...,0,0,0,0,0,0,0,0,0,0
1830,2.0,2003.0,15.0,0.0,30.0,17.209999,2.525000e+05,OS28,2.0,10.000000,...,0,0,1,0,0,0,0,1,1,1
14964,2.0,2008.0,19.0,0.0,68.0,21.049999,2.818509e+05,OG03,1.0,,...,0,0,0,0,0,0,0,0,0,0
1731,2.0,2003.0,15.0,0.0,25.0,14.400000,3.878000e+05,OG01,2.0,0.000000,...,0,0,0,0,0,0,0,1,1,1
15544,2.0,2008.0,19.0,0.0,23.0,22.799999,6.235000e+05,OG03,1.0,,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
580,3090272.0,2001.0,799.0,0.0,11.0,24.500000,2.078038e+06,OS23,2.0,-8.333333,...,0,0,0,0,0,0,0,1,0,1
15589,3090272.0,2006.0,804.0,1.0,19.0,34.770000,1.835400e+05,OG03,1.0,,...,0,0,0,0,0,0,0,0,0,0
15591,3090272.0,2006.0,804.0,1.0,8.0,43.000000,1.228448e+06,OG03,1.0,,...,0,0,0,0,0,0,0,0,0,0
544,3090272.0,2001.0,799.0,0.0,83.0,21.400000,5.681026e+05,OG03,1.0,15.730337,...,0,0,0,0,0,0,0,1,0,0


# 5. Discussion and Policy Implication

The previous finding suggest that reforms towards FPAs can be successful only if their design carefully accounts for the features of the institutions ensuring bids reliability, which include financial guarantees, ex ante prequalification, and ex post screening.  
The final cost of the project under FPAs in M of Turin and C of Turin declines by approximately 8 percent of the original RP(80,000euro - 100,000euro)  
This amount of saving is high enough to compensate for the increased cost of bid screening and for the possible presence cost of transaction costs associated with contract renegotiation.

# 6. Conclusion 

In [17]:
data_copy = data 
data.shape

(16127, 93)

In [20]:
#table5
#keep the observation needed

df = data
df = df[((df['turin_co_sample']==1) | (df['turin_pr_sample']==1)) & ((df['post_experience']>=5)|(df['post_experience'].isnull()==True))  & ((df['pre_experience']>=5)|(df['pre_experience'].isnull()==True))& (df['missing']==0)]
df = df[(df['ctrl_pop_turin_co_sample']==1) | (df['ctrl_pop_turin_pr_sample']==1) | (df['ctrl_exp_turin_co_sample']==1) | (df['ctrl_exp_turin_pr_sample']==1) | (df['ctrl_pop_exp_turin_co_sample']==1) | (df['ctrl_pop_exp_turin_pr_sample']==1)]
df = df.reset_index()

#re-construct trend-pa
id_auth_remained = df['id_auth'].unique()
id_auth_remained_df = pd.DataFrame({'id_auth': [], 'group_num': []})
for i in range(len(id_auth_remained)):
    id_auth_remained_df.loc[i,'id_auth'] = id_auth_remained[i]
    id_auth_remained_df.loc[i,'group_num'] = i+1

for i in range(len(df)):
    for j in range(len(id_auth_remained_df)):
        if df.loc[i, 'id_auth'] == id_auth_remained_df.loc[j, 'id_auth']:
            df.loc[i, 'id_auth_remained'] = j+1
id_auth_remained_dum = pd.get_dummies(df['id_auth_remained']).rename(columns=lambda x: 'id_auth_remained' + str(x))
df = pd.concat([df, id_auth_remained_dum],axis = 1)

#re-contstruc trend-pa 시작
for i in range(len(id_auth_remained_dum.columns)):
    df['trend_pa_remained_'+str(i+1)] = 0
    for j in range(len(df)):
        if df.loc[j, id_auth_remained_dum.columns[i]]==1 and df.loc[j, 'authority_code']!=3090272 and df.loc[j, 'authority_code']!=3070001:
            df.loc[j,'trend_pa_remained_'+str(i+1)] = 1
    df.drop([id_auth_remained_dum.columns[i]],axis = 1)
    
c_outcomes=1
t = "turin_co_sample"
g = "ctrl_exp"
o = 'discount'
#outcomes =[ 'delay_ratio', 'overrun_ratio', 'days_to_award'] # ,
#treatment = ['turin_co_sample','turin_pr_sample']
i = 5
df1 = df

###First SPECIFICATION###
#setting for first specification df1
#work_category = 공백이나 nan값없어서 필터링 안함
df1 = df1[(df1[t]==1) & (df1[g +'_' + t]==1) & (df1['post_experience']>=i) & (df1['pre_experience']>=i)& (df1['post_experience'].isnull()==False) & (df1['pre_experience'].isnull()==False) & (df1['missing']==0) & (df1[o].isnull()==False) & (df1['fiscal_efficiency'].isnull()==False) & (df1['reserve_price'].isnull()==False)&(df1['municipality'].isnull()==False)]

df1 = df1.reset_index() #to use loc
df1 = df1.sort_values(by = 'authority_code', ascending = True)
#df1 value checked 1262

df1['ind'] = np.nan
for i in range(len(df1)):
    if i == 0:
        df1.loc[i, 'ind'] = 1
    else:
        if df1.loc[i, 'authority_code'] != df1.loc[i-1, 'authority_code']:
            df1.loc[i, 'ind'] = 1

#create dummies for administration-year pairs 
all_years = df1['year'].unique()
all_authorities = df1['authority_code'].unique()
auth_year_reg_col = []
for auth in all_authorities:
    for yr in all_years:
        df1['auth_year_' + str(auth)+'_' + str(yr)] = 0
        auth_year_reg_col.append('auth_year_' + str(auth)+'_' + str(yr))
        df1.loc[(df1['year']==yr) & (df1['authority_code']==auth), 'auth_year_' + str(auth)+'_' + str(yr) ] = 1

##regression for first stage
#create dummies for work category
all_categories = df1['work_category'].unique()
for cat in all_categories:
    df1['cat_'+cat] = 0
    df1.loc[df1['work_category']==cat, 'cat_'+cat] =1

### Regression first stage 
#setting
work_dum = pd.get_dummies(df1['work_category']).rename(columns=lambda x: 'work_dum_' + str(x))
year_dum = pd.get_dummies(df1['year']).rename(columns=lambda x: 'year_dum_' + str(x))
auth_dum = pd.get_dummies(df1['authority_code']).rename(columns=lambda x: 'auth_dum_' + str(x))
dum_df = pd.concat([work_dum, year_dum, auth_dum],axis = 1)
#이렇게 해주고 부터 fe_reg_1 singular matrix 걸림
df1 = pd.concat([df1,dum_df],axis = 1)

work_list = list(work_dum.columns)
year_list = list(year_dum.columns)
auth_list = list(auth_dum.columns)

reg_col = []
for i in work_list:
    reg_col.append(i)
for j in year_list:
    reg_col.append(j)
for k in auth_list:
    reg_col.append(k)

exog_var = ['fpsb_auction','reserve_price','municipality','fiscal_efficiency']
exog = exog_var + reg_col 
exog.remove('year_dum_2000.0')

exog.remove('work_dum_OG01')
exog.remove('auth_dum_3.0')
exog.remove('auth_dum_1708.0')

#reg_col for co_sample, discount,ctrl_exp, fe_reg_1&2
#값은 다음
#exog = exog_var + reg_col

#1. reg
fe_reg_1 = mt.reg(df1, o, exog, cluster = 'auth_anno', addcons= True, check_colinear = True)
print(fe_reg_1)
#값 나와따

#2. reg
fe_reg_2 = mt.reg(df1, o, exog, cluster = 'authority_code',addcons= True, check_colinear = True)
print(fe_reg_2)
#얘도 나왔따

#3. reg
reg_col = auth_year_reg_col
for cat in all_categories:
    reg_col.append('cat_'+cat)
exog_var = ['reserve_price','municipality','fiscal_efficiency']
exog = exog_var + reg_col

'''X = df1.loc[:,exog]
vif = calc_vif(X)

for i in range(len(vif)):
    if np.isnan(vif.loc[i, 'VIF']) == True:
        exog.remove(vif.loc[i, 'variables'])
        print(vif.loc[i,'variables'])
    elif vif.loc[i,'VIF'] > 10:
        exog.remove(vif.loc[i, 'variables'])
        print(vif.loc[i,'variables'])'''


exog.remove('auth_year_4.0_2000.0')
exog.remove('auth_year_6.0_2000.0') 
exog.remove('auth_year_16.0_2002.0')
exog.remove('auth_year_16.0_2003.0')
exog.remove('auth_year_16.0_2004.0')
exog.remove('auth_year_1246.0_2000.0')
exog.remove('cat_OS07')
exog.remove('fiscal_efficiency')

fe_reg_3 = mt.reg(df1, o, exog, cluster = 'auth_anno', addcons = True, check_colinear = True) #singular error
#print(fe_reg_3)
#값나옴

df1['dummy_cat'] = 0
#exog에 있는 것만으로 계싼하기
beta_cat_list = []
beta_list = []
for i in range(len(exog)):
    for cat in all_categories:
        if exog[i] == 'cat_'+cat:
            beta_cat_list.append(exog[i])
    for exo in exog_var:
        if exog[i] == exo:
            beta_list.append(exog[i])

if o == 'discount':
    discount_hat = fe_reg_3.yhat
    for i in range(len(df1)):
        for cat in beta_cat_list:
            df1.loc[i, 'discount_beta'] = discount_hat[i] - (df1.loc[i,'dummy_cat']-df1.loc[i,cat] * fe_reg_3.beta[cat])
            for exo in beta_list:
                df1.loc[i,'discount_beta'] = df1.loc[i,'discount_beta']- df1.loc[i,exo]*fe_reg_3.beta[exo]
elif o == 'delay_ratio':
    delay_ratio_hat = fe_reg_3.yhat
    for i in range(len(df1)):
        for cat in beta_cat_list:
            df1.loc[i, 'delay_ratio_beta'] = discount_hat[i] - (df1.loc[i,'dummy_cat']-df1.loc[i,cat] * fe_reg_3.beta[cat])
            for exo in beta_list:
                df1.loc[i,'delay_ratio_beta'] = df1.loc[i,'discount_beta']- df1.loc[i,exo]*fe_reg_3.beta[exo]
elif o == 'overrun_ratio':
    overrun_ratio_hat = fe_reg_3.yhat
    for i in range(len(df1)):
        for cat in beta_cat_list:
            df1.loc[i, 'overrun_ratio_beta'] = discount_hat[i] - (df1.loc[i,'dummy_cat']-df1.loc[i,cat] * fe_reg_3.beta[cat])
            for exo in beta_list:
                df1.loc[i,'overrun_ratio_beta'] = df1.loc[i,'discount_beta']- df1.loc[i,exo]*fe_reg_3.beta[exo]
else:                
    days_to_award_hat = fe_reg_3.yhat
    for i in range(len(df1)):
        for cat in beta_cat_list:
            df1.loc[i, 'days_to_award_beta'] = discount_hat[i] - (df1.loc[i,'dummy_cat']-df1.loc[i,cat] * fe_reg_3.beta[cat])
            for exo in beta_list:
                df1.loc[i,'days_to_award_beta'] = df1.loc[i,'discount_beta']- df1.loc[i,exo]*fe_reg_3.beta[exo]

#create weigths - working well
nrep_s = df1.groupby(['authority_code','year']).size().unstack(level=1)
df1_nrep = pd.DataFrame(nrep_s)/len(df1)
df1['weights'] = np.nan
for auth in all_authorities:
    for yr in all_years:
        df1.loc[(df1['authority_code']==auth)&(df1['year']==yr),'weights'] = df1_nrep.loc[auth, yr]

#Keep only beta coefficients for state*year terms
collapse_list = [o +'_beta', 'authority_code', 'year', 'fpsb_auction', 'municipality', 'fiscal_efficiency', 'missing', 'turin_co_sample', 'weights'] + year_list + auth_list
collapse = df1.groupby(['auth_anno'])[collapse_list].mean()

df2 = collapse
df2 = df2.reset_index()
df2.columns
df2.shape

#Core conley-taber method
exog_var = ['fpsb_auction', 'municipality', 'fiscal_efficiency']
reg_col = []
reg_col_new = []

#reg_col.append(j)
for i in auth_list:
    reg_col.append(i)
for j in year_list:
    reg_col.append(j)

for k in reg_col:
    for j in df2.columns:
        if k ==j:
            reg_col_new.append(j)

exog = exog_var + reg_col_new


X = df2.loc[:,exog]
vif = calc_vif(X)

#delete from col list
for i in range(len(vif)):
    if np.isnan(vif.loc[i, 'VIF']) == True:
        reg_col.remove(vif.loc[i, 'variables'])

exog = exog_var + reg_col_new

exog.remove('year_dum_2000.0')
exog.remove('auth_dum_3.0')
exog.remove('auth_dum_1866.0')

wls = mt.reg(df2, o+'_beta', exog , cluster = 'auth_anno',addcons = True, awt_name = 'weights')
print(wls)
#값 다름 beta가 다르니깐 당연한거임

#predic res
df2['eta'] = wls.resid
df2['eta'] = df2['eta']+ df2['fpsb_auction']*wls.beta['fpsb_auction']

#Create tilde
df2 = df2.sort_values(by = 'year',ascending = True)
df2_wls= df2[(df2['authority_code']==3090272) | (df2['authority_code']==3070001)]
df2_wls = pd.DataFrame(df2_wls.groupby(['year'])['fpsb_auction'].mean())
for i in range(len(df2)):
    if df2.loc[i, 'authority_code']==3090272 or df2.loc[i, 'authority_code']==3070001:
        for j in list(df2_wls.index):
            if df2.loc[i, 'year'] == j:
                df2.loc[i,'djtga'] = df2_wls.loc[j, 'fpsb_auction']

df2_wls = pd.DataFrame(df2.groupby(['year'])['djtga'].sum())
for i in range(len(df2)):
    for j in list(df2_wls.index):
        if df2.loc[i, 'year'] == j:
            df2.loc[i,'djt'] = df2_wls.loc[j, 'djtga']

df2 = df2.sort_values(by = 'authority_code', ascending = True)
df2_wls = pd.DataFrame(df2.groupby(['authority_code'])['djt'].mean())
for i in range(len(df2)):
    for j in list(df2_wls.index):
        if df2.loc[i, 'authority_code'] == j:
            df2.loc[i,'meandjt'] = df2_wls.loc[j, 'djt']

df2['dtil'] = df2['djt'] - df2['meandjt']
#df2['meandjt'] = df2['djt'].mean()
#df2['dtil'] = df2['djt'] - df2['meandjt']
#df2['dtil'].value_counts() #0밖에 없다구요..

#obtain diff in diff coeff
#renormalize weights
df2.loc[(df2['authority_code']==3090272) | (df2['authority_code']==3070001),'tot_weights'] = df2['weights'].sum()
df2['new_weights'] = df2['weights']/df2['tot_weights']
df2_wls = df2[(df2['authority_code']==3090272) | (df2['authority_code']==3070001)]
wls_2 = mt.reg(df2_wls, 'eta' , 'dtil' , awt_name = 'new_weights', addcons = True,check_colinear = True)
print(wls_2)
#얼추맞음
alpha = [wls_2.beta['dtil']] 
df2 = df2.drop(['tot_weights','new_weights'],axis = 1)

#simulataneous for each public
asim = []
'''auth_wls_3 = []
for auth in all_authorities:
    for i in range(len(df2)):
        if df2.loc[i, 'authority_code'] == auth:
            auth_wls_3.append(auth)'''

for auth in all_authorities:
    if auth !=3090272 and auth !=3070001:
        df2.loc[df2['authority_code']==auth, 'tot_weights'] = df2['weights'].sum()
        df2['new_weights'] = df2['weights']/df2['tot_weights']
        df2_wls_3 = df2[df2['authority_code']==auth]
        wls_3 = mt.reg(df2_wls_3, 'eta' , 'dtil' , awt_name = 'new_weights')
        asim.append(wls_3.beta['dtil'])
        df2 = df2.drop(['tot_weights','new_weights'],axis = 1)

#asim 이랑 alpha 길이 맞춰주기
for i in range(len(asim)-1):
    alpha.append(alpha[0])

asim_tmp = []
for i in range(min(len(alpha),len(asim))):
    asim_tmp.append(alpha[i] - asim[i])

asim = asim_tmp
df2['ci'] = np.nan
df2['asim'] = np.nan
for i in range(len(asim)):
    df2.loc[i, 'ci'] = asim[i]
    df2.loc[i, 'asim'] = asim[i]

#form confidence level
numst=len(asim)+1
i025=math.floor(0.025*(numst-1))
i025=max([i025,1])
i975=math.ceil(0.975*(numst-1))
i05=math.floor(0.050*(numst-1))
i05=max([i05,i025+1])
i95=math.ceil(0.950*(numst-1))
i95=min([i95,numst-2])

stima_ta = alpha[0]
df2.sort_values(by = 'asim',ascending = True)
ci_ta025 = min([df2.loc[i025,'ci'], df2.loc[i975, 'ci'] ])
ci_ta975 = max([df2.loc[i025,'ci'], df2.loc[i975, 'ci'] ])

#wls_4 setting
reg_col = []
for i in year_list:
    reg_col.append(i)
for j in auth_list:
    reg_col.append(j)
exog_var = ['fpsb_auction','municipality','fiscal_efficiency']
exog = reg_col+exog_var

exog.remove('year_dum_2000.0')
exog.remove('auth_dum_3.0')
exog.remove('auth_dum_1708.0')

wls_4 = mt.reg(df2, o+'_beta', exog, awt_name = 'weights', cluster = 'authority_code', addcons = True)
#ci_low=$ci_ta025, replace ci_high=$ci_ta975
print(wls_4)
#얼추나옴
#end of first specification

Dependent variable:	discount
N:			1262
R-squared:		0.6395
Estimation method:	OLS
VCE method:		Cluster
  Cluster variable:	  auth_anno
  No. of clusters:	  101
                    coeff    se      t   p>t  CI_low CI_high
fpsb_auction       12.178 1.329  9.163 0.000   9.541  14.815
reserve_price       0.000 0.000  5.988 0.000   0.000   0.000
municipality       -7.233 1.930 -3.748 0.000 -11.062  -3.404
fiscal_efficiency  -2.611 4.586 -0.569 0.570 -11.709   6.486
work_dum_OG02       0.223 0.629  0.354 0.724  -1.024   1.470
work_dum_OG03      -0.560 0.543 -1.032 0.304  -1.638   0.517
work_dum_OG04      -4.715 3.311 -1.424 0.157 -11.283   1.853
work_dum_OG06       0.816 0.667  1.223 0.224  -0.508   2.140
work_dum_OG07       4.494 5.104  0.881 0.381  -5.632  14.620
work_dum_OG08       0.788 1.600  0.492 0.624  -2.386   3.961
work_dum_OG10       8.384 0.510 16.450 0.000   7.373   9.396
work_dum_OG11       5.676 1.066  5.325 0.000   3.561   7.791
work_dum_OG12       5.928 1.926  3.079 0.003   2

In [64]:
df = table5_setting(data)

In [95]:
pd.DataFrame(table5_PanelA_odd(df, 'days_to_award'))

Unnamed: 0,CI_low,CI_high
fpsb_auction,8.0,53.0
fpsb_auction,5.0,56.0


In [96]:
pd.DataFrame(table5_PanelB_odd(df, 'discount'))

Unnamed: 0,CI_low,CI_high
fpsb_auction,7.0,11.0
fpsb_auction,8.0,10.0


In [191]:
def calc_vif(X):

    # Calculating VIF
    vif = pd.DataFrame()
    vif["variables"] = X.columns
    vif["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

    return(vif)

def table5_PanelB_even(data, o):
    t = 'turin_pr_sample'
    g = 'ctrl_exp'
    c_outcomes=1
    i = 5
    df1 = df
    df1 = df1[(df1[t]==1) & (df1[g +'_' + t]==1) & (df1['post_experience']>=i) & (df1['pre_experience']>=i)& (df1['post_experience'].isnull()==False) & (df1['pre_experience'].isnull()==False) & (df1['missing']==0) & (df1[o].isnull()==False) & (df1['fiscal_efficiency'].isnull()==False) & (df1['reserve_price'].isnull()==False)&(df1['municipality'].isnull()==False)&(df1['trend'].isnull()==False)&(df1['trend_treat'].isnull()==False)]

    df1 = df1.reset_index() #to use loc
    df1 = df1.sort_values(by = 'authority_code', ascending = True)
    #df1 value checked 1262

    df1['ind'] = np.nan
    for i in range(len(df1)):
        if i == 0:
            df1.loc[i, 'ind'] = 1
        else:
            if df1.loc[i, 'authority_code'] != df1.loc[i-1, 'authority_code']:
                df1.loc[i, 'ind'] = 1

    #create dummies for administration-year pairs 
    all_years = df1['year'].unique()
    all_authorities = df1['authority_code'].unique()
    auth_year_reg_col = []
    for auth in all_authorities:
        for yr in all_years:
            df1['auth_year_' + str(auth)+'_' + str(yr)] = 0
            auth_year_reg_col.append('auth_year_' + str(auth)+'_' + str(yr))
            df1.loc[(df1['year']==yr) & (df1['authority_code']==auth), 'auth_year_' + str(auth)+'_' + str(yr) ] = 1

    ##regression for first stage
    #create dummies for work category
    all_categories = df1['work_category'].unique()
    for cat in all_categories:
        df1['cat_'+cat] = 0
        df1.loc[df1['work_category']==cat, 'cat_'+cat] =1

    ### Regression first stage 
    #setting
    work_dum = pd.get_dummies(df1['work_category']).rename(columns=lambda x: 'work_dum_' + str(x))
    year_dum = pd.get_dummies(df1['year']).rename(columns=lambda x: 'year_dum_' + str(x))
    auth_dum = pd.get_dummies(df1['authority_code']).rename(columns=lambda x: 'auth_dum_' + str(x))
    
    dum_df = pd.concat([work_dum, year_dum, auth_dum],axis = 1)
    #이렇게 해주고 부터 fe_reg_1 singular matrix 걸림
    df1 = pd.concat([df1,dum_df],axis = 1)

    work_list = list(work_dum.columns)
    year_list = list(year_dum.columns)
    auth_list = list(auth_dum.columns)

    reg_col = []
    for i in work_list:
        reg_col.append(i)
    for j in year_list:
        reg_col.append(j)
    #for k in auth_list:
    #    reg_col.append(k)

    exog_var = ['fpsb_auction','reserve_price','municipality','fiscal_efficiency','trend','trend_treat']
    
    for i in range(1,36):
        exog_var.append('trend_pa_remained_'+str(i))
        
    exog = exog_var + reg_col 
    

    exog.remove('year_dum_2000.0')
    exog.remove('work_dum_OG01')
    exog.remove('year_dum_2006.0')
    #exog.remove('auth_dum_3.0')
    #exog.remove('auth_dum_1246.0')
    for i in [2,4,6,7,9,11,12,13,15,16,17,18,20,21,22,23,24,25,26,28,34,35]:
        exog.remove('trend_pa_remained_'+str(i))
    

    #1. reg
    fe_reg_1 = mt.reg(df1, o, exog, cluster = 'auth_anno', check_colinear = True)

    #2. reg
    
    fe_reg_2 = mt.reg(df1, o, exog, cluster = 'authority_code', check_colinear = True)
    
    ci_1 = fe_reg_1.summary.loc['fpsb_auction',['CI_low', 'CI_high']]
    ci_2 = fe_reg_2.summary.loc['fpsb_auction',['CI_low', 'CI_high']]
    
    return(ci_1,ci_2)

In [192]:
table5_PanelB_even(df, 'discount')

(CI_low     1.411035
 CI_high    7.264471
 Name: fpsb_auction, dtype: float64,
 CI_low     2.558635
 CI_high    6.116871
 Name: fpsb_auction, dtype: float64)

In [103]:
t = 'turin_co_sample'
g = 'ctrl_exp'
o = 'discount'
c_outcomes=1
i = 5
df1 = df
df1 = df1[(df1[t]==1) & (df1[g +'_' + t]==1) & (df1['post_experience']>=i) & (df1['pre_experience']>=i)& (df1['post_experience'].isnull()==False) & (df1['pre_experience'].isnull()==False) & (df1['missing']==0) & (df1[o].isnull()==False) & (df1['fiscal_efficiency'].isnull()==False) & (df1['reserve_price'].isnull()==False)&(df1['municipality'].isnull()==False)&(df1['trend'].isnull()==False)&(df1['trend_treat'].isnull()==False)]

df1 = df1.reset_index() #to use loc
df1 = df1.sort_values(by = 'authority_code', ascending = True)
#df1 value checked 1262

df1['ind'] = np.nan
for i in range(len(df1)):
    if i == 0:
        df1.loc[i, 'ind'] = 1
    else:
        if df1.loc[i, 'authority_code'] != df1.loc[i-1, 'authority_code']:
            df1.loc[i, 'ind'] = 1

#create dummies for administration-year pairs 
all_years = df1['year'].unique()
all_authorities = df1['authority_code'].unique()
auth_year_reg_col = []
for auth in all_authorities:
    for yr in all_years:
        df1['auth_year_' + str(auth)+'_' + str(yr)] = 0
        auth_year_reg_col.append('auth_year_' + str(auth)+'_' + str(yr))
        df1.loc[(df1['year']==yr) & (df1['authority_code']==auth), 'auth_year_' + str(auth)+'_' + str(yr) ] = 1

##regression for first stage
#create dummies for work category
all_categories = df1['work_category'].unique()
for cat in all_categories:
    df1['cat_'+cat] = 0
    df1.loc[df1['work_category']==cat, 'cat_'+cat] =1

### Regression first stage 
#setting
work_dum = pd.get_dummies(df1['work_category']).rename(columns=lambda x: 'work_dum_' + str(x))
year_dum = pd.get_dummies(df1['year']).rename(columns=lambda x: 'year_dum_' + str(x))
auth_dum = pd.get_dummies(df1['authority_code']).rename(columns=lambda x: 'auth_dum_' + str(x))

dum_df = pd.concat([work_dum, year_dum, auth_dum],axis = 1)
#이렇게 해주고 부터 fe_reg_1 singular matrix 걸림
df1 = pd.concat([df1,dum_df],axis = 1)

work_list = list(work_dum.columns)
year_list = list(year_dum.columns)
auth_list = list(auth_dum.columns)

reg_col = []
for i in work_list:
    reg_col.append(i)
for j in year_list:
    reg_col.append(j)
for k in auth_list:
    reg_col.append(k)

exog_var = ['fpsb_auction','reserve_price','municipality','fiscal_efficiency','trend','trend_treat']

for i in range(1,36):
    exog_var.append('trend_pa_remained_'+str(i))

exog = exog_var + reg_col 

X = df1.loc[:,exog]
vif = calc_vif(X)

  vif = 1. / (1. - r_squared_i)
  return 1 - self.ssr/self.centered_tss


In [106]:
df1.columns

Index([           'level_0',              'index',     'authority_code',
                     'year',          'auth_anno',       'fpsb_auction',
                'n_bidders',           'discount',      'reserve_price',
            'work_category',
       ...
            'auth_dum_20.0',      'auth_dum_25.0',      'auth_dum_30.0',
          'auth_dum_1246.0',    'auth_dum_1708.0',    'auth_dum_1739.0',
          'auth_dum_1768.0',    'auth_dum_1858.0',    'auth_dum_1866.0',
       'auth_dum_3090272.0'],
      dtype='object', length=354)