# Paper notebook: Corruption risk across federal and local contracts

This notebook has two purposes:

1. Aggregate data at state level and save tables to create correlation figures in R.
2. Make statistical analyses between variables and corruption perseption.

# 0. Import modules

In [27]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [28]:
import networkx as nx
import graph_tool as gt

# 1. Load data

We have data from different sources and we'll be using:

- Data of contracts
- Data of corruption perseption

## 1.1 Contracts

Load the dataset and count the number of contracts per year, and the amount.

In [44]:
cnts = pd.read_csv('../../data/pre-process/contratos_4.csv')

In [45]:
cnts = cnts.rename(columns={'code_b': 'code'})

In [31]:
cnts.groupby(['gvmnt_level', 'year']).amount.sum().reset_index().groupby('gvmnt_level').mean()

Unnamed: 0_level_0,year,amount
gvmnt_level,Unnamed: 1_level_1,Unnamed: 2_level_1
F,2015,15463500000.0
M,2015,499788000.0
S,2015,2588365000.0


In [32]:
cnts.groupby('year').amount.sum().mean()

18551648414.743237

In [33]:
cnts.year.unique()

array([2011, 2012, 2019, 2018, 2016, 2014])

In [46]:
state_f = cnts[cnts.gvmnt_level=='F'].groupby('code').mean().reset_index()

In [47]:
state_s = cnts[cnts.gvmnt_level=='S'].groupby('code').mean().reset_index()

In [48]:
state_m = cnts[cnts.gvmnt_level=='M'].groupby('code').mean().reset_index()

## 1.2 INEGI (corruption)

Load the corruption data and add it to the aggregated data by state.

In [50]:
corruption = pd.read_csv("../../data/states/federal.csv")

In [52]:
corruption = corruption[['code', 'very_high_r', 'very_low_r', 'c_very_high_r', 'c_low_r', 'e_exp_r']]

In [54]:
state_f = pd.merge(state_f, corruption, how='left')
state_s = pd.merge(state_s, corruption, how='left')
state_m = pd.merge(state_m, corruption, how='left')

Save to R

In [56]:
state_f.to_csv('../../data/states/r_fed.csv', index=False)
state_s.to_csv('../../data/states/r_state.csv', index=False)
state_m.to_csv('../../data/states/r_mun.csv', index=False)