# Election expenses

Recently we found a congressperson who made an allegedly postal service expense in a company created for his own benefit, for his next candidacy in elections.

We believe there are more cases like this one.

In [1]:
import pandas as pd
import numpy as np

reimbursements = pd.read_csv('../data/2016-12-06-reimbursements.xz',
                             dtype={'cnpj_cpf': np.str},
                             low_memory=False)
companies = pd.read_csv('../data/2016-09-03-companies.xz', low_memory=False)

In [2]:
companies['cnpj'] = companies['cnpj'].str.replace(r'\D', '')

In [3]:
dataset = pd.merge(reimbursements, companies,
                   left_on='cnpj_cpf', right_on='cnpj')

In [4]:
is_election_company = \
    dataset['legal_entity'] == '409-0 - CANDIDATO A CARGO POLITICO ELETIVO'
suspect = dataset[is_election_company]
suspect.shape

(47, 259)

In [5]:
suspect['total_net_value'].sum()

11930.769999999999

In [6]:
suspect['total_net_value'].describe()

count      47.000000
mean      253.846170
std       272.379969
min        50.000000
25%       139.690000
50%       157.230000
75%       247.190000
max      1470.270000
Name: total_net_value, dtype: float64

In [7]:
suspect['name'].sample(10)

1249308    ELEICAO 2006 JOSE CARLOS LEAO DE ARAUJO DEPUTA...
759989     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
760015     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
759994     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
760001     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
1395813     ELEICOES 2004 - ANIVALDO JOSE WINCKEL - VEREADOR
760017     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
759996     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
1205658    ELEICAO 2002 - CANDIDATO EDUARDO BRANDAO DE AZ...
760013     ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO ...
Name: name, dtype: object

In [8]:
dataset['name'] = dataset['name'].astype(np.str)
contains_election_str = dataset['name'].str.lower().str.contains(r'(eleic)[(ao)(oes)]')
company_name_suspects = dataset[contains_election_str].index

  from ipykernel import kernelapp as app


In [9]:
np.array_equal(suspect.index, company_name_suspects)

True

In [10]:
import unicodedata

def normalize_string(string):
    if isinstance(string, str):
        nfkd_form = unicodedata.normalize('NFKD', string.lower())
        return nfkd_form.encode('ASCII', 'ignore').decode('utf-8')

In [11]:
suspect['congressperson_name'] = \
    suspect['congressperson_name'].apply(normalize_string)

suspect['name'] = suspect['name'].apply(normalize_string)
suspect[suspect.apply(lambda row: row['congressperson_name'] in row['name'], axis=1)]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0,year,applicant_id,document_id,reimbursement_value_total,total_net_value,reimbursement_numbers,congressperson_name,congressperson_id,congressperson_document,term,...,secondary_activity_97_code,secondary_activity_98,secondary_activity_98_code,secondary_activity_99,secondary_activity_99_code,secondary_activity_9_code,latitude,longitude,latitude.1,longitude.1


In [12]:
suspect[['congressperson_name', 'name']]

Unnamed: 0,congressperson_name,name
759986,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759987,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759988,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759989,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759990,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759991,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759992,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759993,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759994,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...
759995,carlos alberto lereia,eleicao 2006 carlos alberto da silva deputado ...


## Conclusion

Yes, there are more cases. Currently, 47 suspects. Not all of them were in congressperson's own benefit, but for other candidates. Still, could be reported to the Chamber of Deputies.