# Exploring the first version of Sex Place Distances dataset

This notebook plays around with data from the first version of the dataset `data/2017-04-21-sex-place-distance.xz`. This first version collected data about companies:

* In one of these cities: BOA VISTA, CURVELO, SANTA CRUZ DO SUL, PORTO ALEGRE, SAO JOSE DOS PINHAIS, SETE LAGOAS, BOITUVA, IPATINGA, UBERABA, CONGONHAS, SOROCABA, PARAOPEBA, CHAPECO, CUIABA, SALVADOR, BAURU and LAJEADO
* Where expenses with a total net value equal or higher than 100 BRL
* In which congresspeople from the 2015 term have expend public money

The set of cities was taken [random sample that sounded promosing](https://twitter.com/cuducos/status/840882495868530688)… but hold your horses: further analysis is disapointing… let's get started.

In [1]:
import numpy as np
import pandas as pd
from serenata_toolbox.datasets import fetch

DTYPE = dict(cnpj=np.str, cnpj_cpf=np.str)

In [2]:
fetch('2017-04-21-sex-place-distances.xz', '../data')

In [3]:
companies = pd.read_csv('../data/2016-09-03-companies.xz', dtype=DTYPE, low_memory=False)
companies.cnpj = companies.cnpj.str.replace(r'\D', '')
companies.shape

(60047, 228)

In [4]:
sex_places = pd.read_csv('../data/2017-04-21-sex-place-distances.xz', dtype=DTYPE)
sex_places.shape

(2245, 11)

This dataset has all sort of ditances between companies and the closest sex place:

In [5]:
sex_places.distance.describe()

count     2245.000000
mean       453.605521
std        878.745363
min          0.000000
25%        139.348086
50%        289.045854
75%        551.421772
max      28095.285403
Name: distance, dtype: float64

## Sex places _close enough_ to places in which congresspeople were

From this sample (n=2245) 81 places are at least 3m away from the venue in which at least a congressperson made an expense since 2015:

In [6]:
close_enough = sex_places.query('distance < 3')
close_enough.shape

(81, 11)

In [7]:
close_enough.head()

Unnamed: 0,id,keyword,latitude,longitude,distance,name,address,phone,cnpj,company_name,company_trade_name
8,ChIJAQAAAFwPk40RoHav112BS0k,motel,2.82351,-60.675833,0.0,Swing Motel Nada Substitui o Prazer,"Rua Campo Palmares, 607 - Aeroporto, Boa Vista...",(95) 3627-2201,4788281000140,B. H. M. PACHECO & CIA LTDA - EPP,VISUAL AUTDOOR
42,ChIJAQAAAFwPk40RoHav112BS0k,motel,2.82351,-60.675833,0.0,Swing Motel Nada Substitui o Prazer,"Rua Campo Palmares, 607 - Aeroporto, Boa Vista...",(95) 3627-2201,12990433000110,L T COMERCIO E SERVICOS LTDA - EPP,L T COMERCIO E SERVICOS
50,ChIJAQAAAFwPk40RoHav112BS0k,motel,2.82351,-60.675833,0.0,Swing Motel Nada Substitui o Prazer,"Rua Campo Palmares, 607 - Aeroporto, Boa Vista...",(95) 3627-2201,13406312000141,FERREIRA E PICAO LTDA - EPP,MAC ALUGUEL DE VEICULOS
61,ChIJAQAAAFwPk40RoHav112BS0k,motel,2.82351,-60.675833,0.0,Swing Motel Nada Substitui o Prazer,"Rua Campo Palmares, 607 - Aeroporto, Boa Vista...",(95) 3627-2201,4289974000198,M. A. DE LACERDA - ME,
78,ChIJKXpclrIFk40RZ6LOqRlQpO4,night club,2.82351,-60.675833,0.0,Clube do Servidor,"Centro, Boa Vista - State of Roraima, Brazil",,10402370000109,MR - OPERADORA DE VIAGENS E TURISMO LTDA - ME,MONTE RORAIMA TURISMO


## Taking a closer look

Let's generate some Jarbas links to take a closer look at them!

In [8]:
link = 'https://jarbas.serenatadeamor.org/#/cnpjCpf/{}'
cnpjs = (place.cnpj for _, place in close_enough.iterrows())
for cnpj in cnpjs:
    print(link.format(cnpj))

https://jarbas.serenatadeamor.org/#/cnpjCpf/04788281000140
https://jarbas.serenatadeamor.org/#/cnpjCpf/12990433000110
https://jarbas.serenatadeamor.org/#/cnpjCpf/13406312000141
https://jarbas.serenatadeamor.org/#/cnpjCpf/04289974000198
https://jarbas.serenatadeamor.org/#/cnpjCpf/10402370000109
https://jarbas.serenatadeamor.org/#/cnpjCpf/17191347000161
https://jarbas.serenatadeamor.org/#/cnpjCpf/05677013000113
https://jarbas.serenatadeamor.org/#/cnpjCpf/02579214000190
https://jarbas.serenatadeamor.org/#/cnpjCpf/06243454000170
https://jarbas.serenatadeamor.org/#/cnpjCpf/13569064003760
https://jarbas.serenatadeamor.org/#/cnpjCpf/23748593000174
https://jarbas.serenatadeamor.org/#/cnpjCpf/19986561000130
https://jarbas.serenatadeamor.org/#/cnpjCpf/13180886000144
https://jarbas.serenatadeamor.org/#/cnpjCpf/13569064001989
https://jarbas.serenatadeamor.org/#/cnpjCpf/95425369002287
https://jarbas.serenatadeamor.org/#/cnpjCpf/08717252000148
https://jarbas.serenatadeamor.org/#/cnpjCpf/901070530001

### Villa Gorini

Two CNPJs were assigned to a night club called Villa Gorini:

In [9]:
close_enough[close_enough.cnpj == '03874976000181'].iloc[0]

id                         ChIJp5g39UEBqAARuei7IxmZ6UM
keyword                                     night club
latitude                                      -18.7222
longitude                                     -44.9429
distance                                             0
name                                      Villa Gorini
address               BR-040, Felixlândia - MG, Brazil
phone                                              NaN
cnpj                                    03874976000181
company_name               SOLAR DO ENGENHO LTDA - EPP
company_trade_name                    SOLAR DO ENGENHO
Name: 901, dtype: object

In [10]:
close_enough[close_enough.cnpj == '17084369000122'].iloc[0]

id                         ChIJp5g39UEBqAARuei7IxmZ6UM
keyword                                     night club
latitude                                      -18.7222
longitude                                     -44.9429
distance                                             0
name                                      Villa Gorini
address               BR-040, Felixlândia - MG, Brazil
phone                                              NaN
cnpj                                    17084369000122
company_name          POSTO PROFETAS DE CONGONHAS LTDA
company_trade_name                                 NaN
Name: 1082, dtype: object

However the point here is that Google Places API seems rather imprecise when looking for an address that has a KM instead of a street number:

In [11]:
companies[companies.cnpj == '03874976000181'].iloc[0].address

'ROD BR 040 KM 480'

In [12]:
companies[companies.cnpj == '17084369000122'].iloc[0].address

'ROD BR 040'

And actually Villa Gorini [is at KM 701](https://encrypted.google.com/search?q=villa%20gorini) (not 480, for instance).

### Yume Espaço Terapêutico

[Yume Espaço Terapêutico](https://jarbas.serenatadeamor.org/#/cnpjCpf/14310257000154) is another example of a false positive.

In [13]:
close_enough.iloc[66]

id                                          ChIJTcQRimgbFgcRP_VjyPon-GA
keyword                                                        massagem
latitude                                                       -12.9792
longitude                                                      -38.4576
distance                                                              0
name                                            Yume Espaço Terapêutico
address               Edifício Esplanada Tower - Av.Tancredo Neves, ...
phone                                                    (71) 4103-7695
cnpj                                                     14310257000154
company_name           SETE QUATRO COMUNICACAO E PUBLICIDADE LTDA - EPP
company_trade_name                  SETE QUATRO COMUNICAO E PUBLICIDADE
Name: 1922, dtype: object

The company _SETE QUATRO COMUNICACAO E PUBLICIDADE LTDA - EPP_ is clearly in a [office building](https://goo.gl/maps/k6z8bdHLVxR2), probably the same building in which _Yume Espaço Terapêutico_ offers their service.

### Céu Azul Motel

Finally [CONTRASTE EDITORA E INDUSTRIA GRAFICA EIRELI](https://jarbas.serenatadeamor.org/#/cnpjCpf/33867664000101) has very appealling picures in Jarbas (actually in Google Street View). 

In [14]:
close_enough.iloc[-1]

id                                          ChIJ5XYdhycFFgcRgTWWpzPjoR8
keyword                                                           motel
latitude                                                       -12.9664
longitude                                                      -38.4974
distance                                                    6.87978e-10
name                                                     Céu Azul Motel
address               R. Barão de Macaúbas, 27 - Macaúbas, Salvador ...
phone                                                    (71) 3241-3714
cnpj                                                     33867664000101
company_name          CONTRASTE EDITORA E INDUSTRIA GRAFICA EIRELI -...
company_trade_name                                                  NaN
Name: 2191, dtype: object

However a [Google Maps search](https://www.google.com/maps/place/R.+Bar%C3%A3o+de+Maca%C3%BAbas,+27+-+Barbalho,+Salvador+-+BA,+40302-000,+Brazil/@-12.96636,-38.4996147,17z/data=!3m1!4b1!4m5!3m4!1s0x71604d877430fb3:0x2e71207287112e18!8m2!3d-12.96636!4d-38.497426?hl=en) shows that there is actually a press **and** a motel at this address.

## Final considerations

So far the attemp to fetch sex places has brought mainly false positives. We need to improve our methods to make it worth it to implement this hypothesis further.