## Projeto 2: Construindo um dashboard
Vimos que é possível construir gráficos dinâmicos em aula.

Agora é a oportunidade de juntar o que aprendemos de pandas, seaborn, e plotly. Integrando num dashboard que será hospedado no streamlit.

### Objetivo:
- Construir um dashboard dinâmico com dados de interesse
    - Ações
    - Casos de COVID
    - Viagens da Uber
    - Qualquer tipo de dados!
- Aplicar boas práticas de desenvolvimento com pandas e gráficos
- Escolher a palheta de cor adequada
- Escolher gráficos adequados
- Criar uma estória para o usuário com os dados  

### Objetivo extra

- Publicar o dash

Como publicar o dash? [https://docs.streamlit.io/en/stable/deploy_streamlit_app.html](https://docs.streamlit.io/en/stable/deploy_streamlit_app.html)

Não precisamos criar 10 gráficos distintos, ou um com muitas informações. As vezes um ou dois gráficos já é o suficiente para passar a mensagem de forma objetiva!

- Link do streamlit: [https://streamlit.io/](https://streamlit.io/)  

A documentação é bem detalhada! Olhe com atenção e veja as possibilidades de botões, filtros e gráficos.  
[https://docs.streamlit.io/en/stable/](https://docs.streamlit.io/en/stable/)



### Dados:
`yfinance` modulo para extração de valores de ações.

- [https://data.world/datasets/uber](https://data.world/datasets/uber)

- [https://ourworldindata.org/](https://ourworldindata.org/https://ourworldindata.org/)


In [2]:
import pandas as pd
import numpy as np

In [3]:
# importando drinks
dfdrinks = pd.read_csv('datasets/drinks.csv')
dfdrinks

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0,0,0.0
1,Albania,89,132,54,4.9
2,Algeria,25,0,14,0.7
3,Andorra,245,138,312,12.4
4,Angola,217,57,45,5.9
...,...,...,...,...,...
188,Venezuela,333,100,3,7.7
189,Vietnam,111,2,1,2.0
190,Yemen,6,0,0,0.1
191,Zambia,32,19,4,2.5


In [4]:
dfdrinks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   country                       193 non-null    object 
 1   beer_servings                 193 non-null    int64  
 2   spirit_servings               193 non-null    int64  
 3   wine_servings                 193 non-null    int64  
 4   total_litres_of_pure_alcohol  193 non-null    float64
dtypes: float64(1), int64(3), object(1)
memory usage: 7.7+ KB


In [5]:
# importando qualidade de vida
dflifeexp = pd.read_csv('datasets/lifeexpectancy-verbose.csv')
dflifeexp

Unnamed: 0,GhoCode,GhoDisplay,PublishStateCode,PublishStateDisplay,YearCode,YearDisplay,RegionCode,RegionDisplay,WorldBankIncomeGroupGroupCode,WorldBankIncomeGroupDisplay,CountryCode,CountryDisplay,SexCode,SexDisplay,DisplayValue,Numeric
0,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,68,68.00000
1,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2012,2012,EUR,Europe,WB_HI,High_income,IRL,Ireland,MLE,Male,69,69.30000
2,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,EMR,Eastern Mediterranean,WB_LI,Low_income,YEM,Yemen,BTSX,Both sexes,61,61.00000
3,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,AFR,Africa,WB_LMI,Lower_middle_income,NGA,Nigeria,FMLE,Female,48,48.00000
4,WHOSIS_000015,Life expectancy at age 60 (years),PUBLISHED,Published,1990,1990,SEAR,South_East Asia,WB_LMI,Lower_middle_income,THA,Thailand,FMLE,Female,19,19.00000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6403,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,EMR,Eastern Mediterranean,WB_LMI,Lower_middle_income,MAR,Morocco,BTSX,Both sexes,68,68.00000
6404,WHOSIS_000015,Life expectancy at age 60 (years),PUBLISHED,Published,1990,1990,WPR,Western Pacific,WB_UMI,Upper_middle_income,PLW,Palau,BTSX,Both sexes,16,16.00000
6405,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AFR,Africa,WB_LI,Low_income,BFA,Burkina Faso,FMLE,Female,51,51.00000
6406,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2012,2012,EUR,Europe,WB_LMI,Lower_middle_income,ALB,Albania,MLE,Male,64,63.50000


In [6]:
dflifeexp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6408 entries, 0 to 6407
Data columns (total 16 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   GhoCode                        6408 non-null   object 
 1   GhoDisplay                     6408 non-null   object 
 2   PublishStateCode               6408 non-null   object 
 3   PublishStateDisplay            6408 non-null   object 
 4   YearCode                       6408 non-null   int64  
 5   YearDisplay                    6408 non-null   int64  
 6   RegionCode                     6408 non-null   object 
 7   RegionDisplay                  6408 non-null   object 
 8   WorldBankIncomeGroupGroupCode  5244 non-null   object 
 9   WorldBankIncomeGroupDisplay    5244 non-null   object 
 10  CountryCode                    6408 non-null   object 
 11  CountryDisplay                 6408 non-null   object 
 12  SexCode                        6408 non-null   o

In [7]:
dflifeexp[dflifeexp.CountryCode == 'NIC']

Unnamed: 0,GhoCode,GhoDisplay,PublishStateCode,PublishStateDisplay,YearCode,YearDisplay,RegionCode,RegionDisplay,WorldBankIncomeGroupGroupCode,WorldBankIncomeGroupDisplay,CountryCode,CountryDisplay,SexCode,SexDisplay,DisplayValue,Numeric
0,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,68,68.0
344,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2013,2013,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,62,61.9
578,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,FMLE,Female,74,74.0
1130,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,FMLE,Female,76,76.0
1257,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,70,70.0
1967,WHOSIS_000015,Life expectancy at age 60 (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,21,21.0
2258,WHOSIS_000015,Life expectancy at age 60 (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,FMLE,Female,24,24.0
2383,WHOSIS_000015,Life expectancy at age 60 (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,BTSX,Both sexes,22,22.0
2478,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2012,2012,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,70,70.0
2538,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2012,2012,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,FMLE,Female,66,65.8


In [8]:
dflifeexp.GhoDisplay.unique()

array(['Life expectancy at birth (years)',
       'Healthy life expectancy (HALE) at birth (years)',
       'Life expectancy at age 60 (years)'], dtype=object)

In [9]:
dflifeexp[(dflifeexp.GhoCode == 'WHOSIS_000001') & (dflifeexp.CountryCode == 'NIC')]

Unnamed: 0,GhoCode,GhoDisplay,PublishStateCode,PublishStateDisplay,YearCode,YearDisplay,RegionCode,RegionDisplay,WorldBankIncomeGroupGroupCode,WorldBankIncomeGroupDisplay,CountryCode,CountryDisplay,SexCode,SexDisplay,DisplayValue,Numeric
0,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,68,68.0
578,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,FMLE,Female,74,74.0
1130,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,FMLE,Female,76,76.0
1257,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2000,2000,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,70,70.0
2478,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2012,2012,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,MLE,Male,70,70.0
3004,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2013,2013,AMR,Americas,,,NIC,Nicaragua,MLE,Male,71,70.73053
3178,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2013,2013,AMR,Americas,,,NIC,Nicaragua,FMLE,Female,77,76.62982
4930,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,1990,1990,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,BTSX,Both sexes,71,71.0
5319,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2012,2012,AMR,Americas,WB_LMI,Lower_middle_income,NIC,Nicaragua,BTSX,Both sexes,73,73.0
5926,WHOSIS_000001,Life expectancy at birth (years),PUBLISHED,Published,2013,2013,AMR,Americas,,,NIC,Nicaragua,BTSX,Both sexes,74,73.64848


In [10]:
dflifeexp.groupby('CountryDisplay')['SexDisplay'].nunique().sort_values() < 3

CountryDisplay
Afghanistan    False
Niger          False
Nigeria        False
Niue           False
Norway         False
               ...  
Grenada        False
Guatemala      False
Guinea         False
Denmark        False
Zimbabwe       False
Name: SexDisplay, Length: 194, dtype: bool

In [11]:
df_ultimo_ano = dflifeexp[(dflifeexp.SexCode == 'BTSX') & (dflifeexp.GhoCode == 'WHOSIS_000002')].sort_values(by = 'YearCode', ascending = False)
df_ultimo_ano

Unnamed: 0,GhoCode,GhoDisplay,PublishStateCode,PublishStateDisplay,YearCode,YearDisplay,RegionCode,RegionDisplay,WorldBankIncomeGroupGroupCode,WorldBankIncomeGroupDisplay,CountryCode,CountryDisplay,SexCode,SexDisplay,DisplayValue,Numeric
3141,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2013,2013,AMR,Americas,WB_LMI,Lower_middle_income,GUY,Guyana,BTSX,Both sexes,54,54.4
4653,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2013,2013,EMR,Eastern Mediterranean,WB_HI,High_income,OMN,Oman,BTSX,Both sexes,66,66.0
4857,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2013,2013,AFR,Africa,WB_LI,Low_income,ERI,Eritrea,BTSX,Both sexes,55,54.5
2670,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2013,2013,AMR,Americas,WB_LMI,Lower_middle_income,PRY,Paraguay,BTSX,Both sexes,65,64.7
1564,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2013,2013,EUR,Europe,WB_HI,High_income,ISL,Iceland,BTSX,Both sexes,72,72.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1383,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2000,2000,AFR,Africa,WB_UMI,Upper_middle_income,NAM,Namibia,BTSX,Both sexes,49,49.1
5008,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2000,2000,WPR,Western Pacific,WB_LMI,Lower_middle_income,MNG,Mongolia,BTSX,Both sexes,55,55.5
4161,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2000,2000,EUR,Europe,WB_LMI,Lower_middle_income,GEO,Georgia,BTSX,Both sexes,63,62.8
5026,WHOSIS_000002,Healthy life expectancy (HALE) at birth (years),PUBLISHED,Published,2000,2000,EUR,Europe,WB_HI,High_income,MCO,Monaco,BTSX,Both sexes,70,70.3


In [12]:
dfle = df_ultimo_ano.groupby('CountryDisplay')[['CountryCode', 'RegionDisplay', 'YearDisplay', 'WorldBankIncomeGroupDisplay', 'Numeric']].agg('first')
dfle = dfle.reset_index().rename(columns = {
    'CountryDisplay': 'Country',
    'RegionDisplay': 'Region',
    'YearDisplay': 'Year',
    'WorldBankIncomeGroupDisplay': 'IncomeGroup',
    'Numeric': 'LifeExp'
})
dfle.sample(10)

Unnamed: 0,Country,CountryCode,Region,Year,IncomeGroup,LifeExp
37,Comoros,COM,Africa,2013,Low_income,53.8
5,Antigua and Barbuda,ATG,Americas,2013,High_income,64.1
11,Bahamas,BHS,Americas,2013,High_income,64.4
108,Mauritius,MUS,Africa,2013,Upper_middle_income,64.8
14,Barbados,BRB,Americas,2013,High_income,66.2
38,Congo,COG,Africa,2013,Lower_middle_income,50.3
117,Namibia,NAM,Africa,2013,Upper_middle_income,58.1
172,Timor_Leste,TLS,South_East Asia,2013,Lower_middle_income,57.2
69,Guatemala,GTM,Americas,2013,Lower_middle_income,62.5
16,Belgium,BEL,Europe,2013,High_income,70.5


In [13]:
dfle.Region.unique()

array(['Eastern Mediterranean', 'Europe', 'Africa', 'Americas',
       'Western Pacific', 'South_East Asia'], dtype=object)

---
# Consumption per year 

In [14]:
import pandas as pd

dfbeer_raw = pd.read_csv('datasets/beer-consumption-per-person.csv')
dfbeer_raw.sample(5)

Unnamed: 0,Entity,Code,Year,"Indicator:Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) - Beverage Types:Beer"
1175,Bulgaria,BGR,2006,3.96
7369,Somalia,SOM,1978,0.01
570,Bahrain,BHR,1997,2.26
2598,Eritrea,ERI,1974,0.57
7014,Senegal,SEN,1988,0.27


In [15]:
dfbeer = dfbeer_raw.rename(columns = {
    'Indicator:Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) - Beverage Types:Beer': 'Beer_LitersAlcPerYear'
})
dfbeer['Beer_LitersPerYear'] = dfbeer['Beer_LitersAlcPerYear'] / 0.05  # 100 L de cerveja = 5 L de álcool

In [16]:
dfwine_raw = pd.read_csv('datasets/wine-consumption-per-person.csv')
dfwine_raw.sample(5)

Unnamed: 0,Entity,Code,Year,"Indicator:Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) - Beverage Types:Wine"
8181,Tunisia,TUN,1975,0.67
2620,Eritrea,ERI,1996,0.0
4137,Jordan,JOR,1979,0.02
6342,Philippines,PHL,1976,0.0
4544,Lesotho,LSO,1987,0.05


In [17]:
dfwine = dfwine_raw.rename(columns = {
    'Indicator:Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) - Beverage Types:Wine': 'Wine_LitersAlcPerYear'
})
dfwine['Wine_LitersPerYear'] = dfwine['Wine_LitersAlcPerYear'] / 0.12  # 3 L de álcool = 25 garrafas de 1 L de vinho

In [18]:
dfspirits_raw = pd.read_csv('datasets/spirits-consumption-per-person.csv')
dfspirits_raw.sample(5)

Unnamed: 0,Entity,Code,Year,"Indicator:Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) - Beverage Types:Spirits"
6705,Saint Kitts and Nevis,KNA,1967,3.56
8845,Yemen,YEM,2013,0.0
8713,Venezuela,VEN,1989,3.55
2013,Cuba,CUB,1977,1.17
3377,Guyana,GUY,1964,2.1


In [19]:
dfspirits = dfspirits_raw.rename(columns = {
    'Indicator:Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) - Beverage Types:Spirits': 'Spirits_LitersAlcPerYear'
})
dfspirits['Spirits_LitersPerYear'] = dfspirits['Spirits_LitersAlcPerYear'] / 0.4  # vodca contém +- 40% alcool por volume

In [20]:
dfalc_intermediate = dfbeer.merge(dfwine, on = ['Code', 'Year'], how = 'inner', suffixes = ('', '_y'))
dfalc = dfalc_intermediate.drop(columns = 'Entity_y').merge(dfspirits, on = ['Code', 'Year'], how = 'inner', suffixes = ('', '_y')).drop(columns = 'Entity_y')
dfalc['Total_LitersAlcPerYear'] = dfalc[[ f'{c}_LitersAlcPerYear' for c in ['Beer', 'Wine', 'Spirits'] ]].sum(axis=1)
dfalc.to_csv('datasets/alcool-tratado.csv')
dfalc.sample(5)

Unnamed: 0,Entity,Code,Year,Beer_LitersAlcPerYear,Beer_LitersPerYear,Wine_LitersAlcPerYear,Wine_LitersPerYear,Spirits_LitersAlcPerYear,Spirits_LitersPerYear,Total_LitersAlcPerYear
4810,Malawi,MWI,1970,0.11,2.2,0.01,0.083333,0.08,0.2,0.2
1305,Cambodia,KHM,2000,0.29,5.8,0.01,0.083333,1.21,3.025,1.51
3017,Georgia,GEO,1998,0.14,2.8,1.89,15.75,0.06,0.15,2.09
2434,Ecuador,ECU,2004,1.81,36.2,0.05,0.416667,1.77,4.425,3.63
1999,Cuba,CUB,1965,1.02,20.4,0.11,0.916667,1.12,2.8,2.25


In [21]:
# religiao

dfrel_raw = pd.read_csv('datasets/religiao.csv')
dfrel_raw

Unnamed: 0,Country or Area,Year,Area,Sex,Religion,Record Type,Reliability,Source Year,Value,Value Footnotes
0,Albania,2011,Total,Both Sexes,Total,Census - de jure - complete tabulation,"Final figure, complete",2013,2800138.0,
1,Albania,2011,Total,Both Sexes,Atheist,Census - de jure - complete tabulation,"Final figure, complete",2013,69995.0,
2,Albania,2011,Total,Both Sexes,Catholic,Census - de jure - complete tabulation,"Final figure, complete",2013,280921.0,
3,Albania,2011,Total,Both Sexes,Evangelical,Census - de jure - complete tabulation,"Final figure, complete",2013,3797.0,
4,Albania,2011,Total,Both Sexes,Muslim,Census - de jure - complete tabulation,"Final figure, complete",2013,1587608.0,
...,...,...,...,...,...,...,...,...,...,...
20875,Zambia,2010,Rural,Female,Hindu,Census - de facto - complete tabulation,"Final figure, complete",2015,189.0,
20876,Zambia,2010,Rural,Female,Muslim,Census - de facto - complete tabulation,"Final figure, complete",2015,12038.0,
20877,Zambia,2010,Rural,Female,Protestant,Census - de facto - complete tabulation,"Final figure, complete",2015,2915043.0,
20878,Zambia,2010,Rural,Female,Other,Census - de facto - complete tabulation,"Final figure, complete",2015,72222.0,


In [22]:
rel_mask = (
    (dfrel_raw['Reliability'] == 'Final figure, complete')
    #& dfrel_raw['Record Type'] == 'Census - de jure - complete tabulation')
    & (dfrel_raw['Sex'] == 'Both Sexes')
    & (dfrel_raw['Area'] == 'Total')
)

In [102]:
def maior_religiao(country_year_rel_pop):
    df = country_year_rel_pop.copy()
    if df['Religion'].iloc[0] == 'Total':
        res = df.iloc[1]  # retorna o 2o valor, correspondente 'a maior religiao (a primeira posição é o Total)
    else:
        res = df.iloc[0]

    return res['Religion']

dfrel_year = dfrel_raw[rel_mask].sort_values(by = 'Value', ascending = False).groupby(['Country or Area', 'Year'])[['Religion', 'Value']].apply(maior_religiao)
dfrel_year.name = 'MajorReligion'
dfrel_year = dfrel_year.reset_index()
dfrel_year

Unnamed: 0,Country or Area,Year,MajorReligion
0,Albania,2011,Muslim
1,Anguilla,2001,Anglican
2,Antigua and Barbuda,2001,Anglican
3,Armenia,2011,Armenian Apostolic
4,Aruba,2000,Roman Catholic
...,...,...,...
191,United Kingdom of Great Britain and Northern I...,2001,Christian
192,United Kingdom of Great Britain and Northern I...,2011,Christian
193,Vanuatu,2009,Presbyterian
194,Viet Nam,1999,No Religion


In [103]:
dfrel = dfrel_year.sort_values(by = 'Year', ascending = True).groupby('Country or Area')['MajorReligion'].agg('first').sort_index().reset_index()
dfrel

Unnamed: 0,Country or Area,MajorReligion
0,Albania,Muslim
1,Anguilla,Anglican
2,Antigua and Barbuda,Anglican
3,Armenia,Armenian Apostolic
4,Aruba,Roman Catholic
...,...,...
107,Uganda,Catholic
108,United Kingdom of Great Britain and Northern I...,Christian
109,Vanuatu,Presbyterian
110,Viet Nam,No Religion


In [104]:
# try to get country codes
# https://restcountries.eu/
# https://github.com/apilayer/restcountries

import requests
from ratelimiter import RateLimiter

reqget = RateLimiter(max_calls = 30, period = 1)(requests.get)

def api_url_search_fullname(name):
    return rf'https://restcountries.eu/rest/v2/name/{name}?fullText=true'

def api_get_alpha3(name, return_json = False):
    r = reqget(api_url_search_fullname(name))
    try:
        country = r.json()
    except:
        return 
    
    if return_json:
        return country
    
    try:
        return country[0]['alpha3Code']
    except KeyError:
        return


dfrel['CountryCode'] = dfrel['Country or Area'].apply(api_get_alpha3)
dfrel

Unnamed: 0,Country or Area,MajorReligion,CountryCode
0,Albania,Muslim,ALB
1,Anguilla,Anglican,AIA
2,Antigua and Barbuda,Anglican,ATG
3,Armenia,Armenian Apostolic,ARM
4,Aruba,Roman Catholic,ABW
...,...,...,...
107,Uganda,Catholic,UGA
108,United Kingdom of Great Britain and Northern I...,Christian,GBR
109,Vanuatu,Presbyterian,VUT
110,Viet Nam,No Religion,VNM


In [106]:
dfrel[dfrel.CountryCode.isna()]

Unnamed: 0,Country or Area,MajorReligion,CountryCode
15,British Virgin Islands,Methodist,
27,Czechia,No Religion,
73,North Macedonia,Orthodox,
85,Saint Helena ex. dep.,Church of England,


In [107]:
# completando os faltantes
dfrel.loc[dfrel['Country or Area'] == 'North Macedonia', 'CountryCode'] = 'MKD'
dfrel.loc[dfrel['Country or Area'] == 'British Virgin Islands', 'CountryCode'] = 'VGB'
dfrel.loc[dfrel['Country or Area'] == 'Czechia', 'CountryCode'] = 'CZE'
dfrel.loc[dfrel['Country or Area'] == 	'Saint Helena ex. dep.', 'CountryCode'] = 'SHN'

In [108]:
dfrel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112 entries, 0 to 111
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Country or Area  112 non-null    object
 1   MajorReligion    112 non-null    object
 2   CountryCode      112 non-null    object
dtypes: object(3)
memory usage: 2.8+ KB


In [109]:
dfrel.to_csv('datasets/religiao-tratado.csv', index = False)
dfrel

Unnamed: 0,Country or Area,MajorReligion,CountryCode
0,Albania,Muslim,ALB
1,Anguilla,Anglican,AIA
2,Antigua and Barbuda,Anglican,ATG
3,Armenia,Armenian Apostolic,ARM
4,Aruba,Roman Catholic,ABW
...,...,...,...
107,Uganda,Catholic,UGA
108,United Kingdom of Great Britain and Northern I...,Christian,GBR
109,Vanuatu,Presbyterian,VUT
110,Viet Nam,No Religion,VNM


In [100]:
dfrelfn = pd.read_csv('datasets/religiao_footnotes.csv')
dfrelfn

Unnamed: 0,footnoteSeqID,Footnote
0,1,"Because of rounding, totals are not in all cas..."
1,2,These data have been randomly rounded to prote...
2,3,"Including population in off-shore, migratory a..."
3,4,"Category 'No religion' includes Agnoticism, At..."
4,5,Including Church of Christ.
...,...,...
79,80,Data refer to non-institutional population.
80,81,Data refer to resident population.
81,82,Excluding population enumerated in hotels.
82,83,"Other includes Orthodox, Bahai, Other Christia..."


---
# Roadmap

In [26]:
# 1. MERGE dos dados de consumo alcoolico 'dfdrinks' com expectativa de vida 'dfle'

# 2. mapa coroplético: cor é consumo alcoolico
#                      hover = 'religiao', 'income'

# 3. scatter x: consumo de alcool y: expectativa de vida
#               cada ponto é um país
#               hue: ( 'regiao', 'level of income', 'religiao' (tentar pegar) )


In [27]:
dfle.sample(10)

Unnamed: 0,Country,CountryCode,Region,Year,IncomeGroup,LifeExp
161,Spain,ESP,Europe,2013,High_income,73.2
18,Benin,BEN,Africa,2013,Low_income,50.4
155,Slovakia,SVK,Europe,2013,High_income,66.7
31,Canada,CAN,Americas,2013,High_income,72.0
35,China,CHN,Western Pacific,2013,Lower_middle_income,68.0
181,Ukraine,UKR,Europe,2013,Lower_middle_income,62.9
67,Greece,GRC,Europe,2013,High_income,71.3
190,Viet Nam,VNM,Western Pacific,2013,Low_income,65.9
57,Estonia,EST,Europe,2013,High_income,67.0
117,Namibia,NAM,Africa,2013,Upper_middle_income,58.1


In [28]:
dfalc.sample(10)

Unnamed: 0,Entity,Code,Year,Beer_LitersAlcPerYear,Beer_LitersPerYear,Wine_LitersAlcPerYear,Wine_LitersPerYear,Spirits_LitersAlcPerYear,Spirits_LitersPerYear,Total_LitersAlcPerYear
8257,Turkey,TUR,1999,0.78,15.6,0.11,0.916667,0.57,1.425,1.46
1439,Cape Verde,CPV,1978,0.33,6.6,1.37,11.416667,0.51,1.275,2.21
1512,Central African Republic,CAF,2000,0.42,8.4,0.03,0.25,0.02,0.05,0.47
2430,Ecuador,ECU,2000,1.35,27.0,0.03,0.25,2.61,6.525,3.99
1868,Costa Rica,CRI,1969,0.53,10.6,0.04,0.333333,3.13,7.825,3.7
3968,Italy,ITA,1973,1.04,20.8,15.93,132.75,2.92,7.3,19.89
2059,Cyprus,CYP,1971,1.28,25.6,1.42,11.833333,2.3,5.75,5.0
8342,Uganda,UGA,1985,0.09,1.8,0.0,0.0,0.2,0.5,0.29
1722,Colombia,COL,2000,2.89,57.8,0.03,0.25,1.7,4.25,4.62
7287,Slovenia,SVN,2002,4.32,86.4,3.28,27.333333,1.69,4.225,9.29


In [45]:
dfrel.sample(10)

Unnamed: 0,Country or Area,MajorReligion
102,Timor-Leste,Catholic
35,Georgia,Christian Orthodox
93,Sint Maarten (Dutch part),Roman Catholic
43,Indonesia,Muslim
57,Maldives,Islam
36,Germany,Other
0,Albania,Muslim
63,Montenegro,Christian Orthodox
22,Cayman Islands,Church of God
74,Palau,Catholic


In [33]:
# 1. MERGE
dfalc_s = dfalc.sort_values(by = ['Year', 'Code'])
dfle_s = dfle.sort_values(by = ['Year', 'CountryCode'])
dfalcle = pd.merge_asof(left = dfalc_s, right = dfle_s, left_by = 'Code', left_on = 'Year', right_by = 'CountryCode', right_on = 'Year', direction = 'nearest')
dfalcle.sample(10)

Unnamed: 0,Entity,Code,Year,Beer_LitersAlcPerYear,Beer_LitersPerYear,Wine_LitersAlcPerYear,Wine_LitersPerYear,Spirits_LitersAlcPerYear,Spirits_LitersPerYear,Total_LitersAlcPerYear,Country,CountryCode,Region,IncomeGroup,LifeExp
7883,Costa Rica,CRI,2008,2.46,49.2,0.2,1.666667,1.53,3.825,4.19,Costa Rica,CRI,Americas,Upper_middle_income,69.3
5783,Samoa,WSM,1996,2.36,47.2,0.07,0.583333,0.78,1.95,3.21,Samoa,WSM,Western Pacific,Lower_middle_income,64.1
178,Denmark,DNK,1962,5.05,101.0,0.56,4.666667,0.95,2.375,6.56,Denmark,DNK,Europe,High_income,70.0
491,France,FRA,1964,2.76,55.2,17.91,149.25,3.29,8.225,23.96,France,FRA,Europe,High_income,71.8
6078,Mexico,MEX,1998,3.73,74.6,0.03,0.25,1.07,2.675,4.83,Mexico,MEX,Americas,Upper_middle_income,67.0
2465,Bangladesh,BGD,1977,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Bangladesh,BGD,South_East Asia,Low_income,60.8
8651,India,IND,2012,0.19,3.8,0.0,0.0,2.91,7.275,3.1,India,IND,South_East Asia,Lower_middle_income,57.5
312,Canada,CAN,1963,4.32,86.4,0.44,3.666667,2.51,6.275,7.27,Canada,CAN,Americas,High_income,72.0
5380,Saudi Arabia,SAU,1994,0.05,1.0,0.0,0.0,0.0,0.0,0.05,Saudi Arabia,SAU,Eastern Mediterranean,High_income,65.4
1464,Malaysia,MYS,1970,0.27,5.4,0.01,0.083333,0.13,0.325,0.41,Malaysia,MYS,Western Pacific,Upper_middle_income,64.6


In [118]:
dfalclerel = pd.merge(left = dfalcle, right = dfrel, on = 'CountryCode', how = 'left')

dfalclerel = dfalclerel[[
    'Country', 'CountryCode', 'Region','IncomeGroup','LifeExp', 'MajorReligion', 
    'Year', 'Beer_LitersPerYear','Wine_LitersAlcPerYear', 'Wine_LitersPerYear',
    'Spirits_LitersAlcPerYear', 'Spirits_LitersPerYear',
    'Total_LitersAlcPerYear'
]]

#dfalclerel.loc[dfalclerel.MajorReligion.isna(), 'MajorReligion'] = 'Not polled'

dfalclerel.to_csv('app/datasets/alcool-expect_vida-religiao.csv', index = False)