#Qual a relação entre o PIB dos países e a sua Emissão de Gases do Efeito Estufa?

- Esse probjeto busca responder, através de uma análise exploratória dos dados a questão acima. Acredito que os dados desse dataset podem ainda resultar em grande riqueza para análises de outros aspectos, mas esse projeto focará nessa questão principal. Com a urgência cada vez maior da crise climática, queremos entender a relação entre o crescimento econômico, através do indicador do PIB e os dados de emissão de gases do efeito do estufa dos países.

- Para isso, utilizaremos o dataset do Our World in Data, que traz esses dados em detalhe, e está disponível em: https://github.com/owid/co2-data/blob/7de6849c657ef2c4bcc199c9d8de9acf2ff8d81d/owid-co2-data.csv

- Faremos uso desse notebook para uma limpeza e preparação dos dados, bem como uma análise exploratória inicial e complementaremos com a construção de um Dashboard no Tableau, que pode ser consultado nesse link:

- O dicionário descrevendo em detalhes todas as colunas está presente nesse link: https://github.com/owid/co2-data/blob/7de6849c657ef2c4bcc199c9d8de9acf2ff8d81d/owid-co2-codebook.csv

#1. Importando os dados

In [None]:
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt

Importação dos Dados sobre Emissão de CO2 do Our World in Data
Disponível em: https://github.com/owid/co2-data/blob/7de6849c657ef2c4bcc199c9d8de9acf2ff8d81d/owid-co2-data.csv

In [None]:
emissoes = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Análise Emissões CO2 por País/owid-co2-data.csv")
emissoes

Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
0,Afghanistan,1850,AFG,3752993.0,,,,,,,...,,,,,,,,,,
1,Afghanistan,1851,AFG,3767956.0,,,,,,,...,,0.165,0.000,0.000,0.000,0.0,,,,
2,Afghanistan,1852,AFG,3783940.0,,,,,,,...,,0.164,0.000,0.000,0.000,0.0,,,,
3,Afghanistan,1853,AFG,3800954.0,,,,,,,...,,0.164,0.000,0.000,0.000,0.0,,,,
4,Afghanistan,1854,AFG,3818038.0,,,,,,,...,,0.163,0.000,0.000,0.000,0.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50593,Zimbabwe,2017,ZWE,14751101.0,2.194784e+10,0.469,0.032,9.596,-0.937,-8.899,...,,0.114,0.001,0.001,0.002,0.0,115.59,28.30,0.910,9.486
50594,Zimbabwe,2018,ZWE,15052191.0,2.271535e+10,0.558,0.037,11.795,2.199,22.920,...,,0.114,0.001,0.001,0.002,0.0,118.22,30.83,0.771,6.537
50595,Zimbabwe,2019,ZWE,15354606.0,,0.570,0.037,11.115,-0.681,-5.772,...,,0.113,0.001,0.001,0.002,0.0,117.96,30.53,0.978,8.795
50596,Zimbabwe,2020,ZWE,15669663.0,,0.570,0.036,10.608,-0.507,-4.559,...,,0.112,0.001,0.001,0.002,0.0,,,1.006,9.481


In [None]:
pd.DataFrame(emissoes.dtypes)

Unnamed: 0,0
country,object
year,int64
iso_code,object
population,float64
gdp,float64
...,...
temperature_change_from_n2o,float64
total_ghg,float64
total_ghg_excluding_lucf,float64
trade_co2,float64


Importo também o dicionário dos dados para esse Notebook, para facilitar caso seja necessário realizar alguma consulta futura

In [None]:
dicionario_dados = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Análise Emissões CO2 por País/owid-co2-codebook.csv')
dicionario_dados

Unnamed: 0,column,description,source
0,country,Geographic location.,Our World in Data
1,year,Year of observation.,Our World in Data
2,iso_code,"ISO 3166-1 alpha-3, three-letter country codes.",International Organization for Standardization
3,population,Population of each country or region.,Our World in Data based on different sources (...
4,gdp,Gross domestic product measured in internation...,Maddison Project Database 2020 (Bolt and van Z...
...,...,...,...
74,temperature_change_from_n2o,Change in global mean surface temperature (in ...,Our World in Data based on Jones et al. (2023)
75,total_ghg,Total greenhouse gas emissions including land-...,Our World in Data based on Climate Analysis In...
76,total_ghg_excluding_lucf,Total greenhouse gas emissions excluding land-...,Our World in Data based on Climate Analysis In...
77,trade_co2,Annual net carbon dioxide (CO₂) emissions embe...,Global Carbon Budget (2022)


# 2. Tratando os Dados



###Limpando a Coluna dos países

Ao visualisar os dados vemos que coluna "country" tem além de dados de países, também de regiões, por isso queremos separar em 2 DFs, um apenas com os países, outro apenas com as regiões, para que não haja repetição e confusão com os dados.

In [None]:
emissoes['country'].unique()

array(['Afghanistan', 'Africa', 'Africa (GCP)', 'Aland Islands',
       'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola',
       'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina',
       'Armenia', 'Aruba', 'Asia', 'Asia (GCP)',
       'Asia (excl. China and India)', 'Australia', 'Austria',
       'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan',
       'Bolivia', 'Bonaire Sint Eustatius and Saba',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Central African Republic', 'Central America (GCP)', 'Chad',
       'Chile', 'China', 'Christmas Island', 'Colombia', 'Comoros',
       'Congo', 'Cook Islands', 'Costa Rica', "Cote d'Ivoire", 'Croatia',
       'Cuba', 'Curacao', 'Cyprus', 'Czechia',
       'Democratic Republic of Congo', 'Denmar

Com a lista acima, selecionamos todas as regiões que não eram países para excluir da listagem

In [None]:
regioes = ['Africa', 'Africa (GCP)', 'Aland Islands', 'Asia', 'Asia (GCP)', 'Asia (excl. China and India)', 'Europe', 'Europe (GCP)', 'Europe (excl. EU-27)', 'Europe (excl. EU-28)', 'European Union (27)',
        'European Union (27) (GCP)', 'European Union (28)', 'High-income countries', 'International transport', 'Least developed countries (Jones et al. 2023)', 'Low-income countries', 'Lower-middle-income countries',
        'Middle East (GCP)', 'Non-OECD (GCP)', 'North America', 'North America (GCP)', 'North America (excl. USA)', 'OECD (GCP)', 'OECD (Jones et al. 2023)', 'Oceania', 'Oceania (GCP)', 'South America',
        'South America (GCP)', 'Upper-middle-income countries', 'World', 'Central America (GCP)', 'French Equatorial Africa (GCP)', 'French Equatorial Africa (Jones et al. 2023)', 'French West Africa (GCP)', 'French West Africa (Jones et al. 2023)', 'Kuwaiti Oil Fires (GCP)', 'Kuwaiti Oil Fires (Jones et al. 2023)', 'Leeward Islands (GCP)', 'Leeward Islands (Jones et al. 2023)',
        'Micronesia (country)', 'North America (GCP)', 'North America (excl. USA)', 'OECD (GCP)', 'OECD (Jones et al. 2023)', 'Oceania (GCP)',
        'Ryukyu Islands (GCP)', 'Ryukyu Islands (Jones et al. 2023)', 'Saint Kitts-Nevis-Anguilla (GCP)', 'Saint Kitts-Nevis-Anguilla (Jones et al. 2023)', 'South America (GCP)', 'Upper-middle-income countries',  'Panama Canal Zone (GCP)',   'Panama Canal Zone (Jones et al. 2023)',
           'St. Kitts-Nevis-Anguilla (GCP)', 'St. Kitts-Nevis-Anguilla (Jones et al. 2023)']

mask = emissoes['country'].isin(regioes)

emissoes_filtradas = emissoes[~mask]

emissoes_filtradas['country'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
       'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda',
       'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria',
       'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan',
       'Bolivia', 'Bonaire Sint Eustatius and Saba',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Central African Republic', 'Chad', 'Chile', 'China',
       'Christmas Island', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao',
       'Cyprus', 'Czechia', 'Democratic Republic of Congo', 'Denmark',
       'Djibouti', 'Dominica', 'Dominican Republic', 'East Timor',
       'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea'

In [None]:
emissoes_filtradas

Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
0,Afghanistan,1850,AFG,3752993.0,,,,,,,...,,,,,,,,,,
1,Afghanistan,1851,AFG,3767956.0,,,,,,,...,,0.165,0.000,0.000,0.000,0.0,,,,
2,Afghanistan,1852,AFG,3783940.0,,,,,,,...,,0.164,0.000,0.000,0.000,0.0,,,,
3,Afghanistan,1853,AFG,3800954.0,,,,,,,...,,0.164,0.000,0.000,0.000,0.0,,,,
4,Afghanistan,1854,AFG,3818038.0,,,,,,,...,,0.163,0.000,0.000,0.000,0.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50593,Zimbabwe,2017,ZWE,14751101.0,2.194784e+10,0.469,0.032,9.596,-0.937,-8.899,...,,0.114,0.001,0.001,0.002,0.0,115.59,28.30,0.910,9.486
50594,Zimbabwe,2018,ZWE,15052191.0,2.271535e+10,0.558,0.037,11.795,2.199,22.920,...,,0.114,0.001,0.001,0.002,0.0,118.22,30.83,0.771,6.537
50595,Zimbabwe,2019,ZWE,15354606.0,,0.570,0.037,11.115,-0.681,-5.772,...,,0.113,0.001,0.001,0.002,0.0,117.96,30.53,0.978,8.795
50596,Zimbabwe,2020,ZWE,15669663.0,,0.570,0.036,10.608,-0.507,-4.559,...,,0.112,0.001,0.001,0.002,0.0,,,1.006,9.481


Temos muitos valores vazios(NaN) no nosso DF, isso pode acabar por prejudicar nossa análise.Por isso, vamos substituir todos eles por zero.

In [None]:
emissoes_filtradas.fillna(0, inplace=True)
emissoes_filtradas

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  emissoes_filtradas.fillna(0, inplace=True)


Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
0,Afghanistan,1850,AFG,3752993.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.000,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
1,Afghanistan,1851,AFG,3767956.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.165,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
2,Afghanistan,1852,AFG,3783940.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.164,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
3,Afghanistan,1853,AFG,3800954.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.164,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
4,Afghanistan,1854,AFG,3818038.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.163,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50593,Zimbabwe,2017,ZWE,14751101.0,2.194784e+10,0.469,0.032,9.596,-0.937,-8.899,...,0.0,0.114,0.001,0.001,0.002,0.0,115.59,28.30,0.910,9.486
50594,Zimbabwe,2018,ZWE,15052191.0,2.271535e+10,0.558,0.037,11.795,2.199,22.920,...,0.0,0.114,0.001,0.001,0.002,0.0,118.22,30.83,0.771,6.537
50595,Zimbabwe,2019,ZWE,15354606.0,0.000000e+00,0.570,0.037,11.115,-0.681,-5.772,...,0.0,0.113,0.001,0.001,0.002,0.0,117.96,30.53,0.978,8.795
50596,Zimbabwe,2020,ZWE,15669663.0,0.000000e+00,0.570,0.036,10.608,-0.507,-4.559,...,0.0,0.112,0.001,0.001,0.002,0.0,0.00,0.00,1.006,9.481


##Traduzindo as Colunas e nomes de países no DataFrame

Abaixo, realizamos um procedimento de tradução das colunas. Esse dicionário foi gerado com auxílio do ChatGPT a afim de agilizar o trabalho, pois depois podemos ajustar detalhes das nomenclaturas se necessário no painel final do Tableau ou nas legendas dos gráficos que serão plotados aqui

In [None]:
# Mapeamento de tradução de colunas em inglês para português, removendo acentos e caracteres especiais e substituindo "luc" por "mut", "lucf" por "mutf" e "ghg" por "gee"
mapeamento_traducao = {
    'country': 'pais',
    'year': 'ano',
    'iso_code': 'codigo_iso',
    'population': 'populacao',
    'gdp': 'pib',
    'cement_co2': 'cimento_co2',
    'cement_co2_per_capita': 'cimento_co2_per_capita',
    'co2': 'co2',
    'co2_growth_abs': 'crescimento_abs_co2',
    'co2_growth_prct': 'crescimento_pct_co2',
    'co2_including_luc': 'co2_incluindo_mut',
    'co2_including_luc_growth_abs': 'crescimento_abs_co2_incluindo_mut',
    'co2_including_luc_growth_prct': 'crescimento_pct_co2_incluindo_mut',
    'co2_including_luc_per_capita': 'co2_incluindo_mut_per_capita',
    'co2_including_luc_per_gdp': 'co2_incluindo_mut_per_pib',
    'co2_including_luc_per_unit_energy': 'co2_incluindo_mut_per_unidade_energia',
    'co2_per_capita': 'co2_per_capita',
    'co2_per_gdp': 'co2_per_pib',
    'co2_per_unit_energy': 'co2_per_unidade_energia',
    'coal_co2': 'carvao_co2',
    'coal_co2_per_capita': 'carvao_co2_per_capita',
    'consumption_co2': 'consumo_co2',
    'consumption_co2_per_capita': 'consumo_co2_per_capita',
    'consumption_co2_per_gdp': 'consumo_co2_per_pib',
    'cumulative_cement_co2': 'co2_cimento_cumulativo',
    'cumulative_co2': 'co2_cumulativo',
    'cumulative_co2_including_luc': 'co2_cumulativo_incluindo_mut',
    'cumulative_coal_co2': 'co2_carvao_cumulativo',
    'cumulative_flaring_co2': 'co2_queimadas_cumulativo',
    'cumulative_gas_co2': 'co2_gas_cumulativo',
    'cumulative_luc_co2': 'co2_mut_cumulativo',
    'cumulative_oil_co2': 'co2_oleo_cumulativo',
    'cumulative_other_co2': 'co2_outros_cumulativo',
    'energy_per_capita': 'energia_per_capita',
    'energy_per_gdp': 'energia_per_pib',
    'flaring_co2': 'co2_queimadas',
    'flaring_co2_per_capita': 'co2_queimadas_per_capita',
    'gas_co2': 'co2_gas',
    'gas_co2_per_capita': 'co2_gas_per_capita',
    'ghg_excluding_lucf_per_capita': 'gee_excluindo_mutf_per_capita',
    'ghg_per_capita': 'gee_per_capita',
    'land_use_change_co2': 'co2_mudanca_uso_do_solo',
    'land_use_change_co2_per_capita': 'co2_mudanca_uso_do_solo_per_capita',
    'methane': 'metano',
    'methane_per_capita': 'metano_per_capita',
    'nitrous_oxide': 'oxido_nitroso',
    'nitrous_oxide_per_capita': 'oxido_nitroso_per_capita',
    'oil_co2': 'co2_oleo',
    'oil_co2_per_capita': 'co2_oleo_per_capita',
    'other_co2_per_capita': 'outros_co2_per_capita',
    'other_industry_co2': 'co2_outros_industria',
    'primary_energy_consumption': 'consumo_energia_primaria',
    'share_global_cement_co2': 'participacao_global_cimento_co2',
    'share_global_co2': 'participacao_global_co2',
    'share_global_co2_including_luc': 'participacao_global_co2_incluindo_mut',
    'share_global_coal_co2': 'participacao_global_coal_co2',
    'share_global_cumulative_cement_co2': 'participacao_global_cimento_co2_cumulativo',
    'share_global_cumulative_co2': 'participacao_global_co2_cumulativo',
    'share_global_cumulative_co2_including_luc': 'participacao_global_co2_cumulativo_incluindo_mut',
    'share_global_cumulative_coal_co2': 'participacao_global_coal_co2_cumulativo',
    'share_global_cumulative_flaring_co2': 'participacao_global_co2_queimadas_cumulativo',
    'share_global_cumulative_gas_co2': 'participacao_global_co2_gas_cumulativo',
    'share_global_cumulative_luc_co2': 'participacao_global_co2_mut_cumulativo',
    'share_global_cumulative_oil_co2': 'participacao_global_co2_oleo_cumulativo',
    'share_global_cumulative_other_co2': 'participacao_global_co2_outros_cumulativo',
    'share_global_flaring_co2': 'participacao_global_co2_queimadas',
    'share_global_gas_co2': 'participacao_global_co2_gas',
    'share_global_luc_co2': 'participacao_global_co2_mut',
    'share_global_oil_co2': 'participacao_global_co2_oleo',
    'share_global_other_co2': 'participacao_global_co2_outros',
    'share_of_temperature_change_from_ghg': 'participacao_mudanca_temperatura_por_gee',
    'temperature_change_from_ch4': 'mudanca_temperatura_por_ch4',
    'temperature_change_from_co2': 'mudanca_temperatura_por_co2',
    'temperature_change_from_ghg': 'mudanca_temperatura_por_gee',
    'temperature_change_from_n2o': 'mudanca_temperatura_por_n2o',
    'total_ghg': 'total_gee',
    'total_ghg_excluding_lucf': 'total_gee_excluindo_mutf',
    'trade_co2': 'co2_comercio',
    'trade_co2_share': 'participacao_co2_comercio'
}


# Criar uma cópia do DataFrame "emissoes_filtradas" com nomes de colunas em português
emissoes_traduzidas = emissoes_filtradas.rename(columns=mapeamento_traducao)

emissoes_traduzidas

Unnamed: 0,pais,ano,codigo_iso,populacao,pib,cimento_co2,cimento_co2_per_capita,co2,crescimento_abs_co2,crescimento_pct_co2,...,participacao_global_co2_outros,participacao_mudanca_temperatura_por_gee,mudanca_temperatura_por_ch4,mudanca_temperatura_por_co2,mudanca_temperatura_por_gee,mudanca_temperatura_por_n2o,total_gee,total_gee_excluindo_mutf,co2_comercio,participacao_co2_comercio
0,Afghanistan,1850,AFG,3752993.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.000,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
1,Afghanistan,1851,AFG,3767956.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.165,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
2,Afghanistan,1852,AFG,3783940.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.164,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
3,Afghanistan,1853,AFG,3800954.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.164,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
4,Afghanistan,1854,AFG,3818038.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.163,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50593,Zimbabwe,2017,ZWE,14751101.0,2.194784e+10,0.469,0.032,9.596,-0.937,-8.899,...,0.0,0.114,0.001,0.001,0.002,0.0,115.59,28.30,0.910,9.486
50594,Zimbabwe,2018,ZWE,15052191.0,2.271535e+10,0.558,0.037,11.795,2.199,22.920,...,0.0,0.114,0.001,0.001,0.002,0.0,118.22,30.83,0.771,6.537
50595,Zimbabwe,2019,ZWE,15354606.0,0.000000e+00,0.570,0.037,11.115,-0.681,-5.772,...,0.0,0.113,0.001,0.001,0.002,0.0,117.96,30.53,0.978,8.795
50596,Zimbabwe,2020,ZWE,15669663.0,0.000000e+00,0.570,0.036,10.608,-0.507,-4.559,...,0.0,0.112,0.001,0.001,0.002,0.0,0.00,0.00,1.006,9.481


Aqui optamos também por traduzir todos os nomes dos países que estavam presentes no nosso dataset

In [None]:
traducoes = {
    "Afghanistan": "Afeganistão",
    "Albania": "Albânia",
    "Algeria": "Argélia",
    "American Samoa": "Samoa Americana",
    "Andorra": "Andorra",
    "Angola": "Angola",
    "Anguilla": "Anguilla",
    "Antarctica": "Antártica",
    "Antigua and Barbuda": "Antígua e Barbuda",
    "Argentina": "Argentina",
    "Armenia": "Armênia",
    "Aruba": "Aruba",
    "Australia": "Austrália",
    "Austria": "Áustria",
    "Azerbaijan": "Azerbaijão",
    "Bahamas": "Bahamas",
    "Bahrain": "Bahrain",
    "Bangladesh": "Bangladesh",
    "Barbados": "Barbados",
    "Belarus": "Bielorrússia",
    "Belgium": "Bélgica",
    "Belize": "Belize",
    "Benin": "Benin",
    "Bermuda": "Bermudas",
    "Bhutan": "Butão",
    "Bolivia": "Bolívia",
    "Bonaire Sint Eustatius and Saba": "Bonaire, Santo Eustáquio e Saba",
    "Bosnia and Herzegovina": "Bósnia e Herzegovina",
    "Botswana": "Botsuana",
    "Brazil": "Brasil",
    "British Virgin Islands": "Ilhas Virgens Britânicas",
    "Brunei": "Brunei",
    "Bulgaria": "Bulgária",
    "Burkina Faso": "Burkina Faso",
    "Burundi": "Burundi",
    "Cambodia": "Camboja",
    "Cameroon": "Camarões",
    "Canada": "Canadá",
    "Cape Verde": "Cabo Verde",
    "Central African Republic": "República Centro-Africana",
    "Chad": "Chade",
    "Chile": "Chile",
    "China": "China",
    "Christmas Island": "Ilha de Natal",
    "Colombia": "Colômbia",
    "Comoros": "Comores",
    "Congo": "Congo",
    "Cook Islands": "Ilhas Cook",
    "Costa Rica": "Costa Rica",
    "Cote d'Ivoire": "Costa do Marfim",
    "Croatia": "Croácia",
    "Cuba": "Cuba",
    "Curacao": "Curaçao",
    "Cyprus": "Chipre",
    "Czechia": "Tchéquia",
    "Democratic Republic of Congo": "República Democrática do Congo",
    "Denmark": "Dinamarca",
    "Djibouti": "Djibouti",
    "Dominica": "Dominica",
    "Dominican Republic": "República Dominicana",
    "East Timor": "Timor-Leste",
    "Ecuador": "Equador",
    "Egypt": "Egito",
    "El Salvador": "El Salvador",
    "Equatorial Guinea": "Guiné Equatorial",
    "Eritrea": "Eritreia",
    "Estonia": "Estônia",
    "Eswatini": "Suazilândia",
    "Ethiopia": "Etiópia",
    "Falkland Islands": "Ilhas Malvinas",
    "Faroe Islands": "Ilhas Faroe",
    "Fiji": "Fiji",
    "Finland": "Finlândia",
    "France": "França",
    "French Guiana": "Guiana Francesa",
    "French Polynesia": "Polinésia Francesa",
    "Gabon": "Gabão",
    "Gambia": "Gâmbia",
    "Georgia": "Geórgia",
    "Germany": "Alemanha",
    "Ghana": "Gana",
    "Greece": "Grécia",
    "Greenland": "Groenlândia",
    "Grenada": "Granada",
    "Guadeloupe": "Guadalupe",
    "Guatemala": "Guatemala",
    "Guernsey": "Guernsey",
    "Guinea": "Guiné",
    "Guinea-Bissau": "Guiné-Bissau",
    "Guyana": "Guiana",
    "Haiti": "Haiti",
    "Honduras": "Honduras",
    "Hong Kong": "Hong Kong",
    "Hungary": "Hungria",
    "Iceland": "Islândia",
    "India": "Índia",
    "Indonesia": "Indonésia",
    "Iran": "Irã",
    "Iraq": "Iraque",
    "Ireland": "Irlanda",
    "Isle of Man": "Ilha de Man",
    "Israel": "Israel",
    "Italy": "Itália",
    "Jamaica": "Jamaica",
    "Japan": "Japão",
    "Jersey": "Jersey",
    "Jordan": "Jordânia",
    "Kazakhstan": "Cazaquistão",
    "Kenya": "Quênia",
    "Kiribati": "Kiribati",
    "Kosovo": "Kosovo",
    "Kuwait": "Kuwait",
    "Kyrgyzstan": "Quirguistão",
    "Laos": "Laos",
    "Latvia": "Letônia",
    "Lebanon": "Líbano",
    "Lesotho": "Lesoto",
    "Liberia": "Libéria",
    "Libya": "Líbia",
    "Liechtenstein": "Liechtenstein",
    "Lithuania": "Lituânia",
    "Luxembourg": "Luxemburgo",
    "Macao": "Macau",
    "Madagascar": "Madagáscar",
    "Malawi": "Malawi",
    "Malaysia": "Malásia",
    "Maldives": "Maldivas",
    "Mali": "Mali",
    "Malta": "Malta",
    "Marshall Islands": "Ilhas Marshall",
    "Martinique": "Martinica",
    "Mauritania": "Mauritânia",
    "Mauritius": "Maurício",
    "Mayotte": "Mayotte",
    "Mexico": "México",
    "Moldova": "Moldávia",
    "Mongolia": "Mongólia",
    "Montenegro": "Montenegro",
    "Montserrat": "Montserrat",
    "Morocco": "Marrocos",
    "Mozambique": "Moçambique",
    "Myanmar": "Mianmar",
    "Namibia": "Namíbia",
    "Nauru": "Nauru",
    "Nepal": "Nepal",
    "Netherlands": "Países Baixos",
    "Netherlands Antilles": "Antilhas Holandesas",
    "New Caledonia": "Nova Caledônia",
    "New Zealand": "Nova Zelândia",
    "Nicaragua": "Nicarágua",
    "Niger": "Níger",
    "Nigeria": "Nigéria",
    "Niue": "Niue",
    "North Korea": "Coreia do Norte",
    "North Macedonia": "Macedônia do Norte",
    "Norway": "Noruega",
    "Oman": "Omã",
    "Pakistan": "Paquistão",
    "Palau": "Palau",
    "Palestine": "Palestina",
    "Panama": "Panamá",
    "Papua New Guinea": "Papua-Nova Guiné",
    "Paraguay": "Paraguai",
    "Peru": "Peru",
    "Philippines": "Filipinas",
    "Poland": "Polônia",
    "Portugal": "Portugal",
    "Puerto Rico": "Porto Rico",
    "Qatar": "Catar",
    "Reunion": "Reunião",
    "Romania": "Romênia",
    "Russia": "Rússia",
    "Rwanda": "Ruanda",
    "Saint Helena": "Santa Helena",
    "Saint Kitts and Nevis": "São Cristóvão e Nevis",
    "Saint Lucia": "Santa Lúcia",
    "Saint Martin (French part)": "São Martinho (parte francesa)",
    "Saint Pierre and Miquelon": "São Pedro e Miquelão",
    "Saint Vincent and the Grenadines": "São Vicente e Granadinas",
    "Samoa": "Samoa",
    "Sao Tome and Principe": "São Tomé e Príncipe",
    "Saudi Arabia": "Arábia Saudita",
    "Senegal": "Senegal",
    "Serbia": "Sérvia",
    "Seychelles": "Seychelles",
    "Sierra Leone": "Serra Leoa",
    "Singapore": "Singapura",
    "Sint Maarten (Dutch part)": "Sint Maarten (parte holandesa)",
    "Slovakia": "Eslováquia",
    "Slovenia": "Eslovênia",
    "Solomon Islands": "Ilhas Salomão",
    "Somalia": "Somália",
    "South Africa": "África do Sul",
    "South Korea": "Coreia do Sul",
    "South Sudan": "Sudão do Sul",
    "Spain": "Espanha",
    "Sri Lanka": "Sri Lanka",
    "Sudan": "Sudão",
    "Suriname": "Suriname",
    "Svalbard and Jan Mayen": "Svalbard e Jan Mayen",
    "Sweden": "Suécia",
    "Switzerland": "Suíça",
    "Syria": "Síria",
    "Taiwan": "Taiwan",
    "Tajikistan": "Tajiquistão",
    "Tanzania": "Tanzânia",
    "Thailand": "Tailândia",
    "Togo": "Togo",
    "Tonga": "Tonga",
    "Trinidad and Tobago": "Trinidad e Tobago",
    "Tunisia": "Tunísia",
    "Turkey": "Turquia",
    "Turkmenistan": "Turcomenistão",
    "Turks and Caicos Islands": "Ilhas Turks e Caicos",
    "Tuvalu": "Tuvalu",
    "Uganda": "Uganda",
    "Ukraine": "Ucrânia",
    "United Arab Emirates": "Emirados Árabes Unidos",
    "United Kingdom": "Reino Unido",
    "United States": "Estados Unidos",
    "United States Virgin Islands": "Ilhas Virgens dos EUA",
    "Uruguay": "Uruguai",
    "Uzbekistan": "Uzbequistão",
    "Vanuatu": "Vanuatu",
    "Venezuela": "Venezuela",
    "Vietnam": "Vietnã",
    "Wallis and Futuna": "Wallis e Futuna",
    "Western Sahara": "Saara Ocidental",
    "Yemen": "Iêmen",
    "Zambia": "Zâmbia",
    "Zimbabwe": "Zimbábue",
}


In [None]:
emissoes_traduzidas['pais'].replace(traducoes, inplace=True)
emissoes_traduzidas

Unnamed: 0,pais,ano,codigo_iso,populacao,pib,cimento_co2,cimento_co2_per_capita,co2,crescimento_abs_co2,crescimento_pct_co2,...,participacao_global_co2_outros,participacao_mudanca_temperatura_por_gee,mudanca_temperatura_por_ch4,mudanca_temperatura_por_co2,mudanca_temperatura_por_gee,mudanca_temperatura_por_n2o,total_gee,total_gee_excluindo_mutf,co2_comercio,participacao_co2_comercio
0,Afeganistão,1850,AFG,3752993.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.000,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
1,Afeganistão,1851,AFG,3767956.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.165,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
2,Afeganistão,1852,AFG,3783940.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.164,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
3,Afeganistão,1853,AFG,3800954.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.164,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
4,Afeganistão,1854,AFG,3818038.0,0.000000e+00,0.000,0.000,0.000,0.000,0.000,...,0.0,0.163,0.000,0.000,0.000,0.0,0.00,0.00,0.000,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50593,Zimbábue,2017,ZWE,14751101.0,2.194784e+10,0.469,0.032,9.596,-0.937,-8.899,...,0.0,0.114,0.001,0.001,0.002,0.0,115.59,28.30,0.910,9.486
50594,Zimbábue,2018,ZWE,15052191.0,2.271535e+10,0.558,0.037,11.795,2.199,22.920,...,0.0,0.114,0.001,0.001,0.002,0.0,118.22,30.83,0.771,6.537
50595,Zimbábue,2019,ZWE,15354606.0,0.000000e+00,0.570,0.037,11.115,-0.681,-5.772,...,0.0,0.113,0.001,0.001,0.002,0.0,117.96,30.53,0.978,8.795
50596,Zimbábue,2020,ZWE,15669663.0,0.000000e+00,0.570,0.036,10.608,-0.507,-4.559,...,0.0,0.112,0.001,0.001,0.002,0.0,0.00,0.00,1.006,9.481


# 3. Analisando os Dados

Abaixo utilizaremos o método describe, para gerar um novo DataFrame com as principais medidas estatísticas dos nossos dados

In [None]:
descricoes = pd.DataFrame(emissoes_filtradas['population'].describe().round(2))
descricoes['PIB'] = emissoes_filtradas['gdp'].describe().round(2)
descricoes['CO2'] = emissoes_filtradas['co2'].describe().round(2)
descricoes['Crescimento Anual CO2'] = emissoes_filtradas['co2_growth_abs'].describe().round(2)
descricoes['Crescimento Anual CO2 em %'] = emissoes_filtradas['co2_growth_prct'].describe().round(2)
descricoes['CO2 per Capita'] = emissoes_filtradas['co2_per_capita'].describe().round(2)
descricoes['CO2 por PIB'] = emissoes_filtradas['co2_per_gdp'].describe().round(2)
descricoes['CO2 por KWh'] = emissoes_filtradas['co2_per_unit_energy'].describe().round(2)
descricoes['Participacao Global de CO2'] = emissoes_filtradas['share_global_co2'].describe().round(2)
descricoes['Mudanca na Temperatura por Emissoes de GEEs'] = emissoes_filtradas['temperature_change_from_ghg'].describe().round(2)
descricoes['Emissao Total de GEEs'] = emissoes_filtradas['total_ghg'].describe().round(2)
descricoes.rename(columns={'population': 'populacao'}, inplace=True)
descricoes

Unnamed: 0,populacao,PIB,CO2,Crescimento Anual CO2,Crescimento Anual CO2 em %,CO2 per Capita,CO2 por PIB,CO2 por KWh,Participacao Global de CO2,Mudanca na Temperatura por Emissoes de GEEs,Emissao Total de GEEs
count,41970.0,41970.0,41970.0,41970.0,41970.0,41970.0,41970.0,41970.0,41970.0,41970.0,41970.0
mean,12794980.0,78179580000.0,40.36,0.86,11.25,2.12,0.12,0.05,0.64,0.0,27.68
std,65898580.0,562819400000.0,309.28,16.4,534.42,12.07,0.34,0.13,5.19,0.01,287.7
min,0.0,0.0,0.0,-543.45,-100.0,0.0,0.0,0.0,0.0,-0.0,-186.55
25%,80619.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1348150.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,5743231.0,8854880000.0,3.62,0.04,4.13,1.2,0.11,0.0,0.04,0.0,0.0
max,1425894000.0,18151620000000.0,11472.37,911.9,102318.51,824.46,37.61,10.69,100.0,0.28,12055.41


## Entendo a Correlação entre o PIB e as Emissões
Vamos calcular a correlação entre as duas variáveis de estudo (PIB e Emissão de GEEs), para isso vamos utilizar o método .corr() do Pandas, que nos dá um valor entre -1 e 1. Mostrando o quanto uma determinada variável influência positivamente ou negativamente outra.

In [None]:
correlacao = emissoes_traduzidas['pib'].corr(emissoes_traduzidas['total_gee'])
correlacao_porcentagem = correlacao * 100
print(f"A Correlação entre o PIB dos países e o Total de Emissão de Gases do Efeito Estufa nesses países é de: {correlacao_porcentagem:.2f}%")

A Correlação entre o PIB dos países e o Total de Emissão de Gases do Efeito Estufa nesses países é de: 84.06%


Para visualizar melhor como está funcionando essa correlação, vamos utilizar uma análise gráfica

In [None]:
fig = px.scatter(emissoes_traduzidas, x='pib', y='total_gee', trendline='ols', trendline_color_override='darkblue', opacity=0.65)

fig.update_layout(
    title="Correlação entre PIB e Total de Emissões de GEE",
    xaxis_title="PIB (em Trilhões de U$D)",
    yaxis=dict(
        title_text="Total de Emissões de GEE<br> (em milhões de toneladas de equivalentes a CO²)",
        title_standoff=5
    ))

fig.show()

No entanto, parto da hipótese de que países com maiores populações, naturalmente terão um PIB e Emissões de Gases maiores, para comprovar essa hipótese vamos estudar também a correlação da população para esses dois indicadores.

Primeiro, a população e as emissões

In [None]:
correlacao = emissoes_traduzidas['populacao'].corr(emissoes_traduzidas['total_gee'])
correlacao_porcentagem = correlacao * 100
print(f"A Correlação entre a População dos países e o Total de Emissão de Gases do Efeito Estufa nesses países é de: {correlacao_porcentagem:.2f}%")

A Correlação entre a População dos países e o Total de Emissão de Gases do Efeito Estufa nesses países é de: 56.48%


Agora, a população e o PIB

In [None]:
correlacao = emissoes_traduzidas['populacao'].corr(emissoes_traduzidas['pib'])
correlacao_porcentagem = correlacao * 100
print(f"A Correlação entre a População dos países e o PIB nesses países é de: {correlacao_porcentagem:.2f}%")

A Correlação entre a População dos países e o PIB nesses países é de: 55.00%


Nesse caso, minha hipótese se provou verdadeira e a população tem correlação grande tanto com o PIB como com as Emissões, o que na estatística torna ela conhecida como uma **variável de confusão**.

Ou seja, é uma variável que influencia tanto a variável dependente, quanto a variável independente, podendo atrapalhar o entendimento da verdadeira correlação das duas.

###Resolvendo a Confusão, Fazendo Análise com PIB e Emissões per Capita

Para resolver a questão da variável população influenciando no resultado, vamos utilizar a variável PIB per Capita e Emissões per Capita. A variável de Emissões per Capita já existe, a de PIB teremos que criar a partir das variáveis existentes.

Criando a variável PIB per Capita

In [None]:
emissoes_traduzidas['pib_per_capita'] = emissoes_traduzidas['pib'] / emissoes_traduzidas['populacao']

Agora podemos refazer as análises de correlação acima

In [None]:
# Criando um df temporário para ignorar os valores 0 de emissão e de pib
filtered_df = emissoes_traduzidas[(emissoes_traduzidas['pib_per_capita'] != 0) & (emissoes_traduzidas['gee_per_capita'] != 0)]

# Calculando a correlação
correlacao = filtered_df['pib_per_capita'].corr(filtered_df['gee_per_capita'])
correlacao_porcentagem = correlacao * 100
print(f"A Correlação entre o PIB per Capita dos países e a Emissão per Capita de Gases do Efeito Estufa nesses países é de: {correlacao_porcentagem:.2f}%")

A Correlação entre o PIB per Capita dos países e a Emissão per Capita de Gases do Efeito Estufa nesses países é de: 59.91%


In [None]:
fig = px.scatter(filtered_df, x='pib_per_capita', y='gee_per_capita', trendline='ols', trendline_color_override='darkblue', opacity=0.65)

fig.update_layout(
    title="Correlação entre PIB per capita e Emissões per capita de GEE",
    xaxis_title="PIB per Capita (em Trilhões de U$D)",
    yaxis=dict(
        title_text="Emissões per Capita de GEE<br> (em milhões de toneladas de equivalentes a CO²)",
        title_standoff=5
    ))

fig.show()

Vemos que com a análise de correlação ajustada para os valores per Capita, ainda temos uma tendência forte de correlação, 59.9%, mas consideravelmente menos correlata e linear que com os dados totais, que chegava a 84%.

##Exportando os Dados e Continuando a Análise no Tableau
Para melhor visualizar e entender a individualidade de cada país e continente, bem como ter uma visão total, decidi por exportar os nossos dados e criar um dashboard no Tableau.

Ele está disponível aqui: https://public.tableau.com/app/profile/gustavo.baida/viz/EntendendoaRelaoEmissesvsPIB/Dashboard1




In [None]:
filtered_df.to_excel('/content/drive/MyDrive/Colab Notebooks/Análise Emissões CO2 por País/dataset_emissoes_tratado.xlsx', index=False)