## Desenvolvimento Sustentável - Uma análise com dados

Neste projeto, o meu objetivo é colocar em prática o que aprendi na formação da Alura "Python para Data Science". Com isso, desejo fazer uma análise exploratória de dados (EDA) com um dataset obtido no Kaggle sobre os Objetivos de Desenvolvimento Sustentáveis (ODS).

Importando a base de dados:

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('dados/sdg_index_2000-2022.csv')

Ver as primeiras linhas do DataFrame:

In [3]:
df.head()

Unnamed: 0,country_code,country,year,sdg_index_score,goal_1_score,goal_2_score,goal_3_score,goal_4_score,goal_5_score,goal_6_score,...,goal_8_score,goal_9_score,goal_10_score,goal_11_score,goal_12_score,goal_13_score,goal_14_score,goal_15_score,goal_16_score,goal_17_score
0,AFG,Afghanistan,2000,36.0,28.8,27.3,19.2,1.6,20.8,32.4,...,38.5,5.2,0.0,25.8,94.7,99.4,0.0,51.9,39.2,34.2
1,AFG,Afghanistan,2001,36.3,28.8,30.6,19.4,1.6,20.8,32.4,...,38.5,5.2,0.0,25.8,94.5,99.4,0.0,51.9,39.2,34.2
2,AFG,Afghanistan,2002,36.3,28.8,30.7,19.7,1.6,20.8,32.7,...,38.4,5.2,0.0,26.1,94.1,99.4,0.0,51.8,39.2,34.2
3,AFG,Afghanistan,2003,36.7,28.8,32.5,19.9,1.6,20.8,33.0,...,38.4,5.2,0.0,26.5,94.4,99.4,0.0,51.8,39.2,34.2
4,AFG,Afghanistan,2004,37.1,28.8,32.1,21.1,1.6,20.8,33.3,...,38.5,5.2,0.0,26.8,94.8,99.4,0.0,51.8,39.2,34.2


Extrair informações sobre valores não nulos, nome das colunas e seus tipos:

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4140 entries, 0 to 4139
Data columns (total 21 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   country_code     4140 non-null   object 
 1   country          4140 non-null   object 
 2   year             4140 non-null   int64  
 3   sdg_index_score  4140 non-null   float64
 4   goal_1_score     4140 non-null   float64
 5   goal_2_score     4140 non-null   float64
 6   goal_3_score     4140 non-null   float64
 7   goal_4_score     4140 non-null   float64
 8   goal_5_score     4140 non-null   float64
 9   goal_6_score     4140 non-null   float64
 10  goal_7_score     4140 non-null   float64
 11  goal_8_score     4140 non-null   float64
 12  goal_9_score     4140 non-null   float64
 13  goal_10_score    4140 non-null   float64
 14  goal_11_score    4140 non-null   float64
 15  goal_12_score    4140 non-null   float64
 16  goal_13_score    4140 non-null   float64
 17  goal_14_score 

Conferindo de fato se não temos valores nulos:

In [5]:
df.isna().sum().sum()

0

Conferindo se há valores duplicados:

In [6]:
df.duplicated().sum()

0

Checando os países presentes em nossas amostras:

In [7]:
df['country'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas, The',
       'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'Brunei Darussalam', 'Bulgaria',
       'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus',
       'Czechia', 'Denmark', 'Djibouti', 'Dominican Republic',
       'East and South Asia', 'Eastern Europe and Central Asia',
       'Ecuador', 'Egypt, Arab Rep.', 'El Salvador', 'Estonia',
       'Eswatini', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon',
       'Gambia, The', 'Georgia', 'Germany', 'Ghana', 'Greece',
       'Guatemala', 'Guinea', 'Guyana', 'Haiti', 'High-income

Selecionando as amostras que queremos remover do DataFrame:

In [14]:
amostras_del = (df['country'] == "East and South Asia") | (df['country'] == "Eastern Europe and Central Asia") | \
(df['country'] == "High-income Countries") | (df['country'] == "Latin America and the Caribbean") | \
(df['country'] == "Lower & Lower-middle Income") | (df['country'] == "Lower-middle-income Countries") | \
(df['country'] == "Low-income Countries") | (df['country'] == "Middle East and North Africa") | \
(df['country'] == "Oceania") | (df['country'] == "OECD members") | (df['country'] == "Small Island Developing States") | \
(df['country'] == "Sub-Saharan Africa") | (df['country'] == "Upper-middle-income Countries") | \
(df['country'] == "World")

Filtrando o nosso dataset somente com as amostras desejadas:

In [15]:
df = df.loc[~amostras_del]