<a href="https://colab.research.google.com/github/gabrielborja/python_data_analysis/blob/main/sustainability_analytics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Global Climate Change Analysis

Global Climate Change Data from 1750－2015 can be found [here](https://data.world/data-society/global-climate-change-data)

## Uploading packages and data

In [1]:
#Importing necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [None]:
#Remove previous versions of the uploaded file
!rm GlobalLandTemperaturesByCountry.csv

In [2]:
#Uploading file from local drive
from google.colab import files
uploaded1 = files.upload()

Saving GlobalLandTemperaturesByCountry.csv to GlobalLandTemperaturesByCountry.csv


In [11]:
#Storing dataset in a Pandas Dataframe
import io
df1_co = pd.read_csv(io.BytesIO(uploaded1['GlobalLandTemperaturesByCountry.csv']))

In [12]:
#Checking the dataframe information
df1_co.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 577462 entries, 0 to 577461
Data columns (total 4 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   dt                             577462 non-null  object 
 1   AverageTemperature             544811 non-null  float64
 2   AverageTemperatureUncertainty  545550 non-null  float64
 3   Country                        577462 non-null  object 
dtypes: float64(2), object(2)
memory usage: 17.6+ MB


##Data Cleaning

In [13]:
#Checking for missing values the in dataframe
df1_co.isna().sum()

dt                                   0
AverageTemperature               32651
AverageTemperatureUncertainty    31912
Country                              0
dtype: int64

In [14]:
#Removing missing values
df1_co.dropna(axis=0, how='any', subset=['AverageTemperature'], inplace=True, )
df1_co.isna().sum()

dt                               0
AverageTemperature               0
AverageTemperatureUncertainty    0
Country                          0
dtype: int64

In [15]:
#Parse date column to datetime object and reset index
df1_co['dt'] = pd.to_datetime(df1_co['dt'], format='%Y-%m-%d %H:%M:%S', errors='coerce')
df1_co.reset_index(drop=True, inplace=True)
df1_co.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
0,1743-11-01,4.384,2.294,Åland
1,1744-04-01,1.53,4.68,Åland
2,1744-05-01,6.702,1.789,Åland
3,1744-06-01,11.609,1.577,Åland
4,1744-07-01,15.342,1.41,Åland


In [16]:
#Checking the number of unique countries
df1_co['Country'].nunique()

242

In [18]:
#Checking if there are duplicate values
df1_co['Country'].unique()

array(['Åland', 'Afghanistan', 'Africa', 'Albania', 'Algeria',
       'American Samoa', 'Andorra', 'Angola', 'Anguilla',
       'Antigua And Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Asia',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Baker Island', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bhutan', 'Bolivia',
       'Bonaire, Saint Eustatius And Saba', 'Bosnia And Herzegovina',
       'Botswana', 'Brazil', 'British Virgin Islands', 'Bulgaria',
       'Burkina Faso', 'Burma', 'Burundi', "Côte D'Ivoire", 'Cambodia',
       'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands',
       'Central African Republic', 'Chad', 'Chile', 'China',
       'Christmas Island', 'Colombia', 'Comoros',
       'Congo (Democratic Republic Of The)', 'Congo', 'Costa Rica',
       'Croatia', 'Cuba', 'Curaçao', 'Cyprus', 'Czech Republic',
       'Denmark (Europe)', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt'

In [19]:
#Replacing duplicated values in Country names
countries_dict = {'Congo (Democratic Republic Of The)': 'Congo', 'Denmark (Europe)': 'Denmark', 'France (Europe)': 'France',
                  'Netherlands (Europe)': 'Netherlands', 'United Kingdom (Europe)': 'United Kingdom'}

df1_co['Country'] = df1_co['Country'].replace(to_replace=countries_dict)

In [20]:
#Exporting to excel in local disk
from google.colab import files
df1_co.to_excel('global_land_temp_by_country.xlsx', index=False) #==> Excluding index from file
files.download('global_land_temp_by_country.xlsx')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

##Data Manipulation