# Challenge - Global warming data analysis

![global warming](https://th.bing.com/th/id/R.59a9ab47b7b9a1e50fb75b124c9b3c9f?rik=qBHpxbnGeuMNUA&pid=ImgRaw&r=0)

### Description

It's now time to look at our first dataset, including visualisation and data cleaning. 

In this case study, we will analyze the global land temperature data in the countries with a goal to find any underlying relationships between the change in temperature and the geographical location.

In addition, we should analyze the dataset as a whole: extracting statistical parameters, preprocessing the data and doing a bit of visualisation.

### Data

For this task, we will be using the [Climate Change: Earth Surface Temperature dataset](https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data?select=GlobalLandTemperaturesByCountry.csv) which combines 1.6 billion temperature reports from 16 pre-existing archives (starting from 1750s). 


### Tasks

1. Preprocess and statistically describe the data
2. Find and visualize the 20 countries with the highest mean temperature
3. Which countries had the largest change in temperature?
4. What is the overall tendency?

# Import data

In [None]:




# Load dataset into Pandas DataFrame
import pandas as pd
import kaggle



In [None]:
# Download the dataset from Kaggle
!kaggle datasets download -d berkeleyearth/climate-change-earth-surface-temperature-data


In [None]:

# Unzip the downloaded dataset
!unzip climate-change-earth-surface-temperature-data.zip

In [None]:
# Load the dataset into a Pandas DataFrame
df = pd.read_csv('GlobalLandTemperaturesByCountry.csv')

# Display the first few rows of the dataset
df.head()

### Reading and describing data

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [None]:
#---------------------------------TODO------------------------------------#
#Reading the data from a path
path = r"C:/Users/marty/Desktop/AI SOC - Tutorials 21_22/Week 1/GlobalLandTemperaturesByCountry.csv"
data = pd.read_csv(path)

#Have a look of what the data is comprised of
data.head()

In [None]:
#Brief information of the dataset
data.info()

In [None]:
#Find the number of countries
num_countries = len(data['Country'].unique())

### Preprocessing

As it can already be seen from the first glances at data, it contains a lot of NaN values which cannot be used for further analysis. Therefore, we need to remove these values from our dataset.

In [None]:
# Check the number of missing values
print(data.isna().sum())

In [None]:
# Function that deals with missing values
data.dropna(inplace = True)

In [None]:
#Check if everything went correctly
print(data.isna().sum())

### Visualizing data

Prior to completing the intended analysis, it also might be useful to explore the data in further.

In [None]:
# Find and visualize the 20 countries with the highest mean temperature
temperature = data.groupby(['Country']).mean()
hottest = temperature.sort_values(['AverageTemperature'], ascending = False)[:20]

In [None]:
#Plotting bar graph
x = hottest.index
y = hottest.values[:, 0]
plt.yticks(np.arange(0, 35, 2))
plt.xticks(rotation = 90, fontsize = 8)
plt.axhline(y=temperature["AverageTemperature"].mean() , color="r" , linestyle="--")
plt.xlabel('Country', fontsize = 12)
plt.ylabel('Mean temperature', fontsize = 12)
plt.title('Countries with the highest average temperature')
plt.bar(x, y)

### Largest Temperature Change

After performing the preprocessing and general data characterization, we can now find the countries that underwent the largest temperature change. There are numerous ways in which such task can be completed - the guided way is only a suggestion.

In [None]:
# Extracting the names of countries from the dataset
countries = data['Country'].unique()

In [None]:
# Extracting the change in temperature using the loop
Temp_Data = {}

for country in countries:
    #for each country find the 'initial' and last temperature values
    temp = data[data['Country'] == country]['AverageTemperature']
    a, b = temp.index[0], temp.index[-1]
    
    Temp_Data[country] = round((temp[b] - temp[a]), 2)

In [None]:
# Converting to dataframe and sorting in descending order
temp_change = pd.DataFrame(Temp_Data, index = [0]).T
highest_change = temp_change.sort_values([0], ascending = False)[:20]

In [None]:
# Generating bar plot
y = highest_change.values[:, 0]
x = highest_change.index
plt.bar(x, y)
plt.xticks(rotation = 90)
plt.xlabel('Country', fontsize = 12)
plt.ylabel('Temperature Change', fontsize = 12)
plt.title('Countries with the largest temperature change', fontsize = 13)
plt.show()

### Overall Tendencies

There are a few ways in which we could determine the overall tendencies in the global surface temperature - we could analyze the temperature change per country or we could observe the change in the average global temperature. Let's look at both cases.

In [None]:
# Plot the bar graph for all countries

highest_change = temp_change.sort_values([0], ascending = False)
y = highest_change.values[:, 0]
x = highest_change.index
plt.bar(x, y)
plt.xticks(visible = False)
plt.ylabel('Temperature Change', fontsize = 12)
plt.title('Temperature Change Distribution', fontsize = 13)
plt.show()

In [None]:
# Extract the average world temperature throughout the years
average_world_temp = data.groupby(['dt']).mean()

#Plot the extracted data
y = np.array(average_world_temp['AverageTemperature'])
plt.ylabel('Average Global Temperature')
plt.xticks(visible = False)
plt.title('Average Global Temperature Historical Data')
plt.plot(y)