# **Overview**
Earth's climate is changing. The more greenhouse gases we emit, the larger future climate changes will be. ... Human activities, like driving, manufacturing, electricity generation, and the clearing of forests contribute to greenhouse gas emissions and warm the planet.According to NOAA's 2020 Annual Climate Report the combined land and ocean temperature has increased at an average rate of 0.08 degrees Celsius per decade since 1880. However, the average rate of increase since 1981 (0.18°C / 0.32°F) has been more than twice that rate.

Let try to study the changes in the environment factor over the years  in the dataset provided. It corresponds to Finland, a country in the Northern Europe. Tt is a 10 years data ranging from the year 2006 to 2016.

## **Problem Statement**
We feel, if we closely study the Weather Data, we should be able to identify patterns and identify correlating factors on key levels of climate change across the country.
>
**The Null Hypothesis H0 is "Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming"**


#### **Tools used:**
* numpy
* pandas 
* matplotlib
* seaborn

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
sns.set()

### **Data Preperation**

Lets first import the dataet from Kaggle to this notebook.

In [None]:
df = pd.read_csv('/kaggle/input/weather-dataset/weatherHistory.csv')
df.head(10)

In [None]:
print(df.info())

* There are 12 attributes , most are of datatype float.
* Some attributes contains null values

In [None]:
print(df.isnull().sum())

* Precipt type contain null values which will affect further analysis

Let keep the null values as neutral (in this case neither).

In [None]:
df['Precip Type'] = df['Precip Type'].map({'rain':'rain', 'snow':'snow',np.nan: 'Neither'})
print(df['Precip Type'].unique())

Lets get a brief description of each of these attribues

In [None]:
print(df.describe())

* Minimum values of Humidity, Wind Speed (km/h), Wind Bearing (degrees), Visibility (km) are Zero and they can be Zero.

In [None]:
#Rounding the values
df['Temperature (C)'] = round(df['Temperature (C)'],1)
df['Apparent Temperature (C)'] = round(df['Apparent Temperature (C)'],1)
df['Humidity'] = round(df['Humidity'],1)
df['Wind Speed (km/h)'] = round(df['Wind Speed (km/h)'],1)
df['Wind Bearing (degrees)'] = round(df['Wind Bearing (degrees)'],1)
df['Visibility (km)'] = round(df['Visibility (km)'],1)
df['Pressure (millibars)'] = round(df['Pressure (millibars)'],1)

Storing the data in new DataFrame and dropping column 'Loud Cover'

In [None]:
df1 = df.copy()
df1 = df1.drop(['Loud Cover'],axis=1)
#df1 = df1.dropna()
df1.info()

Defragmenting the Date column to  to year, month, day, hour columns

In [None]:
df1['date'] = pd.to_datetime(df1['Formatted Date'],utc=True)
df1['year'] = pd.DatetimeIndex(df1['date']).year
df1['month'] = pd.DatetimeIndex(df1['date']).month
df1['day'] = pd.DatetimeIndex(df1['date']).day
df1['hour'] = pd.DatetimeIndex(df1['date']).hour

In [None]:
df1 = df1.drop(['Formatted Date'],axis=1)
df1 = df1.drop(['Daily Summary'],axis=1)

In [None]:
df2 = df1.copy()

Building a new DataFrame for monthly weather data throughout the year

In [None]:
mavgtemp = []
mavgatemp = []
mavghum = []
mavgvis = []
mavgws = []
mavgpre = []
year = []
month = []
for i in range(2006,2017):
  yeari = df2[df2['year'] == i]
  for j in range(1,13):
    year.append(i)
    month.append(j)
    mavgtemp.append(yeari[yeari['month']==j]['Temperature (C)'].mean())
    mavgatemp.append(yeari[yeari['month']==j]['Apparent Temperature (C)'].mean())
    mavghum.append(yeari[yeari['month']==j]['Humidity'].mean())
    mavgvis.append(yeari[yeari['month']==j]['Wind Speed (km/h)'].mean())
    mavgws.append(yeari[yeari['month']==j]['Visibility (km)'].mean())
    mavgpre.append(yeari[yeari['month']==j]['Pressure (millibars)'].mean())
    
monthlydf = pd.DataFrame(year,columns=['Year'])
monthlydf['Month'] = month
monthlydf['Temperature'] = mavgtemp
monthlydf['App Temperature'] = mavgatemp
monthlydf['Visibility'] = mavgvis
monthlydf['Wind Speed'] = mavgws
monthlydf['Humidity'] = mavghum
monthlydf['Pressure'] = mavgpre

monthlydf['Month'] = round(monthlydf['Month'],1)
monthlydf['Temperature'] = round(monthlydf['Temperature'],1)
monthlydf['App Temperature'] = round(monthlydf['App Temperature'],1)
monthlydf['Visibility'] = round(monthlydf['Visibility'],1)
monthlydf['Wind Speed'] = round(monthlydf['Wind Speed'],1)
monthlydf['Humidity'] = round(monthlydf['Humidity'],1)
monthlydf['Pressure'] = round(monthlydf['Pressure'],1)


Building a new DataFrame for yearly weather data

In [None]:
avgtemp = []
avgatemp = []
avghum = []
avgvis = []
avgws = []
avgpre = []
year = []
for i in range(2006,2017):
  yeari = df2[df2['year'] == i]
  year.append(i)
  month.append(j)
  avgtemp.append(yeari['Temperature (C)'].mean())
  avgatemp.append(yeari['Apparent Temperature (C)'].mean())
  avghum.append(yeari['Humidity'].mean())
  avgvis.append(yeari['Wind Speed (km/h)'].mean())
  avgws.append(yeari['Visibility (km)'].mean())
  avgpre.append(yeari['Pressure (millibars)'].mean())

yearlydf = pd.DataFrame(year,columns=['Year'])
yearlydf['Temperature'] = avgtemp
yearlydf['App Temperature'] = avgatemp
yearlydf['Visibility'] = avgvis
yearlydf['Wind Speed'] = avgws
yearlydf['Humidity'] = avghum
yearlydf['Pressure'] = avgpre

yearlydf['Temperature'] = round(yearlydf['Temperature'],1)
yearlydf['App Temperature'] = round(yearlydf['App Temperature'],1)
yearlydf['Visibility'] = round(yearlydf['Visibility'],1)
yearlydf['Wind Speed'] = round(yearlydf['Wind Speed'],1)
yearlydf['Humidity'] = round(yearlydf['Humidity'],1)
yearlydf['Pressure'] = round(yearlydf['Pressure'],1)


In [None]:
monthlydf.head()

In [None]:
yearlydf.head()

### **Data Visualization**

In [None]:
sns.set_style("whitegrid")
plt.figure(figsize=(70,20))
sns.countplot(df2['Summary'],palette='Set2')
plt.title("Weather Count",fontsize=50)

Observation:
* Cloudy/Rain or CLear are the most frequent weather condition in Finland

In [None]:
sns.relplot(df2['month'],df2['Temperature (C)'],hue=df2['Precip Type'],palette='inferno_r')
plt.title("Temperature vs Month",fontsize=20)

Observation:
* Rain and snow dominant the most part of the year 
* It snow when the temperature is below -1 degree celcius

In [None]:
fig,axes = plt.subplots(2,2)
fig.set_figheight(10)
fig.set_figwidth(10)
fig.suptitle("Parameter during the day",fontsize=18)
axes[0,0].set_title("Temperature vs Hour",fontsize=13)
axes[0,1].set_title("Visibility vs Hour",fontsize=13)
axes[1,0].set_title("Humidity vs Hour",fontsize=13)
axes[1,1].set_title("Wind Speed vs Hour",fontsize=13)
sns.lineplot(ax=axes[0,0],x=df2['hour'],y=df2['Temperature (C)'],color='aqua')
sns.lineplot(ax=axes[0,1],x=df2['hour'],y=df2['Visibility (km)'],color='#190B45')
sns.lineplot(ax=axes[1,0],x=df2['hour'],y=df2['Humidity'],color='#FF5864')
sns.lineplot(ax=axes[1,1],x=df2['hour'],y=df2['Wind Speed (km/h)'],color='#6051AE')

Observation
* Temperature stays above 15 degree Celcius from 10AM to 4PM 
* It is coolest in the realy mornings (before 6AM) and visibility is lowest during that time of the day
* Humidity decreses gradually in the afternoon
* Wind speed stays strong during the afternoons

In [None]:
sns.relplot(df['Temperature (C)'],df['Apparent Temperature (C)'],palette='Set1',hue=df['Summary'])
plt.title("Apparent Temperature vs Temperature",fontsize=15)

Observation:
* Regardless of the weather Apparent Temperature and Temperature have linear relation

In [None]:
x = round(df['Humidity'],1)
sns.countplot(x,hue=df['Precip Type'],palette='Set2')
plt.title("Humidity Count during Rain/Snow thorugh the day",fontsize=15)

Observation:
* Humidity is mostly in the range of 70%-90% ,rarely below 40%
* It doesn't snow in Finland when humidity is below 60%.

In [None]:
fig,axs = plt.subplots(1,2)
fig.set_figheight(5)
fig.set_figwidth(15)
axs[0].set_title("Temperature vs Year",fontsize=20)
axs[1].set_title("Humidity vs Year",fontsize=20)
sns.lineplot(ax=axs[0],x=yearlydf['Year'],y=yearlydf['Temperature'],palette='gnuplot',color='#0245A3',lw=3)
sns.barplot(ax=axs[1],x=yearlydf['Year'],y=yearlydf['Humidity'],color='#0245A3')

Observation:
* If we compare the average temperature in 2006 and 2016, there is rise in temparaure
* When we look are the sequence of each year, average temperature is has increased and decrease inconsistently. It was't only a increasing trend.
* Average Humidity has been between 70%-80%, mostly being 70%.

In [None]:
fig,axes = plt.subplots(3,4)
fig.set_figheight(40)
fig.set_figwidth(40)
fig.suptitle("Humidity vs Year (For every month)",fontsize=40)
axes[0,0].set_title("January",fontsize=22)
axes[0,1].set_title("February",fontsize=22)
axes[0,2].set_title("March",fontsize=22)
axes[0,3].set_title("April",fontsize=22)
axes[1,0].set_title("May",fontsize=22)
axes[1,1].set_title("June",fontsize=22)
axes[1,2].set_title("July",fontsize=22)
axes[1,3].set_title("August",fontsize=22)
axes[2,0].set_title("September",fontsize=22)
axes[2,1].set_title("October",fontsize=22)
axes[2,2].set_title("November",fontsize=22)
axes[2,3].set_title("December",fontsize=22)
sns.barplot(ax=axes[0,0],x=monthlydf[monthlydf['Month']==1]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[0,1],x=monthlydf[monthlydf['Month']==2]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[0,2],x=monthlydf[monthlydf['Month']==3]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[0,3],x=monthlydf[monthlydf['Month']==4]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[1,0],x=monthlydf[monthlydf['Month']==5]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[1,1],x=monthlydf[monthlydf['Month']==6]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[1,2],x=monthlydf[monthlydf['Month']==7]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[1,3],x=monthlydf[monthlydf['Month']==8]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[2,0],x=monthlydf[monthlydf['Month']==9]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[2,1],x=monthlydf[monthlydf['Month']==10]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[2,2],x=monthlydf[monthlydf['Month']==11]['Year'],y=monthlydf['Humidity'],color='#88DFF0')
sns.barplot(ax=axes[2,3],x=monthlydf[monthlydf['Month']==12]['Year'],y=monthlydf['Humidity'],color='#88DFF0')

Observation
* Average every month throughout the every year has been 60%-70%
* In the month November ,December ,January ,February it has touched 90%

In [None]:
fig,axes = plt.subplots(3,4)
fig.set_figheight(30)
fig.set_figwidth(30)
fig.suptitle("Temperature vs Year (For every month)",fontsize=30)
axes[0,0].set_title("January",fontsize=22)
axes[0,1].set_title("February",fontsize=22)
axes[0,2].set_title("March",fontsize=22)
axes[0,3].set_title("April",fontsize=22)
axes[1,0].set_title("May",fontsize=22)
axes[1,1].set_title("June",fontsize=22)
axes[1,2].set_title("July",fontsize=22)
axes[1,3].set_title("August",fontsize=22)
axes[2,0].set_title("September",fontsize=22)
axes[2,1].set_title("October",fontsize=22)
axes[2,2].set_title("November",fontsize=22)
axes[2,3].set_title("December",fontsize=22)
sns.lineplot(ax=axes[0,0],x=monthlydf[monthlydf['Month']==1]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[0,1],x=monthlydf[monthlydf['Month']==2]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[0,2],x=monthlydf[monthlydf['Month']==3]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[0,3],x=monthlydf[monthlydf['Month']==4]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[1,0],x=monthlydf[monthlydf['Month']==5]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[1,1],x=monthlydf[monthlydf['Month']==6]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[1,2],x=monthlydf[monthlydf['Month']==7]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[1,3],x=monthlydf[monthlydf['Month']==8]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[2,0],x=monthlydf[monthlydf['Month']==9]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[2,1],x=monthlydf[monthlydf['Month']==10]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[2,2],x=monthlydf[monthlydf['Month']==11]['Year'],y=monthlydf['App Temperature'],color='#1A3263')
sns.lineplot(ax=axes[2,3],x=monthlydf[monthlydf['Month']==12]['Year'],y=monthlydf['App Temperature'],color='#1A3263')

Observation
* Again temperature has been change inconsistently through the year. with specific consistent trend.
* Deviation has been most in the month of Sept,Nov,Oct,Dec (Winter/Spring Season).

In [None]:
sns.FacetGrid(df2, hue="Summary", height=10,palette='Set1').map(sns.distplot, "Humidity").add_legend()
plt.show()

In [None]:
first5yeartemp = yearlydf[yearlydf['Year'] < 2011]['Temperature'].mean()
deltatemp = yearlydf[yearlydf['Year'] == 2016]['Temperature'] - first5yeartemp
print("Temperature in 2016 : " +str(yearlydf[yearlydf['Year'] == 2016]['Temperature']) + " C")
print("Average Temperature from 2006-2010 : " + str(first5yeartemp) + " C")
print("Temperature change : "+str(deltatemp) + " C")
first5yearhum = yearlydf[yearlydf['Year'] < 2011]['Humidity'].mean()
deltahum = yearlydf[yearlydf['Year'] == 2016]['Humidity'] - first5yearhum
print("Humidity in 2016 : " +str(yearlydf[yearlydf['Year'] == 2016]['Humidity']) + " C")
print("Average Humidity from 2006-2010 : " + str(first5yearhum)+ " C")
print("Humidity change : "+str(deltahum)+ " C")
first5yeartempap = yearlydf[yearlydf['Year'] < 2011]['App Temperature'].mean()
deltatempap = yearlydf[yearlydf['Year'] == 2016]['App Temperature'] - first5yeartemp
print("Apparent Temperature in 2016 : " +str(yearlydf[yearlydf['Year'] == 2016]['App Temperature'])+ " C")
print("Average Apparent Temperature from 2006-2010 : " + str(first5yeartempap)+ " C")
print("Apparent Temperature change : "+str(deltatempap)+ " C")

Given:

The Null Hypothesis H0 is "Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming".

The Alternative Hypothesis H1 is "Has the Apparent temperature and humidity compared monthly across 10 years of the data not indicate an increase due to Global warming".

# **Conclusion:**
By analysing the temperature changes, it is evident that there is no continuous rise or fall and even humidity has has been nearly constant throughtout 2006-2016
**Therefore we can conclude that H0 is not accepted, so we will accept the H1.**