## Performing Analysis Of Meteorological Data

**Task** :-  we need to find whether the average Apparent temperature for the
month of a month say April starting from 2006 to 2016 and the average humidity for
the same period have increased or not.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
%matplotlib inline
import matplotlib.pyplot as plt 

In [None]:
path = '../input/weather-dataset/weatherHistory.csv'
df=pd.read_csv(path)

In [None]:
df.sample(5)

This dataset provides historical data on many meteorological parameters such as pressure, temperature, humidity, wind_speed, visibility, etc. The dataset has hourly temperature recorded for last 10 years starting from 2006-04-01 00:00:00.000 +0200 to 2016-09-09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe.

In [None]:
df.shape

In [None]:
df.dtypes

But before visualization, we need to make date features -> date time object . For this we use to_datetime() fn

In [None]:
df['Formatted Date'] = pd.to_datetime(df['Formatted Date'], utc=True)
df['Formatted Date']

In [None]:
df.describe()

Setting index to Formatted Date

In [None]:
df = df.set_index("Formatted Date")
df.head(2)

Now since we have been given hourly data, we need to resample it monthly. Resampling is a convenient method for frequency conversion. Object must have a datetime like index

In [None]:
data_columns = ['Apparent Temperature (C)', 'Humidity']
df_monthly_mean = df[data_columns].resample('MS').mean()
df_monthly_mean.head()

Here "MS" denotes: Month starting We are displaying the average apparent temperature and humidity using mean() function.

In [None]:
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
plt.figure(figsize=(14,6))
plt.title("Variation in Apparent Temperature and Humidity with time")
sns.lineplot(data=df_monthly_mean)

Observation : From the above plot, we can say that humidity remained almost constant in these years. Even the average apparent temperature is almost same (since peaks lie on the same line)

If we want to specifically retrieve the data of a particular month from every year, say April in this case then :

In [None]:
df1 = df_monthly_mean[df_monthly_mean.index.month==4]
print(df1)
df1.dtypes

Plotting the variation in Apparent Temperature and Humidity for the month of April every year:

In [None]:
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(18,7))
ax.plot(df1.loc['2006-04-01':'2016-04-01', 'Apparent Temperature (C)'], marker='o', linestyle='-',label='Apparent Temperature (C)')
ax.plot(df1.loc['2006-04-01':'2016-04-01', 'Humidity'], marker='o', linestyle='-',label='Humidity')
#ax.set_xticks(['04-01-2006','04-01-2007','04-01-2008','04-01-2009','04-01-2010','04-01-2011','04-01-2012','04-01-2013','04-01-2014','04-01-2015','04-01-2016'])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d %m %Y'))
ax.legend(loc ='center right')
ax.set_xlabel('Month of April')

Observation : No change in average humidity. Increase in average apparent temperature can be seen in the year 2009 then again it dropped in 2010 then there was a slight increase in 2011 then a significant drop is observed in 2015 and again it increased in 2016 .

It is quite evident that there is a sharp rise in temp past 2010 whereas there is a fall over 2014 .
But let’s look at some more graphical visualization so get a better belief.

In [None]:
sns.lmplot(x='Apparent Temperature (C)',y='Humidity',data=df_monthly_mean)
plt.show()

 Lm plot or reg plot is a statistical method for predictive analysis. Apparent temperature V/s Humidity

In [None]:
corr = df_monthly_mean.corr()
sns.heatmap(corr)

A Heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors.

Now let’s plot a dist plot using Seaborn library. A dist plot lets you see a histogram with a line over it.

In [None]:
sns.distplot(df.Humidity,color='red')

Summarising Humidity vs Temperature with the help of relplot() method

In [None]:
sns.relplot(data=df,x="Apparent Temperature (C)",y="Humidity",color="purple",hue="Summary")

## Conclusion

Global warming is no doubt deteriorating the climate and is affecting various parameters of the environment.Hence from this analysis we infer that there are either sharp rise in temperatures or sharp falls over the 10 yrs. Hence we can conclude that Global Warming has caused a major difference and unreliability in temperature predictability also taking humidity into consideration we can say that it has almost remained same throughout the past years.