# Weather Analysis

This dataset contains the weather changes of weather in Szeged,Hungary from year 2006 to 2016. It has 12 columns and 96453 rows. 

Source: https://www.kaggle.com/budincsevity/szeged-weather

### Importing data, Cleaning data and formatting dataset 

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
df = pd.read_csv('../input/szeged-weather/weatherHistory.csv')
df.info()

In [None]:
df.describe()

In [None]:
df.isnull().sum()

In [None]:
df=df.dropna()
df.isnull().sum()

In [None]:
df['Precip Type'].value_counts()

In [None]:
df['Formatted Date'] = df['Formatted Date'].str.split(" ").str[0].str.split("-").str[0]
df.head()

In [None]:
df['Formatted Date'].value_counts()

In [None]:
df.describe()

In [None]:
df = df.rename(columns={"Formatted Date":"Year"})
df.head()

In [None]:
df = df.drop(['Loud Cover'],axis=1)
df.head()

## Analysis for year 2006

In [None]:
a = df[df['Year']=='2016']
a.shape

In [None]:
a.info()

In [None]:
a['Temperature (C)'].mean()

In [None]:
a['Wind Speed (km/h)'].mean()

In [None]:
a.describe()

In [None]:
temp_range_2006 = a['Temperature (C)'].max() - a['Temperature (C)'].min()
temp_range_2006

This shows that the range of the temperature was from -10<sup>o</sup>C to 34.8<sup>o</sup>C i.e. there was change in temperature of ~45<sup>o</sup>C

In [None]:
humidity_range_2006 = a['Humidity'].max() - a['Humidity'].min()
humidity_range_2006

This shows that the range of the humidity was from 0.76 to 1 i.e. there was change in humidity of 0.77.

Similarly the range of the pressure was from 0(millibars) to 1038.01(millibars) i.e. there was change in pressure of 1038.01(milibars)

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(a.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.48
- Wind Bearing and Wind Speed : 0.08

<b>Negative Correlation</b>

- Pressure and Temperature : -0.021

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes drastically.

The negative relationship shows that with increase in temperature the pressure decreased in 2006.

## Analysis of year 2007

In [None]:
b = df[df['Year']=='2007']
b.shape

In [None]:
b.info()

In [None]:
b.describe()

In [None]:
temp_range_2007 = b['Temperature (C)'].max() - b['Temperature (C)'].min()
temp_range_2007

This shows that the range of the temperature was from -10.15<sup>o</sup>C to 39.9<sup>o</sup>C i.e. there was change in temperature of ~50<sup>o</sup>C.

We can say that there is a change of ~5<sup>o</sup>C from 2006 to 2007 in the maximum temperature.

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(b.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.28
- Wind Bearing and Wind Speed : 0.14

<b>Negative Correlation</b>

- Pressure and Temperature : -0.027

The positive relationship shows that with the increase in the temperature there is a increase in visibility which as compared to 2006 has decreased. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes drastically and is more than that of 2006.

The negative relationship shows that with increase in temperature the pressure decreased in 2007.

## Analysis of year 2008

In [None]:
c = df[df['Year']=='2008']
c.shape

In [None]:
c.info()

In [None]:
c.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(c.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.28
- Wind Bearing and Wind Speed : 0.098

<b>Negative Correlation</b>

- Pressure and Temperature : -0.026

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in temperature the pressure decreased in 2008.

## Analysis of year 2009

In [None]:
d = df[df['Year'] == '2009']
d.shape

In [None]:
d.info()

In [None]:
d.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(d.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.33
- Wind Bearing and Wind Speed : 0.18

<b>Negative Correlation</b>

- Pressure and Humidity : -0.014

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in the pressure humidity decreased in 2009.

## Analysis of year 2010

In [None]:
e = df[df['Year'] == '2010']
e.shape

In [None]:
e.info()

In [None]:
e.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(e.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.44
- Wind Bearing and Wind Speed : 0.15

<b>Negative Correlation</b>

- Pressure and Temperature : -0.036

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in temperature the pressure decreased in 2010.

## Analysis of year 2011

In [None]:
f = df[df['Year']=='2011']
f.shape

In [None]:
f.info()

In [None]:
f.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(f.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.5
- Wind Bearing and Wind Speed : 0.11

<b>Negative Correlation</b>

- Pressure and Temperature : -0.049

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in temperature the pressure decreased in 2011.

## Analysis of year 2012

In [None]:
g = df[df['Year'] == '2012']
g.shape

In [None]:
g.info()

In [None]:
g.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(g.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.31
- Wind Bearing and Wind Speed : 0.15

<b>Negative Correlation</b>

- Pressure and Temperature : -0.13

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in temperature the pressure decreased in 2012.

## Analysis of year 2013

In [None]:
h = df[df['Year'] == '2013']
h.shape

In [None]:
h.info()

In [None]:
h.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(h.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.38
- Wind Bearing and Wind Speed : 0.034

<b>Negative Correlation</b>

- Pressure and Humidity : -0.055

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in the pressure humidity decreased in 2013.

## Analysis of year 2014

In [None]:
i = df[df['Year'] == '2014']
i.shape

In [None]:
i.info()

In [None]:
i.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(i.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.43
- Wind Bearing and Wind Speed : 0.06

<b>Negative Correlation</b>

- Pressure and Humidity : -0.026

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in the pressure  humidity decreased in 2014.

## Analysis of year 2015

In [None]:
j = df[df['Year']=='2015']
j.shape

In [None]:
j.info()

In [None]:
j.describe()

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(j.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.5
- Wind Bearing and Wind Speed : 0.091

<b>Negative Correlation</b>

- Pressure and Humidity : -0.013

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in the pressure  humidity decreased in 2015.

## Analysis of year 2016

In [None]:
k = df[df['Year'] == '2016']
k.shape

In [None]:
k.info()

In [None]:
k.describe()

In [None]:
temp_range_2016 = k['Temperature (C)'].max() - k['Temperature (C)'].min()
temp_range_2016

In [None]:
plt.figure(figsize=[20,10])
sns.heatmap(k.corr(),cmap='coolwarm',annot=True)

### Observations:

<b>Positive Correlation</b>

- Visibility and Temperature : 0.48
- Wind Bearing and Wind Speed : 0.039

<b>Negative Correlation</b>

- Pressure and Humidity : -0.015

The positive relationship shows that with the increase in the temperature there is a increase in visibility. Similarly for Wind speed and wind bearing which shows that as the wind speed increase direction of the wind changes.

The negative relationship shows that with increase in the pressure  humidity decreased in 2016.

## Analysis from 2006 to 2016

In [None]:
x = [a,b,c,d,e,f,g,h,i,j,k]
max_temp = []
min_temp = []
avg_temp = []
for i in x:
    #print(i['Temperature (C)'].max())
    max_temp.append(i['Temperature (C)'].max())
    min_temp.append(i['Temperature (C)'].min())
    avg_temp.append(i['Temperature (C)'].mean())

In [None]:
year_list=[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016]
plt.figure(figsize=(12,6))
sns.lineplot(data=pd.DataFrame({'Year':year_list,'Max Temprature':max_temp,'Min Temperature':min_temp,
                                'Average Temperature':avg_temp}).set_index('Year'),markers=True, dashes=False)

### Coclusion

The graph clearly shows the maximum, minimum and average temperature from year 2006 to 2016. From the graph it is clear that there is big drop in the temperature in the year 2012 while the maximum temperature that was noted was in the year 2007. Graph clearly states that the average temperature remained the same from 2006 to 2016.