# Assignment 1

## Problem

We are developing  a predictive classifier using old weather data to predict the next day rain on the target variable RainTomorrow (Sydney).

## Library used 

<ol>
    <li>pandas</li>
    <li>numpy</li>
    <li>matplotlib</li>
    <li>seaborn</li>
    <li>plotly</li>
</ol>


### Link for DataSet & Source & Acknowledgements
<ul><b>Observations were drawn from numerous weather stations</b>
    <li>The daily observations are available from <a         href="http://www.bom.gov.au/climate/data">http://www.bom.gov.au/climate/data</a> </li>
    <li>Definitions adapted from <a href="http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml">http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml</a> </li>
</ul>
<ul><b>Data source</b>
    <li><a href="http://www.bom.gov.au/climate/dwo/ ">http://www.bom.gov.au/climate/dwo/ </a></li>
    <li><a href="http://www.bom.gov.au/climate/data">http://www.bom.gov.au/climate/data</a></li>
    <li><a href="https://www.kaggle.com/datasets/arunavakrchakraborty/australia-weather-data">https://www.kaggle.com/datasets/arunavakrchakraborty/australia-weather-data </a></li>
</ul>

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

data = pd.read_csv("WeatherTestData.csv")

### Data Description
<br>
Location - Name of the city from Australia.<br>
MinTemp - The Minimum temperature during a particular day. (degree Celsius)<br>
MaxTemp - The maximum temperature during a particular day. (degree Celsius)<br>
Rainfall - Rainfall during a particular day. (millimeters)<br>
Evaporation - Evaporation during a particular day. (millimeters)<br>
Sunshine - Bright sunshine during a particular day. (hours)<br>
WindGusDir - The direction of the strongest gust during a particular day. (16 compass points)<br>
WindGuSpeed - Speed of strongest gust during a particular day. (kilometers per hour)<br>
WindDir9am - The direction of the wind for 10 min prior to 9 am. (compass points)<br>
WindDir3pm - The direction of the wind for 10 min prior to 3 pm. (compass points)<br>
WindSpeed9am - Speed of the wind for 10 min prior to 9 am. (kilometers per hour)<br>
WindSpeed3pm - Speed of the wind for 10 min prior to 3 pm. (kilometers per hour)<br>
Humidity9am - The humidity of the wind at 9 am. (percent)<br>
Humidity3pm - The humidity of the wind at 3 pm. (percent)<br>
Pressure9am - Atmospheric pressure at 9 am. (hectopascals)<br>
Pressure3pm - Atmospheric pressure at 3 pm. (hectopascals)<br>
Cloud9am - Cloud-obscured portions of the sky at 9 am. (eighths)<br>
Cloud3pm - Cloud-obscured portions of the sky at 3 pm. (eighths)<br>
Temp9am - The temperature at 9 am. (degree Celsius)<br>
Temp3pm - The temperature at 3 pm. (degree Celsius)<br>
RainToday - If today is rainy then ‘Yes’. If today is not rainy then ‘No’.<br>
RainTomorrow - If tomorrow is rainy then 1 (Yes). If tomorrow is not rainy then 0 (No).<br>

In [27]:
data.head()

Unnamed: 0,row ID,Location,MinTemp,MaxTemp,MeanTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,...,AvgHumidity,Pressure9am,Pressure3pm,AvgPressure,Cloud9am,Cloud3pm,AvgCloud,Temp9am,Temp3pm,RainToday
0,1,Sydney,21.6,24.5,23.1,6.6,2.4,0.1,,,...,87.0,1016.7,1015.6,1016.2,7.0,8.0,7.5,23.5,23.0,Yes
1,2,Sydney,18.6,26.3,22.5,6.2,5.2,5.2,,,...,77.5,999.0,1000.3,999.65,4.0,7.0,5.5,21.7,22.3,Yes
2,3,Sydney,18.4,22.8,20.6,14.4,7.0,3.3,,,...,78.5,1009.2,1011.7,1010.5,8.0,7.0,7.5,20.9,21.0,Yes
3,4,Sydney,16.9,24.3,20.6,3.0,3.2,8.7,,,...,66.0,1017.2,1016.5,1016.9,7.0,1.0,4.0,18.4,23.3,Yes
4,5,Sydney,16.7,24.1,20.4,0.0,6.2,8.8,,,...,64.5,1023.0,1022.6,1022.8,7.0,6.0,6.5,19.8,23.3,No


In [28]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 976 entries, 0 to 975
Data columns (total 27 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   row ID         976 non-null    int64  
 1   Location       976 non-null    object 
 2   MinTemp        976 non-null    float64
 3   MaxTemp        976 non-null    float64
 4   MeanTemp       976 non-null    float64
 5   Rainfall       973 non-null    float64
 6   Evaporation    962 non-null    float64
 7   Sunshine       972 non-null    float64
 8   WindGustDir    682 non-null    object 
 9   WindGustSpeed  682 non-null    float64
 10  WindDir9am     951 non-null    object 
 11  WindDir3pm     965 non-null    object 
 12  WindSpeed9am   965 non-null    float64
 13  WindSpeed3pm   967 non-null    float64
 14  AvgWindSpeed   976 non-null    float64
 15  Humidity9am    974 non-null    float64
 16  Humidity3pm    971 non-null    float64
 17  AvgHumidity    976 non-null    float64
 18  Pressure9a

Now in order to use this data, we need to clean the data and remove all the empty cells from the dataset. So we will use dropna()

In [29]:
data.dropna()

Unnamed: 0,row ID,Location,MinTemp,MaxTemp,MeanTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,...,AvgHumidity,Pressure9am,Pressure3pm,AvgPressure,Cloud9am,Cloud3pm,AvgCloud,Temp9am,Temp3pm,RainToday
280,281,Sydney,14.8,23.8,19.30,0.0,6.8,9.6,SSE,54.0,...,72.5,1016.0,1014.7,1015.35,2.0,7.0,4.5,20.2,20.6,No
283,284,Sydney,17.2,23.6,20.40,5.6,11.2,7.0,SSE,37.0,...,67.0,1025.0,1024.2,1024.60,7.0,8.0,7.5,19.7,23.2,Yes
284,285,Sydney,19.7,27.2,23.50,8.6,7.2,9.3,E,30.0,...,60.0,1014.9,1012.0,1013.45,5.0,3.0,4.0,20.1,25.0,Yes
287,288,Sydney,17.1,22.3,19.70,0.4,6.6,4.2,SSE,28.0,...,70.5,1017.1,1015.7,1016.40,7.0,6.0,6.5,17.5,21.0,No
288,289,Sydney,16.5,24.0,20.25,0.0,9.0,11.4,NE,44.0,...,68.0,1026.0,1024.3,1025.15,6.0,2.0,4.0,19.3,23.2,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
971,972,Sydney,10.7,20.1,15.40,0.6,1.2,6.4,W,22.0,...,79.0,1027.9,1024.4,1026.20,2.0,6.0,4.0,11.9,18.7,No
972,973,Sydney,11.3,18.0,14.65,1.8,2.0,6.3,S,52.0,...,72.5,1025.7,1025.8,1025.80,3.0,5.0,4.0,12.9,17.6,Yes
973,974,Sydney,11.3,20.0,15.65,4.4,2.2,5.8,W,26.0,...,74.0,1028.7,1025.2,1027.00,6.0,1.0,3.5,11.9,19.7,Yes
974,975,Sydney,8.6,19.6,14.10,0.0,2.0,7.8,SSE,37.0,...,62.5,1025.9,1025.3,1025.60,2.0,2.0,2.0,10.5,17.9,No


In [30]:
figure = px.line(data, x="row ID", 
                 y="MeanTemp", 
                 title='Mean Temperature in Sydeny Over the Years')
figure.show()

In [31]:
figure = px.line(data, x="row ID", 
                 y="AvgHumidity", 
                 title='Humidity in Sydeny Over the Years')
figure.show()

In [32]:
figure = px.scatter(data_frame = data, x="AvgHumidity",
                    y="MeanTemp", size="MeanTemp", 
                    trendline="ols", 
                    title = "Relationship Between Temperature and Humidity")
figure.show()