# Cleaning datas 

We obtained historical weather datas for Montreal from OpenWeatherMap. The datas are from January 1st, 1979 to July 31st, 2020. 

The collected features are: 

- <code> city_name </code> City name
- <code> lat </code> Geographical coordinates of the location (latitude)
- <code> lon </code> Geographical coordinates of the location (longitude)
- <code> main </code>
    - <code> main.temp </code> Temperature
    - <code> main.feels_like </code> This temperature parameter accounts for the human perception of weather
    - <code> main.pressure </code> Atmospheric pressure (on the sea level), hPa
    - <code> main.humidity </code> Humidity, %
    - <code> main.temp_min </code> Minimum temperature at the moment. This is deviation from temperature that is possible for large cities and megalopolises geographically expanded (use these parameter optionally).
    - <code> main.temp_max </code> Maximum temperature at the moment. This is deviation from temperature that is possible for large cities and megalopolises geographically expanded (use these parameter optionally).
- <code> wind </code>
    - <code> wind.speed </code> Wind speed. Unit Default: meter/sec
    - <code> wind.deg </code> Wind direction, degrees (meteorological)
- <code> clouds </code>
    - <code> clouds.all </code> Cloudiness, %
- <code> rain </code>
    - <code> rain.1h </code> Rain volume for the last hour, mm
    - <code> rain.3h </code> Rain volume for the last 3 hours, mm
- <code> snow </code>
    - <code> snow.1h </code> Snow volume for the last hour, mm (in liquid state)
    - <code> snow.3h </code> Snow volume for the last 3 hours, mm (in liquid state)
- <code> weather </code> 
    - <code> weather.id </code> Weather condition id
    - <code> weather.main </code> Group of weather parameters (Rain, Snow, Extreme etc.)
    - <code> weather.description </code> Weather condition within the group
    - <code> weather.icon </code> Weather icon id
- <code> dt </code> Time of data calculation, unix, UTC
- <code> dt_iso </code> Date and time in UTC format
- <code> timezone </code> Shift in seconds from UTC

The explanation for the weather condition id and icon id can be found here: https://openweathermap.org/weather-conditions

We import the useful libraries. 

In [1]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import time
import matplotlib.pyplot as plt

We read the csv file.

In [2]:
df_data = pd.read_csv('weather_data_montreal.csv')

print('Number of Entries = {}'.format(df_data.shape[0]))
print('Data Shape = {}'.format(df_data.shape))
print(df_data.columns)

Number of Entries = 373025
Data Shape = (373025, 25)
Index(['dt', 'dt_iso', 'timezone', 'city_name', 'lat', 'lon', 'temp',
       'feels_like', 'temp_min', 'temp_max', 'pressure', 'sea_level',
       'grnd_level', 'humidity', 'wind_speed', 'wind_deg', 'rain_1h',
       'rain_3h', 'snow_1h', 'snow_3h', 'clouds_all', 'weather_id',
       'weather_main', 'weather_description', 'weather_icon'],
      dtype='object')


We split into a training set, a cross-validation set and a test set. 