# Predict tomorrow's rain

## Introduction
This is a model that predicts whether it will rain or not tomorrow.
However, to do that, I have to look at the major triggers of tomorrow's rain.

I have to extract the minimum Temperature, Maximum Temperature, and average temperature of the last 6 hours.
I have to extract the minimum Humidity, Maximum Humidity, and average Humidity of the last 6 hours.
I have to extract the minimum Wind Speed, Maximum Wind Speed, and average Wind Speed of the last 6 hours.

## Dependencies
- Pandas
- Matplotlib
- Numpy
- Seaborn
- Sklearn


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.feature_extraction import DictVectorizer
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import mean_squared_error, precision_score, accuracy_score


In [3]:
df = pd.read_csv('../../../data/cleaned_ikorodu_weather_2016_2024.csv')

In [4]:
df.head()

Unnamed: 0,Time (hr),Temperature (°c),Forecast (°c),Rain (mm),Rain (%),Cloud (%),Pressure (mb),Wind (km/h),Gust (km/h),Direction (deg),...,Moonset (hr),Sunrise (hr),Sunset (hr),Weather,Date,Moonrise (min),Moonset (min),Sunrise (min),Sunset (min),Season
0,0,25,27,0.1,45.0,87.0,1013.0,18.0,27.0,225.3,...,05:17 PM,06:39 AM,06:53 PM,Light rain shower,2024-08-31,271.0,1037.0,399,1133,wet
1,3,24,26,0.4,100.0,100.0,1012.0,19.0,28.0,225.1,...,05:17 PM,06:39 AM,06:53 PM,Light rain shower,2024-08-31,271.0,1037.0,399,1133,wet
2,6,24,26,0.3,100.0,100.0,1013.0,18.0,27.0,230.1,...,05:17 PM,06:39 AM,06:53 PM,Light rain shower,2024-08-31,271.0,1037.0,399,1133,wet
3,9,25,28,0.1,45.0,76.0,1014.0,21.0,28.0,226.8,...,05:17 PM,06:39 AM,06:53 PM,Light rain shower,2024-08-31,271.0,1037.0,399,1133,wet
4,12,27,29,0.0,45.0,61.0,1014.0,21.0,27.0,229.2,...,05:17 PM,06:39 AM,06:53 PM,Light rain shower,2024-08-31,271.0,1037.0,399,1133,wet


In [5]:
df.columns

Index(['Time (hr)', 'Temperature (°c)', 'Forecast (°c)', 'Rain (mm)',
       'Rain (%)', 'Cloud (%)', 'Pressure (mb)', 'Wind (km/h)', 'Gust (km/h)',
       'Direction (deg)', 'Moonrise (hr)', 'Moonset (hr)', 'Sunrise (hr)',
       'Sunset (hr)', 'Weather', 'Date', 'Moonrise (min)', 'Moonset (min)',
       'Sunrise (min)', 'Sunset (min)', 'Season'],
      dtype='object')

### Data Cleaning

In [6]:
df['Date'] = pd.to_datetime(df['Date'])

In [7]:
df['Time (hr)'] = [(df['Date'][i] + pd.Timedelta(hours=x)) for i, x in enumerate(df['Time (hr)'])]

In [8]:
df.set_index('Time (hr)', inplace=True)

### Data Wrangling

In [9]:
df['Rain (mm)'].tail()

Time (hr)
2016-09-03 09:00:00    1.7
2016-09-03 12:00:00    2.5
2016-09-03 15:00:00    0.6
2016-09-03 18:00:00    0.0
2016-09-03 21:00:00    0.0
Name: Rain (mm), dtype: float64

In [10]:
# Add the next rain to the dataframe
df['Next Rain (mm)'] = [x if i != len(df['Rain (mm)']) - 1 else 0 for i, x in enumerate(df['Rain (mm)'])]

In [11]:
df['Next Rain (mm)'].tail()

Time (hr)
2016-09-03 09:00:00    1.7
2016-09-03 12:00:00    2.5
2016-09-03 15:00:00    0.6
2016-09-03 18:00:00    0.0
2016-09-03 21:00:00    0.0
Name: Next Rain (mm), dtype: float64

In [12]:
# Take a look at the major triggers of tomorrow's rain.
correletions = df[['Temperature (°c)', 'Forecast (°c)', 'Rain (mm)',
       'Rain (%)', 'Cloud (%)', 'Pressure (mb)', 'Wind (km/h)', 'Gust (km/h)']].corrwith(df['Next Rain (mm)'])

In [13]:
def get_correlations(df, items, intervals, target):
    
    correlation_df = pd.DataFrame(columns=intervals)
    for interval in intervals:
        sample = df.resample(interval)
        df_interval_mean = sample[[target, *items]].mean()
        df_interval_max = sample[[*items]].max()
        df_interval_min = sample[[*items]].min()

        df_interval_max.columns = ['Max ' + x for x in df_interval_max.columns]
        df_interval_min.columns = ['Min ' + x for x in df_interval_min.columns]

        df_interval = pd.concat([df_interval_mean, df_interval_max, df_interval_min], axis=1)
        columns = [*items, *df_interval_max.columns, *df_interval_min.columns]
        c = df_interval[columns].corrwith(df_interval['Next Rain (mm)'])
        correlation_df[interval] = c
        
    return correlation_df

In [14]:
intervals = ['6h', '12h', '24h']
get_correlations(df, ['Temperature (°c)', 'Cloud (%)', 'Pressure (mb)', 'Wind (km/h)', 'Gust (km/h)'], intervals, target='Next Rain (mm)')

Unnamed: 0,6h,12h,24h
Temperature (°c),-0.152714,-0.238194,-0.371834
Cloud (%),0.296551,0.33615,0.408413
Pressure (mb),0.135821,0.165649,0.225104
Wind (km/h),0.025435,0.042858,0.025664
Gust (km/h),0.098773,0.129914,0.128906
Max Temperature (°c),-0.158031,-0.238375,-0.415304
Max Cloud (%),0.266011,0.256155,0.266289
Max Pressure (mb),0.138231,0.157799,0.180939
Max Wind (km/h),0.034584,0.04106,-0.069478
Max Gust (km/h),0.110415,0.149009,0.138248


In [15]:
years = df.index.year.unique()

In [None]:
fig, axis = plt.subplots(len(years), figsize=(10, 6*len(years)))
for i, year in enumerate(years):
    year_df = df[df.index.year == year]
    axis[i].bar(year_df.index.month, year_df['Rain (mm)'])
    axis[i].set_title(f'Rainfall in {year}')
    axis[i].set_xlabel('Month')
    axis[i].set_ylabel('Rainfall (mm)')

fig.tight_layout()
plt.show()