Project Title:
Flood Prediction and Early Warning System for Gujarat's Urban Areas

Problem Statement: Gujarat experiences severe flooding during monsoons, amplified by climate change, leading to loss of life, displacement, and economic damage. For instance, urban centers like Ahmedabad and Vadodara face risks from river overflows and heavy rainfall, but current warning systems often lack real-time precision, delaying evacuations and resource allocation.

Description:This project uses a dataset "Flood Risk Prediction Dataset in India" is a comprehensive dataset designed to facilitate the development and evaluation of predictive models for flood risks across various regions of India(Gujrate). The dataset includes a wide range of features encompassing meteorological, geographical, hydrological, socio-economic, and historical flood data. These features are crucial for understanding the factors that contribute to flood occurrences and for building accurate prediction models.

Load Library

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import joblib

In [6]:
# Load augmented dataset
df = pd.read_csv('flood_risk_dataset_india.csv')

In [7]:
#Display first few rows
print(df.head())

    Latitude  Longitude  Rainfall (mm)  Temperature (°C)  Humidity (%)  \
0  18.861663  78.835584     218.999493         34.144337     43.912963   
1  35.570715  77.654451      55.353599         28.778774     27.585422   
2  29.227824  73.108463     103.991908         43.934956     30.108738   
3  25.361096  85.610733     198.984191         21.569354     34.453690   
4  12.524541  81.822101     144.626803         32.635692     36.292267   

   River Discharge (m³/s)  Water Level (m)  Elevation (m)    Land Cover  \
0             4236.182888         7.415552     377.465433    Water Body   
1             2472.585219         8.811019    7330.608875        Forest   
2              977.328053         4.631799    2205.873488  Agricultural   
3             3683.208933         2.891787    2512.277800        Desert   
4             2093.390678         3.188466    2001.818223  Agricultural   

  Soil Type  Population Density  Infrastructure  Historical Floods  \
0      Clay         7276.742184   

Basic info

Data Preprocessing

In [None]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 11 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Date                          365 non-null    object 
 1   Cumulative_Rainfall_3d_mm     365 non-null    float64
 2   Soil_Moisture_percent         365 non-null    float64
 3   Urban_Density_people_per_km2  365 non-null    int64  
 4   Drainage_Capacity_mm_per_day  365 non-null    float64
 5   Imperviousness_Factor         365 non-null    float64
 6   Flood_Probability             365 non-null    float64
 7   Flood_Label                   365 non-null    int64  
 8   Rainfall_mm                   365 non-null    float64
 9   River_Level_m                 365 non-null    float64
 10  Temperature_C                 365 non-null    float64
dtypes: float64(8), int64(2), object(1)
memory usage: 31.5+ KB
None


In [18]:
print(df.isnull().sum())

Date                            0
Cumulative_Rainfall_3d_mm       0
Soil_Moisture_percent           0
Urban_Density_people_per_km2    0
Drainage_Capacity_mm_per_day    0
Imperviousness_Factor           0
Flood_Probability               0
Flood_Label                     0
Rainfall_mm                     0
River_Level_m                   0
Temperature_C                   0
dtype: int64


In [21]:
print(df.describe())

       Cumulative_Rainfall_3d_mm  Soil_Moisture_percent  \
count                 365.000000             365.000000   
mean                  175.721479              10.817589   
std                   285.507869               7.199194   
min                     1.120000               0.000000   
25%                     7.180000               5.410000   
50%                     9.660000               9.710000   
75%                   289.040000              15.150000   
max                  1280.220000              39.140000   

       Urban_Density_people_per_km2  Drainage_Capacity_mm_per_day  \
count                         365.0                    365.000000   
mean                        12000.0                     28.163616   
std                             0.0                      2.933012   
min                         12000.0                     20.090000   
25%                         12000.0                     26.130000   
50%                         12000.0                   

In [8]:
print(df.shape)

(10000, 14)


In [12]:
print(df.columns)

Index(['Latitude', 'Longitude', 'Rainfall (mm)', 'Temperature (°C)',
       'Humidity (%)', 'River Discharge (m³/s)', 'Water Level (m)',
       'Elevation (m)', 'Land Cover', 'Soil Type', 'Population Density',
       'Infrastructure', 'Historical Floods', 'Flood Occurred'],
      dtype='object')


Only data on state of Gujarat

In [9]:

#Filter for Gujarat Using Latitude/Longitude
gujarat_df = df[
    (df['Latitude'] >= 20.0) & (df['Latitude'] <= 24.7) &
    (df['Longitude'] >= 68.4) & (df['Longitude'] <= 74.4)
]

In [10]:
print("Filtered Gujarat data:", gujarat_df.shape)
gujarat_df.head()

Filtered Gujarat data: (336, 14)


Unnamed: 0,Latitude,Longitude,Rainfall (mm),Temperature (°C),Humidity (%),River Discharge (m³/s),Water Level (m),Elevation (m),Land Cover,Soil Type,Population Density,Infrastructure,Historical Floods,Flood Occurred
18,20.526406,70.923409,179.721453,34.841212,20.712469,3038.203913,4.714103,5106.693174,Forest,Peat,1427.409995,0,0,1
39,20.764422,70.892198,13.577494,15.989522,56.491764,2240.94188,4.666739,7313.993796,Water Body,Peat,3299.939146,1,0,0
41,22.36013,72.082343,225.614078,17.490126,77.849922,3503.323335,9.432808,3729.181832,Urban,Loam,6879.586521,1,0,0
136,23.044928,71.457629,75.52764,42.226979,37.207684,1009.717102,4.336522,6775.972468,Desert,Clay,1623.784864,0,0,1
153,22.19413,69.540498,189.906366,20.805518,74.021856,1771.197538,1.790897,8544.733938,Urban,Clay,442.296537,1,1,1


In [11]:
# Check missing data
print(gujarat_df.isnull().sum())

Latitude                  0
Longitude                 0
Rainfall (mm)             0
Temperature (°C)          0
Humidity (%)              0
River Discharge (m³/s)    0
Water Level (m)           0
Elevation (m)             0
Land Cover                0
Soil Type                 0
Population Density        0
Infrastructure            0
Historical Floods         0
Flood Occurred            0
dtype: int64
