Project Title:

Flood Prediction and Climate Risk Analysis

Problem Statement:
Floods are one of the most frequent and devastating climate disasters, causing human casualties, economic damage, and environmental loss. Predicting flood occurrence (Yes/No) using factors such as rainfall, temperature, humidity, and water level can help improve disaster preparedness, risk management, and climate resilience.

Description:
This project uses a dataset containing environmental variables (rainfall, temperature, humidity, and water level) to explore flood risk patterns. Through exploratory data analysis (EDA), we can identify how these features affect flood occurrence. The insights can guide disaster management authorities in early warning systems, resource allocation, and policy decisions. Later, the dataset can also be extended for AI/ML modeling.

In [6]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# ML-related imports (future scope, not required for Week 1)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import joblib

In [14]:
# load the dataset
df = pd.read_csv('fd.csv/flood.csv')
# display first few rows
print(df.head())


   MonsoonIntensity  TopographyDrainage  RiverManagement  Deforestation  \
0                 3                   8                6              6   
1                 8                   4                5              7   
2                 3                  10                4              1   
3                 4                   4                2              7   
4                 3                   7                5              2   

   Urbanization  ClimateChange  DamsQuality  Siltation  AgriculturalPractices  \
0             4              4            6          2                      3   
1             7              9            1          5                      5   
2             7              5            4          7                      4   
3             3              4            1          4                      6   
4             5              8            5          2                      7   

   Encroachments  ...  DrainageSystems  CoastalVulnerability  

In [17]:
# Basic info about the dataset
print(df.info())

# Statistical summary (only for numerical columns)
print(df.describe())

# Checking missing values
print(df.isnull().sum())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 21 columns):
MonsoonIntensity                   50000 non-null int64
TopographyDrainage                 50000 non-null int64
RiverManagement                    50000 non-null int64
Deforestation                      50000 non-null int64
Urbanization                       50000 non-null int64
ClimateChange                      50000 non-null int64
DamsQuality                        50000 non-null int64
Siltation                          50000 non-null int64
AgriculturalPractices              50000 non-null int64
Encroachments                      50000 non-null int64
IneffectiveDisasterPreparedness    50000 non-null int64
DrainageSystems                    50000 non-null int64
CoastalVulnerability               50000 non-null int64
Landslides                         50000 non-null int64
Watersheds                         50000 non-null int64
DeterioratingInfrastructure        50000 non-null i