# Urban Flood Risk

**Problem Statement:**  
Urban flooding poses significant risks to infrastructure, public safety, and economic stability. This project focuses on identifying flood-prone areas within cities, analyzing the underlying causes, and providing actionable insights to support urban planning and disaster management efforts.

**Description:**  
This project utilizes a synthetic dataset cataloging micro-areas (“segments”) across global cities to assess urban pluvial (rainfall-driven) flood risk. Each record represents a spatial segment with geographic coordinates, hydrologic context, drainage infrastructure characteristics, rainfall sources and intensities, and qualitative risk labels. By integrating global elevation and land datasets, local/remote rainfall sources, and infrastructure proximity metrics, the project supports hotspot detection, risk scoring, model training, and operational monitoring for effective flood risk management.


In [None]:
%pip install pandas
%pip install matplotlib
%pip install scikit-learn
# Import necessary libraries  
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
%matplotlib inline

In [4]:
df=pd.read_csv("D:/intern_dataset/archive/urban_pluvial_flood_risk_dataset.csv")
print(df.head())

  segment_id             city_name    admin_ward   latitude   longitude  \
0  SEG-00001    Colombo, Sri Lanka  Borough East   6.920633   79.912600   
1  SEG-00002        Chennai, India        Ward D  13.076487   80.281774   
2  SEG-00003      Ahmedabad, India     Sector 12  23.019473   72.638578   
3  SEG-00004      Hong Kong, China     Sector 14  22.302602  114.078673   
4  SEG-00005  Durban, South Africa      Sector 5 -29.887602   30.911008   

  catchment_id  elevation_m            dem_source       land_use soil_group  \
0      CAT-136          NaN  Copernicus_EEA-10_v5  Institutional        NaN   
1      CAT-049        -2.19  Copernicus_EEA-10_v5    Residential          D   
2      CAT-023        30.88             SRTM_3arc     Industrial          B   
3      CAT-168        24.28             SRTM_3arc    Residential          B   
4      CAT-171        35.70             SRTM_3arc     Industrial          C   

   drainage_density_km_per_km2  storm_drain_proximity_m storm_drain_type  

In [5]:

print("\n1. DATASET INFO:")
print("-" * 20)
df.info()

print("\n\n2. STATISTICAL SUMMARY:")
print("-" * 20)
print(df.describe())

print("\n\n3. MISSING VALUES CHECK:")
print("-" * 20)
missing_values = df.isnull().sum()
print("Missing values per column:")
print(missing_values)



1. DATASET INFO:
--------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2963 entries, 0 to 2962
Data columns (total 17 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   segment_id                           2963 non-null   object 
 1   city_name                            2963 non-null   object 
 2   admin_ward                           2963 non-null   object 
 3   latitude                             2963 non-null   float64
 4   longitude                            2963 non-null   float64
 5   catchment_id                         2963 non-null   object 
 6   elevation_m                          2802 non-null   float64
 7   dem_source                           2963 non-null   object 
 8   land_use                             2963 non-null   object 
 9   soil_group                           2601 non-null   object 
 10  drainage_density_km_per_km2          2679 non-null   floa