Flood Risk Regression Model

A univariate flood risk model using logistic regression trained with about 4500 labelled random geo-coordinates in Malaysia.

A multivariate example is given here.

Table of Contents:

Objective
Methodology
- Data Preparation
Limitation

Objective

To predict flood risk in any location in Malaysia
To determine the best model describing flood risk in Malaysia

Methodology

Ordinary Least Square (OLS)
```
Y = b_0 + b_1X + e
```
Explanation For Variables
- Y is the flood risk, it can be in the form of risk score or a binary outcome if using logistic regression
- X is the distance from historical flood location
- \beta_* can be computed and estimated. Generally:
  - The slope should be negative as greater distance (from historical location) would reduce the flood risks
  - One unit change in distance will correspond to the change in flood risk based on the magnitude of slope
Some alternative to the linear regression/logistic regression approach:
- XGBoost
- Decision Tree

Data Preparation

To create a regression model for flood risk, we require both predictor, x and the response, y variable.

The predictor variable is collected as follows:
- Random address throughout Malaysia were scrapped and cleaned from random address generator using Python Selenium package.
- The addresses were then geocoded to obtain their corresponding longitude and latitude.
- To compute for the distance from historical flood location, the following formula is applied: distance = | npoint - point | where npoint represents the nearest historical flood points and point represent the random address point input.
The response variable is collected as follows:
- The response variable is denoted in binary form of 0 and 1 which means either it is not being flooded until now or it is flooded before.
- A radius of about 500m will be drawn around each historical flood points and check for intersection, for intersected point, a value of 1 will be assigned, otherwise, 0 will be assigned.

Limitation

This is only a simple model with one feature: "distance to historical flood site", it might not be 100% reliable. The high accuracy from the model is perhaps due to the data labelling process where the flood coverage for all historical flood events were assumed to be the same.

More features and a more robust data collection plan are needed to make the prediction more sound and trustable.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
images		images
pages		pages
README.md		README.md
datacleaning.ipynb		datacleaning.ipynb
distance.ipynb		distance.ipynb
filesmerged.ipynb		filesmerged.ipynb
geocoding.ipynb		geocoding.ipynb
model_compare.ipynb		model_compare.ipynb
randomaddress.ipynb		randomaddress.ipynb
regression.ipynb		regression.ipynb
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
testing.ipynb		testing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flood Risk Regression Model

Objective

Methodology

Data Preparation

Limitation

About

Releases

Packages

Languages

keanteng/flood_risk_model

Folders and files

Latest commit

History

Repository files navigation

Flood Risk Regression Model

Objective

Methodology

Data Preparation

Limitation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages