Shoa Hung Lin, Chendong Cai
Determing appropriate poverty reduction strategies is hard. To do this, it requires measuring poverty in the first place. The dataset provided by World Bank conducted in-depth household surveys with a subset of the country's population. To measure poverty, most of these surveys collect detailed data on household consumption in order to get a clearer picture of a household's poverty status.
The aim of this project is to build a model that can accurately predict poverty for a specific country and utilized techniques such as data preprocessing, logistic regression, gradient descent, cross validation and regularization.
-
Build a logistic regression model with gradient descent
-
Compare our model to the scikit-learn package
-
Introduce regularization term (L1 and L2) and compare the results
-
Compare results with different poverty prediction probability threshold
The fitting code is written in Python and is demonstrated in the file STAT689_Poverty Prediction.ipynb.
We use log loss function to evaluate our results. https://en.wikipedia.org/wiki/Loss_functions_for_classification
DrivenData https://www.drivendata.org/competitions/50/worldbank-poverty-prediction/
Stanford cs229 http://cs229.stanford.edu/notes/cs229-notes1.pdf
Log loss function https://en.wikipedia.org/wiki/Loss_functions_for_classification
Gradient descent https://en.wikipedia.org/wiki/Gradient_descent