Finding Donors for CharityML

Exercise evaluating classifiers in `scikit-learn`

Example screen:

This project is part of a Udacity program: Data Science degree, Project 1.

I use several classification algorithms included in the scikit-learn package to model person-level income — using data collected from the 1994 U.S. Census — and predict whether an individual makes more than $50,000 per year.

First, I obtain preliminary results from a small set of algorithms. Second, I choose the best candidate algorithm and further optimize it to improve performance.

Main files in the repository:

census.csv: 1994 U.S. Census data; 14 features for 45,222 individuals. Extraction was done by Barry Becker from the Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)). More details here.
find_donors-sklearn.ipynb: Jupyter notebook including main Python code.
visuals.py: Plotting functions.

Economic or Business question

The goal of the project is to build a tool that predicts whether an individual makes more than $50,000 per year. This sort of task can arise in a non-profit setting, where organizations survive on donations and have limited resources for fund-raising. Estimating people's income can help the non-profit make their fund-raising more cost-effective; for example, deciding whether to reach out at all to a potential donor.

Data Science motivation

In this project I use several supervised algorithms included in the scikit-learn package to model person-level income using data collected from the 1994 U.S. Census. First, I obtain preliminary results from a set of algorithms. Second, I choose the best candidate algorithm and further optimize it to best model the data.

Usage Example

The Jupyter Project highly recommends new users to install Anaconda; since it conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

Use the following installation steps:

Download Anaconda.
Install the version of Anaconda which you downloaded, following the instructions on the download page.
To run the notebook:

jupyter notebook find_donors-sklearn.ipynb

Python version

3.7.1 (default, Oct 23 2018, 14:07:42)

Python libraries

The Jupyter Notebook file, find_donors-sklearn.ipynb, requires the following Python libraries:

IPython
matplotlib
numpy
pandas
sklearn
sys
time
warnings

1994 Census data

census.csv: 1994 U.S. Census data; 14 features for 45,222 individuals.

Outcome

'income': >50K, <=50K.

Attribute information

'age': continuous.
'workclass': Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
'education': Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
'education-num': continuous.
'marital-status': Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
'occupation': Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
'relationship': Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
'race': White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
'sex': Female, Male.
'capital-gain': continuous.
'capital-loss': continuous.
'hours-per-week': continuous.
'native-country': United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

Acknowledgments

Udacity: Data Scientist Nanodegree program.
Jupyter Documentation: Installing Jupyter Notebook

Author

Juan Carlos Lopez

Contributing

Fork it (https://github.com/jclh/finding-donors-classifier/fork)
Create your feature branch (git checkout -b feature/fooBar)
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
census.csv		census.csv
find_donors-sklearn.ipynb		find_donors-sklearn.ipynb
screen-example.png		screen-example.png
visuals.py		visuals.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finding Donors for CharityML

Exercise evaluating classifiers in `scikit-learn`

Economic or Business question

Data Science motivation

Usage Example

Python version

Python libraries

1994 Census data

Outcome

Attribute information

Acknowledgments

Author

Contributing

About

Releases

Packages

Languages

jclh/finding-donors-classifier

Folders and files

Latest commit

History

Repository files navigation

Finding Donors for CharityML

Exercise evaluating classifiers in scikit-learn

Economic or Business question

Data Science motivation

Usage Example

Python version

Python libraries

1994 Census data

Outcome

Attribute information

Acknowledgments

Author

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Exercise evaluating classifiers in `scikit-learn`

Packages