Skip to content
NLP Based Project in Collaboration w/ Code For America's Get Cal-Fresh Program
Python Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Code for America Get Calfresh NLP Project

This project is a part of the Data Science Working Group at Code for San Francisco. Other DSWG projects can be found at the main GitHub repo.

-- Project Status: [Active]

Project Intro/Objective

The purpose of this project is help Code for America process and analyze text response data from Get Calfresh applications to better understand the circumstances in which people apply to the program. Goals of the project are to:

  1. Spellcheck the text for better downstream processing and analysis
  2. Remove Personal Identifying information from the text
  3. Complete Exploratory Data Analysis and Topic Modelling of the Text


Methods Used

  • Data Pipelines
  • Natural Language Processing
  • Object Oriented Programming


  • Python
  • Jupyter Notebook
  • NLTK and other NLP libraries

Project Description

About Calfresh

19.4% of Californians did not have enough resources to meet basic needs in 2016 (source: the Economist). One of the initiatives supervised by the Californian state to help those in need is called CalFresh, also known the Supplemental Nutrition Assistance Program (SNAP). Although the application process can be confusing and difficult to navigate, Code for America’s GetCalFresh program is ensuring that everyone can access food assistance benefits. CalFresh consists of providing monthly food benefits to assist low-income households in purchasing the food they need to maintain adequate nutritional levels. These benefits are issued on an Electronic Benefit Transfer (EBT) card which looks like any other credit card.

Partnership with DSWG

Code for America wishes to utilize the rich information within the free response portion of the Get Calfresh applications in order to better understand the sentiments and circumstances underlying the reasons people apply. The hope is that this will help educate the public and break common stereotypes and stigmas associated with food stamp program recepients. Additionally, this information may also serve to encourage others to apply as well. The DSWG is helping CFA process the text data so that spelling errors are corrected, which will allow personal information to be removed effectively and aid in downstream analysis. We also plan to use machine learning and NLP methods such as topic modelling to help classify and quantify circumstances in the response text.

Needs of this project

  • NLP processing Pipelines
  • SpellChecker Improvements
  • Data exploration/descriptive statistics
  • Topic Modelling

Getting Started

  1. Clone this repo (for help see this tutorial).
  2. The Data for this project contains sensitive information. Please reach out to the leads for access.
  3. Data Processing Code (including our custom spellchecker) can be accessed here

Featured Notebooks/Analysis/Deliverables

Contributing DSWG Members

Team Leads (Contacts) : Rocio Ng(@rocio)

Other Members:

Name Slack Handle
Ian Colrick @icorlick
Melodie Belot @Melodie


  • If you haven't joined the SF Brigade Slack, you can do that here.
  • Our slack channel is #datasci-calfresh
  • Feel free to contact team leads with any questions or if you are interested in contributing!
You can’t perform that action at this time.