This project aims to use analytics of search text to improve the user experience at https://data.sfgov.org/
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
0-archive
Figs
data
helper_modules
processed_search_term_data
NER_people.py
NER_people_list.p
NER_people_location.py
NER_spacy.py
README.md
corrected_spellng_errors.csv
data_combiner.R
location_count.csv
notes.md
open_search_exploration.Rmd
open_search_exploration.md
people_count.csv
search_data_processing.ipynb
search_data_processing_baolin.ipynb
search_data_processing_baolin_v2.ipynb
search_terms.py
spell_corrector.py
street_names_clean.csv
threshold.py
usage_writeup.Rmd
usage_writeup.md
word2vec_modelling.ipynb
word_count_sorted.p
word_list.txt

README.md

Open Data Search

This project is a part of the Data Science Working Group at Code for San Francisco. Other DSWG projects can be found at the main GitHub repo.

-- Project Status: Active

Project Intro/Objective

The purpose of this project is to use analytics and topic modelling of search text to improve the user experience at https://data.sfgov.org/

Partner

Methods Used

  • Data Analysis
  • Descriptive Statistics/Data Visualization
  • Natural Language Processing
  • Word2Vec Modelling

Technologies

  • R
  • Python
    • Pandas, Spacy

Project Description

Th major goals of the project are as follows:

  • Clean and process search terms and categorize search terms by quality
  • Utilize Natural Language Processing and Topic Modelling on valid search terms and cluster terms to determine potential demand for data sources
  • Provide actionable insights to improve search functionality on the site

Needs of this project

  • NLP/Topic Modelling Expertise

Getting Started

  1. Clone this repo, for help see this tutorial.

  2. Data is being kept here

  3. Data processing/transformation script is Data Combiner

    • Script that combines raw data from .tsv files into a single .csv file

Featured Notebooks/Analysis

Contributing DSWG Members

Team Lead (Contact): Rocio S Ng (@Rocio)

Other Members:

Name Slack Handle
Bao Lin Liu @jbaolinliu
Scott Brenstuhl @scott_brenstuhl

Contact

  • If you haven't joined the SF Brigade Slack, you can do that here.
  • Our slack channel is #datasci-open-data_src
  • Feel free to contact team leads with any questions or if you are interested in contributing!