Python_Open_Data

Python code for NYC Open Data Sources, tidy, features, charts

Script to generate chart of New York City ride-share major competitors market-share Uses SODA api to download updated NYC open data for taxi industry. (SODA doesn't work in latest version of Python 3.7)
Pre-processing of strings fields to date objects, integers, for grouping operations. Wrote function to tidy company names and multiple names for Uber, Lyft, VIA, Gett Aggregated volume by month and presented in graph
Script finding best Predictive Features of Restaurant Closing using NYC Open Data, Restaurant Violations
Tidy the data, and learn if certain types of restaurants and violations, had a high correlation to closure, why?
Classification logistic regression, which types of food (cuisine type) and violation types, may be leading indicators of restaurant closings (for health violations).
Preprocessed string fields, to time-objects, time-deltas, created dummies for cuisine and violation types,
Used k-best algorithm to find the most informative features.

NYC TAP Files

Review of NYC open data related to metals concentration in NYC tap water.

Coded in Python, primarily utilizing Pandas, Matplotlib and Basemap
Intensity of map color parallels zip code areas relative concentration level (charts maps the copper concentrations in the 1-2 minute draw).
Data cleansing was performed on NYC open data, i.e. fixing data types, missing valuesnaming inconsistencies
Geocoding (latitiude and longitude) data was appended to the NYC dataset to facilitate the heat-map chart

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Copper_Lead_NYC_TAP.pdf		Copper_Lead_NYC_TAP.pdf
Copper_NYC_Tap.png		Copper_NYC_Tap.png
Copper_in_NYC_Tap		Copper_in_NYC_Tap
NYC_Restaurant_Closings_Insights.py		NYC_Restaurant_Closings_Insights.py
NYC_Restaurant_Violations.py		NYC_Restaurant_Violations.py
NYC_Ride_Share_Competition.png		NYC_Ride_Share_Competition.png
README.md		README.md
taxi_git.py		taxi_git.py