Skip to content

Using spark and other tools to analyze large, disparate data sources. Term Group Project for COMP119 Tufts F'19

Notifications You must be signed in to change notification settings

mrogove/NewHampshireOpioidDeepDive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

Term Project for Tufts University's COMP119 Big Data course, Fall 2019

R API examples adapted from WaPo ARCOS stories.

ARCOS/Opioid Data courtesy Washington Post. Social Vulnerability Index from CDC. GIS shapefiles from Census Bureau.

All Python/Pyspark code and R code adaptation/troubleshooting by Michael Rogove QGIS images produced by Megana Lakshmi Padmanabhan

Michael Rogove | Megana Lakshmi Padmanabhan | Kevin Hederman

Project Highlights

Using R to reproduce work from WaPo's ARCOS API samples

At first, it seems like the prescription opioid crisis is most pronounced in Southern New Hampshire. But normalizing per capita threw a spotlight on Northern, rural New Hampshire.

Using R to identify suspicious pharmacies; verifying results from raw data via PySpark

Clearly, something is fishy in Coos County, particularly in its Rite Aid pharmacies.

Draft Queries to Dive Deeper into Raw Opioid Data; Python to blend results with Social Vulnerability Index

Here, we show who the most suspicious pharmacy bought from (doubled their McKesson orders in just 7 years; added a major oxycodone souce from Eckerd). We also can see that the flood of prescription opioids correlates with rural, poorer counties with higher disability rates.


What files are here?

Five groups:

  1. Main Presentation (.pptx file).
  2. R analysis of API data.
  3. PySpark analysis of raw ARCOS data.
  4. SVI data and analysis.
  5. (Bonus: QGIS and related files)

1. Main Presentation

This is what we will mostly use during final presentation. START HERE.

  • COMP119F19OpioidTermProject.pptx

2. R analysis of API data.

Jupyter notebook: "OpioidProjectNotebook"

You only need to view one of these files, in order of preference:

  1. ROpioidProjectNotebook.html
  2. ROpioidProjectNotebook.ipynb
  3. .pdf (just in case)

3. PySpark analysis of ARCOS data.

Jupyter notebook.

You only need to view one of these files, in order of preference:

  1. PySparkOpioidProjectNotebook.html
  2. PySparkOpioidProjectNotebook.ipynb
  3. .pdf (just in case)

4. SVI Data and Analysis

Jupyter notebook and an excel file.

Excel file contains pivot tables and comparisons: NewHampshireSVI_analysis.xlsx

How we generated plot/correlation matrix:

  1. NH_county_summary.html
  2. NH_county_summary.pdf (just in case)

5. QGIS files

  • zipped separately

If you are a journalist or researcher in New Hampshire or who wants to expand on this research more generally nation-wide, let us know and we can add more color on how to do this quickly and cheaply using Google Cloud Platform.

About

Using spark and other tools to analyze large, disparate data sources. Term Group Project for COMP119 Tufts F'19

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published