# Purpose

The goal of this notebook is to perform analysis on the dataset(s) I've pulled down and wrangled in other notebooks and scripts in this repository. To start with, I'll be more deeply analyzing hospital-level data from the Centers for Medicare and Medicaid Services (CMS) (e.g. the amount of times patients are readmitted to the hospital closely following a previous inpatient stay). 

This notebook will cover the analysis for both Phases I and II of this project (please see the project's [README](README.md) for info on the Phases of this project).

# Background

CMS collects are sorts of information on US hospitals as part of its role as a federally-managed insurance organization. I was inspired to think about modeling hospital outcomes by the news in early 2019 that CMS had mandated that all US hospital chargemaster data must be published online in machine-readable format on hospital websites. These tables provide the pre-insurance-negotiation prices for all procedures and consumables in a gvien hospital.

Upon digging a bit into the chargemasters data I could find (and forking [a very helpful repo](https://github.com/vsoch/hospital-chargemaster) that had done a lot of the heavy lifting parsing many hospitals' chargemaster data), I determined that the cleaning process for those data (including discerning what procedures and consumables were equivalent items, due to the shorthand used to describe them in many instances) would be a bigger task, so  I split the work into two phases as described in the [README](README.md).

In [2]:
# Package import

#import autoreload
%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import numpy as np
import pandas as pd
import plotly.express as px
import geopandas as gpd

# Phase I Analysis

In this analysis, I'm going to be exploring the connection between various variables that I found particulary interesting (from a patient-centric quality of care perspective) in my initial wrangling. The questions that arise from these variables include:

1. What are the hallmarks of a hospital that sees a higher-than-average number of deaths among patients with serious treatable complications after surgery? 
2. What features dictate high rates of excess readmissions for heart attacks (AKA "acute myocardial infarction or AMI), the leading cause of death for Americans?
3. Do hospitals that incur greater amounts of Medicare spending per patient care event also tend to have high quality of care measures? Or perhaps is the reverse true, with more cost translating to poorer care (perhaps due to a lack of patients with private insurance to supplement Medicare, for example)? Or is the trend less obvious than either of these options suggest?
4. What are the common characteristics of hospitals wherein patients were most likely to say they were very likely to recommend the hospital to their friends and family?
    * I'm considering this the best "overall quality" measure of the hospital that takes into account many different patients' (hopefully honest) experiences.
    * One thing to note with this feature: like all of the survey response scores, it's highly positively biased (the midpoint score is somewhere around 90/100). Nothing that can't be fixed via feature scaling, but something good to note.
5. 

## Load Up the Data

I'll use the static data file I created in the Phase I notebook for now, but will write up a script for pulling down the latest API-based datasets, cleaning them, and merging them later.

In [None]:
# Load up our hospitals data


In [None]:
# Load up the US Census 2018 US states GeoJSON for contextual mapping
'US_states_Census_2018.json'