###### Team IPPS 

The API information can be accessed through this [link](https://data.cms.gov/Medicare-Inpatient/Inpatient-Prospective-Payment-System-IPPS-Provider/97k6-zzx3)

Objectives (This will expand from just data frame formation to include cleanup by Jessie):
1. Execute a successful pull of all of the data using the API provided into JSON
2. Transform the JSON format into a data frame


### Step 1: Installing sodapy, a python package, in jupyter

In [1]:
# Installing the package sodapy, important for retrieval using the API's detailed method
import sys
!{sys.executable} -m pip install sodapy

Collecting sodapy
  Downloading https://files.pythonhosted.org/packages/45/bb/ca05f9ec808ea57417fccdc3e2810ca4fd08d3f85b656b398b77105fb238/sodapy-1.4.7-py2.py3-none-any.whl
Collecting future==0.16.0 (from sodapy)
  Downloading https://files.pythonhosted.org/packages/00/2b/8d082ddfed935f3608cc61140df6dcbf0edea1bc3ab52fb6c29ae3e81e85/future-0.16.0.tar.gz (824kB)
Building wheels for collected packages: future
  Running setup.py bdist_wheel for future: started
  Running setup.py bdist_wheel for future: finished with status 'done'
  Stored in directory: C:\Users\Luffy4G\AppData\Local\pip\Cache\wheels\bf\c9\a3\c538d90ef17cf7823fa51fc701a7a7a910a80f6a405bf15b1a
Successfully built future
Installing collected packages: future, sodapy
Successfully installed future-0.16.0 sodapy-1.4.7


You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


### Step 2: Retrieving the dataset

In [1]:
import sys

In [39]:
# The API link
link = "https://data.cms.gov/resource/ehrv-m9r6.json"

# The app token
key = "oBbcgRhXZS4dqtTJVyz6zQujv"

# Code snippet for data retrieval using python, as provided by the API information page
import pandas as pd
from sodapy import Socrata

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cms.gov", key)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cms.gov,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("ehrv-m9r6", limit=163065)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

# Checking the head of the data frame
results_df.head(50)

Unnamed: 0,average_covered_charges,average_medicare_payments,average_medicare_payments_2,drg_definition,hospital_referral_region_description,provider_city,provider_id,provider_name,provider_state,provider_street_address,provider_zip_code,total_discharges
0,32963.07,5777.24,4763.73,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Dothan,DOTHAN,10001,SOUTHEAST ALABAMA MEDICAL CENTER,AL,1108 ROSS CLARK CIRCLE,36301,91
1,15131.85,5787.57,4976.71,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,BOAZ,10005,MARSHALL MEDICAL CENTER SOUTH,AL,2505 U S HIGHWAY 431 NORTH,35957,14
2,37560.37,5434.95,4453.79,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,FLORENCE,10006,ELIZA COFFEE MEMORIAL HOSPITAL,AL,205 MARENGO STREET,35631,24
3,13998.28,5417.56,4129.16,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,BIRMINGHAM,10011,ST VINCENT'S EAST,AL,50 MEDICAL PARK EAST DRIVE,35235,25
4,31633.27,5658.33,4851.44,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,ALABASTER,10016,SHELBY BAPTIST MEDICAL CENTER,AL,1000 FIRST STREET NORTH,35007,18
5,16920.79,6653.8,5374.14,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Montgomery,MONTGOMERY,10023,BAPTIST MEDICAL CENTER SOUTH,AL,2105 EAST SOUTH BOULEVARD,36116,67
6,11977.13,5834.74,4761.41,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,OPELIKA,10029,EAST ALABAMA MEDICAL CENTER AND SNF,AL,2000 PEPPERELL PARKWAY,36801,51
7,35841.09,8031.12,5858.5,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,BIRMINGHAM,10033,UNIVERSITY OF ALABAMA HOSPITAL,AL,619 SOUTH 19TH STREET,35233,32
8,28523.39,6113.38,5228.4,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Huntsville,HUNTSVILLE,10039,HUNTSVILLE HOSPITAL,AL,101 SIVLEY RD,35801,135
9,75233.38,5541.05,4386.94,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,AL - Birmingham,GADSDEN,10040,GADSDEN REGIONAL MEDICAL CENTER,AL,1007 GOODYEAR AVENUE,35903,34


In [37]:
resultsSVPC = results_df[["provider_state","provider_name"]]
resultsSVPC.nunique()

provider_state      51
provider_name     3201
dtype: int64

In [46]:
AK_State = resultsSVPC.loc[resultsSVPC["provider_state"] == "AK"]
AK_State_Sorted = AK_State.sort_values(by=["provider_name"])
AK_State_Sorted = AK_State_Sorted.drop_duplicates(subset="provider_name", keep="last")
AK_State_Sorted

Unnamed: 0,provider_state,provider_name
131261,AK,ALASKA NATIVE MEDICAL CENTER
18856,AK,ALASKA REGIONAL HOSPITAL
158158,AK,BARTLETT REGIONAL HOSPITAL
155136,AK,CENTRAL PENINSULA GENERAL HOSPITAL
115944,AK,FAIRBANKS MEMORIAL HOSPITAL
88472,AK,MAT-SU REGIONAL MEDICAL CENTER
101921,AK,MT EDGECUMBE HOSPITAL
77749,AK,PROVIDENCE ALASKA MEDICAL CENTER
124946,AK,YUKON KUSKOKWIM DELTA REG HOSPITAL


# 1) state vs provider count, we will find each state's total number of providers(hospitals)

In [52]:
resultsSVPC = results_df[["provider_state","provider_name"]].groupby(["provider_state"])
resultsGB_data = pd.DataFrame(resultsSVPC["provider_name"].nunique())
resultsGB_data.head()

Unnamed: 0_level_0,provider_name
provider_state,Unnamed: 1_level_1
AK,9
AL,93
AR,45
AZ,61
CA,295


In [53]:
resultsGB_data = resultsGB_data.sort_values(by="provider_name", ascending = False)
resultsGB_data.head()

Unnamed: 0_level_0,provider_name
provider_state,Unnamed: 1_level_1
TX,308
CA,295
FL,166
NY,161
PA,151


In [17]:
# resultsSVPC = results_df[["provider_state","provider_name"]]
# resultsSVPC = resultsSVPC.groupby("provider_state")
# resultsSVPC.head()

# 2) state vs total discharges, we will find how many procedures each state do

In [5]:
resultsGB = results_df.groupby("provider_state")

In [6]:
resultsGB

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001BB54B2BA90>

In [11]:
resultsGB_data = pd.DataFrame(resultsGB["drg_definition"].nunique())

In [12]:
resultsGB_data

Unnamed: 0_level_0,drg_definition
provider_state,Unnamed: 1_level_1
AK,78
AL,100
AR,100
AZ,100
CA,100
CO,100
CT,100
DC,100
DE,100
FL,100


In [14]:
resultGB = results_df.groupby(["provider_state","drg_definition"]).size().reset_index(name='counts')
resultGB = pd.DataFrame(resultGB)
resultGB.sort_values(["drg_definition", "counts"], ascending = [True, False])

Unnamed: 0,provider_state,drg_definition,counts
878,FL,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,81
4246,TX,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,78
378,CA,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,67
1361,IL,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,52
3447,OH,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,48
3347,NY,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,46
3746,PA,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,44
2160,MI,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,40
1461,IN,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,37
4146,TN,039 - EXTRACRANIAL PROCEDURES W/O CC/MCC,32
