# Goals Option 1

## How much are nearby Hospitals and ASCs paid for the same procedure?

Filters for Physician/Other Supplier
- Entity Type = O
- Provider Type = Ambulatory Surgical Center
- CPT = 43249

Filters for Hospital Outpatient
- APC = 5302

Compare results within a CBSA

# Stretch Goals

## Extend to other HCPCS/APCs
- Best place to start is to find an APC that is well-populated in the Hospital file
- Next find a well-populated HCPCS from the APC in the Physician/Other Supplier File

## Compare Doctor Utilization for a procedure
- Filters for Physician/Other Supplier
  - Entity Type = I
  - Place of Service = O
- HCPCS = 99213 (or any office visit – 99211-99215 / 99201-99205)
- How many different patients does each specialty typically see in a year? How does that vary by CBSA?

# To Do List

- Create full_name column for providers and outpatients(?)
- classification1 dataframe setup is broken, can't import it

In [1]:
# import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [2]:
# set up dataframes
providers_tsv = '../data/Medicare_Provider_Util_Payment_PUF_CY2017.tsv'
outpatient_excel = '../data/MUP_OHP_R19_P04_V10_D17_APC_Provider.xlsx'
classification1_csv = '../data/508-Compliant-Version-of-2020_january_web_addendum_b.12312019.csv'
classification2_excel = '../data/2020_january_web_addendum_b.12312019.xlsx'
cbsa_excel = '../data/ZIP_CBSA_032020.xlsx'

# NOTE: providers_csv is a TAB-DELIMITED file, Use sep='\t'
providers = pd.read_csv(providers_tsv, sep='\t', low_memory=False) 
outpatient = pd.read_excel(outpatient_excel, skiprows=5) 
# classification1 = pd.read_csv(classification1_csv, engine='python') 
# classification2 = pd.read_excel(classification2_excel) 
classifications = pd.read_excel(classification2_excel) 
cbsa = pd.read_excel(cbsa_excel) 

# NOTE: the first line in providers_tsv is copyright info
outpatient.head()

Unnamed: 0,Provider ID,Provider Name,Provider Street Address,Provider City,Provider\nState,Provider\nZip Code,Provider\nHospital Referral Region\n(HRR),APC,APC\nDescription,Beneficiaries,Comprehensive APC\nServices,Average\nEstimated\nTotal\nSubmitted\nCharges,Average\nMedicare\nAllowed\nAmount,Average\nMedicare\nPayment\nAmount,Outlier\nComprehensive\nAPC\nServices,Average\nMedicare\nOutlier\nAmount
0,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5072,Level 2 Excision/ Biopsy/ Incision and Drainage,249.0,259,9575.005714,1038.454672,826.277954,,
1,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5073,Level 3 Excision/ Biopsy/ Incision and Drainage,52.0,53,12578.280189,1792.596792,1423.246792,,
2,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5091,Level 1 Breast/Lymphatic Surgery and Related P...,26.0,27,11337.608889,2113.58,1683.99,0.0,0.0
3,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5092,Level 2 Breast/Lymphatic Surgery and Related P...,23.0,23,17116.158261,3737.14,2977.55,0.0,0.0
4,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5112,Level 2 Musculoskeletal Procedures,17.0,17,7382.732353,1029.46,820.21,0.0,0.0


In [3]:
# Inner join on HCPS code will eliminate any duplicates
# NOTE: Trailing spaces were found in column names, they're included below and will be removed later
# memory usage after join only increased by 0.5 MB

# classifications = pd.merge(left=classification1, right=classification2, 
#                            how='inner', 
#                            on=['HCPCS Code','Short Descriptor','SI','APC ',
#                                'Relative Weight','Payment Rate ','National Unadjusted Copayment ',
#                               'Minimum Unadjusted Copayment '])
# classifications.info()

# Format Dataframes
- clean up column names and make lowercase
- make provider names and cities lowercase as well

In [4]:
# column names contain newline characters, remove them
# also convert spaces to underscores and make lowercase
outpatient.columns = outpatient.columns.str.replace('\n', '_').str.replace(' ', '_').str.lower()
providers.columns = providers.columns.str.replace(' ', '_').str.lower()
classifications.columns = classifications.columns.str.replace(' ', '_').str.lower()
cbsa.columns = cbsa.columns.str.replace(' ', '_').str.lower()

providers.head()

Unnamed: 0,npi,nppes_provider_last_org_name,nppes_provider_first_name,nppes_provider_mi,nppes_credentials,nppes_provider_gender,nppes_entity_code,nppes_provider_street1,nppes_provider_street2,nppes_provider_city,...,hcpcs_code,hcpcs_description,hcpcs_drug_indicator,line_srvc_cnt,bene_unique_cnt,bene_day_srvc_cnt,average_medicare_allowed_amt,average_submitted_chrg_amt,average_medicare_payment_amt,average_medicare_standard_amt
0,1,CPT copyright 2016 American Medical Associatio...,,,,,,,,,...,,,,,,,,,,
1,1003000126,ENKESHAFI,ARDALAN,,M.D.,M,I,900 SETON DR,,CUMBERLAND,...,99217.0,Hospital observation care discharge,N,100.0,96.0,100.0,73.3988,325.78,56.8272,57.4924
2,1003000126,ENKESHAFI,ARDALAN,,M.D.,M,I,900 SETON DR,,CUMBERLAND,...,99218.0,Hospital observation care typically 30 minutes,N,26.0,25.0,26.0,100.08,449.0,78.46,79.306154
3,1003000126,ENKESHAFI,ARDALAN,,M.D.,M,I,900 SETON DR,,CUMBERLAND,...,99219.0,Hospital observation care typically 50 minutes,N,52.0,51.0,52.0,136.38,614.0,102.807692,103.895385
4,1003000126,ENKESHAFI,ARDALAN,,M.D.,M,I,900 SETON DR,,CUMBERLAND,...,99220.0,Hospital observation care typically 70 minutes...,N,59.0,59.0,59.0,190.363729,755.932203,141.293559,142.865763


In [5]:
# make provider names and cities lowercase as well
providers['nppes_provider_last_org_name'] = providers['nppes_provider_last_org_name'].str.capitalize()
providers['nppes_provider_first_name'] = providers['nppes_provider_first_name'].str.capitalize()
providers['nppes_provider_city'] = providers['nppes_provider_city'].str.capitalize()


providers.head()



Unnamed: 0,npi,nppes_provider_last_org_name,nppes_provider_first_name,nppes_provider_mi,nppes_credentials,nppes_provider_gender,nppes_entity_code,nppes_provider_street1,nppes_provider_street2,nppes_provider_city,...,hcpcs_code,hcpcs_description,hcpcs_drug_indicator,line_srvc_cnt,bene_unique_cnt,bene_day_srvc_cnt,average_medicare_allowed_amt,average_submitted_chrg_amt,average_medicare_payment_amt,average_medicare_standard_amt
0,1,Cpt copyright 2016 american medical associatio...,,,,,,,,,...,,,,,,,,,,
1,1003000126,Enkeshafi,Ardalan,,M.D.,M,I,900 SETON DR,,Cumberland,...,99217.0,Hospital observation care discharge,N,100.0,96.0,100.0,73.3988,325.78,56.8272,57.4924
2,1003000126,Enkeshafi,Ardalan,,M.D.,M,I,900 SETON DR,,Cumberland,...,99218.0,Hospital observation care typically 30 minutes,N,26.0,25.0,26.0,100.08,449.0,78.46,79.306154
3,1003000126,Enkeshafi,Ardalan,,M.D.,M,I,900 SETON DR,,Cumberland,...,99219.0,Hospital observation care typically 50 minutes,N,52.0,51.0,52.0,136.38,614.0,102.807692,103.895385
4,1003000126,Enkeshafi,Ardalan,,M.D.,M,I,900 SETON DR,,Cumberland,...,99220.0,Hospital observation care typically 70 minutes...,N,59.0,59.0,59.0,190.363729,755.932203,141.293559,142.865763


In [6]:
outpatient.head()

Unnamed: 0,provider_id,provider_name,provider_street_address,provider_city,provider_state,provider_zip_code,provider_hospital_referral_region_(hrr),apc,apc_description,beneficiaries,comprehensive_apc_services,average_estimated_total_submitted_charges,average_medicare_allowed_amount,average_medicare_payment_amount,outlier_comprehensive_apc_services,average_medicare_outlier_amount
0,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5072,Level 2 Excision/ Biopsy/ Incision and Drainage,249.0,259,9575.005714,1038.454672,826.277954,,
1,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5073,Level 3 Excision/ Biopsy/ Incision and Drainage,52.0,53,12578.280189,1792.596792,1423.246792,,
2,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5091,Level 1 Breast/Lymphatic Surgery and Related P...,26.0,27,11337.608889,2113.58,1683.99,0.0,0.0
3,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5092,Level 2 Breast/Lymphatic Surgery and Related P...,23.0,23,17116.158261,3737.14,2977.55,0.0,0.0
4,10001,Southeast Alabama Medical Center,1108 Ross Clark Circle,Dothan,AL,36301,AL - Dothan,5112,Level 2 Musculoskeletal Procedures,17.0,17,7382.732353,1029.46,820.21,0.0,0.0


# Join Zip to CBSA table with both providers AND outpatient
- Note: when I convert datatype here, it breaks due to Canadian postal codes. Move this section to after non-Tenn. data has been filtered out

In [7]:
# providers['nppes_provider_zip'] = pd.to_numeric(providers['nppes_provider_zip'])

# providers.info()


# Create Full_Name Column in Providers

## This needs to be added before the individual dataframes are set up
- IF Entity_Type = O THEN Last_Name
- IF Entity_Type = I THEN First_Name + “ “ + Middle Initial + “ “ + Last_Name + “, “ + Credentials


In [8]:
# create full_name column for providers
# df['Price'] = [1500 if x =='Music' else 800 for x in df['Event']] 


# if x > 10:
#   print("Above ten,")
#   if x > 20:
#     print("and also above 20!")
#   else:
#     print("but not above 20.")

    
# if providers['nppes_entity_code'] == 'O':
#     providers['full_name'] = providers['nppes_provider_last_org_name']
#     if providers['nppes_entity_code'] == 'I':
#         providers['full_name'] = providers['nppes_provider_first_name'] + ' '
#         + providers['nppes_provider_mi'] + ' ' + providers['nppes_provider_last_org_name']
#     else:
#         providers['full_name'] = NULL
        


# providers['full_name'] = [nppes_provider_last_org_name 
#                            if x == 'O'
# #                            else nppes_provider_first_name + ' ' 
# #                               + nppes_provider_mi + ' ' 
# #                               + nppes_provider_last_org_name
# #                            for x in providers['nppes_entity_code']
#                            else providers['nppes_provider_first_name'] + ' ' 
#                               + providers['nppes_provider_mi'] + ' ' 
#                               + providers['nppes_provider_last_org_name']
#                            for x in providers['nppes_entity_code']
#                           ]
# providers.head()

# Set up City Dataframes

We're examining 3 cities: Nashville, Los Angeles, New York City. Starting with Nashville, we set up dataframes and take a look at the procedures done there.

In [9]:
# set up Nashville Outpatient Dataframe
tennessee_outpatient = outpatient[outpatient ['provider_state'] == 'TN'] 
nashville_outpatient = tennessee_outpatient[tennessee_outpatient ['provider_city'] == 'Nashville'] 
nashville_outpatient

Unnamed: 0,provider_id,provider_name,provider_street_address,provider_city,provider_state,provider_zip_code,provider_hospital_referral_region_(hrr),apc,apc_description,beneficiaries,comprehensive_apc_services,average_estimated_total_submitted_charges,average_medicare_allowed_amount,average_medicare_payment_amount,outlier_comprehensive_apc_services,average_medicare_outlier_amount
49924,440006,Tristar Skyline Medical Center,3441 Dickerson Pike,Nashville,TN,37207,TN - Nashville,5072,Level 2 Excision/ Biopsy/ Incision and Drainage,77.0,82,16658.510122,1135.492683,898.364268,0.0,0.0
49925,440006,Tristar Skyline Medical Center,3441 Dickerson Pike,Nashville,TN,37207,TN - Nashville,5073,Level 3 Excision/ Biopsy/ Incision and Drainage,26.0,26,25117.071923,2010.120000,1601.550000,0.0,0.0
49926,440006,Tristar Skyline Medical Center,3441 Dickerson Pike,Nashville,TN,37207,TN - Nashville,5091,Level 1 Breast/Lymphatic Surgery and Related P...,19.0,20,28844.444500,2244.898500,1769.688500,0.0,0.0
49927,440006,Tristar Skyline Medical Center,3441 Dickerson Pike,Nashville,TN,37207,TN - Nashville,5092,Level 2 Breast/Lymphatic Surgery and Related P...,11.0,12,39405.177500,4134.010000,3293.760000,0.0,0.0
49928,440006,Tristar Skyline Medical Center,3441 Dickerson Pike,Nashville,TN,37207,TN - Nashville,5112,Level 2 Musculoskeletal Procedures,18.0,19,14518.818421,1061.058947,835.680526,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51293,440197,Tristar Southern Hills Medical Center,391 Wallace Rd,Nashville,TN,37211,TN - Nashville,5431,Level 1 Nerve Procedures,17.0,19,24275.466316,1401.114211,1103.864211,0.0,0.0
51294,440197,Tristar Southern Hills Medical Center,391 Wallace Rd,Nashville,TN,37211,TN - Nashville,8011,Comprehensive Observation Services,118.0,121,18325.429752,1983.275785,1559.381405,0.0,0.0
51296,440218,Saint Thomas Hospital For Specialty Surgery,2011 Murphy Avenue,Nashville,TN,37203,TN - Nashville,5114,Level 4 Musculoskeletal Procedures,325.0,334,22372.576766,4804.725539,3811.837395,0.0,0.0
51297,440218,Saint Thomas Hospital For Specialty Surgery,2011 Murphy Avenue,Nashville,TN,37203,TN - Nashville,5115,Level 5 Musculoskeletal Procedures,13.0,13,50129.440769,6003.778462,4687.778462,0.0,0.0


In [10]:
# set up Nashville Providers Dataframe
tennessee_providers = providers[providers ['nppes_provider_state'] == 'TN'] 
nashville_providers = tennessee_providers[tennessee_providers ['nppes_provider_city'] == 'Nashville'] 
nashville_providers

Unnamed: 0,npi,nppes_provider_last_org_name,nppes_provider_first_name,nppes_provider_mi,nppes_credentials,nppes_provider_gender,nppes_entity_code,nppes_provider_street1,nppes_provider_street2,nppes_provider_city,...,hcpcs_code,hcpcs_description,hcpcs_drug_indicator,line_srvc_cnt,bene_unique_cnt,bene_day_srvc_cnt,average_medicare_allowed_amt,average_submitted_chrg_amt,average_medicare_payment_amt,average_medicare_standard_amt
2396,1003013947,Otto,Robert,J,M.D.,M,I,2400 PATTERSON STREET SUITE 100,,Nashville,...,20610,Aspiration and/or injection of large joint or ...,N,61.0,37.0,60.0,64.691967,268.032787,47.191311,54.860164
2397,1003013947,Otto,Robert,J,M.D.,M,I,2400 PATTERSON STREET SUITE 100,,Nashville,...,27447,Repair of knee joint,N,26.0,23.0,26.0,1307.530385,4319.000000,1021.338846,1142.162692
2398,1003013947,Otto,Robert,J,M.D.,M,I,2400 PATTERSON STREET SUITE 100,,Nashville,...,72170,"X-ray of pelvis, 1 or 2 views",N,65.0,65.0,65.0,29.410000,78.000000,21.150000,25.368308
2399,1003013947,Otto,Robert,J,M.D.,M,I,2400 PATTERSON STREET SUITE 100,,Nashville,...,73502,"X-ray of hip with pelvis, 2-3 views",N,65.0,39.0,65.0,38.310000,114.000000,28.781692,31.684308
2400,1003013947,Otto,Robert,J,M.D.,M,I,2400 PATTERSON STREET SUITE 100,,Nashville,...,73562,"X-ray of knee, 3 views",N,43.0,36.0,43.0,35.302326,107.441860,25.268372,28.054884
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9845871,1992985949,Beck,Robert,L,M.D,M,I,2400 PATTERSON ST,SUITE 400,Nashville,...,G0008,Administration of influenza virus vaccine,N,52.0,52.0,52.0,23.730000,67.000000,23.260000,25.320000
9845872,1992985949,Beck,Robert,L,M.D,M,I,2400 PATTERSON ST,SUITE 400,Nashville,...,G0439,"Annual wellness visit, includes a personalized...",N,91.0,91.0,91.0,109.980000,312.000000,107.780000,115.360000
9846017,1992987085,Horton,Susanne,M,CRNA,F,I,1211 MEDICAL CENTER DR,,Nashville,...,00320,Anesthesia for procedure on esophagus and neck...,N,17.0,16.0,17.0,186.568824,2194.941176,146.270588,153.111765
9846018,1992987085,Horton,Susanne,M,CRNA,F,I,1211 MEDICAL CENTER DR,,Nashville,...,00740,Anesthesia for procedure on gastrointestinal t...,N,61.0,58.0,60.0,76.923443,917.975410,60.307377,63.126557


In [11]:
# Set up New York City and Los Angeles dataframes
ny_outpatient = outpatient[outpatient ['provider_state'] == 'NY'] 
nyc_outpatient = ny_outpatient[ny_outpatient ['provider_city'] == 'New York'] 
nyc_outpatient

ca_outpatient = outpatient[outpatient ['provider_state'] == 'CA'] 
la_outpatient = ca_outpatient[ca_outpatient ['provider_city'] == 'Los Angeles'] 
la_outpatient

ny_providers = providers[providers ['nppes_provider_state'] == 'NY'] 
nyc_providers = ny_providers[ny_providers ['nppes_provider_city'] == 'New York'] 
nyc_providers

ca_providers = providers[providers ['nppes_provider_state'] == 'CA'] 
la_providers = ca_providers[ca_providers ['nppes_provider_city'] == 'Los Angeles'] 
la_providers

Unnamed: 0,npi,nppes_provider_last_org_name,nppes_provider_first_name,nppes_provider_mi,nppes_credentials,nppes_provider_gender,nppes_entity_code,nppes_provider_street1,nppes_provider_street2,nppes_provider_city,...,hcpcs_code,hcpcs_description,hcpcs_drug_indicator,line_srvc_cnt,bene_unique_cnt,bene_day_srvc_cnt,average_medicare_allowed_amt,average_submitted_chrg_amt,average_medicare_payment_amt,average_medicare_standard_amt


# Filter dataframes

In [12]:
## Filter outpatient
# APC = 5302

# individual statements for filters
tennessee_outpatient = tennessee_outpatient.loc[
    tennessee_outpatient['apc'] == 5302
]

nashville_outpatient = nashville_outpatient.loc[
    nashville_outpatient['apc'] == 5302
]

nyc_outpatient = nyc_outpatient.loc[
    nyc_outpatient['apc'] == 5302
]

la_outpatient = la_outpatient.loc[
    la_outpatient['apc'] == 5302
]

# nyc_outpatient.head()

In [13]:
## Filter providers
# SELECT entity type = 'O'
# Provider Type = Ambulatory Surgical Center
# CPT/HCPCS = 43249
# DEBUGDEBUG: I'm getting zero rows when I add the hcpcs filter <-------------------------------------


# individual statements for filters
tennessee_providers = tennessee_providers.loc[
    (tennessee_providers['nppes_entity_code'] == 'O') &
    (tennessee_providers['provider_type'] == 'Ambulatory Surgical Center')
   & (tennessee_providers['hcpcs_code'] == '43249')
]

nashville_providers = nashville_providers.loc[
    (nashville_providers['nppes_entity_code'] == 'O') &
    (nashville_providers['provider_type'] == 'Ambulatory Surgical Center')
   & (nashville_providers['hcpcs_code'] == '43249')
]

nyc_providers = nyc_providers.loc[
    (nyc_providers['nppes_entity_code'] == 'O') &
    (nyc_providers['provider_type'] == 'Ambulatory Surgical Center')
    & (nyc_providers['hcpcs_code'] == '43249')
]

la_providers = la_providers.loc[
    (la_providers['nppes_entity_code'] == 'O') &
    (la_providers['provider_type'] == 'Ambulatory Surgical Center')
   & (la_providers['hcpcs_code'] == '43249')
]

tennessee_providers.head()

Unnamed: 0,npi,nppes_provider_last_org_name,nppes_provider_first_name,nppes_provider_mi,nppes_credentials,nppes_provider_gender,nppes_entity_code,nppes_provider_street1,nppes_provider_street2,nppes_provider_city,...,hcpcs_code,hcpcs_description,hcpcs_drug_indicator,line_srvc_cnt,bene_unique_cnt,bene_day_srvc_cnt,average_medicare_allowed_amt,average_submitted_chrg_amt,average_medicare_payment_amt,average_medicare_standard_amt
990426,1104055425,Wartburg surgery center llc,,,,,O,1236 KNOXVILLE HWY,,Wartburg,...,43249,Balloon dilation of esophagus using an endoscope,N,41.0,36.0,41.0,512.0,2491.5,401.404878,465.44439
1272943,1124498357,Endoscopy center of inland empire llc,,,,,O,401 COMMERCE ST,SUITE 600,Nashville,...,43249,Balloon dilation of esophagus using an endoscope,N,16.0,16.0,16.0,660.99,2200.0,512.294375,484.535
2185064,1225039761,Endoscopy center of st thomas lp,,,,,O,4230 HARDING RD,SUITE 400,Nashville,...,43249,Balloon dilation of esophagus using an endoscope,N,17.0,16.0,17.0,565.636471,1500.0,435.024706,468.650588
2881267,1295708964,Columbia tn endoscopy asc llc,,,,,O,1510 1/2 HATCHER LN,,Columbia,...,43249,Balloon dilation of esophagus using an endoscope,N,34.0,29.0,34.0,587.11,2427.0,454.431471,471.222941
3187896,1326054172,Fort sanders west op surgery center,,,,,O,210 FORT SANDERS WEST BLVD,,Knoxville,...,43249,Balloon dilation of esophagus using an endoscope,N,25.0,22.0,25.0,524.8,1151.5,398.998,480.5668


# Add CBSA info

In [14]:
# convert zip code to numeric fields
tennessee_providers['nppes_provider_zip'] = pd.to_numeric(tennessee_providers['nppes_provider_zip'])
tennessee_outpatient['provider_zip_code'] = pd.to_numeric(tennessee_outpatient['provider_zip_code'])

tennessee_providers = pd.merge(left=tennessee_providers, right=cbsa, left_on='nppes_provider_zip', right_on='zip', how='outer')
tennessee_outpatient = pd.merge(left=tennessee_outpatient, right=cbsa, left_on='provider_zip_code', right_on='zip', how='outer')

tennessee_outpatient.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47417 entries, 0 to 47416
Data columns (total 22 columns):
provider_id                                  79 non-null float64
provider_name                                79 non-null object
provider_street_address                      79 non-null object
provider_city                                79 non-null object
provider_state                               79 non-null object
provider_zip_code                            79 non-null float64
provider_hospital_referral_region_(hrr)      79 non-null object
apc                                          79 non-null float64
apc_description                              79 non-null object
beneficiaries                                77 non-null float64
comprehensive_apc_services                   79 non-null float64
average_estimated_total_submitted_charges    79 non-null float64
average_medicare_allowed_amount              79 non-null float64
average_medicare_payment_amount              79 non-

In [15]:
tennessee_providers.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 47425 entries, 0 to 47424
Data columns (total 32 columns):
npi                                 10 non-null float64
nppes_provider_last_org_name        10 non-null object
nppes_provider_first_name           0 non-null object
nppes_provider_mi                   0 non-null object
nppes_credentials                   0 non-null object
nppes_provider_gender               0 non-null object
nppes_entity_code                   10 non-null object
nppes_provider_street1              10 non-null object
nppes_provider_street2              2 non-null object
nppes_provider_city                 10 non-null object
nppes_provider_zip                  10 non-null float64
nppes_provider_state                10 non-null object
nppes_provider_country              10 non-null object
provider_type                       10 non-null object
medicare_participation_indicator    10 non-null object
place_of_service                    10 non-null object
hcpcs_code    

# These Dataframes are Loaded and Ready for Use

## Outpatients
- tennessee_outpatient
- nashville_outpatient
- nyc_outpatient
- la_outpatient

## Providers
- tennessee_providers
- nashville_providers
- nyc_providers
- la_providers

# Next Steps
If no more processing, time to save these in new files for analysis
- tennessee_outpatient_clean
- nashville_outpatient_clean
- nyc_outpatient_clean
- la_outpatient_clean
- nashville_providers_clean
- nyc_providers_clean
- la_providers_clean


In [16]:
# save clean dataframes
tennessee_outpatient.to_csv('../data/tennessee_outpatient_clean.csv')
nashville_outpatient.to_csv('../data/nashville_outpatient_clean.csv')
tennessee_providers.to_csv('../data/tennessee_providers_clean.csv')
nashville_providers.to_csv('../data/nashville_providers_clean.csv')

# nyc_outpatient.to_csv('../data/nyc_outpatient_clean.csv')
# la_outpatient.to_csv('../data/la_outpatient_clean.csv')
# nyc_providers.to_csv('../data/nyc_providers_clean.csv')
# la_providers.to_csv('../data/la_providers_clean.csv')


In [17]:
nashville_outpatient

Unnamed: 0,provider_id,provider_name,provider_street_address,provider_city,provider_state,provider_zip_code,provider_hospital_referral_region_(hrr),apc,apc_description,beneficiaries,comprehensive_apc_services,average_estimated_total_submitted_charges,average_medicare_allowed_amount,average_medicare_payment_amount,outlier_comprehensive_apc_services,average_medicare_outlier_amount
49935,440006,Tristar Skyline Medical Center,3441 Dickerson Pike,Nashville,TN,37207,TN - Nashville,5302,Level 2 Upper GI Procedures,13.0,15,14899.436667,1248.62,994.84,0.0,0.0
50269,440039,Vanderbilt University Medical Center,1211 Medical Center Drive,Nashville,TN,37232,TN - Nashville,5302,Level 2 Upper GI Procedures,477.0,623,7120.824045,1205.620546,958.820642,,
50660,440082,Saint Thomas West Hospital,"4220 Harding Rd, Po Box 380",Nashville,TN,37205,TN - Nashville,5302,Level 2 Upper GI Procedures,189.0,207,6350.16401,1224.590048,970.810048,0.0,0.0
50790,440111,Metro Nashville General Hospital,1818 Albion Street,Nashville,TN,37208,TN - Nashville,5302,Level 2 Upper GI Procedures,11.0,12,4488.7875,1248.62,994.84,0.0,0.0
50923,440133,Saint Thomas Midtown Hospital,2000 Church St,Nashville,TN,37236,TN - Nashville,5302,Level 2 Upper GI Procedures,76.0,98,8420.009184,1238.468571,984.688571,,
51056,440161,Tristar Centennial Medical Center,2300 Patterson Street,Nashville,TN,37203,TN - Nashville,5302,Level 2 Upper GI Procedures,151.0,170,11861.729353,1207.656,953.876,0.0,0.0
51288,440197,Tristar Southern Hills Medical Center,391 Wallace Rd,Nashville,TN,37211,TN - Nashville,5302,Level 2 Upper GI Procedures,44.0,49,20273.752041,1228.336327,973.597143,0.0,0.0


In [18]:
cbsa

Unnamed: 0,zip,cbsa,res_ratio,bus_ratio,oth_ratio,tot_ratio
0,501,35620,0.0,1.0,0.0,1.0
1,601,38660,1.0,1.0,1.0,1.0
2,602,10380,1.0,1.0,1.0,1.0
3,603,10380,1.0,1.0,1.0,1.0
4,604,10380,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...
47410,99925,99999,0.0,0.0,1.0,1.0
47411,99926,99999,0.0,0.0,1.0,1.0
47412,99927,99999,0.0,0.0,1.0,1.0
47413,99928,28540,0.0,0.0,1.0,1.0
