# Data Loading to BigQuery

This notebook covers the setup of taking all of the files that we've discovered around Price Transparency and loading them into a standard format. 

We first need to do one, and then we will do the rest. See https://github.com/pauldria/ncssm-2022-jterm-price-transparency#data for more.

In [1]:
from google.cloud import bigquery

import pandas as pd
import urllib
import datetime

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

ModuleNotFoundError: No module named 'google'

Let's look at the first one from the list, from Alamance. 

In [2]:
filepath = "https://www.conehealth.com/560529994_Alamance-Regional-Medical-Center-Inc_standardcharges.csv"

We have to decode it into `utf-8` since it was given to us as a binary file. Sigh. Try it without the `decode` portion to see what it's like otherwise.

In [3]:
with urllib.request.urlopen(filepath) as f:
    print(f.readline().decode("utf-8"))
    print(f.readline().decode("utf-8"))
    print(f.readline().decode("utf-8"))

﻿Alamance Regional Medical Center,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Updated 09/28/2021,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,



In [4]:
date_obtained = datetime.datetime.now().strftime("%Y-%m-%d")
date_provided = "2021-09-28"
name_of_hospital = "Alamance Regional Medical Center"

When we [read the file](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html), we force all columns types to be `object` (i.e. string) and have the default NA value to be an empty string. This is to prevent headaches later. 

We simply do not know the contents of this data, so this is a safe default option and we will explicitly cast and form the data later when we know more.

In [5]:
df = pd.read_csv(filepath, skiprows = 3, dtype = str, na_values = "")
df.columns = [x.strip() for x in df.columns]

In [6]:
df.fillna("", inplace = True)
for c in df.columns:
    df[c] = df[c].map(str.strip)

In [7]:
df.info(verbose = True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113385 entries, 0 to 113384
Data columns (total 115 columns):
 #    Column                                                                                 Dtype 
---   ------                                                                                 ----- 
 0    Common Billing Code (CPT/HCPCS/MS-DRG)                                                 object
 1    NDC (for medications/drugs)                                                            object
 2    Rev Code                                                                               object
 3    Procedure Description                                                                  object
 4    Gross Charge                                                                           object
 5    De-identified Min Inpatient Negotiated Rate Across All Payers                          object
 6    De-identified Max Inpatient Negotiated Rate Across All Payers                     

In [8]:
df.shape

(113385, 115)

In [9]:
df.head()

Unnamed: 0,Common Billing Code (CPT/HCPCS/MS-DRG),NDC (for medications/drugs),Rev Code,Procedure Description,Gross Charge,De-identified Min Inpatient Negotiated Rate Across All Payers,De-identified Max Inpatient Negotiated Rate Across All Payers,De-identified Min Outpatient Negotiated Rate Across All Payers,De-identified Max Outpatient Negotiated Rate Across All Payers,AETNA WHOLE HEALTH Negotiated Inpatient Rate,AETNA WHOLE HEALTH Negotiated Outpatient Rate,AETNA CAROLINA PERFERRED Negotiated Inpatient Rate,AETNA CAROLINA PREFERRED Negotiated Outpatient Rate,AETNA DESIGNATED PRODUCTS Negotiated Inpatient Rate,AETNA DESIGNATED PRODUCTS Negotiated Outpatient Rate,AETNA FIRST HEALTH Negotiated Inpatient Rate,AETNA FIRST HEALTH Negotiated Outpatient Rate,AETNA MEDICARE Negotiated Inpatient Rate,AETNA MEDICARE Negotiated Outpatient Rate,AMBETTER Negotiated Inpatient Rate,AMBETTER Negotiated Outpatient Rate,BEACON HEALTH VALUE OPTIONS Negotiated Inpatient Rate,BEACON HEALTH VALUE OPTIONS Negotiated Outpatient Rate,BEECH STREET NETWORK Negotiated Inpatient Rate,BEECH STREET NETWORK Negotiated Outpatient Rate,BLUE CROSS BLUE SHIELD (Excluding Market Place plans) Negotiated Inpatient Rate,BLUE CROSS BLUE SHIELD (Excluding Market Place Plans) Negotiated Outpatient Rate,BLUE CROSS BLUE SHIELD MEDICARE Negotiated Inpatient Rate,BLUE CROSS BLUE SHIELD MEDICARE Negotiated Outpatient Rate,BRIGHT HEALTH Negotiated Inpatient Rate,BRIGHT HEALTH Negotiated Outpatient Rate,CARDINAL INNOVATIONS Negotiated Inpatient Rate,CARDINAL INNOVATIONS Negotiated Outpatient Rate,CARE & CARE MEDICARE Negotiated Inpatient Rate,CARE & CARE MEDICARE Negotiated Outpatient Rate,CAROLINA BEHAVIORAL HEALTH Negotiated Inpatient Rate,CAROLINA BEHAVIORAL HEALTH Negotiated Outpatient Rate,CENTIVO (ALL PAYER PLANS) Negotiated Inpatient Rate,CENTIVO (ALL PAYER PLANS) Negotiated Outpatient Rate,CHOICE CARE (ALL PAYER PLANS) Negotiated Inpatient Rate,CHOICE CARE (ALL PAYER PLANS) Negotiated Outpatient Rate,CIGNA ALL PAYER PLANS Negotiated Inpatient Rate,CIGNA ALL PAYER PLANS Negotiated Outpatient Rate,CIGNA MEDICARE ADVANTAGE Negotiated Inpatient Rate,CIGNA MEDICARE ADVANTAGE Negotiated Outpatient Rate,COMPYSCH COPORATION Negotiated Inpatient Rate,COMPYSCH COPORATION Negotiated Outpatient Rate,FIDELIS Medicare Advantage Negotiated Inpatient Rate,FIDELIS Medicare Advantage Negotiated Outpatient Rate,GATEWAY HEALTH Negotiated Inpatient Rate,GATEWAY HEALTH Negotiated Outpatient Rate,HEALTHTEAM ADVANTAGE Negotiated Inpatient Rate,HEALTHTEAM ADVANTAGE Negotiated Outpatient Rate,HOSPICE -Authora CARE Negotiated Inpatient Rate,HOSPICE -Authora CARE Negotiated Outpatient Rate,HUMANA (ALL PAYER PLANS ) Negotiated Inpatient Rate,HUMANA (ALL PAYER PLANS ) Negotiated Outpatient Rate,HUMANA MEDICARE Negotiated Inpatient Rate,HUMANA MEDICARE Negotiated Outpatient Rate,LIBERTY ADVANTAGE MEDICARE Negotiated Inpatient Rate,LIBERTY ADVANTAGE MEDICARE Negotiated Outpatient Rate,LONGEVITY HEALTH Negotiated Inpatient Rate,LONGEVITY HEALTH Negotiated Outpatient Rate,MAGELLAN Negotiated Inpatient Rate,MAGELLAN Negotiated Outpatient Rate,MEDCOST NETWORK Negotiated Inpatient Rate,MEDCOST NETWORK Negotiated Outpatient Rate,MEDCOST ULTRA NETWORK Negotiated Inpatient Rate,MEDCOST ULTRA NETWORK Negotiated Outpatient Rate,MUTUAL OF OMAHA Special Risk Negotiated Inpatient Rate,MUTUAL OF OMAHA Special Risk Negotiated Outpatient Rate,NC DEPT OF PUBLIC SAFETYSTATE INMATES Negotiated Inpatient Rate,NC DEPT OF PUBLIC SAFETYSTATE INMATES Negotiated Outpatient Rate,NC MEDICAID PREPAID HEALTH PLAN HEALTHY BLUE Negotiated Inpatient Rate,NC MEDICAID PREPAID HEALTH PLAN HEALTHY BLUE Negotiated Outpatient Rat,NC MEDICAID PREPAID HEALTHPLAN UNITED HEALTHCARE COMMUNITY Negotiated Inpatient Rate,NC MEDICAID PREPAID HEALTHPLAN UNITED HEALTHCARE COMMUNITY Negotiated Outpatient Rate,NC MEDICAID PREPAID HEALTHPLAN WELLCARE Negotiated Inpatient Rate,NC MEDICAID PREPAID HEALTHPLAN WELLCARE Necotiated Outpatient Rate,NYSHIP EMPIRE Negotiated Inpatient Rate,NYSHIP EMPIRE Negotiated Outpatient Rate,ONENET Negotiated Inpatient Rate,ONENET Negotiated Outpatient Rate,OXFORD HEALTH Negotiated Inpatient Rate,OXFORD HEALTH Negotiated Outpatient Rate,PACE OF THE GUILFORD & ROCKINGHAM COUNTIES Negotiated Inpatient Rate,PACE OF THE GUILFORD & ROCKINGHAM COUNTIES Negotiated Outpatient Rate,PHCS MULTIPLAN NETWORK Negotiated Inpatient Rate,PHCS MULTIPLAN NETWORK Negotiated Outpatient Rate,PRIMARY PHYSICIAN CARE Negotiated Inpatient Rate,PRIMARY PHYSICIAN CARE Negotiated Outpatient Rate,PRISON HEALTH SERVICES INC Negotiated Inpatient Rate,PRISON HEALTH SERVICES INC Negotiated Outpatient Rate,PROVIDER PARTNERS OF NC Negotiated Inpatient Rate,PROVIDER PARTNERS OF NC Negotiated Outpatient Rate,PYRAMID TODAYS OPTIONS MEDICARE Negotiated Inpatient Rate,PYRAMID TODAYS OPTIONS MEDICARE Negotiated Outpatient Rate,SANDHILLS MEDICAID Negotiated Inpatient Rate,SANDHILLS MEDICAID Negotiated Outpatient Rate,TRICARE Negotiated Inpatient Rate,TRICARE Negotiated Outpatient Rate,UNICARE SECURITY CHOICE MEDICARE Negotiated Inpatient Rate,UNICARE SECURITY CHOICE MEDICARE Negotiated Outpatient Rate,UNITED HEALTHCARE (ALL PAYER PLANS) Negotiated Inpatient Rate,UNITED HEALTHCARE (ALL PAYER PLANS) Negotiated Outpatient Rate,UNITED HEALTHCARE MEDICARE Negotiated Inpatient Rate,UNITED HEALTHCARE MEDICARE Negotiated Outpatient Rate,US MARSHALL IN CUSTODY Negotiated Inpatient Rate,US MARSHALL IN CUSTODY Negotiated Outpatient Rate,VAYA MEDICAID LME Negotiated Inpatient Rate,VAYA MEDICAID LME Negotiated Outpatient Rate,VETERANS ADMINISTRATION Negotiated Inpatient Rate,VETERANS ADMINISTRATION Negotiated Outpatient Rate,VIRGINIA PREMIER MEDICARE ADVANTAGE Negotiated Inpatient Rate,VIRGINIA PREMIER MEDICARE ADVANTAGE Negotiated Outpatient Rate
0,MS001,,,Heart Transplant Or Implant Of Heart Assist Sy...,"$490,771.98","$9,416.22","$490,771.98",,,"$372,986.70",,"$348,448.11",,"$314,094.07",,"$357,282.00",,"$189,232.52",,"$416,311.54",,,,"$480,956.54",,"$365,625.13",,"$189,232.52",,"$388,133.68",,,,"$189,232.52",,,,"$368,078.99",,"$372,986.70",,"$391,030.20",,"$191,634.16",,,,"$191,634.16",,"$402,433.02",,"$191,634.16",,$937.49 per day,,"$372,986.70",,"$189,232.52",,"$189,232.52",,"$189,232.52",,,,"$366,704.82",,"$311,591.13",,"$353,355.83",,"$208,916.54",,"$329,644.82",,"$334,468.83",,"$331,252.83",,"$490,771.98",,"$490,771.98",,"$490,771.98",,"$189,232.52",,"$466,233.38",,"$402,433.02",,"$441,694.78",,"$189,232.52",,"$189,232.52",,,,"$9,416.22",,"$191,634.16",,"$490,771.98",,"$189,232.52",,"$189,232.52",,,,"$189,232.52",,"$189,232.52",
1,MS002,,,Heart Transplant Or Implant Of Heart Assist Sy...,"$540,615.96","$13,055.26","$529,803.64",,,"$410,868.13",,"$383,837.33",,"$345,994.21",,"$393,568.42",,"$125,032.52",,"$275,071.54",,,,"$529,803.64",,"$402,758.89",,"$125,032.52",,"$213,475.40",,,,"$125,032.52",,,,"$405,461.97",,"$410,868.13",,"$215,068.50",,"$125,032.52",,,,"$125,032.52",,"$443,305.09",,"$125,032.52",,$937.49 per day,,"$410,868.13",,"$125,032.52",,"$125,032.52",,"$125,032.52",,,,"$403,948.25",,"$343,237.07",,"$389,243.49",,"$113,404.93",,"$178,939.16",,"$181,557.75",,"$179,812.02",,"$337,450.44",,"$337,450.44",,"$337,450.44",,"$125,032.52",,"$513,585.16",,"$443,305.09",,"$486,554.36",,"$125,032.52",,"$125,032.52",,,,"$13,055.26",,"$125,032.52",,"$337,450.44",,"$125,032.52",,"$125,032.52",,,,"$125,032.52",,"$125,032.52",
2,MS003,,,Ecmo Or Tracheostomy With Mv >96 Hours Or Prin...,"$256,535.52","$8,918.55","$256,535.52",,,"$194,967.00",,"$182,140.22",,"$164,182.73",,"$186,757.86",,"$129,096.95",,"$256,535.52",,,,"$251,404.81",,"$191,118.96",,"$129,096.95",,"$254,482.08",,,,"$129,096.95",,,,"$192,401.64",,"$194,967.00",,"$203,800.36",,"$128,167.11",,,,"$128,167.11",,"$210,359.13",,"$127,561.95",,$937.49 per day,,"$194,967.00",,"$129,096.95",,"$129,096.95",,"$129,096.95",,,,"$191,683.34",,"$162,874.40",,"$184,705.57",,"$132,604.45",,"$209,233.66",,"$212,295.57",,"$210,254.29",,"$256,535.52",,"$256,535.52",,"$256,535.52",,"$129,096.95",,"$243,708.74",,"$210,359.13",,"$230,881.97",,"$129,096.95",,"$129,096.95",,,,"$8,918.55",,"$128,167.11",,"$256,535.52",,"$129,096.95",,"$129,096.95",,,,"$129,096.95",,"$129,096.95",
3,MS004,,,Tracheostomy With Mv >96 Hours Or Principal Di...,"$256,923.78","$6,167.31","$251,785.30",,,"$195,262.07",,"$182,415.88",,"$164,431.22",,"$187,040.51",,"$78,603.85",,"$172,928.47",,,,"$251,785.30",,"$191,408.22",,"$78,603.85",,"$158,950.80",,,,"$78,603.85",,,,"$192,692.84",,"$195,262.07",,"$160,137.00",,"$81,682.00",,,,"$81,682.00",,"$210,677.50",,"$81,303.59",,$937.49 per day,,"$195,262.07",,"$78,603.85",,"$78,603.85",,"$78,603.85",,,,"$191,973.45",,"$163,120.91",,"$184,985.12",,"$101,753.19",,"$160,554.12",,"$162,903.67",,"$161,337.31",,"$251,260.88",,"$251,260.88",,"$251,260.88",,"$78,603.85",,"$244,077.59",,"$210,677.50",,"$231,231.40",,"$78,603.85",,"$78,603.85",,,,"$6,167.31",,"$81,682.00",,"$251,260.88",,"$78,603.85",,"$78,603.85",,,,"$78,603.85",,"$78,603.85",
4,MS011,,,"Tracheostomy For Face, Mouth And Neck Diagnose...","$58,037.92","$8,155.74","$67,219.76",,,"$44,108.82",,"$41,206.92",,"$37,144.27",,"$42,251.61",,"$48,810.60",,"$58,037.92",,,,"$56,877.16",,"$43,238.25",,"$48,810.60",,"$67,219.76",,,,"$48,810.60",,,,"$43,528.44",,"$44,108.82",,"$58,037.92",,"$35,477.55",,,,"$35,477.55",,"$47,591.09",,"$35,317.94",,$937.49 per day,,"$44,108.82",,"$48,810.60",,"$48,810.60",,"$48,810.60",,,,"$43,365.93",,"$36,848.28",,"$41,787.30",,"$31,041.40",,"$48,979.55",,"$49,696.32",,"$49,218.47",,"$58,037.92",,"$58,037.92",,"$58,037.92",,"$48,810.60",,"$55,136.02",,"$47,591.09",,"$52,234.13",,"$48,810.60",,"$48,810.60",,,,"$8,155.74",,"$35,477.55",,"$58,037.92",,"$48,810.60",,"$48,810.60",,,,"$48,810.60",,"$48,810.60",


In [10]:
fixed_columns = df.columns[0:4]
melted_columns = df.columns[4:]
df_transformed = df.melt(id_vars = fixed_columns, value_vars = melted_columns, var_name = "payer", value_name = "cost")

In [11]:
df_transformed.shape

(12585735, 6)

In [13]:
df_transformed.columns = ["code", "ndc", "rev_code", "description", "payer", "cost"]

In [14]:
df_transformed.info(verbose = True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12585735 entries, 0 to 12585734
Data columns (total 6 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   code         object
 1   ndc          object
 2   rev_code     object
 3   description  object
 4   payer        object
 5   cost         object
dtypes: object(6)
memory usage: 576.1+ MB


## Now we are ready to load to BigQuery