# What's a custom pivot?

In this example, I know what columns I want to have in the pivot, but I don't know if they exist, or if extra columns might exist that I want to simply ignore.

Back to our complications data...

Suppose that we want a column for the following measure IDs:
* PSI_90_SAFETY
* PSI_13_POST_SEPSIS
* PSI_08_HIP_FRAC


In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('/data/complications.csv')

In [3]:
data.columns

Index(['provider_id', 'hospital_name', 'address', 'city', 'state', 'zip_code',
       'county_name', 'phone_number', 'measure_id', 'measure_name',
       'compared_to_national', 'denominator', 'score', 'lower_estimate',
       'higher_estimate', 'footnote', 'measure_start_date',
       'measure_end_date'],
      dtype='object')

In [4]:
data.head()

Unnamed: 0,provider_id,hospital_name,address,city,state,zip_code,county_name,phone_number,measure_id,measure_name,compared_to_national,denominator,score,lower_estimate,higher_estimate,footnote,measure_start_date,measure_end_date
0,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,COMP_HIP_KNEE,Rate of complications for hip/knee replacement...,No Different Than the National Rate,292,3.2,2.1,4.8,,2015-04-01T00:00:00.000,2018-03-31T00:00:00.000
1,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_AMI,Death rate for heart attack patients,No Different Than the National Rate,688,13.0,11.0,15.5,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000
2,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_CABG,Death rate for CABG surgery patients,No Different Than the National Rate,291,4.3,2.6,6.8,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000
3,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_COPD,Death rate for COPD patients,No Different Than the National Rate,411,8.8,6.7,11.4,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000
4,10001,SOUTHEAST ALABAMA MEDICAL CENTER,1108 ROSS CLARK CIRCLE,DOTHAN,AL,36301,HOUSTON,(334) 793-8701,MORT_30_HF,Death rate for heart failure patients,No Different Than the National Rate,869,12.7,10.7,15.0,,2015-07-01T00:00:00.000,2018-06-30T00:00:00.000


In [5]:
data['measure_id'].unique()

array(['COMP_HIP_KNEE', 'MORT_30_AMI', 'MORT_30_CABG', 'MORT_30_COPD',
       'MORT_30_HF', 'MORT_30_PN', 'MORT_30_STK', 'PSI_10_POST_KIDNEY',
       'PSI_11_POST_RESP', 'PSI_12_POSTOP_PULMEMB_DVT',
       'PSI_13_POST_SEPSIS', 'PSI_14_POSTOP_DEHIS', 'PSI_15_ACC_LAC',
       'PSI_3_ULCER', 'PSI_4_SURG_COMP', 'PSI_6_IAT_PTX',
       'PSI_8_POST_HIP', 'PSI_90_SAFETY', 'PSI_9_POST_HEM'], dtype=object)

**Strategy**

Retreive each subset of rows that we want and put them into separate data frames.

Then merge those data frames together using the key values we want as our new rows.

In [6]:
data.columns

Index(['provider_id', 'hospital_name', 'address', 'city', 'state', 'zip_code',
       'county_name', 'phone_number', 'measure_id', 'measure_name',
       'compared_to_national', 'denominator', 'score', 'lower_estimate',
       'higher_estimate', 'footnote', 'measure_start_date',
       'measure_end_date'],
      dtype='object')

In [7]:
psi_90 = data[data['measure_id'] == 'PSI_90_SAFETY'][['provider_id','denominator','score']]
psi_13 = data[data['measure_id'] == 'PSI_13_POST_SEPSIS'][['provider_id','denominator']]
psi_08 = data[data['measure_id'] == 'PSI_08_HIP_FRAC'][['provider_id','denominator']]

In [8]:
psi_90.head()

Unnamed: 0,provider_id,denominator,score
17,10001,Not Available,0.99
36,10005,Not Available,1.01
55,10006,Not Available,1.02
74,10007,Not Available,0.98
93,10008,Not Available,0.99


In [9]:
psi_13.head()

Unnamed: 0,provider_id,denominator
10,10001,1607
29,10005,641
48,10006,551
67,10007,41
86,10008,Not Available


In [10]:
psi_08.head()

Unnamed: 0,provider_id,denominator


In [11]:
psis = psi_90.\
       merge(psi_13, how='outer', on=['provider_id']).\
       merge(psi_08, how='outer', on=['provider_id'])


In [12]:
psis.head()

Unnamed: 0,provider_id,denominator_x,score,denominator_y,denominator
0,10001,Not Available,0.99,1607,
1,10005,Not Available,1.01,641,
2,10006,Not Available,1.02,551,
3,10007,Not Available,0.98,41,
4,10008,Not Available,0.99,Not Available,


In [13]:
psis.columns = ['provider_id','PSI_90 Den','PSI_90 Score','PSI_13','PSI_08']

In [14]:
psis.head()

Unnamed: 0,provider_id,PSI_90 Den,PSI_90 Score,PSI_13,PSI_08
0,10001,Not Available,0.99,1607,
1,10005,Not Available,1.01,641,
2,10006,Not Available,1.02,551,
3,10007,Not Available,0.98,41,
4,10008,Not Available,0.99,Not Available,
