# Transforming Data
There are 100 values of DRG Definition. Construct 100 DRG Charges features, one for
each unique value of DRG Definition. The feature should record the Average Covered
Charges for the specified DRG category. Then construct a transformed version of the data
that only includes the provider id, provider state, and the 100 new DRG Charges features.
For example, the data should look like the format in the table below. Make sure to include
missing values for any provider that doesn’t have a charge for a specific DRG.

In [24]:
import pandas as pd

df = pd.read_csv('data.csv')
df[' Average Covered Charges '] = df[' Average Covered Charges '].str[1:].astype(float)
pv = df.pivot(index=['Provider Id', 'Provider State'], 
              values=' Average Covered Charges ', 
              columns='DRG Definition')
print(pv)

DRG Definition              039 - EXTRACRANIAL PROCEDURES W/O CC/MCC  \
Provider Id Provider State                                             
10001       AL                                              32963.07   
10005       AL                                              15131.85   
10006       AL                                              37560.37   
10007       AL                                                   NaN   
10008       AL                                                   NaN   
...                                                              ...   
670072      TX                                                   NaN   
670073      TX                                                   NaN   
670075      TX                                                   NaN   
670076      TX                                                   NaN   
670077      TX                                                   NaN   

DRG Definition              057 - DEGENERATIVE NERVOUS SYSTEM D

# Correlation

1. Find the Highest and lowest Correlations

In [29]:
#pv = pv.fillna(value=0.0)

def get_redundant_pairs(df):
    pairs_to_drop = set()
    cols = df.columns
    for i in range(0, df.shape[1]):
        for j in range(0, i+1):
            pairs_to_drop.add((cols[i], cols[j]))
    return pairs_to_drop

def get_top_abs_correlations(df, n=2):
    au_corr = df.corr().abs().unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=False)
    return au_corr[0:n]

def get_bottom_abs_correlations(df, n=2):
    au_corr = df.corr().abs().unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=True)
    return au_corr[0:n]

print(get_top_abs_correlations(pv))
print(get_bottom_abs_correlations(pv))

DRG Definition                                        DRG Definition                                            
481 - HIP & FEMUR PROCEDURES EXCEPT MAJOR JOINT W CC  482 - HIP & FEMUR PROCEDURES EXCEPT MAJOR JOINT W/O CC/MCC    0.968424
191 - CHRONIC OBSTRUCTIVE PULMONARY DISEASE W CC      192 - CHRONIC OBSTRUCTIVE PULMONARY DISEASE W/O CC/MCC        0.965424
dtype: float64
DRG Definition                               DRG Definition 
473 - CERVICAL SPINAL FUSION W/O CC/MCC      885 - PSYCHOSES    0.405575
460 - SPINAL FUSION EXCEPT CERVICAL W/O MCC  885 - PSYCHOSES    0.413419
dtype: float64


# Scatterplots

2. Plot a scatterplot to show their relationship. </br>
Make sure to label both axis of the plot with the feature names. </br>
Discuss whether the observed relationsare interesting or expected, given the DRG category names. </br>
(This will result in 4
scatter plots total.)

In [31]:
import plotly.express as px

fig = px.scatter(pv, 
                 x="194 - SIMPLE PNEUMONIA & PLEURISY W CC", 
                 y="690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC")
fig.show()


In [32]:
fig = px.scatter(pv, 
                 x="392 - ESOPHAGITIS, GASTROENT & MISC DIGEST DISORDERS W/O MCC", 
                 y="690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC")
fig.show()

In [33]:
fig = px.scatter(pv, 
                 x="460 - SPINAL FUSION EXCEPT CERVICAL W/O MCC", 
                 y="885 - PSYCHOSES")
fig.show()

In [34]:
fig = px.scatter(pv, 
                 x="473 - CERVICAL SPINAL FUSION W/O CC/MCC", 
                 y="885 - PSYCHOSES")
fig.show()