# MBS Combined 2014-2022: Exporative Analysis

In [83]:
# import libraries
import pandas as pd
import numpy as np
import os

## Import MBS Combined Dataset

In [84]:
# import the transformed mbs file and assign to a dataframe

# setup path to original dataset
path = r"/Users/patel/Documents/CF-Data Anaylst Course/portfolio_projects/mbs_analysis/datasets/"

df_mbs_2014_23 = pd.read_pickle(
    os.path.join(path, "clean_datasets/mbs_data/2014-22_phc_combined_mbs.pkl")
)
df_mbs_2014_23.info(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 258535 entries, 0 to 258534
Data columns (total 16 columns):
 #   Column                                        Non-Null Count   Dtype  
---  ------                                        --------------   -----  
 0   Year                                          258535 non-null  int64  
 1   StateTerritory                                258535 non-null  object 
 2   GeographicCode                                258535 non-null  object 
 3   GeographicAreaName                            258535 non-null  object 
 4   GeographicGroup                               258535 non-null  object 
 5   ServiceLevel                                  258535 non-null  object 
 6   Service                                       258535 non-null  object 
 7   DemographicGroup                              258535 non-null  object 
 8   Medicare benefits per 100 people ($)          234301 non-null  float64
 9   No. of patients                               23

## Descriptive Statistical Analysis

In [85]:
df_mbs_2014_23.describe()

Unnamed: 0,Year,Medicare benefits per 100 people ($),No. of patients,No. of services,Percentage of people who had the service (%),Services per 100 people,Total Medicare benefits paid ($),Total provider fees ($),Estimated resident population
count,258535.0,234301.0,234301.0,234301.0,234301.0,234301.0,234301.0,234301.0,258505.0
mean,2018.002162,6546.570367,21725.315317,93415.819766,23.257384,104.4814,5642181.0,6530087.0,126692.421748
std,2.583762,11229.591164,228847.398569,1353282.511084,28.371166,208.22682,69630720.0,78485830.0,835320.657884
min,2014.0,0.0,0.0,-102.0,0.0,-0.34,0.0,0.0,0.0
25%,2016.0,97.0,521.0,1063.0,1.11,2.17,47439.0,53897.0,20080.0
50%,2018.0,1146.11,3290.0,7721.0,8.99,19.43,537654.0,621628.0,43450.0
75%,2020.0,7662.81,11101.0,30794.0,35.7,91.07,2486454.0,3009783.0,77425.0
max,2022.0,110280.55,23099650.0,188694030.0,100.0,2041.2,9082284000.0,10005620000.0,25697298.0


1. 10% of values are missing in each numeric column except Year and ERD. These are not published values due to suppression. Will not be populated
2. No of Services has - 102 as minimum. Investigation required
3. Services per 100 people has negative value
4. Check number of SA areas that have 0 residents

### Address Negative Value

In [86]:
df_mbs_2014_23[df_mbs_2014_23["No. of services"] < 0]

Unnamed: 0,Year,StateTerritory,GeographicCode,GeographicAreaName,GeographicGroup,ServiceLevel,Service,DemographicGroup,Medicare benefits per 100 people ($),No. of patients,No. of services,Percentage of people who had the service (%),Services per 100 people,Total Medicare benefits paid ($),Total provider fees ($),Estimated resident population
159089,2019,Qld,31502,Outback - North,Remote (incl. very remote),Level 3,GP Multidisciplinary Case Conference,All persons,5.0,183,-102,0.61,-0.34,1548.0,1586.0,30139


Original data is with -ve value. No explanation from source the reason for -ve 'no of services' and 'services per 100 people'. 

Assumption made that thi is an error. Since there were patients recorded, setting it to be positive.

In [87]:
df_mbs_2014_23.loc[159089, "No. of services"] = abs(
    df_mbs_2014_23.loc[159089, "No. of services"]
)
df_mbs_2014_23.loc[159089, "Services per 100 people"] = abs(
    df_mbs_2014_23.loc[159089, "Services per 100 people"]
)
df_mbs_2014_23.loc[159089]

Year                                                                            2019
StateTerritory                                                                   Qld
GeographicCode                                                                 31502
GeographicAreaName                                                   Outback - North
GeographicGroup                                           Remote (incl. very remote)
ServiceLevel                                                                 Level 3
Service                                         GP Multidisciplinary Case Conference
DemographicGroup                                                         All persons
Medicare benefits per 100 people ($)                                             5.0
No. of patients                                                                  183
No. of services                                                                  102
Percentage of people who had the service (%)                     

#### Estimated Resident Population 0 Investigation

In [88]:
zero_erp = df_mbs_2014_23[df_mbs_2014_23["Estimated resident population"] == 0]
zero_erp.shape

(314, 16)

In [89]:
zero_erp[["StateTerritory", "GeographicCode", "GeographicAreaName"]].value_counts(
    dropna=False
)

StateTerritory     GeographicCode  GeographicAreaName         
Other Territories  90104           Norfolk Island                 249
NSW                12402           Blue Mountains - South          45
                   10702           Illawarra Catchment Reserve     20
dtype: int64

From previous investigation we know Blue Mountains and Illawarra Catchment Reserve have no population.

In [90]:
# investigating Norfolk Island
zero_erp[zero_erp["GeographicCode"] == "90104"]["GeographicGroup"].value_counts(
    dropna=False
)

Remote (incl. very remote)    249
Name: GeographicGroup, dtype: int64

In [91]:
norfolk_data = df_mbs_2014_23[(df_mbs_2014_23["GeographicCode"] == "90104")]

Norfolk Island data has small population. Due to this data was either suppressed or services were not used. Suspect the data collection from Norfolk also occurred after 2016 census. 

In [92]:
df_mbs_2014_23.dtypes.to_clipboard()

In [103]:
df_mbs_2014_23.shape

(258535, 16)

In [98]:
df_mbs_2014_23.to_pickle(
    os.path.join(path, "clean_datasets/mbs_data/2014-22_phc_complete_mbs.pkl")
)