Primary Question: 
- What factors influence Indigenous healthcare access? 
- Patterns of healthcare access among different Indigenous groups

> Health Gaps: Compare healthcare access between Indigenous and non-Indigenous populations

> Within-group Analysis: compare First Nations, Métis, and Inuit populations, or rural vs urban Indigenous populations

In [22]:
import pandas as pd
import numpy as np

# Healthcare Access

## Load Data

In [2]:
def data_loader(data_type, table_name):
    path = f'data/{data_type}/{table_name}-eng/{table_name}.csv'
    df = pd.read_csv(path)
    return df

In [3]:
healthcare_df = data_loader('healthcare-access', '41100081')
healthcare_df.head()

Unnamed: 0,REF_DATE,GEO,DGUID,Selected characteristics of health care access and experiences,Indigenous group,Gender,Statistics,UOM,UOM_ID,SCALAR_FACTOR,SCALAR_ID,VECTOR,COORDINATE,VALUE,STATUS,SYMBOL,TERMINATED,DECIMALS
0,2024,Canada,2021A000011124,"Total, unmet health care needs in the past 12 ...",First Nations,"Total, gender",Percentage,Percent,239,units,0,v1663833852,1.1.1.1.1,100.0,,,,1
1,2024,Canada,2021A000011124,"Total, unmet health care needs in the past 12 ...",First Nations,"Total, gender",Low 95% confidence interval,Percent,239,units,0,v1663833853,1.1.1.1.2,100.0,,,,1
2,2024,Canada,2021A000011124,"Total, unmet health care needs in the past 12 ...",First Nations,"Total, gender",High 95% confidence interval,Percent,239,units,0,v1663833854,1.1.1.1.3,100.0,,,,1
3,2024,Canada,2021A000011124,"Total, unmet health care needs in the past 12 ...",First Nations,Men+,Percentage,Percent,239,units,0,v1663833855,1.1.1.2.1,100.0,,,,1
4,2024,Canada,2021A000011124,"Total, unmet health care needs in the past 12 ...",First Nations,Men+,Low 95% confidence interval,Percent,239,units,0,v1663833856,1.1.1.2.2,100.0,,,,1


In [4]:
# Check data types
healthcare_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1566 entries, 0 to 1565
Data columns (total 18 columns):
 #   Column                                                          Non-Null Count  Dtype  
---  ------                                                          --------------  -----  
 0   REF_DATE                                                        1566 non-null   int64  
 1   GEO                                                             1566 non-null   object 
 2   DGUID                                                           1566 non-null   object 
 3   Selected characteristics of health care access and experiences  1566 non-null   object 
 4   Indigenous group                                                1566 non-null   object 
 5   Gender                                                          1566 non-null   object 
 6   Statistics                                                      1566 non-null   object 
 7   UOM                                                

The `value` column shows the percentage of individuals in each Indigenous group (and gender category) who selected a specific healthcare experience

## Rename Column Names


In [5]:
healthcare_df.columns

Index(['REF_DATE', 'GEO', 'DGUID',
       'Selected characteristics of health care access and experiences',
       'Indigenous group', 'Gender', 'Statistics', 'UOM', 'UOM_ID',
       'SCALAR_FACTOR', 'SCALAR_ID', 'VECTOR', 'COORDINATE', 'VALUE', 'STATUS',
       'SYMBOL', 'TERMINATED', 'DECIMALS'],
      dtype='object')

In [6]:
# change column names to lower case
healthcare_df.columns = healthcare_df.columns.str.lower()

In [7]:
# replace spaces with underscores
healthcare_df.columns = healthcare_df.columns.str.replace(' ', '_')

In [8]:
# rename `selected characteristics of health care access and experiences` to `healthcare_access_experience`
healthcare_df.rename(columns={'selected_characteristics_of_health_care_access_and_experiences': 'healthcare_access_experience'}, inplace=True)


## Statistics of the Data

In [9]:
# statistics of the numerical columns
healthcare_df.describe()

Unnamed: 0,ref_date,uom_id,scalar_id,value,symbol,terminated,decimals
count,1566.0,1566.0,1566.0,1530.0,0.0,0.0,1566.0
mean,2024.0,239.0,0.0,51.694575,,,1.0
std,0.0,0.0,0.0,34.336181,,,0.0
min,2024.0,239.0,0.0,2.0,,,1.0
25%,2024.0,239.0,0.0,20.9,,,1.0
50%,2024.0,239.0,0.0,43.25,,,1.0
75%,2024.0,239.0,0.0,89.275,,,1.0
max,2024.0,239.0,0.0,100.0,,,1.0


Since standard deviation is 0 and min and max are the same, it means that all values in the column are the same.

Since count is 0, it means there are no non-null values in the column.

Therefore, only the `value` column is useful for analysis.

In [10]:
# categorical columns
healthcare_df.describe(include=['object'])

Unnamed: 0,geo,dguid,healthcare_access_experience,indigenous_group,gender,statistics,uom,scalar_factor,vector,coordinate,status
count,1566,1566,1566,1566,1566,1566,1566,1566,1566,1566,36
unique,1,1,58,3,3,3,1,1,1566,1566,1
top,Canada,2021A000011124,"Total, unmet health care needs in the past 12 ...",First Nations,"Total, gender",Percentage,Percent,units,v1663833852,1.1.1.1.1,F
freq,1566,1566,27,522,522,522,1566,1566,1,1,36


Since unique is 1, it means that all values in these columns are the same. So they are not useful for analysis.

In [11]:
# unique values in the categorical columns
for col in ['indigenous_group', 'gender', 'statistics']:
    print(f'{col}: {healthcare_df[col].nunique()} unique values')
    print(healthcare_df[col].unique())
    print()

indigenous_group: 3 unique values
['First Nations' 'Métis' 'Inuk (Inuit)']

gender: 3 unique values
['Total, gender' 'Men+' 'Women+']

statistics: 3 unique values
['Percentage' 'Low 95% confidence interval' 'High 95% confidence interval']



95% confident that true percentage falls between 'Low 95% confidence interval' and 'High 95% confidence interval'

In [12]:
healthcare_df[
    (healthcare_df['indigenous_group'] == 'First Nations') &
    (healthcare_df['gender'] == 'Men+') &
    (healthcare_df['healthcare_access_experience'] == 'Yes, had an unmet health care need in the past 12 months')
]

Unnamed: 0,ref_date,geo,dguid,healthcare_access_experience,indigenous_group,gender,statistics,uom,uom_id,scalar_factor,scalar_id,vector,coordinate,value,status,symbol,terminated,decimals
30,2024,Canada,2021A000011124,"Yes, had an unmet health care need in the past...",First Nations,Men+,Percentage,Percent,239,units,0,v1663833882,1.2.1.2.1,24.4,,,,1
31,2024,Canada,2021A000011124,"Yes, had an unmet health care need in the past...",First Nations,Men+,Low 95% confidence interval,Percent,239,units,0,v1663833883,1.2.1.2.2,19.2,,,,1
32,2024,Canada,2021A000011124,"Yes, had an unmet health care need in the past...",First Nations,Men+,High 95% confidence interval,Percent,239,units,0,v1663833884,1.2.1.2.3,30.3,,,,1


We'll filter for statistics == 'Percentage'

In [13]:
# filter statistics == 'Percentage'
hc_filtered_df = healthcare_df[healthcare_df['statistics'] == 'Percentage']

## Drop Unnecessary Columns

In [14]:
hc_filtered_df.columns

Index(['ref_date', 'geo', 'dguid', 'healthcare_access_experience',
       'indigenous_group', 'gender', 'statistics', 'uom', 'uom_id',
       'scalar_factor', 'scalar_id', 'vector', 'coordinate', 'value', 'status',
       'symbol', 'terminated', 'decimals'],
      dtype='object')

In [15]:
selected_cols = ['indigenous_group', 'gender', 'healthcare_access_experience', 'value']

hc_filtered_df = hc_filtered_df[selected_cols]

## Drop Missing Values 

In [16]:
# Check NA value counts and proportions
na_counts = hc_filtered_df.isna().sum()
na_proportions = hc_filtered_df.isna().mean()
na_summary = pd.DataFrame({'count': na_counts, 'proportion': na_proportions})
na_summary

Unnamed: 0,count,proportion
indigenous_group,0,0.0
gender,0,0.0
healthcare_access_experience,0,0.0
value,12,0.022989


NA values only take up ~2% of the data, so we can drop them

In [17]:
# Drop rows with NA values
hc_cleaned_df = hc_filtered_df.dropna()
print(f"Dropped {hc_filtered_df.shape[0] - hc_cleaned_df.shape[0]} rows with NA values")

Dropped 12 rows with NA values


## Reported Access to Healthcare: Yes vs. No

In [21]:
hc_cleaned_df['healthcare_access_experience'].unique()

array(['Total, unmet health care needs in the past 12 months',
       'Yes, had an unmet health care need in the past 12 months',
       'No, did not have an unmet health care need in the past 12 months',
       'Total, consulted a health care provider for a non-urgent primary health care need in the past 12 months',
       'Yes, consulted a health care provider for a non-urgent primary health care need in the past 12 months',
       'No, did not consult a health care provider for a non-urgent primary health care need in the past 12 months',
       'Total, wait time between requesting care and speaking with a primary health care provider',
       'Same day or next', 'Two days to less than two weeks',
       'Two weeks or more',
       'Total, satisfaction with wait times between requesting care and speaking with a primary health care provider',
       'Very satisfied or satisfied',
       'Neither satisfied nor dissatisfied',
       'Dissatisfied or very dissatisfied',
       'Total, p

In [18]:
hc_cleaned_df[hc_cleaned_df['healthcare_access_experience'].str.contains("No")]['value'].mean()

54.84027777777778

In [19]:
hc_cleaned_df[hc_cleaned_df['healthcare_access_experience'].str.contains("Yes")]['value'].mean()

42.09074074074075

55% of Indigenous people reported unmet healthcare needs, which is 13% higher than the 42% who reported having their healthcare needs met. This suggests a significant access issue for Indigenous peoples.

## Feature Engineering

In [23]:
def split_healthcare_access_experience(df):
    # 1. Unmet healthcare need in past 12 months
    df['unmet_healthcare_need_past_12mo'] = np.where(
        df['healthcare_access_experience'].str.contains('unmet health care need in the past 12 months') & 
        df['healthcare_access_experience'].str.contains('Yes'),
        'Yes',
        np.where(
            df['healthcare_access_experience'].str.contains('unmet health care need in the past 12 months') & 
            df['healthcare_access_experience'].str.contains('No'),
            'No',
            np.nan
        )
    )

    # 2. Consulted provider for non-urgent care
    df['consulted_provider_nonurgent_12mo'] = np.where(
        df['healthcare_access_experience'].str.contains('consulted a health care provider for a non-urgent primary health care need in the past 12 months') & 
        df['healthcare_access_experience'].str.contains('Yes'),
        'Yes',
        np.where(
            df['healthcare_access_experience'].str.contains('consulted a health care provider for a non-urgent primary health care need in the past 12 months') & 
            df['healthcare_access_experience'].str.contains('No'),
            'No',
            np.nan
        )
    )

    # 3. Wait time category
    def wait_time_category(val):
        if val == 'Same day or next':
            return 'Same day or next'
        elif val == 'Two days to less than two weeks':
            return '2d-2w'
        elif val == 'Two weeks or more':
            return '2w+'
        return np.nan
    df['wait_time'] = df['healthcare_access_experience'].apply(wait_time_category)

    # 4. Satisfaction with wait time
    def wait_satisfaction(val):
        if val == 'Very satisfied or satisfied':
            return 'Satisfied'
        elif val == 'Neither satisfied nor dissatisfied':
            return 'Neutral'
        elif val == 'Dissatisfied or very dissatisfied':
            return 'Dissatisfied'
        return np.nan
    df['wait_satisfaction'] = df['healthcare_access_experience'].apply(wait_satisfaction)

    # 5. Perceived mental health
    def mental_health(val):
        if val == 'Excellent or very good perceived mental health':
            return 'Excellent/Very Good'
        elif val == 'Good perceived mental health':
            return 'Good'
        elif val == 'Fair or poor perceived mental health':
            return 'Fair/Poor'
        return np.nan
    df['mental_health_status'] = df['healthcare_access_experience'].apply(mental_health)

    # 6. Needed mental health care
    df['needed_mental_health_care'] = np.where(
        df['healthcare_access_experience'] == 'Yes, needed mental health care', 'Yes',
        np.where(df['healthcare_access_experience'] == 'No, did not need mental health care', 'No', np.nan)
    )

    # 7. Extent to which mental health needs were met
    def mh_needs_met(val):
        if val == 'Sought mental health care, and needs were fully met':
            return 'Fully Met'
        elif val == 'Sought mental health care, and needs were partially met':
            return 'Partially Met'
        elif val == 'Sought mental health care, and needs were unmet':
            return 'Unmet'
        elif val == 'Did not seek mental health care':
            return 'Not Sought'
        return np.nan
    df['mental_healthcare_needs_met'] = df['healthcare_access_experience'].apply(mh_needs_met)

    # 8. Prescription for medication in past 12 months
    df['prescription_12mo'] = np.where(
        df['healthcare_access_experience'] == 'Yes, had a prescription for medication', 'Yes',
        np.where(df['healthcare_access_experience'] == 'No, did not have a prescription for medication', 'No', np.nan)
    )

    # 9. Cost-related non-adherence (condensed)
    def cost_nonadherence(val):
        if 'Did not fill or collect' in val:
            return 'Did not fill/collect'
        elif 'Skipped doses' in val:
            return 'Skipped doses'
        elif 'Reduced the dosage' in val:
            return 'Reduced dosage'
        elif 'Delayed filling' in val:
            return 'Delayed'
        elif 'Other decision' in val:
            return 'Other'
        elif 'Did not do any of the above' in val:
            return 'None'
        return np.nan
    df['cost_related_nonadherence'] = df['healthcare_access_experience'].apply(cost_nonadherence)

    # 10. Reason unable to fill prescription
    def unable_to_fill(val):
        if 'Unable to get new prescription' in val:
            return 'No new prescription'
        elif 'Unable to fill enough' in val:
            return '30-day limit'
        elif 'Medication not available' in val:
            return 'Not available'
        elif 'Other reason' in val:
            return 'Other'
        elif 'None of the above' in val:
            return 'None'
        return np.nan
    df['unable_to_fill_prescription'] = df['healthcare_access_experience'].apply(unable_to_fill)

    # 11. Traveled outside community for care
    df['traveled_for_care'] = np.where(
        df['healthcare_access_experience'] == 'Yes, travelled outside community to access health care services', 'Yes',
        np.where(df['healthcare_access_experience'] == 'No, did not travel outside community to access health care services', 'No', np.nan)
    )

    # 12. Reported discrimination
    df['reported_discrimination'] = np.where(
        df['healthcare_access_experience'] == 'Yes, reported unfair treatment, racism or discrimination from health care professional', 'Yes',
        np.where(df['healthcare_access_experience'] == 'No, did not report unfair treatment, racism or discrimination from health care professional', 'No', np.nan)
    )

    # 13. Importance of Indigenous-supportive care
    def indigenous_support(val):
        if val == 'Very or somewhat important':
            return 'Important'
        elif val == 'Not very important or not important at all':
            return 'Not important'
        return np.nan
    df['importance_indigenous_support'] = df['healthcare_access_experience'].apply(indigenous_support)

    # 14. Reason why Indigenous-supportive care is important
    def reason_indigenous_support(val):
        if val == 'Provide better overall quality of care':
            return 'Better quality'
        elif val == 'Feel respected for culture, beliefs and identity':
            return 'Feel respected'
        elif val == 'Feel safer discussing sensitive or traumatic experiences':
            return 'Safer to discuss'
        elif val == 'More likely to seek health care':
            return 'More likely to seek'
        elif val == 'Other reason':
            return 'Other'
        elif val == 'Prefer not to say':
            return 'Prefer not to say'
        return np.nan
    df['reason_indigenous_support'] = df['healthcare_access_experience'].apply(reason_indigenous_support)
    
    return df

In [24]:
hc_feature_eng_df = split_healthcare_access_experience(hc_cleaned_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['unmet_healthcare_need_past_12mo'] = np.where(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['consulted_provider_nonurgent_12mo'] = np.where(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['wait_time'] = df['healthcare_access_experience'].apply(wait_time_category)
A value is trying to be 

In [25]:
hc_feature_eng_df.shape[0]

510

In [26]:
hc_feature_eng_df.columns

Index(['indigenous_group', 'gender', 'healthcare_access_experience', 'value',
       'unmet_healthcare_need_past_12mo', 'consulted_provider_nonurgent_12mo',
       'wait_time', 'wait_satisfaction', 'mental_health_status',
       'needed_mental_health_care', 'mental_healthcare_needs_met',
       'prescription_12mo', 'cost_related_nonadherence',
       'unable_to_fill_prescription', 'traveled_for_care',
       'reported_discrimination', 'importance_indigenous_support',
       'reason_indigenous_support'],
      dtype='object')

In [33]:
hc_feature_eng_df[(hc_feature_eng_df['consulted_provider_nonurgent_12mo'] == 'Yes') &
                (hc_feature_eng_df['indigenous_group'] == 'First Nations') &
                (hc_feature_eng_df['gender'] == 'Total, gender')
]

Unnamed: 0,indigenous_group,gender,healthcare_access_experience,value,unmet_healthcare_need_past_12mo,consulted_provider_nonurgent_12mo,wait_time,wait_satisfaction,mental_health_status,needed_mental_health_care,mental_healthcare_needs_met,prescription_12mo,cost_related_nonadherence,unable_to_fill_prescription,traveled_for_care,reported_discrimination,importance_indigenous_support,reason_indigenous_support
108,First Nations,"Total, gender","Yes, consulted a health care provider for a no...",66.2,,Yes,,,,,,,,,,,,


## Store Final Data as Parquet

In [34]:
print(f"Total number of rows: {hc_feature_eng_df.shape[0]}")

Total number of rows: 510


In [35]:
# save as parquet
hc_feature_eng_df.to_parquet('data/healthcare-access/41100081-eng/healthcare_access.parquet', index=False)