# AnyoneAI - Final Project
> Hospitalization Prediction for Elderly Population

The project aims to predict the probability of hospitalization for elderly Mexican individuals using machine learning algorithms and data such as demographics, health indicators, and medical history. The model can help healthcare providers identify high-risk patients and allocate resources accordingly. This project is similar to what you have implemented in the last few sprints but with a focus on building a model that can use a few features to make accurate predictions.


### Goal:
The main goal of this project is to ask users to complete a form and use the provided information to predict the risk of hospitalization for that person in the next year. For that task, a Machine Learning model must be trained to make that prediction. Keep in mind the dataset we are going to use has thousands of features but, we can't ask users to complete such amount of fields in the form. We suggest you start using some classic models like Decision Trees, Random Forests, or Gradient Boosting, identify the most important features, and try to refine your model input based on that.

### Main Deliverables:
- Exploratory Dataset Analysis (EDA)
- Scripts used for data pre-processing and data preparation
- Training scripts and trained models. Description of how to reproduce results
- Implementation and training of model for hospitalization prediction
- The final model should need less than 50 features to make predictions
- The final model AUC Score must be over 0.9
- Present results and a demo of the model doing predictions in real time using an API
- Everything must be containerized using Docker

### Additional Optional Deliverables:
- Experiment using Transformer models, you can convert the data to a string, experiment using certain subsets of features, and evaluate how good are the predictions with few input variables.
- Make a UI in which users must complete a form to get the prediction for the demo.

## 1. Introduction
The dataset comprises multiple files with information about aging, health, and retirement from the Mexican Family of Health and Retirement surveys:
`H_MHAS_c2.sas7bdat`, `H_MHAS_EOL_b.sas7bdat`  and `H_MEX_COG_A2_2016.dta`.

In [1]:
# Import libraries
import numpy as np
import pandas as pd

from src import config, data_utils

### Getting the data

To access the data for this project, you only need to execute the code below. This will download one sas7bdat file inside the `dataset` folder:

- `H_MHAS_c2.sas7bdat`: The MHAS Dataset is public and free of charge for use but you will have to register on their website to get full access. You can see all the available data in the following link: https://www.mhasweb.org/DataProducts/HarmonizedData.aspx. This file consolidates information throughout the survey duration and is linked by user ID.

In [2]:
h_mhas_c2 = data_utils.get_datasets()
print('in our dataset we have',h_mhas_c2.shape[0],'subjects')
print('in our dataset we have',h_mhas_c2.shape[1],'features')

Downloading...
From: https://drive.google.com/uc?id=1PZRLL7cq6UAVLG3DFeuPuhrS1LJTsNvY&confirm=t
To: c:\Users\Carlos\Documents\Projects\Final_project\hospitalization-prediction-for-elderly-population\dataset\H_MHAS_c2.sas7bdat
100%|██████████| 482M/482M [00:46<00:00, 10.3MB/s] 
Downloading...
From: https://drive.google.com/uc?id=1bH9hNW0FoN_1q-rovnB8_59ovi3jXs3j&confirm=t
To: c:\Users\Carlos\Documents\Projects\Final_project\hospitalization-prediction-for-elderly-population\dataset\H_MHAS_EOL_b.sas7bdat
100%|██████████| 5.23M/5.23M [00:00<00:00, 6.86MB/s]
Downloading...
From: https://drive.google.com/uc?id=1MnDC30Jo5Jztmb99z5PLJQA47vj-iDdA&confirm=t
To: c:\Users\Carlos\Documents\Projects\Final_project\hospitalization-prediction-for-elderly-population\dataset\H_MEX_COG_A2_2016.dta
100%|██████████| 1.26M/1.26M [00:00<00:00, 3.46MB/s]
[H_MHAS_c2.sas7bdat] column count mismatch


in our dataset we have 26839 subjects
in our dataset we have 5241 features


### Data and Features
The dataset `H_MHAS_c2.sas7bdat` has 26839 rows and 5241 features.

SOME IMPORTANT INSIGHTS
-   The first character of the majority of variables indicates whether the variable refers to the reference person (“r”), spouse (“s”), or household (“h”).
-   The second character indicates the wave to which the variable pertains: “1”, “2”, “3”, “4”, “5”, or “A”. The “A” indicates “all,”

All features are divided into the following sections.

- SECTION A: DEMOGRAPHICS, IDENTIFIERS, AND WEIGHTS
- SECTION B: HEALTH
- SECTION C: HEALTH CARE UTILIZATION AND INSURANCE
- SECTION D: COGNITION
- SECTION E: FINANCIAL AND HOUSING WEALTH
- SECTION F: INCOME
- SECTION G: FAMILY STRUCTURE
- SECTION H: EMPLOYMENT HISTORY
- SECTION I: RETIREMENT
- SECTION J: PENSION
- SECTION K: PHYSICAL MEASURES
- SECTION L: ASSISTANCE AND CAREGIVING
- SECTION M: STRESS
- SECTION O: END OF LIFE PLANNING
- SECTION Q: PSYCHOSOCIAL

<span style="font-size:0.8em;">

<h3><center> EDA Section A - DEMOGRAPHICS, IDENTIFIERS, AND WEIGHTS </center></h3>

This section consists of 26 subsections (245 features) which are shown below:
-  Person Specific Identifier  
-  Household Identifier  
-  Spouse Identifier  
-  Wave Status: Response Indicator  
-  Wave Status: Interview Status  
-  Sample Cohort  
-  Whether Proxy Interview  
-  Number of Household Respondents  
-  Whether Couple Household  
-  Household Analysis Weight  
-  Person-Level Analysis Weight
-  Interview Dates  
-  Birth Date: Month and Year  
-  Death Date: Month and Year  
-  Age at Interview (Months and Years)  
-  Gender  
-  Education  
-  Education: Categories by ISCED Codes  
-  Education: Harmonized Education  
-  Literacy and Numeracy  
-  Indigenous Language  
-  Current Marital Status: Current Partnership Status  
-  Current Marital Status: With Partnership  
-  Current Marital Status: Without Partnership  
-  Number of Marriages  
-  Urban or Rural
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...


In [3]:
elimin_A = [
    # 'Person Specific Identifier', # Person Specific Identifier  
    # 'Household Identifier', # Household Identifier  
    # 'Spouse Identifier', # Spouse Identifier  
    # 'Response Indicator', # Wave Status: Response Indicator  
    # 'Interview Status', # Wave Status: Interview Status  
    # 'Sample Cohort', # Sample Cohort  
    # 'Whether Proxy Interview', # Whether Proxy Interview  
    # 'Number of Household Respondents', # Number of Household Respondents  
    # 'Whether Couple Household', # Whether Couple Household  
    # 'Household Analysis Weight', # Household Analysis Weight  
    # 'Person-Level Analysis Weight', # Person-Level Analysis Weight
    # 'Interview Dates', # Interview Dates  
    # 'Birth Date', # Birth Date: Month and Year  
    # 'Death Date', # Death Date: Month and Year  
    # 'Age at Interview', # Age at Interview (Months and Years)  
    # 'Gender', # Gender  
    # 'Education', # Education  
    # 'Education_Cat', # Education: Categories by ISCED Codes  
    # 'Education_ Harmon', # Education: Harmonized Education  
    # 'Literacy and Numeracy', # Literacy and Numeracy  
    # 'Indigenous Language', # Indigenous Language  
    # 'Current Partnership Status', # Current Marital Status: Current Partnership Status  
    # 'With Partnership', # Current Marital Status: With Partnership  
    # 'Without Partnership', # Current Marital Status: Without Partnership  
    # 'Number of Marriages', # Number of Marriages  
    # 'Urban or Rural' # Urban or Rural
]

<span style="font-size:0.8em;">

<h3><center> EDA Section B - HEALTH </center></h3>

This section consists of 27 subsections (1624 features) which are shown below:

-  Self-Report of Health  
-  Activities of Daily Living (ADLs): Raw Recodes  
-  Activities of Daily Living (ADLs): Some Difficulty 
-  Instrumental Activities of Daily Living (IADLs): Raw Recodes  
-  Instrumental Activities of Daily Living (IADLs): Some Difficulty  
-  Other Functional Limitations: Raw Recodes  
-  Other Functional Limitations: Some Difficulty  
-  ADL Summary: Sum ADLs Where Respondent Reports Any Difficulty  
-  IADL Summary: Sum IADLs Where Respondent Reports Any Difficulty  
-  Other Summary Indices: Mobility, Large Muscle, Gross, Fine Motor, Total, Upper, Lower Body Mobility, and NAGI Activities 
-  Doctor Diagnosed Health Problems: Ever Have Condition  
-  Doctor Diagnosed Diseases: Whether Receives Treatment or Medication for Disease  
-  Doctor Diagnosed Diseases: Whether Disease Limits Activity  
-  Doctor Diagnosed Diseases: Age of Diagnosis  
-  Vision  
-  Hearing 
-  Falls  
-  Urinary Incontinence  
-  Persistent Health Problems  
-  Sleep  
-  Pain  
-  Menopause  
-  BMI  
-  Health Behaviors: Physical Activity or Exercise  
-  Health Behaviors: Drinking  
-  Health Behaviors: Smoking (Cigarettes)  
-  Health Behaviors: Preventive Care  
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [4]:
elimin_B = {
    # 'Self-Report of Health',  # Self-Report of Health  
    # 'ADLs Raw Recodes',  # Activities of Daily Living (ADLs): Raw Recodes  
    # 'ADLs Some Difficulty',  # Activities of Daily Living (ADLs): Some Difficulty 
    # 'IADLs Raw Recodes',  # Instrumental Activities of Daily Living (IADLs): Raw Recodes  
    # 'IADLs Some Difficulty',  # Instrumental Activities of Daily Living (IADLs): Some Difficulty  
    # 'Other Limitations Raw Recodes',  # Other Functional Limitations: Raw Recodes  
    # 'Other Limitations Some Difficulty',  # Other Functional Limitations: Some Difficulty  
    # 'ADL Summary',  # ADL Summary: Sum ADLs Where Respondent Reports Any Difficulty  
    # 'IADL Summary',  # IADL Summary: Sum IADLs Where Respondent Reports Any Difficulty  
    # 'Other Summary Indices',  # Other Summary Indices: Mobility, Large Muscle, Gross, Fine Motor, Total, Upper, Lower Body Mobility, and NAGI Activities 
    # 'Doctor Diagnosed Health Problems',  # Doctor Diagnosed Health Problems: Ever Have Condition  
    # 'Treatment or Medication for Disease',  # Doctor Diagnosed Diseases: Whether Receives Treatment or Medication for Disease  
    # 'Whether Disease Limits Activity',  # Doctor Diagnosed Diseases: Whether Disease Limits Activity  
    # 'Age of Diagnosis',  # Doctor Diagnosed Diseases: Age of Diagnosis  
    # 'Vision',  # Vision  
    # 'Hearing',  # Hearing 
    # 'Falls',  # Falls  
    # 'Urinary Incontinence',  # Urinary Incontinence  
    # 'Persistent Health Problems',  # Persistent Health Problems  
    # 'Sleep',  # Sleep  
    # 'Pain',  # Pain  
    # 'Menopause',  # Menopause  
    # 'BMI',  # BMI  
    # 'Physical Activity or Exercise',  # Health Behaviors: Physical Activity or Exercise  
    # 'Drinking',  # Health Behaviors: Drinking  
    # 'Smoking (Cigarettes)',  # Health Behaviors: Smoking (Cigarettes)  
    # 'Preventive Care',  # Health Behaviors: Preventive Care
}

<span style="font-size:0.8em;">

<h3><center> EDA Section C - HEALTH CARE UTILIZATION AND INSURANCE </center></h3>

This section consists of 8 subsections (236 features) which are shown below:

-  Medical Care Utilization: Hospital  
-  Medical Care Utilization: Doctor  
-  Medical Care Utilization: Other Medical Care Utilization  
-  Medical Expenditures: Out of Pocket and Total  
-  Covered by Federal Government Health Insurance Program  
-  Covered by Private Health Insurance  
-  Covered by Health Insurance from a Current or Previous Employer  
-  Number of Health Insurance Plans 
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [5]:
elimin_C = {
    # 'Medical Care Hospital',  # Medical Care Utilization: Hospital  
    # 'Medical Care Doctor',  # Medical Care Utilization: Doctor  
    # 'Medical Care Other',  # Medical Care Utilization: Other Medical Care Utilization  
    # 'Medical Expenditures',  # Medical Expenditures: Out of Pocket and Total  
    # 'Covered by Government',  # Covered by Federal Government Health Insurance Program  
    # 'Covered by Private',  # Covered by Private Health Insurance  
    # 'Covered by Employer',  # Covered by Health Insurance from a Current or Previous Employer  
    # 'Number of Health Insurance Plans',  # Number of Health Insurance Plans
}

<span style="font-size:0.8em;">

<h3><center> EDA Section D: COGNITION </center></h3>

This section consists of 15 subsections (706 features) which are shown below:
 
-  Cognition Testing Conditions 
-  Self-Reported Memory 
-  Immediate Word Recall 
-  Delayed Word Recall 
-  Summary Scores 
-  Picture Drawing 
-  Verbal Fluency 
-  Visual Scanning 
-  Backwards Counting From 20 
-  Date Naming/Orientation 
-  Serial 7’s -> 'Serial 7s'
-  Proxy Cognition: JORM IQCODE 
-  Proxy Cognition: Ratings of Memory and Abilities 
-  Proxy Cognition: Cognitive Impairment 
-  Proxy Cognition: Problem Behaviors in Past Week 
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [6]:
elimin_D = {
    # 'Cognition Testing Conditions',  # Cognition Testing Conditions 
    # 'Self-Reported Memory',  # Self-Reported Memory 
    # 'Immediate Word Recall',  # Immediate Word Recall 
    # 'Delayed Word Recall',  # Delayed Word Recall 
    # 'Summary Scores',  # Summary Scores 
    # 'Picture Drawing',  # Picture Drawing 
    # 'Verbal Fluency',  # Verbal Fluency 
    # 'Visual Scanning',  # Visual Scanning 
    # 'Backwards Counting From 20',  # Backwards Counting From 20 
    # 'Date Naming/Orientation',  # Date Naming/Orientation 
    # 'Serial 7s',  # Serial 7’s -> 'Serial 7s'
    # 'JORM IQCODE',  # Proxy Cognition: JORM IQCODE 
    # 'Ratings of Memory and Abilities',  # Proxy Cognition: Ratings of Memory and Abilities 
    # 'Cognitive Impairment',  # Proxy Cognition: Cognitive Impairment 
    # 'Problem Behaviors in Past Week',  # Proxy Cognition: Problem Behaviors in Past Week
}  

<span style="font-size:0.8em;">

<h3><center> EDA Section E - FINANCIAL AND HOUSING WEALTH </center></h3>

This section consists of 15 subsections (148 features) which are shown below:

-  Inflation Multiplier 
-  Net Value of Real Estate (Not Primary Residence) 
-  Net Value of Cars 
-  Net Value of Businesses 
-  Value of Stocks Shares and Bonds 
-  Value of Checking Savings Accounts 
-  Value of Other Assets
-  Value of Primary Residence 
-  Value of All Mortgages (Primary Residence)
-  Net Value of Primary Residence
-  Home ownership
-  Value of Other Debt
-  Value of Loans Lent
-  Net Value of Non-Housing Financial Wealth (Excluding IRAs)
-  Total Wealth

</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [7]:
elimin_E = {
    # 'Inflation Multiplier',  # Inflation Multiplier 
    # 'Net Value of Real Estate',  # Net Value of Real Estate (Not Primary Residence) 
    # 'Net Value of Cars',  # Net Value of Cars 
    # 'Net Value of Businesses',  # Net Value of Businesses 
    # 'Value of Stocks',  # Value of Stocks Shares and Bonds 
    # 'Value of Accounts',  # Value of Checking Savings Accounts 
    # 'Value of Other',  # Value of Other Assets
    # 'Value of Residence',  # Value of Primary Residence 
    # 'Value of Mortgages',  # Value of All Mortgages (Primary Residence)
    # 'Net Value P Residence',  # Net Value of Primary Residence
    # 'Home ownership',  # Home ownership
    # 'Value of Other Debt',  # Value of Other Debt
    # 'Value of Loans Lent',  # Value of Loans Lent
    # 'Net Value Financial Wealth',  # Net Value of Non-Housing Financial Wealth (Excluding IRAs)
    # 'Total Wealth',  # Total Wealth
}   

<span style="font-size:0.8em;">

<h3><center> EDA Section F - INCOME </center></h3>

This section consists of 10 subsections (212 features) which are shown below:

-  Individual Earnings
-  Household Capital Income
-  Individual Income from Private Pension
-  Individual Public Pension Income
-  Individual Other Pensions Income
-  Individual Total Pensions Income
-  Individual Income from Other Government Transfers
-  All Other Income
-  Total Household Income (respondent & spouse)
-  Total Household Consumption (full household)
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [8]:
elimin_F = {
    # 'Individual Earnings',  # Individual Earnings
    # 'Household Income',  # Household Capital Income
    # 'Income from Private Pension',  # Individual Income from Private Pension
    # 'Public Pension Income',  # Individual Public Pension Income
    # 'Other Pensions Income',  # Individual Other Pensions Income
    # 'Total Pensions Income',  # Individual Total Pensions Income
    # 'Other Government Transfers',  # Individual Income from Other Government Transfers
    # 'All Other Income',  # All Other Income
    # 'Total Household Income r&s',  # Total Household Income (respondent & spouse)
    # 'Total Household Consumption h',  # Total Household Consumption (full household)
}

<span style="font-size:0.8em;">

<h3><center> EDA Section G - FAMILY STRUCTURE </center></h3>

This section consists of 19 subsections (223 features) which are shown below:

-  Number of People Living in Household
-  Number of Living Children
-  Number of Deceased Children
-  Number of Children Ever Born
-  Number of Grandchildren
-  Number of Living Siblings
-  Number of Deceased Siblings
-  Number of Living Parents
-  Parental Mortality
-  Parents' Current Age or Age at Death
-  Parents' Education
-  Any Child Co-Resides with Respondent
-  Any Children Living in the Same City
-  Any Weekly Contact with Children
-  Frequent or Weekly Contact with Relatives and Friends
-  Any Weekly Social Activities or Participate in Religious Groups
-  Financial Transfer from Children
-  Financial Transfer to Children
-  Financial Transfer to Parents
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [9]:
elimin_G = {
    # 'N Living in Household',  # Number of People Living in Household
    # 'N Living Children',  # Number of Living Children
    # 'N Deceased Children',  # Number of Deceased Children
    # 'N Children Ever Born',  # Number of Children Ever Born
    # 'N Grandchildren',  # Number of Grandchildren
    # 'N Living Siblings',  # Number of Living Siblings
    # 'N Deceased Siblings',  # Number of Deceased Siblings
    # 'N Living Parents',  # Number of Living Parents
    # 'Parental Mortality',  # Parental Mortality
    # 'Parents Age or Age at Death',  # Parents' Current Age or Age at Death
    # 'Parents Education',  # Parents' Education
    # 'Any Child with Respondent',  # Any Child Co-Resides with Respondent
    # 'Any Children Same City',  # Any Children Living in the Same City
    # 'Any Weekly Contact with Children',  # Any Weekly Contact with Children
    # 'Frequent Contact with Relatives',  # Frequent or Weekly Contact with Relatives and Friends
    # 'Weekly Social Activities',  # Any Weekly Social Activities or Participate in Religious Groups
    # 'Financial Transfer from Children',  # Financial Transfer from Children
    # 'Financial Transfer to Children',  # Financial Transfer to Children
    # 'Financial Transfer to Parents',  # Financial Transfer to Parents
}

<span style="font-size:0.8em;">

<h3><center> EDA Section H - EMPLOYMENT HISTORY </center></h3>

This section consists of 12 subsections (110 features) which are shown below:

-  Currently Working for Pay
-  Whether Self-Employed
-  Labor Force Status
-  In the Labor Force
-  Unemployment Status
-  Retired Employment Status
-  Hours at Main Job
-  Main Activity Years of Tenure
-  Job Allows Move to Less Demanding Work
-  Occupation Code for Job with Longest Reported Tenure
-  Year Last Job Ended
-  Reason Job Ended
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [10]:
elimin_H = {
    # 'Currently Working for Pay',  # Currently Working for Pay
    # 'Whether Self-Employed',  # Whether Self-Employed
    # 'Labor Force Status',  # Labor Force Status
    # 'In the Labor Force',  # In the Labor Force
    # 'Unemployment Status',  # Unemployment Status
    # 'Retired Employment Status',  # Retired Employment Status
    # 'Hours at Main Job',  # Hours at Main Job
    # 'Main Activity Years of Tenure',  # Main Activity Years of Tenure
    # 'Job Allows Move',  # Job Allows Move to Less Demanding Work
    # 'Occupation Code with Longest',  # Occupation Code for Job with Longest Reported Tenure
    # 'Year Last Job Ended',  # Year Last Job Ended
    # 'Reason Job Ended',  # Reason Job Ended
}

<span style="font-size:0.8em;">

<h3><center> EDA Section I - RETIREMENT </center></h3>

This section consists of 2 subsections (16 features) which are shown below:

-  Whether Retired: Retirement year, if says retired
-  Whether Retired: Retirement age, if says retired
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [11]:
elimin_I = {
    # 'Retirement year',  # Whether Retired: Retirement year, if says retired
    # 'Retirement age',  # Whether Retired: Retirement age, if says retired
}


<span style="font-size:0.8em;">

<h3><center> EDA Section J - PENSION </center></h3>

This section consists of 7 subsections (54 features) which are shown below:

-  Whether Receives Public Pension
-  Whether Receives Private Pension
-  Whether Receives Other Pension
-  Age When Started to Receive a Public Pension
-  Age When Started to Receive a Private Pension
-  Whether Current Public Pension(s) Can Continue
-  Whether Current Private Pension Can Continue
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [12]:
elimin_J = {
    # 'Public Pension',  # Whether Receives Public Pension
    # 'Private Pension',  # Whether Receives Private Pension
    # 'Other Pension',  # Whether Receives Other Pension
    # 'Age Public Pension',  # Age When Started to Receive a Public Pension
    # 'Age Private Pension',  # Age When Started to Receive a Private Pension
    # 'Public Pension Continuity',  # Whether Current Public Pension(s) Can Continue
    # 'Private Pension Continuity',  # Whether Current Private Pension Can Continue
}

<span style="font-size:0.8em;">

<h3><center> EDA Section K - PHYSICAL MEASURES </center></h3>

This section consists of 8 subsections (282 features) which are shown below:
 
-  Height, Weight, Waist and Hip Circumference Measurements
-  Height, Weight, Waist and Hip Circumference Measurements: Reason Didn't Complete
-  Sitting Height
-  Sitting Height: Reason Didn't Complete
-  Balance Test
-  Balance Test: Reason Didn't Complete
-  Blood Pressure Measurements
-  Blood Pressure Measurements: Reason Didn't Complete
-  Timed Walk Measurements
-  Timed Walk Measurements: Reason Didn't Complete
-  Hand Grip Strength Measurements
-  Hand Grip Strength Measurements: Reason Didn't Complete
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [13]:
elimin_K = {
    # 'Waist and Hip Measure',  # Height, Weight, Waist and Hip Circumference Measurements
    # 'Waist and Hip Measure Incomplete',  # Height, Weight, Waist and Hip Circumference Measurements: Reason Didn't Complete
    # 'Sitting Height',  # Sitting Height
    # 'Sitting Height Incomplete',  # Sitting Height: Reason Didn't Complete
    # 'Balance Test',  # Balance Test
    # 'Balance Test Incomplete',  # Balance Test: Reason Didn't Complete
    # 'Blood Pressure Measure',  # Blood Pressure Measurements
    # 'Blood Pressure Measure Incomplete',  # Blood Pressure Measurements: Reason Didn't Complete
    # 'Timed Walk Measure',  # Timed Walk Measurements
    # 'Timed Walk Measure Incomplete',  # Timed Walk Measurements: Reason Didn't Complete
    # 'Hand Grip Measure',  # Hand Grip Strength Measurements
    # 'Hand Grip Measure Incomplete',  # Hand Grip Strength Measurements: Reason Didn't Complete
}

<span style="font-size:0.8em;">

<h3><center> EDA Section L - ASSISTANCE AND CAREGIVING </center></h3>

This section consists of 31 subsections (1103 features) which are shown below:
 
-  ADL Help
-  IADL Help
-  Whether Uses Personal Aids
-  Future ADL Help
-  Activities of Daily Living: Whether Receives Any Care
-  Activities of Daily Living: Whether Receives Any Informal Care
-  Activities of Daily Living: Receives Informal Care from Spouse
-  Activities of Daily Living: Receives Informal Care from Children or Grandchildren
-  Activities of Daily Living: Receives Informal Care from Relatives
-  Activities of Daily Living: Receives Informal Care from Other Individuals
-  Activities of Daily Living: Whether Receives Any Formal Care
-  Activities of Daily Living: Receives Formal Care from Paid Professional
-  Instrumental Activities of Daily Living: Whether Receives Any Care
-  Instrumental Activities of Daily Living: Whether Receives Any Informal Care
-  Instrumental Activities of Daily Living: Receives Informal Care from Spouse
-  Instrumental Activities of Daily Living: Receives Informal Care from Children or Grandchildren
-  Instrumental Activities of Daily Living: Receives Informal Care from Relatives
-  Instrumental Activities of Daily Living: Receives Informal Care from Other Individuals
-  Instrumental Activities of Daily Living: Whether Receives Any Formal Care
-  Instrumental Activities of Daily Living: Receives Formal Care from Paid Professional
-  Activities of Daily Living and Instrumental Activities of Daily Living: Whether Receives Any Care
-  Activities of Daily Living and Instrumental Activities of Daily Living: Whether Receives Any Informal Care
-  Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Spouse
-  Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Children or Grandchildren
-  Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Relatives
-  Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Other Individuals
-  Activities of Daily Living and Instrumental Activities of Daily Living: Whether Receives Any Formal Care
-  Activities of Daily Living and Instrumental Activities of Daily Living: Receives Formal Care from Paid Professional
-  Receives Help with Chores from Children or Grandchildren
-  Provides Informal Care to Children or Grandchildren
-  Provides Personal Care to Parents
-  Provides Informal Care for Sick or Disabled Adults
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [14]:
elimin_L = {
    # 'ADL Help',  # ADL Help
    # 'IADL Help',  # IADL Help
    # 'Personal Aids',  # Whether Uses Personal Aids
    # 'Future ADL Help',  # Future ADL Help
    # 'ADL Any Care',  # Activities of Daily Living: Whether Receives Any Care
    # 'ADL Informal Care',  # Activities of Daily Living: Whether Receives Any Informal Care
    # 'ADL Informal Care Spouse',  # Activities of Daily Living: Receives Informal Care from Spouse
    # 'ADL Care Children or Grandchildren',  # Activities of Daily Living: Receives Informal Care from Children or Grandchildren
    # 'ADL Care Relatives',  # Activities of Daily Living: Receives Informal Care from Relatives
    # 'ADL Care Other Individuals',  # Activities of Daily Living: Receives Informal Care from Other Individuals
    # 'ADL Formal Care',  # Activities of Daily Living: Whether Receives Any Formal Care
    # 'ADL Care Paid Professional',  # Activities of Daily Living: Receives Formal Care from Paid Professional
    # 'IADL Any Care',  # Instrumental Activities of Daily Living: Whether Receives Any Care
    # 'IADL Informal Care',  # Instrumental Activities of Daily Living: Whether Receives Any Informal Care
    # 'IADL Informal Care Spouse',  # Instrumental Activities of Daily Living: Receives Informal Care from Spouse
    # 'IADL Care Children or Grandchildren',  # Instrumental Activities of Daily Living: Receives Informal Care from Children or Grandchildren
    # 'IADL Care Relatives',  # Instrumental Activities of Daily Living: Receives Informal Care from Relatives
    # 'IADL Care Other Individuals',  # Instrumental Activities of Daily Living: Receives Informal Care from Other Individuals
    # 'IADL Formal Care',  # Instrumental Activities of Daily Living: Whether Receives Any Formal Care
    # 'IADL Care Paid Professional',  # Instrumental Activities of Daily Living: Receives Formal Care from Paid Professional
    # 'ADL & IADL Any Care',  # Activities of Daily Living and Instrumental Activities of Daily Living: Whether Receives Any Care
    # 'ADL & IADL Informal Care',  # Activities of Daily Living and Instrumental Activities of Daily Living: Whether Receives Any Informal Care
    # 'ADL & IADL Care Spouse',  # Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Spouse
    # 'ADL & IADL Care Children or Grandchildren',  # Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Children or Grandchildren
    # 'ADL & IADL Care Relatives',  # Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Relatives
    # 'ADL & IADL Care Other Individuals',  # Activities of Daily Living and Instrumental Activities of Daily Living: Receives Informal Care from Other Individuals
    # 'ADL & IADL Formal Care',  # Activities of Daily Living and Instrumental Activities of Daily Living: Whether Receives Any Formal Care
    # 'ADL & IADL Care Paid Professional',  # Activities of Daily Living and Instrumental Activities of Daily Living: Receives Formal Care from Paid Professional
    # 'Help Children or Grandchildren',  # Receives Help with Chores from Children or Grandchildren
    # 'Provides Care to Children or Grandchildren',  # Provides Informal Care to Children or Grandchildren
    # 'Provides Care to Parents',  # Provides Personal Care to Parents
    # 'Provides Care for Sick or Disabled Adults',  # Provides Informal Care for Sick or Disabled Adults
}

<span style="font-size:0.8em;">

<h3><center> EDA Section M - STRESS </center></h3>

This section consists of 4 subsections (77 features) which are shown below:
  
-  Social Support: Spouse
-  Social Support: Children
-  Social Support: Friends
-  Experienced Death of a Child
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [15]:
elimin_M = {
    # 'Support Spouse',  # Social Support: Spouse
    # 'Support Children',  # Social Support: Children
    # 'Support: Friends',  # Social Support: Friends
    # 'Experienced Death of a Child',  # Experienced Death of a Child
}

<span style="font-size:0.8em;">

<h3><center> EDA Section O - END OF LIFE PLANNING </center></h3>

This section consists of 3 subsections (33 features) which are shown below:

-  Will: Whether Has a Will
-  Will: Beneficiaries of Will
-  Covered by Life Insurance
</span>

For this project, we don't need the subsections highlighted in bold... The reason is that blablabla...

In [16]:
elimin_O = {
#     'Has a Will',  # Will: Whether Has a Will
#     'Beneficiaries Will',  # Will: Beneficiaries of Will
#     'Covered Life Insurance',  # Covered by Life Insurance
}

<span style="font-size:0.8em;">

<h3><center> EDA Section Q - PSYCHOSOCIAL </center></h3>

This section consists of 8 subsections (166 features) which are shown below:

-  Depressive Symptoms: CESD
-  Satisfaction with Life Scale
-  Single Life Satisfaction Question
-  Cantril Ladder
</span>

In [17]:
elimin_Q = {
    # 'Depressive Symptoms',  # Depressive Symptoms: CESD
    # 'Satisfaction Life',  # Satisfaction with Life Scale
    # 'Single Life Satisfaction',  # Single Life Satisfaction Question
    # 'Cantril Ladder',  # Cantril Ladder
}

In [18]:
elimins = [elimin_A, elimin_B, elimin_C, elimin_D, elimin_E, elimin_F, elimin_G, elimin_H, elimin_I, elimin_J, elimin_K, elimin_L, elimin_M, elimin_O, elimin_Q]

h_mhas_c2_unified = data_utils.outline_dataframe(h_mhas_c2,elimins)
print(h_mhas_c2_unified.columns.tolist())


['acthog', 'cX000cpindex', 'cX001cpindex', 'cX002cpindex', 'cX003cpindex', 'cX011cpindex', 'cX012cpindex', 'cX013cpindex', 'cX014cpindex', 'cX015cpindex', 'cX016cpindex', 'cX017cpindex', 'cX018cpindex', 'cX019cpindex', 'codent01', 'codent03', 'cunicah', 'ent2', 'hXabdstk', 'hXabsns', 'hXachck', 'hXadebt', 'hXafbdstk', 'hXafbsns', 'hXafchck', 'hXafdebt', 'hXafhous', 'hXaflend', 'hXafmort', 'hXafothr', 'hXafrles', 'hXaftotb', 'hXaftotf', 'hXaftoth', 'hXaftran', 'hXahous', 'hXalend', 'hXamort', 'hXaothr', 'hXarles', 'hXatotb', 'hXatotf', 'hXatoth', 'hXatran', 'hXchdeathe', 'hXchild', 'hXcoresd', 'hXcpl', 'hXdau', 'hXdchild', 'hXfcamt', 'hXfcany', 'hXfcflag', 'hXgapcare', 'hXgapcarehr', 'hXgccare_m', 'hXgccarehr_m', 'hXgrchild', 'hXhhid', 'hXhhidc', 'hXhhres', 'hXhhresp', 'hXhownrnt', 'hXicap', 'hXifcap', 'hXifrent', 'hXifsemp', 'hXiftot', 'hXiftrest', 'hXirent', 'hXisemp', 'hXitot', 'hXitrest', 'hXkcnt', 'hXlvnear', 'hXrcchore', 'hXrcchorehr', 'hXrcchorenf', 'hXrural', 'hXrural_m', 'hXson

In [19]:
print(h_mhas_c2_unified.head())
print(h_mhas_c2_unified.shape)
h_mhas_c2_unified.info()

  acthog  cX000cpindex  cX001cpindex  cX002cpindex  cX003cpindex  \
0     00           NaN           NaN           NaN           NaN   
1     00           NaN           NaN           NaN           NaN   
2     00           NaN           NaN           NaN           NaN   
3     00           NaN           NaN           NaN           NaN   
4     01           NaN           NaN           NaN           NaN   

   cX011cpindex  cX012cpindex  cX013cpindex  cX014cpindex  cX015cpindex  ...  \
0           NaN           NaN           NaN           NaN           NaN  ...   
1           NaN           NaN           NaN           NaN           NaN  ...   
2           NaN           NaN           NaN           NaN           NaN  ...   
3           NaN           NaN           NaN           NaN           NaN  ...   
4           NaN           NaN           NaN           NaN           NaN  ...   

   subhog_12  subhog_15  subhog_18  tipent_01  tipent_03  tipent_12  \
0        1.0       31.0       99.0     