## Dummy Variables

### Income

|        | income_middle | income_upper |
|--------|---------------|--------------|
| low    | 0             | 0            |
| middle | 1             | 0            |
| upper  | 0             | 1            |


### Age

|              | age_young_adult | age_mature_adult | age_senior |
|--------------|-----------------|------------------|------------|
| students     | 0               | 0                | 0          |
| young_adult  | 1               | 0                | 0          |
| mature_adult | 0               | 1                | 0          |
| senior       | 0               | 0                | 1          |


### Relationship

|         | relationship_married | relationship_single |
|---------|----------------------|---------------------|
| unknown | 0                    | 0                   |
| married | 1                    | 0                   |
| single  | 0                    | 1                   |


### Home

|         | home_owner | home_renter |
|---------|------------|-------------|
| unknown | 0          | 0           |
| owner   | 1          | 0           |
| renter  | 0          | 1           |


### Home Size

|        | homesize_couple | homesize_group |
|--------|-----------------|----------------|
| single | 0               | 0              |
| couple | 1               | 0              |
| group  | 0               | 1              |

In [1]:
import numpy as np
import pandas as pd

In [30]:
households = 'data/hh_demographic.csv'
df_hh = pd.read_csv(households)

Income Categories:

    * Lower: 0 - 34K
    * Middle: 35 - 74K
    * Upper: 75K+

In [31]:
df_hh['income'] = df_hh.INCOME_DESC.map({
    'Under 15K': 'lower',
    '15-24K': 'lower',
    '25-34K': 'lower',
    '35-49K': 'middle',
    '50-74K': 'middle',
    '75-99K': 'upper',
    '100-124K': 'upper',
    '125-149K': 'upper',
    '150-174K': 'upper',
    '175-199K': 'upper',
    '200-249K': 'upper',
    '250K+': 'upper'
})

Age Categories:

    * Students: 19-24
    * Young Adults: 25-44
    * Mature Adults: 45-64
    * Seniors: 65+

In [32]:
df_hh['age'] = df_hh.AGE_DESC.map({
    '19-24': 'student',
    '25-34': 'young_adult',
    '35-44': 'young_adult',
    '45-54': 'mature_adult',
    '55-64': 'mature_adult',
    '65+': 'senior'
})

Relationship Categories:

    * Married
    * Single
    * Unknown

In [33]:
df_hh['relationship'] = df_hh.MARITAL_STATUS_CODE.map({
    'A': 'married',
    'B': 'single',
    'U': 'unknown'
})

Household Categories:

    * Owner : Homeowners or Probable Owners
    * Renter : Renters or Probable Renters
    * Unknown

In [34]:
df_hh['home'] = df_hh.HOMEOWNER_DESC.map({
    'Probable Renter': 'renter',
    'Renter': 'renter',
    'Probable Owner': 'owner',
    'Homeowner': 'owner',
    'Unknown': 'unknown'
})

House Size Categories:

    * Single : 1
    * Couple : 2
    * Group : 3+

In [35]:
df_hh['homesize'] = df_hh.HOUSEHOLD_SIZE_DESC.map({
    '1': 'single',
    '2': 'couple',
    '3': 'group',
    '4': 'group',
    '5+': 'group'
})

In [47]:
df_dummy = df_hh[['household_key', 'income', 'age', 'relationship', 'home', 'homesize']]
df_dummy = pd.get_dummies(df_dummy, columns=['income', 'age', 'relationship', 'home', 'homesize'])
df_dummy = df_dummy.drop(['income_lower', 'age_student', 'relationship_unknown', 'home_unknown', 'homesize_single'], axis=1)
df_dummy.to_csv('hh_dummy_vars.csv')