## Population

> This data will supplement our census data in the next step of data prep. We're using the Population dataset because it's more accurate (census is just estimates), and it lets us create the age grouping ourselves. There are nearly 400,000 rows, because they give us population by year, county, and EACH individual age. In our case, we want to create an age grouping that separates students in school, and adults. So we chose >= 19, and < 19. It also has a 60 year timeframe. So number of rows = 60 years * 64 counties * 90 years of age.

In [1]:
import pandas as pd, numpy as np
import df_util
from df_util import head
input_path = lambda name: f'../input-data/{name}.csv'
work_path = lambda name: f'../working-data/{name}.csv'

df_raw = pd.read_csv(input_path('county_population__eeah-cmy8'))
head(df_raw)

7 cols x 381504 rows


Unnamed: 0,county,fipscode,year,age,malepopulation,femalepopulation,totalpopulation
0,Adams,1.0,1990.0,0.0,2354.0,2404.0,4758.0
1,Adams,1.0,1990.0,1.0,2345.0,2375.0,4720.0
2,Adams,1.0,1990.0,2.0,2413.0,2219.0,4632.0


## Create age groups (< 19, >= 19)

In [2]:
df = df_raw.copy()
df.county = df.county.str.upper()
df['age_range'] = "over18"
df.loc[df.age <= 18, 'age_range'] = 'under19'
df.insert(2, 'age_range', df.pop('age_range'))

df_with_groups = df
head(df_with_groups)

8 cols x 381504 rows


Unnamed: 0,county,fipscode,age_range,year,age,malepopulation,femalepopulation,totalpopulation
0,ADAMS,1.0,under19,1990.0,0.0,2354.0,2404.0,4758.0
1,ADAMS,1.0,under19,1990.0,1.0,2345.0,2375.0,4720.0
2,ADAMS,1.0,under19,1990.0,2.0,2413.0,2219.0,4632.0


In [3]:
df = (df_with_groups.copy()
    .drop(columns=['age', 'fipscode'])
    .rename(columns={
        'malepopulation': 'male',
        'femalepopulation': 'female',
        'totalpopulation': 'total',
    })
    .groupby(['year', 'county', 'age_range'])
    .sum()
    .reset_index()
)
df_grouped = df
head(df_grouped)

6 cols x 7808 rows


Unnamed: 0,year,county,age_range,male,female,total
0,1990.0,ADAMS,over18,90383.0,94282.0,184665.0
1,1990.0,ADAMS,under19,41519.0,39525.0,81044.0
2,1990.0,ALAMOSA,over18,4488.0,4823.0,9311.0


### Notice the `age_range` column. We should pivot those values out to their own columns, and mix with our existing columns
- First, pivot age_range into the male, female, and total columns
- We're left with a multilevel column index, so we drop a level and rename everything by hand.
- Lastly, restore the total, male, and female columns since they got split in half when pivoting.

In [4]:
df = (df_grouped
    .copy()
    .pivot(
        index=['year', 'county'],
        columns='age_range',
        values=['male', 'female', 'total']
    )
    .reset_index()
    .flatten_multi_level_cols()
    .set_columns(['year', 'county', 'over18_male', 'under19_male', 'over18_female', 'under19_female', 'over18', 'under19'])
)
df_pivoted = df
head(df_pivoted)

8 cols x 3904 rows


Unnamed: 0,year,county,over18_male,under19_male,over18_female,under19_female,over18,under19
0,1990.0,ADAMS,90383.0,41519.0,94282.0,39525.0,184665.0,81044.0
1,1990.0,ALAMOSA,4488.0,2189.0,4823.0,2117.0,9311.0,4306.0
2,1990.0,ARAPAHOE,134481.0,57241.0,146820.0,54747.0,281301.0,111988.0


In [5]:
# New calculated columns
df = df_pivoted.copy()
df['total'] = df.under19 + df.over18
df['male'] = df.under19_male + df.over18_male
df['female'] = df.under19_female + df.over18_female

df = df[['year', 'county', 'total', 'male', 'female', 'over18', 'under19', 'under19_male', 'under19_female', 'over18_male', 'over18_female']]
df_with_calculated_cols = df
head(df_with_calculated_cols)

11 cols x 3904 rows


Unnamed: 0,year,county,total,male,female,over18,under19,under19_male,under19_female,over18_male,over18_female
0,1990.0,ADAMS,265709.0,131902.0,133807.0,184665.0,81044.0,41519.0,39525.0,90383.0,94282.0
1,1990.0,ALAMOSA,13617.0,6677.0,6940.0,9311.0,4306.0,2189.0,2117.0,4488.0,4823.0
2,1990.0,ARAPAHOE,393289.0,191722.0,201567.0,281301.0,111988.0,57241.0,54747.0,134481.0,146820.0


In [6]:
df_with_calculated_cols.to_csv(work_path('county_population'), index=False)