<a href="https://colab.research.google.com/github/npr99/PlanningMethods/blob/master/PLAN604_ACS_Population_byAge_Sex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Application of Population Compostion by Age and Sex from ACS Data
 
---
This Google Colab Notebook provides a complete workflow (sequence of steps from start to finish) that will allow you to explore population composition data by age and sex found in ACS Data. 

This notebook obtains and cleans population percentages found in the ACS.

The notebook also explores outliers for a single age and sex composition. 

### Assignment Steps:
1. Run all blocks of the notebook.
2. On step for change the code by selecting a sex and age cohort to explore outliers
3. Choose a state to find a specific county
4. Update the last block of code with the county you want to explore in more detail
5. In the class discussion board on Canvas create a discussion post about the county you found to be an outlier




Helpful Links that inspired this notebook:

https://www.youtube.com/watch?v=jMBaY-rO4G0

https://walker-data.com/tidycensus/articles/other-datasets.html#migration-flows-1


#  Your Choice
In the code block below select a sex and age cohort to explore.

Sex Options, choose one by uncommenting the line of code.

Age Cohort Options, choose one by uncommenting the line of code.

Leave year as 2019.

Select a state based on FIPS code [see link](https://www.census.gov/library/reference/code-lists/ansi/ansi-codes-for-states.html).

To uncomment - remove the # from the beginning of the line of code, and add a # to the beginning of the line of code that is currently uncommented.



In [1]:
# Select Sex (uncomment one line)
sex = "Female"
#sex = "Male"

# Select age cohort (uncomment one line)
#agecohort = "Under 5"
#agecohort = "5 to 9"
#agecohort = "10 to 14"
#agecohort = "15 to 19"
agecohort = "20 to 24"
#agecohort = "25 to 29"
#agecohort = "30 to 34"
#agecohort = "35 to 39"
#agecohort = "40 to 44"
#agecohort = "45 to 49"
#agecohort = "50 to 54"
#agecohort = "55 to 59"
#agecohort = "60 to 64"
#agecohort = "65 to 69"
#agecohort = "70 to 74"
#agecohort = "75 to 79"
#agecohort = "80 to 84"
#agecohort = "85"

# Select state by FIPS Code
state = '48'

# Select year (leave as 2019 for now)
year = '2019'

In [2]:
# Downloading and running python script from github
# https://jckantor.github.io/cbe61622/A.02-Downloading_Python_source_files_from_github.html
# Make sure the url is the raw version of the file on GitHub

user = "npr99"
repo = "PlanningMethods"
pyfile = "_planning_methods.py"
url = f"https://raw.githubusercontent.com/{user}/{repo}/master/{pyfile}"
!wget --no-cache --quiet --backups=1 {url}
print("Reading in python file from",url)
exec(open(pyfile).read())

'wget' is not recognized as an internal or external command,
operable program or batch file.


Reading in python file from https://raw.githubusercontent.com/npr99/PlanningMethods/master/_planning_methods.py


## Steps 1 and 2: Obtain Data and Clean Data
The previous step read in a python file with the Obtain Data and Clean Data functions presented in the notebook on [Sample Size and Confidence Intervals](https://github.com/npr99/PlanningMethods/blob/master/PLAN604_Population_vs_Sample_USCounties.ipynb) and the notebook on [Comparison of two proportions](https://github.com/npr99/PlanningMethods/blob/master/PLAN604_Comparison_of_two_proportions_tractlevel.ipynb). The block of code with the python functions needs to be run first and then the function can be called in future blocks of code.

## Run Obtain Census API for ACS data with Age and Sex Charactersitics
The next block of code calls the function and gets varaibles related to public transportation use. The code is called in a loop.

For more variables see:

https://www.census.gov/data/developers/data-sets/acs-5year.2019.html

https://api.census.gov/data/2019/acs/acs5/subject/groups/S0101.html

https://data.census.gov/cedsci/table?tid=ACSST1Y2019.S0101


### Use a loop to make list of variables to get

In [3]:
acs_df = planning_methods.percent_pop_by_age_sex()

Census API data from: https://api.census.gov/data/2019/acs/acs5/subject?get=GEO_ID,NAME,S0101_C03_001E,S0101_C03_001M,S0101_C04_002E,S0101_C04_002M,S0101_C04_003E,S0101_C04_003M,S0101_C04_004E,S0101_C04_004M,S0101_C04_005E,S0101_C04_005M,S0101_C04_006E,S0101_C04_006M,S0101_C04_007E,S0101_C04_007M,S0101_C04_008E,S0101_C04_008M,S0101_C04_009E,S0101_C04_009M,S0101_C04_010E,S0101_C04_010M,S0101_C04_011E,S0101_C04_011M,S0101_C04_012E,S0101_C04_012M,S0101_C04_013E,S0101_C04_013M,S0101_C04_014E,S0101_C04_014M,S0101_C04_015E,S0101_C04_015M,S0101_C04_016E,S0101_C04_016M,S0101_C04_017E,S0101_C04_017M,S0101_C04_018E,S0101_C04_018M,S0101_C04_019E,S0101_C04_019M&in=state:*&in=county:*&for=county:*
Census API data from: https://api.census.gov/data/2019/acs/acs5/subject?get=GEO_ID,NAME,S0101_C05_001E,S0101_C05_001M,S0101_C06_002E,S0101_C06_002M,S0101_C06_003E,S0101_C06_003M,S0101_C06_004E,S0101_C06_004M,S0101_C06_005E,S0101_C06_005M,S0101_C06_006E,S0101_C06_006M,S0101_C06_007E,S0101_C06_007M,S0101_C0

In [4]:
acs_df['2019 Percent Male'].head()

Unnamed: 0,Geography,Geographic Area Name,Total population 2019 (Estimate),Total population 2019 (MOE),Under 5 years 2019 (Estimate),Under 5 years 2019 (MOE),5 to 9 years 2019 (Estimate),5 to 9 years 2019 (MOE),10 to 14 years 2019 (Estimate),10 to 14 years 2019 (MOE),...,70 to 74 years 2019 (Estimate),70 to 74 years 2019 (MOE),75 to 79 years 2019 (Estimate),75 to 79 years 2019 (MOE),80 to 84 years 2019 (Estimate),80 to 84 years 2019 (MOE),85 years and over 2019 (Estimate),85 years and over 2019 (MOE),state,county
0,0500000US17097,"Lake County, Illinois",350466,84.0,5.9,0.1,7.0,0.2,7.3,0.2,...,3.2,0.1,2.1,0.1,1.3,0.1,1.2,0.1,17,97
1,0500000US17051,"Fayette County, Illinois",11469,113.0,5.4,0.4,5.4,0.8,5.5,0.6,...,4.6,0.7,3.5,0.6,1.6,0.4,1.7,0.5,17,51
2,0500000US17107,"Logan County, Illinois",14298,218.0,5.8,0.3,5.3,0.7,5.2,0.7,...,3.8,0.6,2.9,0.5,1.4,0.4,2.4,0.6,17,107
3,0500000US17165,"Saline County, Illinois",11807,118.0,6.8,0.5,5.2,0.9,6.8,1.1,...,4.1,0.8,3.1,0.7,2.2,0.5,1.8,0.5,17,165
4,0500000US17127,"Massac County, Illinois",6714,54.0,5.8,0.4,7.7,1.2,5.1,1.2,...,5.9,1.0,3.8,0.9,2.9,0.9,1.5,0.7,17,127


In [5]:
acs_df['2019 Percent Female'].head()

Unnamed: 0,Geography,Geographic Area Name,Total population 2019 (Estimate),Total population 2019 (MOE),Under 5 years 2019 (Estimate),Under 5 years 2019 (MOE),5 to 9 years 2019 (Estimate),5 to 9 years 2019 (MOE),10 to 14 years 2019 (Estimate),10 to 14 years 2019 (MOE),...,70 to 74 years 2019 (Estimate),70 to 74 years 2019 (MOE),75 to 79 years 2019 (Estimate),75 to 79 years 2019 (MOE),80 to 84 years 2019 (Estimate),80 to 84 years 2019 (MOE),85 years and over 2019 (Estimate),85 years and over 2019 (MOE),state,county
0,0500000US17097,"Lake County, Illinois",351007,84.0,5.7,0.1,6.4,0.2,7.2,0.2,...,3.5,0.2,2.4,0.1,1.7,0.1,2.3,0.1,17,97
1,0500000US17051,"Fayette County, Illinois",10096,113.0,5.6,0.1,4.9,1.2,7.0,0.8,...,5.7,0.8,4.7,0.8,3.1,0.6,2.9,0.7,17,51
2,0500000US17107,"Logan County, Illinois",14705,218.0,4.9,0.2,4.9,0.6,5.3,0.5,...,4.4,0.5,3.1,0.6,2.0,0.4,5.1,0.6,17,107
3,0500000US17165,"Saline County, Illinois",12187,118.0,5.7,0.2,4.7,1.0,6.4,0.8,...,5.7,0.9,3.7,0.8,2.8,0.7,4.1,0.8,17,165
4,0500000US17127,"Massac County, Illinois",7505,54.0,5.4,0.5,9.5,1.4,4.3,1.4,...,5.4,1.0,3.6,1.0,4.1,1.3,3.6,1.0,17,127


## Step 3: Explore Data

#### 3.1 Use descriptive statistics to check cleaning
A descriptive statistics table is a great way to check to make sure the variables have been created correctly.

In [6]:
planning_methods.descriptive_stats_table(acs_df['2019 Percent Male'], 
      who = "Percent male population by age",
      what = "descriptive statistics",
      when = "in 2015-2019",
      where = "for all US counties")

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Under 5 years 2019 (Estimate),3220.0,5.92,1.41,0.0,5.1,5.9,6.7,21.1
5 to 9 years 2019 (Estimate),3220.0,6.27,1.46,0.0,5.4,6.3,7.1,14.6
10 to 14 years 2019 (Estimate),3220.0,6.59,1.47,0.0,5.8,6.6,7.4,21.4
15 to 19 years 2019 (Estimate),3220.0,6.71,1.63,0.0,5.9,6.7,7.3,28.3
20 to 24 years 2019 (Estimate),3220.0,6.49,2.57,0.0,5.3,6.1,7.0,31.8
25 to 29 years 2019 (Estimate),3220.0,6.28,1.61,0.0,5.3,6.1,7.0,16.6
30 to 34 years 2019 (Estimate),3220.0,5.97,1.4,0.0,5.2,5.8,6.6,17.7
35 to 39 years 2019 (Estimate),3220.0,6.04,1.36,0.0,5.3,6.0,6.7,16.7
40 to 44 years 2019 (Estimate),3220.0,5.79,1.29,0.0,5.1,5.8,6.4,22.2
45 to 49 years 2019 (Estimate),3220.0,6.06,1.09,0.0,5.5,6.1,6.7,15.4


In [7]:
planning_methods.descriptive_stats_table(acs_df['2019 Percent Female'], 
      who = "Percent female population by age",
      what = "descriptive statistics",
      when = "in 2015-2019",
      where = "for all US counties")

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Under 5 years 2019 (Estimate),3220.0,5.65,1.49,0.0,4.9,5.6,6.2,36.4
5 to 9 years 2019 (Estimate),3220.0,5.96,1.41,0.0,5.1,5.9,6.7,14.1
10 to 14 years 2019 (Estimate),3220.0,6.26,1.41,0.0,5.5,6.2,7.0,16.9
15 to 19 years 2019 (Estimate),3220.0,6.18,1.63,0.0,5.38,6.0,6.8,19.7
20 to 24 years 2019 (Estimate),3220.0,5.8,2.31,0.0,4.8,5.4,6.3,28.0
25 to 29 years 2019 (Estimate),3220.0,5.76,1.36,0.0,5.0,5.7,6.4,15.5
30 to 34 years 2019 (Estimate),3220.0,5.64,1.14,0.0,5.0,5.6,6.2,16.7
35 to 39 years 2019 (Estimate),3220.0,5.77,1.22,0.0,5.1,5.8,6.5,14.4
40 to 44 years 2019 (Estimate),3220.0,5.6,1.14,0.0,5.0,5.6,6.2,15.7
45 to 49 years 2019 (Estimate),3220.0,5.97,0.99,0.0,5.5,6.0,6.5,14.3


## Step 4: Explore Data - Z-Score Outliers
A way to identify outliers is by looking at the z-score, or the number of standard deviations an observation falls from the mean. 
The formula for z-score is

>$z = \frac{observation - mean}{{standard deviation}}$

If a z-score is greater than or less than 3 it would be considered an outlier.


In [8]:
outliers = planning_methods.find_zscore_outliers(acs_df[f'{year} Percent {sex}'],f'{agecohort} years {year}')
outliers.head()

Unnamed: 0,Geography,Geographic Area Name,Total population 2019 (Estimate),Total population 2019 (MOE),Under 5 years 2019 (Estimate),Under 5 years 2019 (MOE),5 to 9 years 2019 (Estimate),5 to 9 years 2019 (MOE),10 to 14 years 2019 (Estimate),10 to 14 years 2019 (MOE),...,80 to 84 years 2019 (Estimate),80 to 84 years 2019 (MOE),85 years and over 2019 (Estimate),85 years and over 2019 (MOE),state,county,20 to 24 years 2019 Z-score,Z-score Outlier 20 to 24 years 2019 (Estimate),20 to 24 years 2019 (Estimate) SE,20 to 24 years 2019 (Estimate) CV
3101,0500000US51750,"Radford city, Virginia",9303,137.0,2.7,0.4,3.5,1.2,1.6,1.1,...,1.3,0.6,1.7,0.7,51,750,9.593563,1,1.033435,0.036908
198,0500000US51830,"Williamsburg city, Virginia",7998,122.0,2.0,0.6,2.7,0.8,1.5,0.8,...,2.7,0.7,1.1,0.6,51,830,7.951414,1,1.641337,0.067824
769,0500000US46027,"Clay County, South Dakota",7124,102.0,4.4,0.6,4.4,1.2,4.1,1.2,...,1.4,0.6,2.7,0.7,46,27,7.951414,1,1.276596,0.052752
693,0500000US16065,"Madison County, Idaho",19219,160.0,9.2,0.4,7.0,1.0,5.9,1.0,...,0.7,0.3,1.2,0.4,16,65,6.957483,1,0.547112,0.024982
1433,0500000US20161,"Riley County, Kansas",35308,193.0,5.9,0.2,4.6,0.6,4.3,0.6,...,0.7,0.2,2.2,0.4,20,161,6.914268,1,0.303951,0.013943


In [9]:
planning_methods.descriptive_stats_table(outliers, 
      who = f"Percent {sex} population by age",
      what = "descriptive statistics",
      when = "in 2015-2019",
      where = "for outlier US counties")

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Under 5 years 2019 (Estimate),77.0,5.03,1.27,0.0,4.5,5.1,5.5,9.2
5 to 9 years 2019 (Estimate),77.0,4.85,1.22,0.0,4.2,4.8,5.4,8.5
10 to 14 years 2019 (Estimate),77.0,5.03,1.29,1.0,4.4,5.1,5.7,9.5
15 to 19 years 2019 (Estimate),77.0,10.83,2.96,4.6,8.9,9.9,12.2,19.7
20 to 24 years 2019 (Estimate),77.0,16.17,3.0,12.8,14.0,15.4,17.6,28.0
25 to 29 years 2019 (Estimate),77.0,6.77,1.56,0.0,5.8,7.0,7.6,10.9
30 to 34 years 2019 (Estimate),77.0,5.64,1.09,1.9,5.1,5.7,6.2,9.2
35 to 39 years 2019 (Estimate),77.0,5.41,1.05,2.7,4.8,5.5,5.9,9.8
40 to 44 years 2019 (Estimate),77.0,4.69,1.0,1.5,4.2,4.8,5.3,7.6
45 to 49 years 2019 (Estimate),77.0,4.86,0.73,2.9,4.4,4.9,5.3,7.2


In [11]:
outliers.loc[(outliers['state'] == state) & (outliers['20 to 24 years 2019 (Estimate) CV'] < 0.1)]

Unnamed: 0,Geography,Geographic Area Name,Total population 2019 (Estimate),Total population 2019 (MOE),Under 5 years 2019 (Estimate),Under 5 years 2019 (MOE),5 to 9 years 2019 (Estimate),5 to 9 years 2019 (MOE),10 to 14 years 2019 (Estimate),10 to 14 years 2019 (MOE),...,80 to 84 years 2019 (Estimate),80 to 84 years 2019 (MOE),85 years and over 2019 (Estimate),85 years and over 2019 (MOE),state,county,20 to 24 years 2019 Z-score,Z-score Outlier 20 to 24 years 2019 (Estimate),20 to 24 years 2019 (Estimate) SE,20 to 24 years 2019 (Estimate) CV
488,0500000US48041,"Brazos County, Texas",110352,76.0,6.1,0.1,5.8,0.4,5.3,0.4,...,1.2,0.2,1.2,0.2,48,41,5.315334,1,0.121581,0.006717
2897,0500000US48471,"Walker County, Texas",30104,238.0,4.8,0.1,4.2,0.7,6.0,0.8,...,1.8,0.6,3.0,0.6,48,471,5.099262,1,1.155015,0.065626
2791,0500000US48273,"Kleberg County, Texas",15318,235.0,6.8,0.6,7.2,1.2,6.1,1.4,...,1.5,0.7,2.3,0.8,48,273,4.451046,1,0.729483,0.04531
508,0500000US48143,"Erath County, Texas",21494,278.0,5.4,0.3,5.8,1.5,5.1,1.4,...,1.7,0.6,2.6,0.7,48,143,4.321403,1,0.972644,0.06156
482,0500000US48473,"Waller County, Texas",25988,157.0,5.7,0.4,6.5,1.0,7.4,1.1,...,1.3,0.5,1.6,0.5,48,473,3.543543,1,0.911854,0.065132


## Generate Links to Sources
Public sources of Census data include Census Reporter, ACS Narratitve Profiles, and data.census.gov. The sources can be quickly located using links based on Geography Codes.

In [18]:
planning_methods.generate_source_county_links(year,'48041')

For an ACS Narrative Profile click on link:
https://www.census.gov/acs/www/data/data-tables-and-tools/narrative-profiles/2019/report.php?geotype=county&state=48&county=041


For a Census Reporter Profile click on link:
https://censusreporter.org/profiles/05000US48041


For data.census.gov data click on link:
https://data.census.gov/cedsci/table?tid=ACSST5Y2019.S0101&g=0500000US48041
