# Texas Choroshape Examples

This script creates county-level choropleth maps for Texas demographic data. It creates some basic classes and walks through some use cases. City and county shapefiles were created with ArcGIS and have not been included. Colors were chosen using Color Brewer 2.0 (http://colorbrewer2.org/).
<br/><br/>
All data is publicly available. The data file has been downloaded from the Texas State Data Center on 8/10/2016 (http://osd.texas.gov/Data/TPEPP/Projections/). The data comprises 2014 Population Projections with a Full 2000-10 Migration Rate. 

In [200]:
# standard library
import os
import pandas as pd
import geopandas as gpd

%load_ext autoreload
%autoreload 2
%matplotlib inline

pd.set_option("display.max_rows",10)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Sets up some variables. A census api key must be specified here, as must the output path for storing the map image files.

In [201]:
OUTPATH = os.path.expanduser('~/Desktop/Example_Files/')
CURR_PATH = (os.path.realpath(''))

TSDC_FILE = os.path.normpath(os.path.join(CURR_PATH,
    'Data_Files/TSDC_PopulationProj_County_AgeGroup Yr2014 - 1.0ms.xlsx'))

# Data from Excel


Let's load and clean some data. First, load the file into a pandas dataframe.

In [202]:
pop_data = pd.read_excel(TSDC_FILE)
print('the dimensions are in rows, cols: ', pop_data.shape)
pop_data

('the dimensions are in rows, cols: ', (1530, 20))


Unnamed: 0,FIPS,area_name,migration_scenario,year,age_group,total,total_male,total_female,total_anglo,anglo_male,anglo_female,total_black,black_male,black_female,total_hispanic,hispanic_male,hispanic_female,total_other,other_male,other_female
0,0,State of Texas,1.0 Scen,2014,ALL,27161942,13499310,13662632,11624881,5753306,5871575,3114187,1506024,1608163,10740456,5417985,5322471,1682418,821995,860423
1,0,State of Texas,1.0 Scen,2014,<18,7216132,3691030,3525102,2330976,1195370,1135606,824890,420626,404264,3578159,1829787,1748372,482107,245247,236860
2,0,State of Texas,1.0 Scen,2014,18-24,2790003,1445762,1344241,964491,495432,469059,354617,180591,174026,1302563,683133,619430,168332,86606,81726
3,0,State of Texas,1.0 Scen,2014,25-44,7569054,3800392,3768662,2934438,1479361,1455077,889282,426914,462368,3206202,1635324,1570878,539132,258793,280339
4,0,State of Texas,1.0 Scen,2014,45-64,6500397,3188648,3311749,3381136,1672570,1708566,778233,368572,409661,1976562,974421,1002141,364466,173085,191381
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1525,507,Zavala County,1.0 Scen,2014,<18,3714,1851,1863,127,55,72,13,6,7,3566,1787,1779,8,3,5
1526,507,Zavala County,1.0 Scen,2014,18-24,1406,732,674,43,24,19,5,1,4,1356,707,649,2,0,2
1527,507,Zavala County,1.0 Scen,2014,25-44,2829,1495,1334,141,73,68,8,6,2,2674,1413,1261,6,3,3
1528,507,Zavala County,1.0 Scen,2014,45-64,2637,1273,1364,169,88,81,6,4,2,2450,1175,1275,12,6,6


We're not interested in state level data, so let's drop that from our dataframe

In [203]:
pop_data= pop_data[pop_data['FIPS'] != 0]
pop_data

Unnamed: 0,FIPS,area_name,migration_scenario,year,age_group,total,total_male,total_female,total_anglo,anglo_male,anglo_female,total_black,black_male,black_female,total_hispanic,hispanic_male,hispanic_female,total_other,other_male,other_female
6,1,Anderson County,1.0 Scen,2014,ALL,59991,36443,23548,35957,19870,16087,12456,9030,3426,10242,6895,3347,1336,648,688
7,1,Anderson County,1.0 Scen,2014,<18,11542,5942,5600,6555,3376,3179,1707,850,857,2756,1458,1298,524,258,266
8,1,Anderson County,1.0 Scen,2014,18-24,4747,2847,1900,2526,1393,1133,951,653,298,1152,744,408,118,57,61
9,1,Anderson County,1.0 Scen,2014,25-44,18335,12981,5354,8965,5525,3440,4968,4169,799,4097,3134,963,305,153,152
10,1,Anderson County,1.0 Scen,2014,45-64,17014,10700,6314,11002,6308,4694,3868,2912,956,1880,1356,524,264,124,140
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1525,507,Zavala County,1.0 Scen,2014,<18,3714,1851,1863,127,55,72,13,6,7,3566,1787,1779,8,3,5
1526,507,Zavala County,1.0 Scen,2014,18-24,1406,732,674,43,24,19,5,1,4,1356,707,649,2,0,2
1527,507,Zavala County,1.0 Scen,2014,25-44,2829,1495,1334,141,73,68,8,6,2,2674,1413,1261,6,3,3
1528,507,Zavala County,1.0 Scen,2014,45-64,2637,1273,1364,169,88,81,6,4,2,2450,1175,1275,12,6,6


Get the total population for each county in Texas.

In [204]:
total_data = pop_data[pop_data['age_group'] == 'ALL']
print('the dimensions are in rows, cols: ', total_data.shape)
total_data[:10]

('the dimensions are in rows, cols: ', (254, 20))


Unnamed: 0,FIPS,area_name,migration_scenario,year,age_group,total,total_male,total_female,total_anglo,anglo_male,anglo_female,total_black,black_male,black_female,total_hispanic,hispanic_male,hispanic_female,total_other,other_male,other_female
6,1,Anderson County,1.0 Scen,2014,ALL,59991,36443,23548,35957,19870,16087,12456,9030,3426,10242,6895,3347,1336,648,688
12,3,Andrews County,1.0 Scen,2014,ALL,15861,7942,7919,7211,3535,3676,202,103,99,8114,4141,3973,334,163,171
18,5,Angelina County,1.0 Scen,2014,ALL,89854,44325,45529,54815,26830,27985,13348,6365,6983,19441,10082,9359,2250,1048,1202
24,7,Aransas County,1.0 Scen,2014,ALL,24431,12061,12370,16874,8220,8654,265,150,115,6358,3224,3134,934,467,467
30,9,Archer County,1.0 Scen,2014,ALL,9416,4711,4705,8437,4192,4245,34,20,14,781,428,353,164,71,93
36,11,Armstrong County,1.0 Scen,2014,ALL,1957,967,990,1775,880,895,11,6,5,131,64,67,40,17,23
42,13,Atascosa County,1.0 Scen,2014,ALL,49165,24151,25014,17195,8274,8921,273,153,120,31064,15412,15652,633,312,321
48,15,Austin County,1.0 Scen,2014,ALL,31434,15504,15930,19567,9473,10094,2966,1466,1500,8290,4278,4012,611,287,324
54,17,Bailey County,1.0 Scen,2014,ALL,7670,3899,3771,2719,1324,1395,70,43,27,4808,2499,2309,73,33,40
60,19,Bandera County,1.0 Scen,2014,ALL,22335,11049,11286,17804,8759,9045,88,49,39,4007,2045,1962,436,196,240


There's a lot of extra columns we don't need--we're really only interested in what will go in our report. Let's slice the data to keep only the relevant columns by using lists

In [205]:
cols_to_keep = [col for col in total_data.columns if 'total' in col]
cols_to_keep = ['FIPS', 'area_name'] + cols_to_keep
cols_to_keep

['FIPS',
 'area_name',
 u'total',
 u'total_male',
 u'total_female',
 u'total_anglo',
 u'total_black',
 u'total_hispanic',
 u'total_other']

Ah, much better...

In [206]:
total_data = total_data.loc[:,cols_to_keep]
total_data

Unnamed: 0,FIPS,area_name,total,total_male,total_female,total_anglo,total_black,total_hispanic,total_other
6,1,Anderson County,59991,36443,23548,35957,12456,10242,1336
12,3,Andrews County,15861,7942,7919,7211,202,8114,334
18,5,Angelina County,89854,44325,45529,54815,13348,19441,2250
24,7,Aransas County,24431,12061,12370,16874,265,6358,934
30,9,Archer County,9416,4711,4705,8437,34,781,164
...,...,...,...,...,...,...,...,...,...
1500,499,Wood County,44655,21974,22681,37385,2126,4208,936
1506,501,Yoakum County,8531,4254,4277,3073,59,5284,115
1512,503,Young County,19074,9482,9592,14962,227,3529,356
1518,505,Zapata County,15241,7686,7555,844,11,14324,62


Oh wait, we also don't need the 'total_other' column

In [207]:
total_data = total_data.drop('total_other', axis=1)
total_data

Unnamed: 0,FIPS,area_name,total,total_male,total_female,total_anglo,total_black,total_hispanic
6,1,Anderson County,59991,36443,23548,35957,12456,10242
12,3,Andrews County,15861,7942,7919,7211,202,8114
18,5,Angelina County,89854,44325,45529,54815,13348,19441
24,7,Aransas County,24431,12061,12370,16874,265,6358
30,9,Archer County,9416,4711,4705,8437,34,781
...,...,...,...,...,...,...,...,...
1500,499,Wood County,44655,21974,22681,37385,2126,4208
1506,501,Yoakum County,8531,4254,4277,3073,59,5284
1512,503,Young County,19074,9482,9592,14962,227,3529
1518,505,Zapata County,15241,7686,7555,844,11,14324


Let's go back to our original dataframe and sort by a different age category to find the number of children for each county

In [208]:
children = pop_data[pop_data['age_group'] == '<18']
print('the dimensions are in rows, cols: ', children.shape)
children

('the dimensions are in rows, cols: ', (254, 20))


Unnamed: 0,FIPS,area_name,migration_scenario,year,age_group,total,total_male,total_female,total_anglo,anglo_male,anglo_female,total_black,black_male,black_female,total_hispanic,hispanic_male,hispanic_female,total_other,other_male,other_female
7,1,Anderson County,1.0 Scen,2014,<18,11542,5942,5600,6555,3376,3179,1707,850,857,2756,1458,1298,524,258,266
13,3,Andrews County,1.0 Scen,2014,<18,4563,2390,2173,1509,800,709,57,33,24,2895,1506,1389,102,51,51
19,5,Angelina County,1.0 Scen,2014,<18,23711,12101,11610,11728,6011,5717,3688,1884,1804,7454,3801,3653,841,405,436
25,7,Aransas County,1.0 Scen,2014,<18,4507,2335,2172,2263,1168,1095,55,29,26,1998,1035,963,191,103,88
31,9,Archer County,1.0 Scen,2014,<18,2111,1043,1068,1783,883,900,12,7,5,274,135,139,42,18,24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1501,499,Wood County,1.0 Scen,2014,<18,8705,4421,4284,6470,3272,3198,331,171,160,1571,809,762,333,169,164
1507,501,Yoakum County,1.0 Scen,2014,<18,2594,1283,1311,679,345,334,10,5,5,1874,922,952,31,11,20
1513,503,Young County,1.0 Scen,2014,<18,4537,2347,2190,3024,1539,1485,52,33,19,1341,714,627,120,61,59
1519,505,Zapata County,1.0 Scen,2014,<18,5207,2762,2445,119,69,50,2,1,1,5075,2686,2389,11,6,5


And we'll pair downt the dataframe here

In [209]:
children = children[['FIPS', 'total']]  # select
children

Unnamed: 0,FIPS,total
7,1,11542
13,3,4563
19,5,23711
25,7,4507
31,9,2111
...,...,...
1501,499,8705
1507,501,2594
1513,503,4537
1519,505,5207


And we'll rename it to make it less confusing

In [210]:
children.columns = ['FIPS', 'total_children']  # rename
children

Unnamed: 0,FIPS,total_children
7,1,11542
13,3,4563
19,5,23711
25,7,4507
31,9,2111
...,...,...
1501,499,8705
1507,501,2594
1513,503,4537
1519,505,5207


And let's add our children column back in with a dataframe merge

In [211]:
total_data = pd.merge(total_data, children, how='left', on='FIPS')
print('the dimensions are in rows, cols: ', total_data.shape)
total_data

('the dimensions are in rows, cols: ', (254, 9))


Unnamed: 0,FIPS,area_name,total,total_male,total_female,total_anglo,total_black,total_hispanic,total_children
0,1,Anderson County,59991,36443,23548,35957,12456,10242,11542
1,3,Andrews County,15861,7942,7919,7211,202,8114,4563
2,5,Angelina County,89854,44325,45529,54815,13348,19441,23711
3,7,Aransas County,24431,12061,12370,16874,265,6358,4507
4,9,Archer County,9416,4711,4705,8437,34,781,2111
...,...,...,...,...,...,...,...,...,...
249,499,Wood County,44655,21974,22681,37385,2126,4208,8705
250,501,Yoakum County,8531,4254,4277,3073,59,5284,2594
251,503,Young County,19074,9482,9592,14962,227,3529,4537
252,505,Zapata County,15241,7686,7555,844,11,14324,5207


What if we want ratios?

In [212]:
new_data = total_data.iloc[:,:2].join(total_data.iloc[:,3:].div(total_data.total, axis='index'), rsuffix='_ratio')
new_data

Unnamed: 0,FIPS,area_name,total_male,total_female,total_anglo,total_black,total_hispanic,total_children
0,1,Anderson County,0.607474,0.392526,0.599373,0.207631,0.170726,0.192396
1,3,Andrews County,0.500725,0.499275,0.454637,0.012736,0.511569,0.287687
2,5,Angelina County,0.493300,0.506700,0.610045,0.148552,0.216362,0.263884
3,7,Aransas County,0.493676,0.506324,0.690680,0.010847,0.260243,0.184479
4,9,Archer County,0.500319,0.499681,0.896028,0.003611,0.082944,0.224193
...,...,...,...,...,...,...,...,...
249,499,Wood County,0.492084,0.507916,0.837196,0.047609,0.094234,0.194939
250,501,Yoakum County,0.498652,0.501348,0.360216,0.006916,0.619388,0.304068
251,503,Young County,0.497116,0.502884,0.784419,0.011901,0.185016,0.237863
252,505,Zapata County,0.504298,0.495702,0.055377,0.000722,0.939833,0.341644
