# Link Job Records with Person Records Files

This notebook attempts to link the Job Records with the Person Records using common variables - age, sex, race, hispan, earnings. 
The link will use the random merge process, starting at the block level and then moving up to Block Group, Tract, and County.

## Description of Program
- program:    ICD_1cv1_LinkJob_HUI_PREC
- task:       Obtain and clean data for Community Intersection Data.
- See github commits for description of program updates
- Current Version:    2022-01-12
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE)'}, Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, N. (2022)'}, 

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import os # For saving output to path

In [2]:
# Display versions being used - important information for replication
import sys
print("Python Version     ", sys.version)
print("numpy version:     ", np.__version__)
print("pandas version:    ", pd.__version__)

Python Version      3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:22:46) [MSC v.1916 64 bit (AMD64)]
numpy version:      1.22.2
pandas version:     1.4.1


In [3]:
# Store Program Name for output files to have the same name
programname = "ICD_1cv1_LinkJob_HUI_PREC"
# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - this helps to follow the overall workflow
outputfolder = "ICD_workflow_2022-01-19"
# Make directory to save output
if not os.path.exists(outputfolder):
    os.mkdir(outputfolder)

In [4]:
os.getcwd()

'g:\\Shared drives\\HRRC_IN-CORE\\Tasks\\M5.2-01 Pop inventory\\WorkNPR'

### Version of housing unit inventory workflow

Versioning standard based on Semantic Versioning 2.0.0 (https://semver.org/)'},.

Basic overview:
Given a version number MAJOR.MINOR.PATCH, increment the:
1. MAJOR version when you make incompatible API changes,
2. MINOR version when you add functionality in a backwards compatible manner, and
3. PATCH version when you make backwards compatible bug fixes.

API = application programming interface

How should I deal with revisions in the 0.y.z initial development phase?

The simplest thing to do is start your initial development release at 0.1.0 and then increment the minor version for each subsequent release.

I am going to release the current data using version 0.2.0 or 0-2-0 in the file names. This is an indication of what program generates the data but that the software is not being used in production. I am using the number 2 to indicate that this is the “second” version and that the earlier files released should be considered v0.1.0. The two versions are backwards compatible.


In [5]:
version = '0.2.0'
version_text = 'v0-2-0'

### Setup notebook enviroment to access Cloned Github Package
This notebook uses packages that are in developement. The packages are available at:

https://github.com/npr99/Labor_Market_Allocation

To replicate this notebook Clone the Github Package to a folder that is a sibling of this notebook.

To access the sibling package you will need to append the parent directory ('..')'}, to the system path list.

In [6]:
# to access new package that is in a sibling folder - the system path list needs to include the parent folder (..)'},
# append the path of the directory that includes the github repository.
sys.path.append("..\\github_com\\npr99\\Population_Inventory")

# Setup access to IN-CORE
https://incore.ncsa.illinois.edu/

In [7]:
#from pyincore import IncoreClient, Dataset, FragilityService, MappingSet, DataService
#from pyincore_viz.geoutil import GeoUtil as viz

### IN-CORE addons
This program uses coded that is being developed as potential add ons to pyincore. These functions are in a folder called pyincore_addons - this folder is located in the same directory as this notebook.
The add on functions are organized to mirror the folder sturcture of https://github.com/IN-CORE/pyincore

Each add on function attempts to follow the structure of existing pyincore functions and includes some help information.

In [8]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from pyincore_data_addons.SourceData.api_census_gov.tidy_censusapi import tidy_censusapi

from pyincore_data_addons.ICD01a_obtain_sourcedata import obtain_sourcedata
from pyincore_data_addons.ICD02a_clean import clean_comm_data_intrsctn
from pyincore_data_addons.ICD04c_linestring_map import wkt_line


from pyincore_data_addons.ICD02b_tidy \
    import icd_tidy as icdtidy 

## Read In Census data for a County

In [9]:
#state_counties = {'48167' : 'Galveston County, TX'}
#state_counties = {'37155' : 'Robeson County, NC'}
#state_counties = {'26127' : 'Oceanna County, MI'}
#state_counties = {'48041' : 'Brazos County, TX'}
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}},                   
                }

seed = 9876
basevintage = '2010'

In [10]:
from pyincore_data_addons.ICD02f_intersect_lodes_estab import intersect_lodes_estab

# establishment data is based on data-axel data which is not public
estab_data_folder_path = 'C:\\MyProjects\\HRRCProjects\\IN-CORE\\WorkNPR\\'
lodes_file_name = "joblist_v010_JT07_37155_2010_rs133234_prechui.csv"
posted_relative_path = '\\..\\Posted\\Labor_Market_Allocation_Output'
# Merge job records with establishment location data
jrec_estab_df = intersect_lodes_estab(communities=communities,
                                outputfolder = outputfolder,
                                seed = seed,
                                basevintage = basevintage,
                                estab_data_folder_path = estab_data_folder_path,
                                lodes_file_name = lodes_file_name,
                                posted_relative_path = posted_relative_path)


Intersect Community Data for: Lumberton, NC
Robeson County, NC : county FIPS Code 37155
File g:\Shared drives\HRRC_IN-CORE\Tasks\M5.2-01 Pop inventory\WorkNPR/ICD_workflow_2022-01-19/RobesonCounty_NC/01_SourceData/dataaxle_01a_obtain_37155_2010.csv Already exists - Skipping Obtain data-axel.
******************************
 Spatial Join Establishments with blocks
******************************
10 Establishments have missing block data
Double check to ensure that missing block data is due to
establishment geocode being just outside of county boundary.
******************************
 Expand Dataset
******************************
     Add Counter
     Generate unique ID
     Counter max length 4
******************************
 Read in LODES Dataset
******************************

***************************************
    Random merge between Job Records and Establishment.
***************************************

File ICD_workflow_2022-01-19/RobesonCounty_NC/04_RandomMerge/jrec_randomesta

In [11]:
jrec_estab_df['primary'].head(1).T

Unnamed: 0,0
jobid,WB371559601021075HB371419206021018jidodJT07212...
County2010,37155
Tract2010,37155960102
BlockGroup2010,371559601021
Block2010,371559601021075
...,...
Longitude,-78.98466
NAICS2D,44
NAICS4D,4453
SIName,


In [12]:
from pyincore_data_addons.ICD02c_intersect_prechui_nces import intersect_prechui_nces



# Merge person records with National Center for Education Statistics Data
prechui_srec_df = intersect_prechui_nces(communities=communities,
                                outputfolder = outputfolder,
                                seed = seed,
                                basevintage = basevintage)

Intersect Community Data for: Lumberton, NC
Robeson County, NC : county FIPS Code 37155
    Obtaining Housing Unit Inventory and Person Records for Lumberton, NC
     Robeson County, NC : county FIPS Code 37155
File ICD_workflow_2022-01-19/RobesonCounty_NC/02_TidySourceData/nces_tidy_ccd_37155_09.csv Already exists - Skipping Tidy NCES.

***************************************
    Random merge between Person Records and Student records.
***************************************

Round 1

***************************************
***************************************

Performing random merge at geography level: ncessch_1

***************************************
***************************************


***************************************
***************************************

Attempt to merge students on all common group vars.

***************************************
***************************************

Running random merge by ['ncessch_1', 'gradelevel1', 'sex', 'racecat5']

***

In [13]:
prechui_srec_df['primary'].head(1).T

Unnamed: 0,0
precid,B371559601011002P0002
huid,B371559601011002H001
pernum,1
Block2010str,B371559601011002
Block2010,371559601011002
Tract2010,37155960101
sex,1
race,2.0
hispan,0.0
randageP12,62.0


In [14]:
from pyincore_data_addons.ICD03a_results_table import pop_results_table as viz
viz.pop_results_table(prechui_srec_df['primary'],
                who = "Total Population by Persons", 
                what = "by Gradelevel",
                where = "Robeson County, NC",
                when = "2010",
                row_index = 'gradelevel',
                col_index = 'CommunityFocus')
                #row_percent = "1 Family Household")

['Inside Community', 'Outside Community', 'Total Population by Persons']


CommunityFocus,Inside Community (%),Outside Community (%),Total Population by Persons (%)
gradelevel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
G01,274 (7.3%),"1,528 (7.9%)","1,802 (7.8%)"
G02,281 (7.5%),"1,485 (7.7%)","1,766 (7.6%)"
G03,307 (8.2%),"1,544 (8.0%)","1,851 (8.0%)"
G04,310 (8.2%),"1,627 (8.4%)","1,937 (8.4%)"
G05,301 (8.0%),"1,543 (8.0%)","1,844 (8.0%)"
G06,289 (7.7%),"1,543 (8.0%)","1,832 (7.9%)"
G07,311 (8.3%),"1,433 (7.4%)","1,744 (7.5%)"
G08,287 (7.6%),"1,351 (7.0%)","1,638 (7.1%)"
G09,294 (7.8%),"1,767 (9.1%)","2,061 (8.9%)"
G10,307 (8.2%),"1,496 (7.7%)","1,803 (7.8%)"


In [15]:
from pyincore_data_addons.ICD02d_intersect_prechui_lodes import intersect_prechui_lodes

seed = 9876
basevintage = '2010'
lodes_file_name = "joblist_v010_JT07_37155_2010_rs133234_prechui.csv"
posted_relative_path = '\\..\\Posted\\Labor_Market_Allocation_Output'

# Merge person records with National Center for Education Statistics Data
prechui_lodes_df  = intersect_prechui_lodes(communities=communities,
                                outputfolder = outputfolder,
                                seed = seed,
                                basevintage = basevintage,
                                lodes_file_name = lodes_file_name,
                                posted_relative_path = posted_relative_path)


Intersect Community Data for: Lumberton, NC
Robeson County, NC : county FIPS Code 37155
    Obtaining Housing Unit Inventory and Person Records for Lumberton, NC
     Robeson County, NC : county FIPS Code 37155
******************************
 Read in LODES Dataset
******************************

***************************************
    Try to clean geometry lat lon.
***************************************

******************************
 Split LODES Dataset
 To append workers that live outside of county to Prechui
******************************


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


******************************
 Prepare LODES Dataset for Merge.
 Merge workers with persons living in county.
******************************

***************************************
    Random merge between Person Records and Jobs.
***************************************

Round 1

***************************************
***************************************

Performing random merge at geography level: Block

***************************************
***************************************


***************************************
***************************************

Attempt to merge householder                         on all common group vars.

***************************************
***************************************

Running random merge by ['Block2010', 'agegroupLODES', 'sex', 'race', 'hispan', 'earngingsgroupLODES1']

***************************************
    Setting up  primary data with primary key and flags
***************************************

Longest Tract2010 :

In [16]:
prechui_lodes_df.head()

Unnamed: 0,precid,County2010,Tract2010,BlockGroup2010,Block2010,huid,pernum,Block2010str,sex,race,...,IndustryCode,jobtype,Earnings,Education,h_stabbr,od_distance,w_geocode_str,h_geometry,hcb_lat,hcb_lon
0,B371559601011002P0002,37155,37155960101,371559601011,371559601011002,B371559601011002H001,1.0,B371559601011002,1,2.0,...,,,,,,,,,,
1,B371559601011002P0001,37155,37155960101,371559601011,371559601011002,B371559601011002H001,2.0,B371559601011002,2,1.0,...,,,,,,,,,,
2,B371559601011003P0002,37155,37155960101,371559601011,371559601011003,B371559601011003H001,1.0,B371559601011003,1,1.0,...,,,,,,,,,,
3,B371559601011003P0003,37155,37155960101,371559601011,371559601011003,B371559601011003H002,1.0,B371559601011003,2,1.0,...,15.0,JT07,3.0,2.0,nc,30.027827,,,,
4,B371559601011003P0001,37155,37155960101,371559601011,371559601011003,B371559601011003H002,2.0,B371559601011003,1,1.0,...,,,,,,,,,,


In [17]:
viz.pop_results_table(prechui_lodes_df,
                who = "Total Jobs for persons working in county", 
                what = "by Job Type and NAICS Code",
                where = "Robeson County, NC",
                when = "2010",
                row_index = 'NAICS Industry Sector',
                col_index = 'Job Type')

['Public Sector Primary Jobs', 'Total Jobs for persons working in county']


Job Type,Public Sector Primary Jobs (%),Total Jobs for persons working in county (%)
NAICS Industry Sector,Unnamed: 1_level_1,Unnamed: 2_level_1
44-45 Retail Trade,44 (0.6%),44 (0.6%)
48-49 Transportation and Warehousing,4 (0.1%),4 (0.1%)
51 Information,24 (0.3%),24 (0.3%)
"56 Administration & Support, Waste Management and Remediation",1 (0.0%),1 (0.0%)
61 Educational Services,"5,415 (70.5%)","5,415 (70.5%)"
62 Health Care and Social Assistance,98 (1.3%),98 (1.3%)
92 Public Administration,"2,099 (27.3%)","2,099 (27.3%)"
Total,"7,685 (100.0%)","7,685 (100.0%)"


## Merge Establishment location with LODES data

In [18]:
estab_keep_vars = ['jobid','estabid','Longitude','Latitude','NAICS2D','NAICS4D','SIName']
prechui_lodes_estab_df = pd.merge(
        left = prechui_lodes_df,
        right = jrec_estab_df['primary'][estab_keep_vars],
        on = 'jobid',
        how = 'left'
)
lodes_keep_vars = ['agegroupLODES','Earnings','Education',
        'jobtype','IndustryCode', 'Block2010str',
        'w_geocode_str','h_stabbr','od_distance','w_geometry','h_geometry',
        'hcb_lat','hcb_lon','wcb_lat','wcb_lon','race','sex','hispan']
keep_vars =  ['precid'] + estab_keep_vars + lodes_keep_vars
prechui_lodes_estab_df_keep_vars = prechui_lodes_estab_df[keep_vars]

In [19]:
prechui_lodes_estab_df_keep_vars.head(1).T

Unnamed: 0,0
precid,B371559601011002P0002
jobid,-999
estabid,
Longitude,
Latitude,
NAICS2D,
NAICS4D,
SIName,
agegroupLODES,3.0
Earnings,


In [20]:
# Merge LODES Establishments with Student Record Data
# Merge on person id, race, sex, and hispan to keep values for new persons in joblist
# Job list adds people who live outside of Robeson County
prechui_srec_lodes_estab_df = pd.merge(
        left = prechui_srec_df['primary'],
        right = prechui_lodes_estab_df_keep_vars,
        on = ['precid','race','sex','hispan'],
        how = 'outer'
)

prechui_srec_lodes_estab_df.head(1).T

Unnamed: 0,0
precid,B371559601011002P0002
huid,B371559601011002H001
pernum,1.0
Block2010str_x,B371559601011002
Block2010,371559601011002
...,...
h_geometry,
hcb_lat_y,
hcb_lon_y,
wcb_lat,


In [21]:
list(prechui_srec_lodes_estab_df.columns)

['precid',
 'huid',
 'pernum',
 'Block2010str_x',
 'Block2010',
 'Tract2010',
 'sex',
 'race',
 'hispan',
 'randageP12',
 'agegroupP12',
 'randagePCT12',
 'agegroupP43',
 'agegroupH18',
 'agegroupH17',
 'child',
 'gqtype',
 'numprec',
 'family',
 'PLCGEOID10',
 'PLCNAME10',
 'PUMGEOID10',
 'rppnt4269',
 'gradelevel1',
 'gradelevel2',
 'gradelevel3',
 'racecat5',
 'BLOCKID10',
 'ncessch_3',
 'high_schnm',
 'ncessch_2',
 'mid_schnm',
 'ncessch_1',
 'primary_schnm',
 'ncessch_5',
 'overalsab_schnm',
 'ncessch_6',
 'charter_schnm',
 'CommunityFocus',
 'NCESSCH',
 'NCESSCH_flagsetrm',
 'NCESSCH_ncessch_1_flagsetrm',
 'NCESSCH_ncessch_2_flagsetrm',
 'NCESSCH_ncessch_3_flagsetrm',
 'NCESSCH_ncessch_5_flagsetrm',
 'NCESSCH_ncessch_6_flagsetrm',
 'SCHNAM09',
 'gradelevel',
 'LATCOD09',
 'LONCOD09',
 'geometry',
 'geocode_type',
 'hcb_lon_x',
 'hcb_lat_x',
 'Race Ethnicity',
 'Family Type',
 'jobid',
 'estabid',
 'Longitude',
 'Latitude',
 'NAICS2D',
 'NAICS4D',
 'SIName',
 'agegroupLODES',
 'Ea

## Keep Primary Columns Merge


In [22]:
def update_cols_after_merge(input_df,column_name):
    """
    After merge need to update the columns that will be repeated
    """
    if column_name+'_y' in list(input_df.columns):
        input_df[column_name] = \
            input_df[column_name+'_y'].fillna(input_df[column_name+'_x'])
        # Drop duplicate columns
        input_df = input_df.drop([column_name+'_x', column_name+'_y'], axis=1)

    return input_df

prechui_srec_lodes_estab_df = update_cols_after_merge(prechui_srec_lodes_estab_df,
                                column_name = 'Block2010str')
                                
prechui_srec_lodes_estab_df = update_cols_after_merge(prechui_srec_lodes_estab_df,
                                column_name = 'hcb_lat')
prechui_srec_lodes_estab_df = update_cols_after_merge(prechui_srec_lodes_estab_df,
                                column_name = 'hcb_lon')
            

In [23]:
person_recordid_keep_vars =  ['precid','huid','pernum','Block2010str']
person_record_char_keep_vars = ['sex','race','hispan','randagePCT12',
                    'child', 'gqtype', 'numprec', 'family']
student_record_char_keep_vars = ['gradelevel']
student_record_id_keep_vars = ['NCESSCH','SCHNAM09','LATCOD09','LONCOD09']
geo_keep_vars = ['PLCGEOID10','PLCNAME10','CommunityFocus',
                 'PUMGEOID10','rppnt4269']
estab_geo_keep_vars = ['Longitude','Latitude']
job_char_keep_vars = ['agegroupLODES','Earnings','Education',
                        'jobtype','IndustryCode',
                        'NAICS2D','NAICS4D','estabid','SIName']
lodes_geo_keep_vars =  ['h_stabbr','od_distance','h_geometry',
                        'hcb_lat','hcb_lon','wcb_lat','wcb_lon','od_distance']
keep_vars = person_recordid_keep_vars + geo_keep_vars + \
    person_record_char_keep_vars + \
    student_record_char_keep_vars + job_char_keep_vars + \
    student_record_id_keep_vars + \
    estab_geo_keep_vars + lodes_geo_keep_vars



In [24]:
icd_df = prechui_srec_lodes_estab_df[keep_vars]
icd_df.head(1).T

Unnamed: 0,0
precid,B371559601011002P0002
huid,B371559601011002H001
pernum,1.0
Block2010str,B371559601011002
PLCGEOID10,
PLCNAME10,Unincorporated
CommunityFocus,Outside Community
PUMGEOID10,3705100.0
rppnt4269,POINT (-78.94751707644164 34.9055645)
sex,1


## Rename Geography Columns

In [25]:
icd_df = icd_df.rename(columns={'Longitude':'wet_lon',
                                'Latitude' :'wet_lat',
                                'LONCOD09' :'ncs_lon',
                                'LATCOD09' :'ncs_lat'})

In [26]:
icd_df.head(1).T

Unnamed: 0,0
precid,B371559601011002P0002
huid,B371559601011002H001
pernum,1.0
Block2010str,B371559601011002
PLCGEOID10,
PLCNAME10,Unincorporated
CommunityFocus,Outside Community
PUMGEOID10,3705100.0
rppnt4269,POINT (-78.94751707644164 34.9055645)
sex,1


## Add line strings for Origin and Destinations 

In [27]:
icd_df = wkt_line(icd_df, x1='hcb_lon', y1='hcb_lat', x2='ncs_lon', y2='ncs_lat',
                    linestring_var = 'hcb_ncs_line')
icd_df = wkt_line(icd_df, x1='hcb_lon', y1='hcb_lat', x2='wet_lon', y2='wet_lat',
                    linestring_var = 'hcb_wet_line')

Number of rows in selected dataframe:  23122
Number of rows in not selected dataframe:  112840
return pandas dataframe with WKT Linestring.
Linestring can be converted to geometry variable.
Number of rows in selected dataframe:  7685
Number of rows in not selected dataframe:  128277
return pandas dataframe with WKT Linestring.
Linestring can be converted to geometry variable.


In [28]:
icd_df.head(1).T

Unnamed: 0,12050
precid,B371559602012079P0063
huid,B371559602012079H033
pernum,5.0
Block2010str,B371559602012079
PLCGEOID10,
PLCNAME10,Unincorporated
CommunityFocus,Outside Community
PUMGEOID10,3705100.0
rppnt4269,POINT (-79.07091010522959 34.8609445)
sex,2


In [29]:
from pyincore_data_addons.ICD00b_directory_design import directory_design

# Setup directory design
for community in communities.keys():
    print("Intersect Community Data for:",\
        communities[community]['community_name'])
    for county in communities[community]['counties'].keys():
        state_county = communities[community]['counties'][county]['FIPS Code']
        state_county_name  = communities[community]['counties'][county]['Name']
        print(state_county_name,': county FIPS Code',state_county)
    
        outputfolders = directory_design(state_county_name = state_county_name,
                                            outputfolder = outputfolder)

Intersect Community Data for: Lumberton, NC
Robeson County, NC : county FIPS Code 37155


In [30]:
output_filename = f'prec_{version_text}_{community}_{basevintage}_rs{seed}'
output_filepath = os.path.join(outputfolders['top'], output_filename)
print('Saving to:',output_filepath)
icd_df.to_csv(output_filepath+'.csv', index=False)

Saving to: ICD_workflow_2022-01-19/RobesonCounty_NC\prec_v0-2-0_Lumberton_NC_2010_rs9876


## Explore results

In [31]:
viz.pop_results_table(icd_df,
                who = "Total Population by Persons", 
                what = "by Gradelevel",
                where = "Robeson County, NC",
                when = "2010",
                row_index = 'gradelevel',
                col_index = 'CommunityFocus')

['Inside Community', 'Outside Community', 'Total Population by Persons']


CommunityFocus,Inside Community (%),Outside Community (%),Total Population by Persons (%)
gradelevel,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
G01,274 (7.3%),"1,528 (7.9%)","1,802 (7.8%)"
G02,281 (7.5%),"1,485 (7.7%)","1,766 (7.6%)"
G03,307 (8.2%),"1,544 (8.0%)","1,851 (8.0%)"
G04,310 (8.2%),"1,627 (8.4%)","1,937 (8.4%)"
G05,301 (8.0%),"1,543 (8.0%)","1,844 (8.0%)"
G06,289 (7.7%),"1,543 (8.0%)","1,832 (7.9%)"
G07,311 (8.3%),"1,433 (7.4%)","1,744 (7.5%)"
G08,287 (7.6%),"1,351 (7.0%)","1,638 (7.1%)"
G09,294 (7.8%),"1,767 (9.1%)","2,061 (8.9%)"
G10,307 (8.2%),"1,496 (7.7%)","1,803 (7.8%)"


In [32]:
icd_df["LiveLumbertonCity"] = "LiveOutsideRobesonCounty"
icd_df.loc[(icd_df['Block2010str'].str.startswith('B37155')),\
    "LiveLumbertonCity"] = "LiveOutsideLumbertonCity"
icd_df.loc[icd_df['PLCNAME10']=='Lumberton',"LiveLumbertonCity"] = "LiveInsideLumbertonCity"

In [33]:
viz.pop_results_table(icd_df,
                who = "Total Jobs", 
                what = "by Job Type and NAICS Code",
                where = "Robeson County, NC",
                when = "2010",
                row_index = 'NAICS Industry Sector',
                col_index = 'LiveLumbertonCity',
                row_percent = 'LiveInsideLumbertonCity')

['LiveInsideLumbertonCity', 'LiveOutsideLumbertonCity', 'LiveOutsideRobesonCounty', 'Total Jobs']


LiveLumbertonCity,LiveInsideLumbertonCity (%),LiveOutsideLumbertonCity (%),LiveOutsideRobesonCounty (%),Total Jobs (%),Percent Row LiveInsideLumbertonCity
NAICS Industry Sector,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
44-45 Retail Trade,7 (0.6%),28 (0.6%),9 (0.5%),44 (0.6%),15.9%
48-49 Transportation and Warehousing,2 (0.2%),2 (0.0%),nan (nan%),4 (0.1%),50.0%
51 Information,7 (0.6%),11 (0.2%),6 (0.3%),24 (0.3%),29.2%
"56 Administration & Support, Waste Management and Remediation",nan (nan%),1 (0.0%),nan (nan%),1 (0.0%),nan%
61 Educational Services,786 (63.3%),"3,216 (69.2%)","1,413 (78.8%)","5,415 (70.5%)",14.5%
62 Health Care and Social Assistance,29 (2.3%),28 (0.6%),41 (2.3%),98 (1.3%),29.6%
92 Public Administration,411 (33.1%),"1,363 (29.3%)",325 (18.1%),"2,099 (27.3%)",19.6%
Total,"1,242 (100.0%)","4,649 (100.0%)","1,794 (100.0%)","7,685 (100.0%)",16.2%


## Add GUID with Address point Place
To determine if the job is inside Lumberton need to look at the location of the building. This will not be the same as the numbers provided by CES OnTheMap. The difference is due to the bridge between the data-axel establishment locations and where LODES data is geocoded. 