# Pandas practice

## Tasks:

1. Find unique state FIPS in data dictionary (first two digits of FIPS)
    a. Result should be something like [33, ?]
2. Get rid of "\_long" from dictionary
3. Split data dict so that:
    a. 1 has only county level data
    b. 1 has tract level data
    c. 1 has other data (block, etc.--without FIPS/code that is not 10 digits)
4. No index--just # (reset_index())
5. FIPS has to be integer, not string

FIPS format:
    first two stars: state
    second three stars: county
    last five stars: tract (determined by population)

Import necessary libraries and csv file

In [2]:
import pandas as pd
import numpy as np

pickle_data = pd.read_pickle('C:/Users/zella/Desktop/Personal/Education/University of Kentucky/spatial_mapping/Spatial_Mapping_Training/dataset.pickle')

Find unique state FIPS in data dictionary (first two digits of FIPS) a. Result should be something like [33, ?]

In [3]:
# reset index to make FIPS column accessible
pickle_no_index = pickle_data['rf_and_screening_county'].reset_index()

# pickle_no_index is a dataframe

# isolate FIPS column
fips = pickle_no_index.FIPS

# fips is a series

state_codes = np.unique([int(i[:2]) for i in fips if i[:2]]).tolist()

state_codes

[33, 50]

Get rid of "\_long" from dictionary

In [4]:
pickle_data.keys()

pickle_no_long = {k:v for (k,v) in zip(pickle_data.keys(), pickle_data.values()) if '_long' not in k}

pickle_no_long.keys()

dict_keys(['rf_and_screening_county', 'rf_and_screening_tract', 'cancer_incidence', 'cancer_mortality', 'economy_county', 'economy_tract', 'ht_county', 'ht_tract', 'sociodemographics_county', 'sociodemographics_tract', 'environment_county', 'environment_tract', 'broadband_speeds', 'facilities_and_providers'])

3. Split data dict so that:
    a. 1 has only county level data
    b. 1 has tract level data
    c. 1 has other data (block, etc.--without FIPS/code that is not 10 digits)

In [10]:
# give convenient names to pickle_no_long keys and values
pkey = pickle_no_long.keys(); print(pkey)
pvalue = pickle_no_long.values()

# counties (also contains cancer_incidence and cancer_mortality)
counties = {k:v for (k,v) in zip(pkey, pvalue) if '_county' in k or k == 'cancer_mortality' or k == 'cancer_incidence'}

# tracts
tracts = {k:v for (k,v) in zip(pkey, pvalue) if '_tract' in k}

# all others
other = {k:v for (k,v) in zip(pkey, pvalue) if k not in [*counties.keys(), *tracts.keys()]} 

dict_keys(['rf_and_screening_county', 'rf_and_screening_tract', 'cancer_incidence', 'cancer_mortality', 'economy_county', 'economy_tract', 'ht_county', 'ht_tract', 'sociodemographics_county', 'sociodemographics_tract', 'environment_county', 'environment_tract', 'broadband_speeds', 'facilities_and_providers'])


No index--just # (reset_index())

In [242]:
pickle_no_index

measure,FIPS,County,State,BMI_Obese,Cancer_Prevalence,Currently_Smoke,Met_Breast_Screen,Met_Cervical_Screen,Met_Colon_Screen
0,33001,Belknap,NH,0.308,0.091,0.157,0.747,0.86,0.703
1,33003,Carroll,NH,0.294,0.1,0.156,0.747,0.854,0.707
2,33005,Cheshire,NH,0.3,0.079,0.164,0.724,0.85,0.687
3,33007,Coos,NH,0.343,0.088,0.2,0.689,0.849,0.638
4,33009,Grafton,NH,0.265,0.084,0.149,0.728,0.86,0.72
5,33011,Hillsborough,NH,0.325,0.074,0.152,0.791,0.88,0.723
6,33013,Merrimack,NH,0.302,0.081,0.152,0.743,0.873,0.688
7,33015,Rockingham,NH,0.28,0.08,0.139,0.788,0.887,0.721
8,33017,Strafford,NH,0.306,0.069,0.155,0.759,0.846,0.686
9,33019,Sullivan,NH,0.317,0.089,0.183,0.695,0.849,0.66


FIPS has to be integer, not string

In [9]:
pickle_no_index.FIPS = pd.to_numeric(pickle_no_index.FIPS)

type(pickle_no_index.FIPS[0])

0     33001
1     33003
2     33005
3     33007
4     33009
5     33011
6     33013
7     33015
8     33017
9     33019
10    50001
11    50003
12    50005
13    50007
14    50009
15    50011
16    50013
17    50015
18    50017
19    50019
20    50021
21    50023
22    50025
23    50027
Name: FIPS, dtype: int64