## `acs_sdoh`

In [7]:
from CIFTools import acs_sdoh

Using `acs_sdoh`, you can not just download data for Cancer_InFocus but also for your own data project.   
Let's first take a look at how to download acs_sdoh data.   
`acs_sdoh` takes following arguments:   
* year: int
* state_fips : str, int or list of str (2 digit state fips code)
* query_level : str (possible values: 'state','county', 'tract','county subregion', 'block', 'zip'
* key : census key

You can sign up for the census key at : https://api.census.gov/data/key_signup.html

In this tutorial, we will use the sample key that may already be expired by the time you read the tutorial

In [8]:
key = 'f1a4c4de1f35fe90fc1ceb60fd97b39c9a96e436' # provide the census api user key

In [13]:
sdoh = acs_sdoh(2019, '11', 'tract', key = key)

The function that you may call to scrap data for Cancer_InFocus is `cancer_infocus_download`.   
This function does not require any arguement

In [14]:
data_dictionary = sdoh.cancer_infocus_download()

`cancer_infocus_download()` returns a dictionary object with pandas dataframe as values as well as the corresponding dataset names as keys. 

In [15]:
data_dictionary.keys()

dict_keys(['insurance', 'vacancy', 'poverty', 'transportation', 'employment', 'gini_index', 'rent_to_income', 'houses_before_1960', 'public_assistance', 'education', 'income', 'demographic_age', 'demographic_race'])

In [16]:
data_dictionary['demographic_age']

Unnamed: 0,FIPS,Tract,County,State,Total,Under 18,18 to 64,Over 64
0,11001000100,Census Tract 1,District of Columbia,District of Columbia,4888,622,3172,1094
1,11001000201,Census Tract 2.01,District of Columbia,District of Columbia,3922,67,3830,25
2,11001000202,Census Tract 2.02,District of Columbia,District of Columbia,4709,347,3579,783
3,11001000300,Census Tract 3,District of Columbia,District of Columbia,6585,1025,4986,574
4,11001000400,Census Tract 4,District of Columbia,District of Columbia,1413,244,775,394
...,...,...,...,...,...,...,...,...
174,11001010700,Census Tract 107,District of Columbia,District of Columbia,1768,34,1473,261
175,11001010800,Census Tract 108,District of Columbia,District of Columbia,6460,62,6338,60
176,11001010900,Census Tract 109,District of Columbia,District of Columbia,3779,1171,2355,253
177,11001011000,Census Tract 110,District of Columbia,District of Columbia,4099,135,2893,1071


In [10]:
data_dictionary['transportation'].head()

Unnamed: 0,FIPS,County,State,no_vehicle,two_or_more_vehicle,three_or_more_vehicle
0,22001,Acadia Parish,Louisiana,0.012522,0.783917,0.345327
1,22003,Allen Parish,Louisiana,0.01076,0.783041,0.384327
2,22005,Ascension Parish,Louisiana,0.011004,0.837007,0.364278
3,22007,Assumption Parish,Louisiana,0.024013,0.831469,0.397807
4,22009,Avoyelles Parish,Louisiana,0.023306,0.752653,0.31879


### Query other dataset

To query other dataset, you may want to use `add_custom_table` function.   
The function requires the following three arguments:
* group_id: acs group id (e.g. B01001)
    - you can provide multiple group ids in a list (e.g. \["B01001", "C27007"\])
    - However, all the acs groups must be in the same acs type
* acs_type:
    - '' : acs5
    - 'profile' : acs5/profile
    - 'subject' : acs5/subject
    - for more information, please visit: https://api.census.gov/data.html
* name: it will be a dictionary key for the dataset in the data_dictionary

Using the decorator of the `add_custom_table` you define how the dataframe to be organized since it is downloaded from the census.   
In the following example, the function does not change any from the raw data.

In [91]:
sdoh.clean_functions()

In [93]:
@sdoh.add_custom_table(["C27007", "B27001"], '', 'sample')
def download_custom_data(df):
    df = df.drop('B27001_001E', axis = 1)
    return df

In [95]:
data_dictionary = sdoh.download_all()

In [96]:
data_dictionary['sample'].head()

Unnamed: 0,FIPS,County,State,B27001_002E,B27001_003E,B27001_004E,B27001_005E,B27001_006E,B27001_007E,B27001_008E,...,C27007_012E,C27007_013E,C27007_014E,C27007_015E,C27007_016E,C27007_017E,C27007_018E,C27007_019E,C27007_020E,C27007_021E
0,22001,Acadia Parish,Louisiana,29899,2662,2635,27,6107,5768,339,...,31634,8335,4946,3389,18438,4623,13815,4861,1171,3690
1,22003,Allen Parish,Louisiana,10344,957,904,53,2219,2176,43,...,10940,2764,1693,1071,6136,1708,4428,2040,478,1562
2,22005,Ascension Parish,Louisiana,60261,5536,5472,64,12416,12055,361,...,62023,16719,5206,11513,37741,4851,32890,7563,1231,6332
3,22007,Assumption Parish,Louisiana,10702,666,632,34,1985,1963,22,...,11552,2645,1494,1151,6853,1816,5037,2054,423,1631
4,22009,Avoyelles Parish,Louisiana,17579,1636,1576,60,3632,3569,63,...,19420,4833,3121,1712,11248,3162,8086,3339,830,2509


### ACSConfig

When exploring an acs group, you may use `ACSConfig`.
`ACSConfig` requires following arguements:
* year : str or int
* state_fips : a list of state fips or a single state fips as str
* query_level: str
* acs_group  : str
* acs_type   : str (optional)

In [97]:
from CIF_Config import ACSConfig
from pprint import pprint

In [98]:
cfg = ACSConfig(2020, 21, 'tract', 'B15001', acs_type = '')

In [99]:
cfg

ACSConfig(year=2020, state_fips=21, query_level='tract', acs_group='B15001', acs_type='')

ACSConfig can provide both variables within the group and their labels.   
In addition, it also provides a table explaining details of each variable.

In [100]:
print(cfg.variables)

['B15001_001E', 'B15001_002E', 'B15001_003E', 'B15001_004E', 'B15001_005E', 'B15001_006E', 'B15001_007E', 'B15001_008E', 'B15001_009E', 'B15001_010E', 'B15001_011E', 'B15001_012E', 'B15001_013E', 'B15001_014E', 'B15001_015E', 'B15001_016E', 'B15001_017E', 'B15001_018E', 'B15001_019E', 'B15001_020E', 'B15001_021E', 'B15001_022E', 'B15001_023E', 'B15001_024E', 'B15001_025E', 'B15001_026E', 'B15001_027E', 'B15001_028E', 'B15001_029E', 'B15001_030E', 'B15001_031E', 'B15001_032E', 'B15001_033E', 'B15001_034E', 'B15001_035E', 'B15001_036E', 'B15001_037E', 'B15001_038E', 'B15001_039E', 'B15001_040E', 'B15001_041E', 'B15001_042E', 'B15001_043E', 'B15001_044E', 'B15001_045E', 'B15001_046E', 'B15001_047E', 'B15001_048E', 'B15001_049E', 'B15001_050E', 'B15001_051E', 'B15001_052E', 'B15001_053E', 'B15001_054E', 'B15001_055E', 'B15001_056E', 'B15001_057E', 'B15001_058E', 'B15001_059E', 'B15001_060E', 'B15001_061E', 'B15001_062E', 'B15001_063E', 'B15001_064E', 'B15001_065E', 'B15001_066E', 'B15001_0

In [27]:
print(cfg.labels)

['Total', 'Male', 'Male - 18 to 24 years', '18 to 24 years - Less than 9th grade', '18 to 24 years - 9th to 12th grade, no diploma', '18 to 24 years - High school graduate (includes equivalency)', '18 to 24 years - Some college, no degree', "18 to 24 years - Associate's degree", "18 to 24 years - Bachelor's degree", '18 to 24 years - Graduate or professional degree', 'Male - 25 to 34 years', '25 to 34 years - Less than 9th grade', '25 to 34 years - 9th to 12th grade, no diploma', '25 to 34 years - High school graduate (includes equivalency)', '25 to 34 years - Some college, no degree', "25 to 34 years - Associate's degree", "25 to 34 years - Bachelor's degree", '25 to 34 years - Graduate or professional degree', 'Male - 35 to 44 years', '35 to 44 years - Less than 9th grade', '35 to 44 years - 9th to 12th grade, no diploma', '35 to 44 years - High school graduate (includes equivalency)', '35 to 44 years - Some college, no degree', "35 to 44 years - Associate's degree", "35 to 44 years 

In [29]:
cfg.var_desc

Unnamed: 0,name,label,concept
9131,B15001_014E,Estimate!!Total:!!Male:!!25 to 34 years:!!High...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
9133,B15001_015E,Estimate!!Total:!!Male:!!25 to 34 years:!!Some...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
9136,B15001_016E,Estimate!!Total:!!Male:!!25 to 34 years:!!Asso...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
9138,B15001_017E,Estimate!!Total:!!Male:!!25 to 34 years:!!Bach...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
9141,B15001_010E,Estimate!!Total:!!Male:!!18 to 24 years:!!Grad...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
...,...,...,...
18017,B15001_043E,Estimate!!Total:!!Female:,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
18024,B15001_044E,Estimate!!Total:!!Female:!!18 to 24 years:,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
18030,B15001_045E,Estimate!!Total:!!Female:!!18 to 24 years:!!Le...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...
18046,B15001_040E,Estimate!!Total:!!Male:!!65 years and over:!!A...,SEX BY AGE BY EDUCATIONAL ATTAINMENT FOR THE P...


## Downloading facility data

To download facility data, you simply need to use `gen_facility_data` function.   
The fuction requires only one arguement:
* location : str or List\[str\] (abbreviation(s) of state name(s))

In [13]:
from CIFTools import gen_facility_data

In [11]:
facility_data = gen_facility_data(['KY','WV'])

Process is complete


downloading fqhc data file: 100%|██████████| 12.4M/12.4M [00:00<00:00, 16.3MiB/s]
downloading hpsa data file: 100%|██████████| 15.6M/15.6M [00:00<00:00, 17.7MiB/s]
downloading lcs data file: 100%|██████████| 343k/343k [00:00<00:00, 431kiB/s]MiB/s]
downloading toxRel data file: 100%|██████████| 73.0M/73.0M [00:02<00:00, 35.7MiB/s]


In [12]:
facility_data.keys()

dict_keys(['nppes', 'mammography', 'hpsa', 'fqhc', 'lung_cancer_screening', 'tri_facility', 'superfund_site'])

In [4]:
facility_data['hpsa']

Unnamed: 0,Type,Name,Address,State,Phone_number,Notes,latitude,longitude
0,HPSA Correctional Facility,FCC - Hazelton,"1640 Skyview Ln, Bruceton Mills, WV 26525",WV,,,39.671864,-79.499303
1,HPSA Rural Health Clinic,FAMILY HEALTHCARE ASSOCIATES,"114 Main St, Man, WV 25635",WV,,,37.743543,-81.874243
2,HPSA Rural Health Clinic,BOONE MEMORIAL HOSPITAL FAMILY MEDICAL CENTER,,WV,,,38.048889,-81.806171
3,HPSA Rural Health Clinic,ST MARYS EXPRESS CARE,"201 2nd St, Saint Marys, WV 26170",WV,,,39.389380,-81.207532
4,HPSA Rural Health Clinic,FAMILY HEALTHCARE ASSOCIATES,"205 Howard Ave, Mullens, WV 25882",WV,,,37.583248,-81.381166
...,...,...,...,...,...,...,...,...
101,HPSA Federally Qualified Health Center Look A ...,"Faith Healthcare, Inc","126 Franklin Rd, Monticello, KY 42633",KY,,,36.866140,-84.827390
102,HPSA Rural Health Clinic,PHYSICIAN SERVICES OF MEMORIAL HOSPITAL,"94 Marie Langdon Dr STE 4, Manchester, KY 40962",KY,,,37.162858,-83.760707
103,HPSA Rural Health Clinic,"FAMILY HEALTHCARE ASSOCIATES 7, LLC","422 N Highway 27, Whitley City, KY 42653",KY,,,36.817664,-84.486655
104,HPSA Rural Health Clinic,"FAMILY HEALTHCARE ASSOCIATES 8, LLC","305 Danville Ave, Stanford, KY 40484",KY,,,37.531983,-84.668236


In [5]:
import pandas as pd
all_facility = pd.concat(facility_data.values(), axis = 0).reset_index(drop = True)

In [6]:
all_facility

Unnamed: 0,Type,Name,Address,State,Phone_number,Notes,latitude,longitude,FIPS
0,Gastroenterology,Marilyn Moaga,"2513 Sun Seeker Ct, Lexington, KY 40503",KY,859-523-4442,,,,
1,Gastroenterology,Jeannine M Keeler RN,"650 Joel Dr, Fort Campbell, KY 42223",KY,270-956-0297,,,,
2,Gastroenterology,Samantha Bridges R.N. BSN,"650 Joel Dr, Fort Campbell, KY 42223",KY,270-956-0489,,,,
3,Gastroenterology,Molly E Whitledge RN,"650 Joel Dr, Fort Campbell, KY 42223",KY,270-798-8422,,,,
4,Gastroenterology,Carole P Whitledge APRN,"800 Zorn Ave, Louisville, KY 40206",KY,502-287-5894,,,,
...,...,...,...,...,...,...,...,...,...
3819,Superfund Site,WEST VIRGINIA ORDNANCE (USARMY),"ROUTE 1 BOX 125, POINT PLEASANT, WV 25550",WV,,Currently on the Final NPL,38.926389,-82.076389,54053
3820,Superfund Site,SHAFFER EQUIPMENT/ARBUCKLE CREEK AREA,"WV ROUTE 17 (A.K.A. MINDEN ROAD), MINDEN, WV 2...",WV,,Currently on the Final NPL,37.97651,-81.1265,54019
3821,Superfund Site,VIENNA TETRACHLOROETHENE,"30TH STREET, GRAND CENTRAL AVE, VIENNA, WV 26105",WV,,Currently on the Final NPL,39.325167,-81.548778,54107
3822,Superfund Site,NORTH 25TH STREET GLASS AND ZINC,"N. 25TH STREET, CLARKSBURG, WV 26301",WV,,Currently on the Final NPL,39.297053,-80.357433,54033


## Downloading Cancer Data

Cancer statistics are queries from https://www.statecancerprofiles.cancer.gov/   
To query cancer statistics, you first need to define `scp_cancer_data` with *state_fips* arguments.   
*state_fips* can be single state fips code in string format or a list of state fips code.

In [4]:
from CIFTools import scp_cancer_data
cancer = scp_cancer_data(['01','02'])

Then you can query data and retrieve it from `cancer_data` attribute, which is in a dictionary format with each value is a pandas DataFrame for incidence or mortality data.

In [6]:
cancer.cancer_data['incidence']

Unnamed: 0,FIPS,County,State,Type,Site,AAR,AAC
0,01000,Alabama,Alabama,Incidence,All Site,451.7,27407.0
1,01000,Alabama,Alabama,Incidence,Bladder,17.7,1092.0
2,01000,Alabama,Alabama,Incidence,Brain & ONS,6.4,360.0
3,01000,Alabama,Alabama,Incidence,Cervix,9.5,241.0
4,01000,Alabama,Alabama,Incidence,Colon & Rectum,42.4,2533.0
...,...,...,...,...,...,...,...
1355,01133,Winston County,Alabama,Incidence,Ovary,,
1356,01133,Winston County,Alabama,Incidence,Pancreas,15.3,6.0
1357,01133,Winston County,Alabama,Incidence,Prostate,104.7,20.0
1358,01133,Winston County,Alabama,Incidence,Stomach,,


## Risk Behavior and Screening Data

CDC risk behavior and screening data is querried using Socrata API.   
You then need the followings:
* domain : str =  "chronicdata.cdc.gov"
* app_token : str = "nx4zQ2205wpLwaaaZeZp9zAOs"
* user_name : str
* password : str   

Then first define a SocrataConfig config object with the arguments above.   
It is okay to not have *user_name* and *password*, but it may cause a slow download of datasets

In [1]:
from CIF_Config import SocrataConfig

kwargs = {"domain": "chronicdata.cdc.gov",
      "app_token": "nx4zQ2205wpLwaaaZeZp9zAOs"}

cfg = SocrataConfig(**kwargs)

Then we provide the config object as well as state_fips (str or List\[str\]) to `places_data`

In [2]:
from CIFTools import places_data
cdc = places_data(['21','22'], cfg)

Then you can retrieve county-level and tract-level data from the `places_data` attribute

In [3]:
cdc.places_data

{'county':       FIPS         County      State Cancer_Prevalence Met_Cervical_Screen  \
 0    21085        Grayson   Kentucky               7.5                78.3   
 1    21061       Edmonson   Kentucky               7.8                79.3   
 2    21223        Trimble   Kentucky               7.6                80.1   
 3    21075         Fulton   Kentucky               7.7                79.0   
 4    21179         Nelson   Kentucky               7.1                82.3   
 ..     ...            ...        ...               ...                 ...   
 179  22117     Washington  Louisiana               7.2                79.8   
 180  22103    St. Tammany  Louisiana               7.4                83.8   
 181  22031        De Soto  Louisiana               7.1                81.1   
 182  22077  Pointe Coupee  Louisiana               7.7                80.1   
 183  22013      Bienville  Louisiana               7.7                79.9   
 
     Met_Colon_Screen Currently_Smoke Me

## Downloading Food Desert Data

The food desert data is downloaded from the FDA: https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/   
Similar to `scp_cancer_data`, you first need to provide `food_desert` a state fips code or a list of state fips code if you have more than one state of interest.   
Another argument that `food_desert` requires is *var_name*. As default, the variable used in **Cancer InFocus** is **LILATracts_Vehicle** but if you have a different variable of your interest, you can provide the variable name. You can find variables from the link provided in the beginning. 

In [18]:
from CIFTools import food_desert
fd = food_desert(['21','22'])


You can query and view the food desert data from the `food_desert_data` attribute. 

In [19]:
fd.food_desert_data

downloading food desert data file: 100%|██| 81.8M/81.8M [00:02<00:00, 35.8MiB/s]


{'Tract':               FIPS  LILATracts_Vehicle
 0       1001020100                   0
 1       1001020200                   0
 2       1001020300                   0
 3       1001020400                   0
 4       1001020500                   0
 ...            ...                 ...
 62432  56043000200                   0
 62433  56043000301                   0
 62434  56043000302                   0
 62435  56045951100                   0
 62436  56045951300                   0
 
 [62437 rows x 2 columns],
 'County':        FIPS  LILATracts_Vehicle
 0     10001            0.164720
 1     10003            0.129582
 2     10005            0.042145
 3     10010            0.064586
 4     10030            0.114731
 ...     ...                 ...
 3010  56037            0.137542
 3011  56039            0.000000
 3012  56041            0.343245
 3013  56043            0.000000
 3014  56045            0.000000
 
 [3015 rows x 2 columns]}

## Downloading Water Violation Data

`water_violation` class variable can provide you accumulate number of water violations in a given county between `start_year` and `end_year`. When `end_year` is **None**, which is a default value for the argument, it gives the total accumulate number of violations for the given county since the `start_year`. The default for `start_year` is 2016. If you want to retrieve information for a single year, make sure `start_year` == `end_year`.

In [1]:
from CIFTools import water_violation

wv = water_violation(['21','22'])

In [2]:
wv.water_violation_data

Unnamed: 0,County,State,Counts
0,Adair County,Kentucky,0.0
1,Allen County,Kentucky,2.0
2,Anderson County,Kentucky,2.0
3,Ballard County,Kentucky,0.0
4,Barren County,Kentucky,0.0
...,...,...,...
179,Webster Parish County,Louisiana,14.0
180,West Baton Rouge Parish County,Louisiana,0.0
181,West Carroll Parish County,Louisiana,11.0
182,West Feliciana Parish County,Louisiana,0.0


## Downloading BLS Employment Data

While ACS5 provides 5 years estimates of unemployment rate, the BLS provides most recent county-wise unemployment numbers.    
`BLS` with a state_fips code(s) provides the data.

In [1]:
from CIFTools import BLS
bls = BLS(['21','22'])

In [2]:
bls.bls_data

Unnamed: 0,FIPS,Unemployment Rate,Period
0,21001,4.7,Nov-22
1,21003,3.7,Nov-22
2,21005,3.1,Nov-22
3,21007,4.4,Nov-22
4,21009,4.1,Nov-22
...,...,...,...
179,22119,3.4,Nov-22
180,22121,2.6,Nov-22
181,22123,4.2,Nov-22
182,22125,2.0,Nov-22
