First, you'll work with a dataset containing the number of primary care physicians per county for each county in the United States. It was obtained from the Area Health Resources File, published by the [Health Resources and Services Administration](https://data.hrsa.gov/topics/health-workforce/ahrf). This data is contained in the file `primary_care_physicians.csv`.

Second, the file `population_by_county.csv` contains the Census Bureau's 2019 population estimates for each US County. It also contains a column `urban`. The `urban` column uses data from the National Bureau of Economic Research to classify each county as either urban or rural. The U.S. Office of Management and Budget designates counties as metropolitan (a core urban area of 50,000 or more population), micropolitan (an urban core of at least 10,000 but less than 50,000 population), or neither. Here, a county is considered "urban" if it is part of a metropolitan or micropolitan area and "rural" if it is not.

 1. First, import the primary care physicians dataset (`primary_care_physicians.csv`) into a data frame named `physicians`. 
 2. Filter `physicians` down to just the counties in Tennessee. Save the filtered dataframe back to `physicians`. Verify that the resulting dataframe has 95 rows.
 3. Look at the distribution of the number of primary care physicians. What do you notice?
 4. Now, import the population by county dataset (`population_by_county.csv`) into a DataFrame named `population`.
 5. Merge the `physicians` DataFrame with the `population` DataFrame. Keep only the values for Tennessee. When you merge, be sure the include both the `population` and `urban` columns in the merged results. Save the result of the merge back to `physicians`.
 6. How many Tennessee counties are considered urban?
 7. The State Health Access Data Assistance Center (SHADAC) (https://www.shadac.org/) classifies counties into three groups based on the number of residents per primary care physician. First, counties with fewer than 1500 residents per primary care physician are considered to have an "adequate" supply. Counties with at least 1500 residents but fewer than 3500 residents per primary care physician are considered to have a "moderately inadequate" supply, and counties with at least 3500 residents per primary care physician are considered to have a "low inadequate" supply. How many counties in Tennessee are in each group? 
 8. Does there appear to be any detectable relationship between whether a county is urban or rural and its supply of primary care physicians?

#1: First, import the primary care physicians dataset (primary_care_physicians.csv) into a data frame named physicians.

In [1]:
import pandas as pd

In [2]:
physicians = pd.read_csv('../data/primary_care_physicians.csv')

In [3]:
physicians.head(3)

Unnamed: 0,FIPS,state,county,primary_care_physicians
0,1001,Alabama,Autauga,26.0
1,1003,Alabama,Baldwin,153.0
2,1005,Alabama,Barbour,8.0


#2: Filter physicians down to just the counties in Tennessee. Save the filtered dataframe back to physicians. Verify that the resulting dataframe has 95 rows

In [5]:
physicians = physicians.loc[physicians['state']=="Tennessee"]

In [7]:
physicians.shape

(95, 4)

#3: Look at the distribution of the number of primary care physicians. What do you notice?

In [11]:
physicians['primary_care_physicians'].value_counts

<bound method IndexOpsMixin.value_counts of 2432     39.0
2433     15.0
2434      3.0
2435      1.0
2436     90.0
        ...  
2522      5.0
2523     18.0
2524      9.0
2525    338.0
2526     43.0
Name: primary_care_physicians, Length: 95, dtype: float64>

#4: Now, import the population by county dataset (population_by_county.csv) into a DataFrame named population.

In [12]:
population = pd.read_csv('../data/population_by_county.csv')

In [13]:
population.head()

Unnamed: 0,FIPS,population,county,state,urban
0,17051,21565,Fayette County,ILLINOIS,Rural
1,17107,29003,Logan County,ILLINOIS,Rural
2,17165,23994,Saline County,ILLINOIS,Rural
3,17097,701473,Lake County,ILLINOIS,Urban
4,17127,14219,Massac County,ILLINOIS,Rural


#5: Merge the physicians DataFrame with the population DataFrame. Keep only the values for Tennessee. When you merge, be sure the include both the population and urban columns in the merged results. Save the result of the merge back to physicians.

In [16]:
pcp = pd.merge( left = physicians,
               right = population.loc[population['state']=="TENNESSEE"], on="FIPS")

In [17]:
pcp.head()

Unnamed: 0,FIPS,state_x,county_x,primary_care_physicians,population,county_y,state_y,urban
0,47001,Tennessee,Anderson,39.0,76061,Anderson County,TENNESSEE,Urban
1,47003,Tennessee,Bedford,15.0,48292,Bedford County,TENNESSEE,Rural
2,47005,Tennessee,Benton,3.0,16140,Benton County,TENNESSEE,Rural
3,47007,Tennessee,Bledsoe,1.0,14836,Bledsoe County,TENNESSEE,Rural
4,47009,Tennessee,Blount,90.0,129927,Blount County,TENNESSEE,Urban


In [18]:
pcp.shape

(95, 8)

In [19]:
physicians=pcp

#6: How many Tennessee counties are considered urban? <br> 38

In [21]:
physicians['urban'].value_counts()

Rural    57
Urban    38
Name: urban, dtype: int64

The State Health Access Data Assistance Center (SHADAC) (https://www.shadac.org/) classifies counties into three groups based on the number of residents per primary care physician. First, counties with fewer than 1500 residents per primary care physician are considered to have an "adequate" supply. Counties with at least 1500 residents but fewer than 3500 residents per primary care physician are considered to have a "moderately inadequate" supply, and counties with at least 3500 residents per primary care physician are considered to have a "low inadequate" supply. How many counties in Tennessee are in each group?

In [36]:
physicians['res_per_pcp'] = physicians['population'] / physicians['primary_care_physicians']

In [42]:
physicians.head()

Unnamed: 0,FIPS,state_x,county_x,primary_care_physicians,population,county_y,state_y,urban,pcp_supply,res_per_pcp
0,47001,Tennessee,Anderson,39.0,76061,Anderson County,TENNESSEE,Urban,moderately adequate,1950.282051
1,47003,Tennessee,Bedford,15.0,48292,Bedford County,TENNESSEE,Rural,moderately adequate,3219.466667
2,47005,Tennessee,Benton,3.0,16140,Benton County,TENNESSEE,Rural,adequate,5380.0
3,47007,Tennessee,Bledsoe,1.0,14836,Bledsoe County,TENNESSEE,Rural,adequate,14836.0
4,47009,Tennessee,Blount,90.0,129927,Blount County,TENNESSEE,Urban,low inadequate,1443.633333


In [37]:
physicians['pcp_supply']= "low inadequate"

In [44]:
physicians.loc[(physicians['res_per_pcp'] >= 1500),'pcp_supply'] = "moderately adequate"

In [45]:
physicians.loc[(physicians['res_per_pcp'] >= 3500),'pcp_supply'] = "adequate"

In [46]:
physicians['pcp_supply'].value_counts()

moderately adequate    50
adequate               31
low inadequate         14
Name: pcp_supply, dtype: int64

#8: Does there appear to be any detectable relationship between whether a county is urban or 
rural and its supply of primary care physicians?

In [48]:
physicians[['pcp_supply','urban'].table()

SyntaxError: unexpected EOF while parsing (<ipython-input-48-aec9472a19a9>, line 1)