# Creating the datafile to draw *Manufacturer Choropleths*

In this notebook we will create the datafile needed to visualize a Choropleth similar to the one created by The Washington Post in [one of their articles](https://www.washingtonpost.com/graphics/2019/investigations/dea-pain-pill-database/) but on a per Manufacturer basis.

It seems API endpoint `/v1/total_manufacturers_county` does not work if we do not specify state and county. From a previous exercise we just collected all the counties and states in the larger dataset using another API endpoint. Let's load that list now in a Pandas dataframe.

In [9]:
import pandas as pd

counties_df = pd.read_csv('tof/static/data/counties.csv')
counties_df

Unnamed: 0,fips,state,county,name
0,1001,AL,AUTAUGA,"Autauga County, Alabama"
1,1003,AL,BALDWIN,"Baldwin County, Alabama"
2,1005,AL,BARBOUR,"Barbour County, Alabama"
3,1007,AL,BIBB,"Bibb County, Alabama"
4,1009,AL,BLOUNT,"Blount County, Alabama"
...,...,...,...,...
3029,56037,WY,SWEETWATER,"Sweetwater County, Wyoming"
3030,56039,WY,TETON,"Teton County, Wyoming"
3031,56041,WY,UINTA,"Uinta County, Wyoming"
3032,56043,WY,WASHAKIE,"Washakie County, Wyoming"


Now it is time to make API calls, one for every state and county.

In [2]:
import pandas as pd
import urllib.parse

for index, row in counties_df.iterrows():
    # note county is URL encoded using the urllib.parse.quote function
    url = f"https://arcos-api.ext.nile.works/v1/total_manufacturers_county?state={row['state']}&county={urllib.parse.quote(row['county'])}&key=WaPo"
    
    try:
        # I had to add dtype=False to prevent Pandas from changing the JSON data types (i.e. countyfips)
        df = pd.read_json(url, dtype=False)
        df.to_csv(f"tof/static/data/total_manufacturers_county_{row['state']}.csv", header=False, index=False, mode="a+")
    except:
        print(url)

print("done!")

https://arcos-api.ext.nile.works/v1/total_manufacturers_county?state=AK&county=PRINCE%20OF%20WALES%20HYDER&key=WaPo
https://arcos-api.ext.nile.works/v1/total_manufacturers_county?state=AK&county=WRANGELL&key=WaPo
done!


Go thru all csv files generated in the previous step and put them in a SQLlite database.

In [23]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///db/tof.sqlite', echo=False)

for state in counties_df.state.unique():
    try:
        df = pd.read_csv(f"tof/static/data/total_manufacturers_county_{state}.csv", header=None)
        df.columns = ['buyer_state',
                      'buyer_county',
                      'combined_labeler_name',
                      'total_dosage_unit',
                      'total_records']
        df.to_sql('total_manufacturers_county', con=engine, index=False, if_exists='append')
    except:
        print(state)

print("done!")

done!


In [20]:
counties_df.state.unique()

array(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA',
       'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA',
       'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY',
       'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
       'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'], dtype=object)