# Data Validation

In the previous notebook, two pins were saved:

- City of Chicago - Business License Data (RAW): `chicago-business-license-data`
- ity of Chicago - Food Inspection Data (RAW): `chicago-food-inspection-data`

## Setup

In [1]:
import os

import ibis
import pins
import pandas as pd
import numpy as np
import pandera as pa
from sqlalchemy import create_engine, text

In [2]:
pd.options.display.max_columns = 999

In [3]:
# Database details
db_user = "posit"
db_password = os.environ["CONF23_DB_PASSWORD"]
db_host = os.environ["CONF23_DB_HOST"]
db_port = 5432
db_database = "conf23_python"

# Set up sqlalchemy for writing data
engine = create_engine(f"postgresql+psycopg2://{db_user}:{db_password}@{db_host}/{db_database}")

# Set up ibis for reading data
con = ibis.postgres.connect(
    user=db_user,
    password=db_password,
    host=db_host,
    port=db_port,
    database=db_database
)

## Tips

- Use multiple cursors in VS Code to easily edit many lines at the same time (<https://code.visualstudio.com/docs/getstarted/tips-and-tricks#_column-box-selection>).
- Use `df["col_name"].value_counts()` to understand the distribution of categorical columns.
- Use `df["col_name"].hist` to understand the distribution of numeric columns.
- Use `df.info()` to understand column types and null values.
- Use [ydata-profiling](https://pypi.org/project/ydata-profiling/) to generate an automated data report.

```python
from ydata_profiling import ProfileReport
ProfileReport(df)
```

## Load raw data

Use `ibis` to read the data from Postgres.

In [4]:
business_license_raw = con.table("business_license_raw").limit(100_000).to_pandas()

In [5]:
food_inspection_raw = con.table("food_inspection_raw").limit(100_000).to_pandas()

## Data set (1): Business License Data

<https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr>

In [6]:
business_license_raw

Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,ward,precinct,ward_precinct,police_district,license_code,license_description,business_activity_id,business_activity,license_number,application_type,application_created_date,application_requirements_complete,payment_date,conditional_approval,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude,location
0,2594643-20180417,2594643,425578,1,CHILDHOOD FRACTURED,CHILDHOOD FRACTURED,1432 W IRVING PARK RD 1ST,CHICAGO,IL,60613,47,24,47-24,19,1010,Limited Business License,602 | 709,Administrative Commercial Office | Miscellaneo...,2594643,ISSUE,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,N,2018-04-17T00:00:00.000,2021-07-15T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,AAI,,,41.954446304,-87.665535122,"\n, \n(41.954446303973185, -87.66553512156656)"
1,2594647-20180417,2594647,426826,1,"BOOST ON DAMEN, INC.",ABC CHOICE,1934 W 79TH ST 1ST,CHICAGO,IL,60620,17,41,17-41,6,1010,Limited Business License,922,Retail Sales of Cell Phones and Accessories,2594647,ISSUE,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,N,2018-04-17T00:00:00.000,2020-07-15T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,AAI,,,41.750352746,-87.672202458,"\n, \n(41.75035274562417, -87.6722024581212)"
2,2594647-20200516,2723053,426826,1,"BOOST ON DAMEN, INC.",ABC CHOICE,1934 W 79TH ST 1ST,CHICAGO,IL,60620,17,41,17-41,6,1010,Limited Business License,922,Retail Sales of Cell Phones and Accessories,2594647,RENEW,,2020-03-15T00:00:00.000,2020-06-11T00:00:00.000,N,2020-05-16T00:00:00.000,2022-05-15T00:00:00.000,2020-06-11T00:00:00.000,2020-06-12T00:00:00.000,AAI,,,41.750352746,-87.672202458,"\n, \n(41.75035274562417, -87.6722024581212)"
3,2594649-20180523,2594649,426834,1,ALFONSO GARCIA,ALFONSO GARCIA,4556 S RICHMOND ST 2,CHICAGO,IL,60632,12,12,12-12,9,4404,Regulated Business License,689,Junk Peddler,2594649,ISSUE,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,N,2018-05-23T00:00:00.000,2021-07-15T00:00:00.000,2018-05-22T00:00:00.000,2018-05-23T00:00:00.000,AAI,,,41.810181861,-87.698080415,"\n, \n(41.81018186074113, -87.69808041538582)"
4,2594650-20181221,2594650,426833,1,"PIRMA AMERICA, INC.",PIRMA,6801 - 6805 S PULASKI RD,CHICAGO,IL,60629,23,44,23-44,8,1010,Limited Business License,911,Retail Sales of Clothing / Accessories / Shoes,2594650,ISSUE,2018-04-17T00:00:00.000,2018-12-21T00:00:00.000,2018-12-21T00:00:00.000,N,2018-12-21T00:00:00.000,2021-07-15T00:00:00.000,2018-12-21T00:00:00.000,2018-12-21T00:00:00.000,AAI,,3,41.769511072,-87.722380058,"\n, \n(41.769511071960956, -87.72238005764007)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,32815-20080616,1900150,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1472,Music and Dance,,,32815,RENEW,,2008-04-15T00:00:00.000,2008-06-06T00:00:00.000,N,2008-06-16T00:00:00.000,2010-06-15T00:00:00.000,2008-06-06T00:00:00.000,2008-06-09T00:00:00.000,AAI,,17,41.949706195,-87.658701413,"\n, \n(41.94970619451691, -87.65870141348526)"
99996,32815-20100616,2028371,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1472,Music and Dance,,,32815,RENEW,,2010-04-15T00:00:00.000,2010-06-15T00:00:00.000,N,2010-06-16T00:00:00.000,2012-06-15T00:00:00.000,2010-06-15T00:00:00.000,2010-06-29T00:00:00.000,AAI,,17,41.949706195,-87.658701413,"\n, \n(41.94970619451691, -87.65870141348526)"
99997,32815-20120616,2148402,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1472,Music and Dance,,,32815,RENEW,,2012-04-16T00:00:00.000,2012-04-23T00:00:00.000,N,2012-06-16T00:00:00.000,2014-06-15T00:00:00.000,2012-04-23T00:00:00.000,2013-05-30T00:00:00.000,AAI,,17,41.949706195,-87.658701413,"\n, \n(41.94970619451691, -87.65870141348526)"
99998,32815-20140616,2321712,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1472,Music and Dance,916,Music and Dance,32815,RENEW,,2014-04-15T00:00:00.000,2014-06-10T00:00:00.000,N,2014-06-16T00:00:00.000,2016-06-15T00:00:00.000,2014-06-10T00:00:00.000,2014-06-11T00:00:00.000,AAI,,17,41.949706195,-87.658701413,"\n, \n(41.94970619451691, -87.65870141348526)"


The business license data includes licenses for all Chicago businesses. For this analysis, we are only interested in the licenses where a food inspection may apply. To figure out which licenses are in scope:

- Perform an inner join on the business license and food inspection data.
- Identify all of the unique license codes where food inspections apply.
- Filter the data to include only those businesses.

In [7]:
food_inspection_raw

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,address,city,state,zip,inspection_date,inspection_type,results,violations,latitude,longitude,location
0,104236,TEMPO CAFE,TEMPO CAFE,80916,Restaurant,Risk 1 (High),6 E CHESTNUT ST,CHICAGO,IL,60611,2010-01-04T00:00:00.000,Canvass,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,41.89843137207629,-87.6280091630558,"(41.89843137207629, -87.6280091630558)"
1,67733,WOLCOTT'S,TROQUET,1992040,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961605669949854,-87.67596676683779,"(41.961605669949854, -87.67596676683779)"
2,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948,Restaurant,Risk 1 (High),8750 W BRYN WAWR AVE,CHICAGO,IL,60631,2010-01-04T00:00:00.000,License,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,,,
3,52234,Cafe 608,Cafe 608,2013328,Restaurant,Risk 1 (High),608 W BARRY AVE,CHICAGO,IL,60657,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.938006880423615,-87.6447545707008,"(41.938006880423615, -87.6447545707008)"
4,67757,DUNKIN DONUTS/BASKIN-ROBBINS,DUNKIN DONUTS/BASKIN-ROBBINS,1380279,Restaurant,Risk 2 (Medium),100 W RANDOLPH ST,CHICAGO,IL,60601,2010-01-04T00:00:00.000,Tag Removal,Pass,,41.88458626715456,-87.63101044588599,"(41.88458626715456, -87.63101044588599)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,1522545,PAKEEZA,PAKEEZA,1991022,Restaurant,Risk 1 (High),1009-1011 N ORLEANS ST,CHICAGO,IL,60610,2015-02-03T00:00:00.000,Complaint,Fail,8. SANITIZING RINSE FOR EQUIPMENT AND UTENSILS...,41.900827710854195,-87.63718892231834,"(41.900827710854195, -87.63718892231834)"
99996,1522547,MOUNT GREENWOOD ELEMENTARY,MOUNT GREENWOOD ELEMENTARY,24591,School,Risk 2 (Medium),10841 S Homan (3400W),CHICAGO,IL,60655,2015-02-03T00:00:00.000,Canvass,Pass w/ Conditions,21. * CERTIFIED FOOD MANAGER ON SITE WHEN POTE...,41.695748452263636,-87.70577383458786,"(41.695748452263636, -87.70577383458786)"
99997,1522593,NEW LITTLE CHINA,NEW LITTLE CHINA,1844410,Restaurant,Risk 1 (High),1737 E 95TH ST,CHICAGO,IL,60617,2015-02-03T00:00:00.000,Canvass,Pass,"30. FOOD IN ORIGINAL CONTAINER, PROPERLY LABEL...",41.72228503083544,-87.58107166595508,"(41.72228503083544, -87.58107166595508)"
99998,1522563,TACO JOE INC,TACO JOE INC,54004,Restaurant,Risk 1 (High),3458 W 111TH ST,CHICAGO,IL,60655,2015-02-03T00:00:00.000,Canvass,Out of Business,,41.691554773175255,-87.70810064840053,"(41.691554773175255, -87.70810064840053)"


In [8]:
in_scope_license_codes = (
    pd.merge(
        food_inspection_raw,
        business_license_raw,
        how="inner",
        left_on="license_",
        right_on="license_id"
    )
    .loc[:, "license_code"]
    .dropna()
    .unique()
)

in_scope_license_codes

array(['1010', '1006'], dtype=object)

**Data cleaning**

Apply some basic cleaning steps to the data.

In [9]:
business_license_tidy = (business_license_raw
    
    # Only keep in scope licenses
    .loc[business_license_raw["license_code"].isin(in_scope_license_codes)]

    # Filter on the relevant state and city only.
    .loc[business_license_raw["state"] == "IL"]
    .loc[business_license_raw["city"] == "CHICAGO"]

    # Convert conditional approval to a boolean value.
    .assign(conditional_approval=lambda x: x["conditional_approval"] == "Y")
    
    # Drop the "location" column, the same data is already stored in the "latitude"
    # and "longitude" columns.
    .drop(columns=["location"])

    # Reset the index.
    .reset_index(drop=True)
)

business_license_tidy

Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,ward,precinct,ward_precinct,police_district,license_code,license_description,business_activity_id,business_activity,license_number,application_type,application_created_date,application_requirements_complete,payment_date,conditional_approval,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude
0,2594643-20180417,2594643,425578,1,CHILDHOOD FRACTURED,CHILDHOOD FRACTURED,1432 W IRVING PARK RD 1ST,CHICAGO,IL,60613,47,24,47-24,19,1010,Limited Business License,602 | 709,Administrative Commercial Office | Miscellaneo...,2594643,ISSUE,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,False,2018-04-17T00:00:00.000,2021-07-15T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,AAI,,,41.954446304,-87.665535122
1,2594647-20180417,2594647,426826,1,"BOOST ON DAMEN, INC.",ABC CHOICE,1934 W 79TH ST 1ST,CHICAGO,IL,60620,17,41,17-41,6,1010,Limited Business License,922,Retail Sales of Cell Phones and Accessories,2594647,ISSUE,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,False,2018-04-17T00:00:00.000,2020-07-15T00:00:00.000,2018-04-17T00:00:00.000,2018-04-17T00:00:00.000,AAI,,,41.750352746,-87.672202458
2,2594647-20200516,2723053,426826,1,"BOOST ON DAMEN, INC.",ABC CHOICE,1934 W 79TH ST 1ST,CHICAGO,IL,60620,17,41,17-41,6,1010,Limited Business License,922,Retail Sales of Cell Phones and Accessories,2594647,RENEW,,2020-03-15T00:00:00.000,2020-06-11T00:00:00.000,False,2020-05-16T00:00:00.000,2022-05-15T00:00:00.000,2020-06-11T00:00:00.000,2020-06-12T00:00:00.000,AAI,,,41.750352746,-87.672202458
3,2594650-20181221,2594650,426833,1,"PIRMA AMERICA, INC.",PIRMA,6801 - 6805 S PULASKI RD,CHICAGO,IL,60629,23,44,23-44,8,1010,Limited Business License,911,Retail Sales of Clothing / Accessories / Shoes,2594650,ISSUE,2018-04-17T00:00:00.000,2018-12-21T00:00:00.000,2018-12-21T00:00:00.000,False,2018-12-21T00:00:00.000,2021-07-15T00:00:00.000,2018-12-21T00:00:00.000,2018-12-21T00:00:00.000,AAI,,3,41.769511072,-87.722380058
4,2594650-20210116,2761701,426833,1,"PIRMA AMERICA, INC.",PIRMA,6801 - 6805 S PULASKI RD,CHICAGO,IL,60629,23,44,23-44,8,1010,Limited Business License,911,Retail Sales of Clothing / Accessories / Shoes,2594650,RENEW,,2020-11-15T00:00:00.000,2021-01-12T00:00:00.000,False,2021-01-16T00:00:00.000,2023-01-15T00:00:00.000,2021-01-12T00:00:00.000,2021-01-13T00:00:00.000,AAI,,3,41.769511072,-87.722380058
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48517,32813-20120616,2148403,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2012-04-16T00:00:00.000,2012-04-23T00:00:00.000,False,2012-06-16T00:00:00.000,2014-06-15T00:00:00.000,2012-04-23T00:00:00.000,2013-05-30T00:00:00.000,AAI,,17,41.949706195,-87.658701413
48518,32813-20140616,2321713,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2014-04-15T00:00:00.000,2014-06-10T00:00:00.000,False,2014-06-16T00:00:00.000,2016-06-15T00:00:00.000,2014-06-10T00:00:00.000,2014-06-11T00:00:00.000,AAI,,17,41.949706195,-87.658701413
48519,32813-20160616,2459164,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2016-04-15T00:00:00.000,2016-06-07T00:00:00.000,False,2016-06-16T00:00:00.000,2018-06-15T00:00:00.000,2016-06-07T00:00:00.000,2016-06-13T00:00:00.000,AAI,,17,41.949706195,-87.658701413
48520,32813-20180616,2590563,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2018-04-15T00:00:00.000,2018-06-14T00:00:00.000,False,2018-06-16T00:00:00.000,2019-06-15T00:00:00.000,2018-06-14T00:00:00.000,2018-06-14T00:00:00.000,AAI,,17,41.949706195,-87.658701413


**Data validation**

Use pandera to validate the data and convert each column to the correct type.

In [10]:
business_license_schema = pa.DataFrameSchema({
    "id": pa.Column(str, coerce=True),
    "license_id": pa.Column(str, coerce=True, unique=True), # Primary Key
    "account_number": pa.Column(str, coerce=True),
    "site_number": pa.Column(str, coerce=True),
    "legal_name": pa.Column(str, coerce=True),
    "doing_business_as_name": pa.Column(str, coerce=True, nullable=True),
    "address": pa.Column(str, coerce=True),
    "city": pa.Column(str, coerce=True, nullable=True, checks=[
        pa.Check.eq("CHICAGO")
    ]),
    "state": pa.Column(str, coerce=True, nullable=True, checks=[
        pa.Check.eq("IL")
    ]),
    "zip_code": pa.Column(str, coerce=True, nullable=True, checks=[
        pa.Check(lambda x: x.str.match(r'^\d{5}$').all())
    ]),
    "ward": pa.Column(str, coerce=True, nullable=True),
    "precinct": pa.Column(str, coerce=True, nullable=True),
    "ward_precinct": pa.Column(str, coerce=True, nullable=True),
    "police_district": pa.Column(pa.Category, coerce=True, nullable=True),
    "license_code": pa.Column(pa.Category, coerce=True, checks=[
        pa.Check.isin(in_scope_license_codes)
    ]),
    "license_description": pa.Column(str, coerce=True),
    "business_activity_id": pa.Column(str, coerce=True, nullable=True),
    "business_activity": pa.Column(pa.Category, coerce=True, nullable=True),
    "license_number": pa.Column(str, coerce=True),
    "application_type": pa.Column(pa.Category, coerce=True),
    "application_created_date": pa.Column(str, coerce=True, nullable=True),
    "application_requirements_complete": pa.Column(pa.DateTime, coerce=True, nullable=True),
    "payment_date": pa.Column(pa.DateTime, coerce=True, nullable=True),
    "conditional_approval": pa.Column(bool, coerce=True),
    "license_start_date": pa.Column(pa.DateTime, coerce=True, nullable=True),
    "expiration_date": pa.Column(pa.DateTime, coerce=True, nullable=True),
    "license_approved_for_issuance": pa.Column(pa.DateTime, coerce=True, nullable=True),
    "date_issued": pa.Column(pa.DateTime, coerce=True),
    "license_status": pa.Column(pa.Category, coerce=True),
    "license_status_change_date": pa.Column(pa.DateTime, coerce=True, nullable=True),
    "ssa": pa.Column(str, coerce=True, nullable=True),
    "latitude": pa.Column(pa.Float, coerce=True, nullable=True, checks=[
        pa.Check.between(38, 44)
    ]),
    "longitude": pa.Column(pa.Float, coerce=True, nullable=True, checks=[
        pa.Check.between(-89, -84)
    ]),
})



business_license_validated = business_license_schema.validate(business_license_tidy)
business_license_validated

Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,ward,precinct,ward_precinct,police_district,license_code,license_description,business_activity_id,business_activity,license_number,application_type,application_created_date,application_requirements_complete,payment_date,conditional_approval,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude
0,2594643-20180417,2594643,425578,1,CHILDHOOD FRACTURED,CHILDHOOD FRACTURED,1432 W IRVING PARK RD 1ST,CHICAGO,IL,60613,47,24,47-24,19,1010,Limited Business License,602 | 709,Administrative Commercial Office | Miscellaneo...,2594643,ISSUE,2018-04-17T00:00:00.000,2018-04-17,2018-04-17,False,2018-04-17,2021-07-15,2018-04-17,2018-04-17,AAI,NaT,,41.954446,-87.665535
1,2594647-20180417,2594647,426826,1,"BOOST ON DAMEN, INC.",ABC CHOICE,1934 W 79TH ST 1ST,CHICAGO,IL,60620,17,41,17-41,6,1010,Limited Business License,922,Retail Sales of Cell Phones and Accessories,2594647,ISSUE,2018-04-17T00:00:00.000,2018-04-17,2018-04-17,False,2018-04-17,2020-07-15,2018-04-17,2018-04-17,AAI,NaT,,41.750353,-87.672202
2,2594647-20200516,2723053,426826,1,"BOOST ON DAMEN, INC.",ABC CHOICE,1934 W 79TH ST 1ST,CHICAGO,IL,60620,17,41,17-41,6,1010,Limited Business License,922,Retail Sales of Cell Phones and Accessories,2594647,RENEW,,2020-03-15,2020-06-11,False,2020-05-16,2022-05-15,2020-06-11,2020-06-12,AAI,NaT,,41.750353,-87.672202
3,2594650-20181221,2594650,426833,1,"PIRMA AMERICA, INC.",PIRMA,6801 - 6805 S PULASKI RD,CHICAGO,IL,60629,23,44,23-44,8,1010,Limited Business License,911,Retail Sales of Clothing / Accessories / Shoes,2594650,ISSUE,2018-04-17T00:00:00.000,2018-12-21,2018-12-21,False,2018-12-21,2021-07-15,2018-12-21,2018-12-21,AAI,NaT,3,41.769511,-87.722380
4,2594650-20210116,2761701,426833,1,"PIRMA AMERICA, INC.",PIRMA,6801 - 6805 S PULASKI RD,CHICAGO,IL,60629,23,44,23-44,8,1010,Limited Business License,911,Retail Sales of Clothing / Accessories / Shoes,2594650,RENEW,,2020-11-15,2021-01-12,False,2021-01-16,2023-01-15,2021-01-12,2021-01-13,AAI,NaT,3,41.769511,-87.722380
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48517,32813-20120616,2148403,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2012-04-16,2012-04-23,False,2012-06-16,2014-06-15,2012-04-23,2013-05-30,AAI,NaT,17,41.949706,-87.658701
48518,32813-20140616,2321713,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2014-04-15,2014-06-10,False,2014-06-16,2016-06-15,2014-06-10,2014-06-11,AAI,NaT,17,41.949706,-87.658701
48519,32813-20160616,2459164,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2016-04-15,2016-06-07,False,2016-06-16,2018-06-15,2016-06-07,2016-06-13,AAI,NaT,17,41.949706,-87.658701
48520,32813-20180616,2590563,14590,1,MATSUI-JONES CORPORATION,FULL SHILLING,3724-3726 N CLARK ST,CHICAGO,IL,60613,44,50,44-50,19,1006,Retail Food Establishment,775,Retail Sales of Perishable Foods,32813,RENEW,,2018-04-15,2018-06-14,False,2018-06-16,2019-06-15,2018-06-14,2018-06-14,AAI,NaT,17,41.949706,-87.658701


Insert the data into postgresql.

In [11]:
# Insert the data into postgres. Inserting large amounts of data can be slow, so
# iterate over 10,000 rows at a time.

n_rows = business_license_validated.shape[0]
step_size = 10_000

for i in range(0, n_rows, step_size):
    index_start = i
    index_end = min(n_rows, i + step_size - 1)
    
    if i == 0:
        if_exists = "replace"
    else:
        if_exists = "append"

    print(f"Inserting rows: {index_start:,} - {index_end:,}")
    
    business_license_validated \
        .loc[index_start:index_end, :] \
        .to_sql("business_license_validated", engine, if_exists=if_exists, index=False)

Inserting rows: 0 - 9,999
Inserting rows: 10,000 - 19,999
Inserting rows: 20,000 - 29,999
Inserting rows: 30,000 - 39,999
Inserting rows: 40,000 - 48,522


In [12]:
# Confirm number of rows
with engine.begin() as conn:
    query = text("SELECT COUNT(*) FROM business_license_validated")
    _ = pd.read_sql_query(query, conn)

print(_)

   count
0  48522


## Data set (2): Food inspections

<https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5>

In [13]:
food_inspection_raw

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,address,city,state,zip,inspection_date,inspection_type,results,violations,latitude,longitude,location
0,104236,TEMPO CAFE,TEMPO CAFE,80916,Restaurant,Risk 1 (High),6 E CHESTNUT ST,CHICAGO,IL,60611,2010-01-04T00:00:00.000,Canvass,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,41.89843137207629,-87.6280091630558,"(41.89843137207629, -87.6280091630558)"
1,67733,WOLCOTT'S,TROQUET,1992040,Restaurant,Risk 1 (High),1834 W MONTROSE AVE,CHICAGO,IL,60613,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.961605669949854,-87.67596676683779,"(41.961605669949854, -87.67596676683779)"
2,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948,Restaurant,Risk 1 (High),8750 W BRYN WAWR AVE,CHICAGO,IL,60631,2010-01-04T00:00:00.000,License,Fail,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,,,
3,52234,Cafe 608,Cafe 608,2013328,Restaurant,Risk 1 (High),608 W BARRY AVE,CHICAGO,IL,60657,2010-01-04T00:00:00.000,License Re-Inspection,Pass,,41.938006880423615,-87.6447545707008,"(41.938006880423615, -87.6447545707008)"
4,67757,DUNKIN DONUTS/BASKIN-ROBBINS,DUNKIN DONUTS/BASKIN-ROBBINS,1380279,Restaurant,Risk 2 (Medium),100 W RANDOLPH ST,CHICAGO,IL,60601,2010-01-04T00:00:00.000,Tag Removal,Pass,,41.88458626715456,-87.63101044588599,"(41.88458626715456, -87.63101044588599)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,1522545,PAKEEZA,PAKEEZA,1991022,Restaurant,Risk 1 (High),1009-1011 N ORLEANS ST,CHICAGO,IL,60610,2015-02-03T00:00:00.000,Complaint,Fail,8. SANITIZING RINSE FOR EQUIPMENT AND UTENSILS...,41.900827710854195,-87.63718892231834,"(41.900827710854195, -87.63718892231834)"
99996,1522547,MOUNT GREENWOOD ELEMENTARY,MOUNT GREENWOOD ELEMENTARY,24591,School,Risk 2 (Medium),10841 S Homan (3400W),CHICAGO,IL,60655,2015-02-03T00:00:00.000,Canvass,Pass w/ Conditions,21. * CERTIFIED FOOD MANAGER ON SITE WHEN POTE...,41.695748452263636,-87.70577383458786,"(41.695748452263636, -87.70577383458786)"
99997,1522593,NEW LITTLE CHINA,NEW LITTLE CHINA,1844410,Restaurant,Risk 1 (High),1737 E 95TH ST,CHICAGO,IL,60617,2015-02-03T00:00:00.000,Canvass,Pass,"30. FOOD IN ORIGINAL CONTAINER, PROPERLY LABEL...",41.72228503083544,-87.58107166595508,"(41.72228503083544, -87.58107166595508)"
99998,1522563,TACO JOE INC,TACO JOE INC,54004,Restaurant,Risk 1 (High),3458 W 111TH ST,CHICAGO,IL,60655,2015-02-03T00:00:00.000,Canvass,Out of Business,,41.691554773175255,-87.70810064840053,"(41.691554773175255, -87.70810064840053)"


**Data cleaning**

Apply some basic cleaning steps to the data.

In [14]:
food_inspection_tidy = (food_inspection_raw

    # Filter on the relevant state and city only.
    .loc[food_inspection_raw["state"] == "IL"]
    .loc[food_inspection_raw["city"] == "CHICAGO"]

    # Drop columns that also exist in the business license data.
    .drop(columns=["address", "city", "state", "latitude", "longitude", "location"])

    # Convert categorical columns to be all upper case for consistency
    .assign(
        dba_name=lambda x: x["dba_name"].str.upper(),
        aka_name=lambda x: x["aka_name"].str.upper(),
        facility_type=lambda x: x["facility_type"].str.upper(),
        risk=lambda x: x["risk"].str.upper(),
        inspection_type=lambda x: x["inspection_type"].str.upper(),
        results=lambda x: x["results"].str.upper(),
        violations=lambda x: x["violations"].str.upper(),
    )

    # Specify the order of categorical columns.
    .assign(risk=lambda x: x["risk"].astype("category").cat.set_categories(["ALL", "RISK 1 (HIGH)", "RISK 2 (MEDIUM)", "RISK 3 (LOW)"], ordered=True))

    # The "violations" can have multiple violations separated by a "|". E.g.
    # "32. FOOD AND NON-FOOD ... REPLACED. | 33. FOOD AND NON-FOOD CONTACT E"
    # To make the data easier to work with split each violation into its own item.
    # The result is the violations column will contain a list of strings.
    .assign(violations=lambda x: x["violations"].str.split(pat=" \| "))

    # Reset the index.
    .reset_index(drop=True)
)

food_inspection_tidy

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,zip,inspection_date,inspection_type,results,violations
0,104236,TEMPO CAFE,TEMPO CAFE,80916,RESTAURANT,RISK 1 (HIGH),60611,2010-01-04T00:00:00.000,CANVASS,FAIL,[18. NO EVIDENCE OF RODENT OR INSECT OUTER OPE...
1,67733,WOLCOTT'S,TROQUET,1992040,RESTAURANT,RISK 1 (HIGH),60613,2010-01-04T00:00:00.000,LICENSE RE-INSPECTION,PASS,
2,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948,RESTAURANT,RISK 1 (HIGH),60631,2010-01-04T00:00:00.000,LICENSE,FAIL,[18. NO EVIDENCE OF RODENT OR INSECT OUTER OPE...
3,52234,CAFE 608,CAFE 608,2013328,RESTAURANT,RISK 1 (HIGH),60657,2010-01-04T00:00:00.000,LICENSE RE-INSPECTION,PASS,
4,67757,DUNKIN DONUTS/BASKIN-ROBBINS,DUNKIN DONUTS/BASKIN-ROBBINS,1380279,RESTAURANT,RISK 2 (MEDIUM),60601,2010-01-04T00:00:00.000,TAG REMOVAL,PASS,
...,...,...,...,...,...,...,...,...,...,...,...
99557,1522545,PAKEEZA,PAKEEZA,1991022,RESTAURANT,RISK 1 (HIGH),60610,2015-02-03T00:00:00.000,COMPLAINT,FAIL,[8. SANITIZING RINSE FOR EQUIPMENT AND UTENSIL...
99558,1522547,MOUNT GREENWOOD ELEMENTARY,MOUNT GREENWOOD ELEMENTARY,24591,SCHOOL,RISK 2 (MEDIUM),60655,2015-02-03T00:00:00.000,CANVASS,PASS W/ CONDITIONS,[21. * CERTIFIED FOOD MANAGER ON SITE WHEN POT...
99559,1522593,NEW LITTLE CHINA,NEW LITTLE CHINA,1844410,RESTAURANT,RISK 1 (HIGH),60617,2015-02-03T00:00:00.000,CANVASS,PASS,"[30. FOOD IN ORIGINAL CONTAINER, PROPERLY LABE..."
99560,1522563,TACO JOE INC,TACO JOE INC,54004,RESTAURANT,RISK 1 (HIGH),60655,2015-02-03T00:00:00.000,CANVASS,OUT OF BUSINESS,


**Data validation**

Use pandera to validate the data and convert each column to the correct type.

In [15]:
food_inspection_schema = pa.DataFrameSchema({
    "inspection_id": pa.Column(str, coerce=True, unique=True), # Primary Key
    "dba_name": pa.Column(str, coerce=True),
    "aka_name": pa.Column(str, coerce=True, nullable=True),
    "license_": pa.Column(str, coerce=True, nullable=True), # Foreign Key
    "facility_type": pa.Column(pa.Category, coerce=True, nullable=True),
    "risk": pa.Column(str, coerce=True, nullable=True, checks=[
        pa.Check.isin(["ALL", "RISK 1 (HIGH)", "RISK 2 (MEDIUM)", "RISK 3 (LOW)"])
    ]),
    "zip": pa.Column(str, coerce=True, nullable=True),
    "inspection_date": pa.Column(pa.DateTime, coerce=True),
    "inspection_type": pa.Column(pa.Category, coerce=True, nullable=True),
    "results": pa.Column(pa.Category, coerce=True),
    "violations": pa.Column(pa.Object, coerce=True, nullable=True)
})

food_inspection_validated = food_inspection_schema.validate(food_inspection_tidy)
food_inspection_validated

Unnamed: 0,inspection_id,dba_name,aka_name,license_,facility_type,risk,zip,inspection_date,inspection_type,results,violations
0,104236,TEMPO CAFE,TEMPO CAFE,80916,RESTAURANT,RISK 1 (HIGH),60611,2010-01-04,CANVASS,FAIL,[18. NO EVIDENCE OF RODENT OR INSECT OUTER OPE...
1,67733,WOLCOTT'S,TROQUET,1992040,RESTAURANT,RISK 1 (HIGH),60613,2010-01-04,LICENSE RE-INSPECTION,PASS,
2,67738,MICHAEL'S ON MAIN CAFE,MICHAEL'S ON MAIN CAFE,2008948,RESTAURANT,RISK 1 (HIGH),60631,2010-01-04,LICENSE,FAIL,[18. NO EVIDENCE OF RODENT OR INSECT OUTER OPE...
3,52234,CAFE 608,CAFE 608,2013328,RESTAURANT,RISK 1 (HIGH),60657,2010-01-04,LICENSE RE-INSPECTION,PASS,
4,67757,DUNKIN DONUTS/BASKIN-ROBBINS,DUNKIN DONUTS/BASKIN-ROBBINS,1380279,RESTAURANT,RISK 2 (MEDIUM),60601,2010-01-04,TAG REMOVAL,PASS,
...,...,...,...,...,...,...,...,...,...,...,...
99557,1522545,PAKEEZA,PAKEEZA,1991022,RESTAURANT,RISK 1 (HIGH),60610,2015-02-03,COMPLAINT,FAIL,[8. SANITIZING RINSE FOR EQUIPMENT AND UTENSIL...
99558,1522547,MOUNT GREENWOOD ELEMENTARY,MOUNT GREENWOOD ELEMENTARY,24591,SCHOOL,RISK 2 (MEDIUM),60655,2015-02-03,CANVASS,PASS W/ CONDITIONS,[21. * CERTIFIED FOOD MANAGER ON SITE WHEN POT...
99559,1522593,NEW LITTLE CHINA,NEW LITTLE CHINA,1844410,RESTAURANT,RISK 1 (HIGH),60617,2015-02-03,CANVASS,PASS,"[30. FOOD IN ORIGINAL CONTAINER, PROPERLY LABE..."
99560,1522563,TACO JOE INC,TACO JOE INC,54004,RESTAURANT,RISK 1 (HIGH),60655,2015-02-03,CANVASS,OUT OF BUSINESS,


Insert the data into postgresql.

In [16]:

# Insert the data into postgres. Inserting large amounts of data can be slow, so
# iterate over 10,000 rows at a time.

n_rows = food_inspection_validated.shape[0]
step_size = 10_000

for i in range(0, n_rows, step_size):
    index_start = i
    index_end = min(n_rows, i + step_size - 1)
    
    if i == 0:
        if_exists = "replace"
    else:
        if_exists = "append"

    print(f"Inserting rows: {index_start:,} - {index_end:,}")

    food_inspection_validated \
        .loc[index_start:index_end, :] \
        .to_sql("food_inspection_validated", engine, if_exists=if_exists, index=False)

Inserting rows: 0 - 9,999
Inserting rows: 10,000 - 19,999
Inserting rows: 20,000 - 29,999
Inserting rows: 30,000 - 39,999
Inserting rows: 40,000 - 49,999
Inserting rows: 50,000 - 59,999
Inserting rows: 60,000 - 69,999
Inserting rows: 70,000 - 79,999
Inserting rows: 80,000 - 89,999
Inserting rows: 90,000 - 99,562


In [17]:
# Confirm number of rows
with engine.begin() as conn:
    query = text("SELECT COUNT(*) FROM food_inspection_validated")
    _ = pd.read_sql_query(query, conn)

print(_)

   count
0  99562


## Data set (3): Map Data

Generate map data for use in downstream applications.

In [18]:
business_license_validated.sort_values(by=["legal_name", "expiration_date"])

Unnamed: 0,id,license_id,account_number,site_number,legal_name,doing_business_as_name,address,city,state,zip_code,ward,precinct,ward_precinct,police_district,license_code,license_description,business_activity_id,business_activity,license_number,application_type,application_created_date,application_requirements_complete,payment_date,conditional_approval,license_start_date,expiration_date,license_approved_for_issuance,date_issued,license_status,license_status_change_date,ssa,latitude,longitude
36039,2895839-20230224,2895839,494470,1,"""ART ON YOU"" STUDIO INC.","""ART ON YOU"" STUDIO INC.",1838 N WESTERN AVE 102,CHICAGO,IL,60647,1,29,1-29,14,1010,Limited Business License,1112,Provide Photography or Videography Services,2895839,ISSUE,2023-02-24T00:00:00.000,2023-02-24,2023-02-24,False,2023-02-24,2025-03-15,2023-02-24,2023-02-24,AAI,NaT,33,41.915086,-87.687494
12546,2677346-20190621,2677346,457471,1,"""V"" STEAM LUXURY BAR LLC","""V"" STEAM LUXURY BAR",4016 S WESTERN AVE 1,CHICAGO,IL,60609,12,21,12-21,9,1010,Limited Business License,956 | 709 | 767 | 896,Provide Waxing Services | Miscellaneous Person...,2677346,ISSUE,2019-06-21T00:00:00.000,2019-06-21,2019-06-21,False,2019-06-21,2021-07-15,2019-06-21,2019-06-21,AAI,NaT,,41.820586,-87.684900
12547,2677346-20210716,2790866,457471,1,"""V"" STEAM LUXURY BAR LLC","""V"" STEAM LUXURY BAR",4016 S WESTERN AVE 1,CHICAGO,IL,60609,12,21,12-21,9,1010,Limited Business License,709 | 767 | 896 | 956,Miscellaneous Personal Services | Retail Sales...,2677346,RENEW,,2021-05-15,2021-07-15,False,2021-07-16,2023-07-15,2021-07-15,2022-05-02,AAI,NaT,,41.820586,-87.684900
14992,2695112-20191028,2695112,463564,1,"# 1 CHOP SUEY RESTAURANT, INC.","# 1 CHOP SUEY RESTAURANT, INC.",7342 S STONY ISLAND AVE 2,CHICAGO,IL,60649,7,1,7-1,3,1006,Retail Food Establishment,782,Sale of Food Prepared Onsite Without Dining Area,2695112,ISSUE,2019-10-11T00:00:00.000,2019-10-11,2019-10-11,False,2019-10-28,2021-11-15,2019-10-25,2019-10-28,AAI,NaT,42,41.761138,-87.586346
14993,2695112-20211116,2812216,463564,1,"# 1 CHOP SUEY RESTAURANT, INC.","# 1 CHOP SUEY RESTAURANT, INC.",7342 S STONY ISLAND AVE 2,CHICAGO,IL,60649,7,1,7-1,3,1006,Retail Food Establishment,782,Sale of Food Prepared Onsite Without Dining Area,2695112,RENEW,,2021-09-15,2022-05-06,False,2021-11-16,2023-11-15,2022-05-06,2022-05-09,AAI,NaT,42,41.761138,-87.586346
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35673,2891036-20230201,2891036,493821,1,"thyssenkrupp Supply Chain Services NA, Inc.","thyssenkrupp Supply Chain Services NA, Inc.",303 E WACKER DR 21 HIVE 7,CHICAGO,IL,60601,42,,42-,,1010,Limited Business License,602,Administrative Commercial Office,2891036,ISSUE,2023-01-31T00:00:00.000,2023-01-31,2023-02-01,False,2023-02-01,2025-02-15,2023-02-01,2023-02-01,AAI,NaT,,41.887705,-87.620588
18804,2728371-20200513,2728371,467899,1,two brothers pizza inc,Two brothers pizza inc./Domino's,3145 S ASHLAND AVE GROUND 9,CHICAGO,IL,60609,12,,12-,,1006,Retail Food Establishment,782,Sale of Food Prepared Onsite Without Dining Area,2728371,ISSUE,2020-04-30T00:00:00.000,2020-04-30,2020-04-30,False,2020-05-13,2022-05-15,2020-05-12,2020-05-13,AAI,NaT,13,41.836468,-87.665567
18805,2728371-20220516,2839711,467899,1,two brothers pizza inc,Two brothers pizza inc./Domino's,3145 S ASHLAND AVE GROUND 9,CHICAGO,IL,60609,12,,12-,,1006,Retail Food Establishment,782,Sale of Food Prepared Onsite Without Dining Area,2728371,RENEW,,2022-03-15,2022-09-20,False,2022-05-16,2024-05-15,2022-09-20,2022-09-21,AAI,NaT,13,41.836468,-87.665567
21402,2754011-20201013,2754011,470651,1,viridiana lucas,ventas viri,4100 S ASHLAND AVE OUT,CHICAGO,IL,60609,12,,12-,,1010,Limited Business License,904,Retail Sales of General Merchandise,2754011,ISSUE,2020-10-09T00:00:00.000,2020-10-09,2020-10-09,False,2020-10-13,2022-10-15,2020-10-09,2020-10-13,AAI,NaT,10,41.819441,-87.665376


The map data should have one row for each geographical location. The business license data could have many rows for each location because a location could have multiple licenses. To tidy this data we will collapse the data so that each location has only one row. The license details of that location will be nested into a new column "license_data".

In [19]:


# Only keep a subset of the columns that are relevant for mapping.
map_cols = [
    "legal_name", 
    "doing_business_as_name",
    "address", 
    "zip_code",
    "latitude",
    "longitude",
    "license_id",
    "license_code",
    "license_description",
    "license_start_date",
    "expiration_date",
]

# Apply the data cleaning steps.
map_data = (
    business_license_validated
    # .head(100_000)
    .loc[:, map_cols]
    .drop_duplicates()
    .reset_index(drop=True)
    .groupby([
        "legal_name", 
        "doing_business_as_name",
        "address", 
        "zip_code",
        "latitude",
        "longitude"
    ])
    .apply(lambda x: [{
        "license_id": row["license_id"], 
        "license_code": row["license_code"], 
        "license_description": row["license_description"],
        "license_start_date": row["license_start_date"],
        "expiration_date": row["expiration_date"],
    } for _, row in x.iterrows()])
    .reset_index()
    .rename({0: "license_data"}, axis=1)
)

map_data

Unnamed: 0,legal_name,doing_business_as_name,address,zip_code,latitude,longitude,license_data
0,"""ART ON YOU"" STUDIO INC.","""ART ON YOU"" STUDIO INC.",1838 N WESTERN AVE 102,60647,41.915086,-87.687494,"[{'license_id': '2895839', 'license_code': '10..."
1,"""V"" STEAM LUXURY BAR LLC","""V"" STEAM LUXURY BAR",4016 S WESTERN AVE 1,60609,41.820586,-87.684900,"[{'license_id': '2677346', 'license_code': '10..."
2,"# 1 CHOP SUEY RESTAURANT, INC.","# 1 CHOP SUEY RESTAURANT, INC.",7342 S STONY ISLAND AVE 2,60649,41.761138,-87.586346,"[{'license_id': '2695112', 'license_code': '10..."
3,#1 GYM,#1 GYM,3055 N SHEFFIELD AVE 1ST,60657,41.937930,-87.653893,"[{'license_id': '1205262', 'license_code': '10..."
4,#1 MARKET INC,#1 MARKET INC,2555 W 63RD ST,60629,41.779082,-87.688290,"[{'license_id': '2753756', 'license_code': '10..."
...,...,...,...,...,...,...,...
20148,taxline accounitng and tax professionals inc,taxline accounitng and tax professionals inc,4721 S CICERO AVE F,60632,41.806969,-87.742988,"[{'license_id': '2744414', 'license_code': '10..."
20149,"thyssenkrupp Supply Chain Services NA, Inc.","thyssenkrupp Supply Chain Services NA, Inc.",303 E WACKER DR 21 HIVE 7,60601,41.887705,-87.620588,"[{'license_id': '2891036', 'license_code': '10..."
20150,two brothers pizza inc,Two brothers pizza inc./Domino's,3145 S ASHLAND AVE GROUND 9,60609,41.836468,-87.665567,"[{'license_id': '2728371', 'license_code': '10..."
20151,viridiana lucas,ventas viri,4100 S ASHLAND AVE OUT,60609,41.819441,-87.665376,"[{'license_id': '2754011', 'license_code': '10..."


The license details of a specific row can be accessed as a list of dictionaries in the `license_data` column.

In [20]:
map_data.loc[2, :]

legal_name                                   # 1 CHOP SUEY RESTAURANT, INC.
doing_business_as_name                       # 1 CHOP SUEY RESTAURANT, INC.
address                                           7342 S STONY ISLAND AVE 2
zip_code                                                              60649
latitude                                                          41.761138
longitude                                                        -87.586346
license_data              [{'license_id': '2695112', 'license_code': '10...
Name: 2, dtype: object

In [21]:
map_data.loc[2, "license_data"]

[{'license_id': '2695112',
  'license_code': '1006',
  'license_description': 'Retail Food Establishment',
  'license_start_date': Timestamp('2019-10-28 00:00:00'),
  'expiration_date': Timestamp('2021-11-15 00:00:00')},
 {'license_id': '2812216',
  'license_code': '1006',
  'license_description': 'Retail Food Establishment',
  'license_start_date': Timestamp('2021-11-16 00:00:00'),
  'expiration_date': Timestamp('2023-11-15 00:00:00')}]

Save the map data as a pin on Connect for easy access by other applications.

In [24]:
# Set up the board
board = pins.board_connect(server_url=os.environ["CONNECT_SERVER"])
user_name = "sam.edwardes"

In [25]:
# Pin the data to Connect
pin_name = f"{user_name}/chicago-business-map-data"

board.pin_write(
    map_data, 
    name=pin_name, 
    type="arrow", 
    versioned=True,
    title="City of Chicago - Business Map Data"
)

Writing pin:
Name: 'sam.edwardes/chicago-business-map-data'
Version: 20230811T012059Z-97fa3


Meta(title='City of Chicago - Business Map Data', description=None, created='20230811T012059Z', pin_hash='97fa3888e87c66be', file='chicago-business-map-data.arrow', file_size=2559914, type='arrow', api_version=1, version=VersionRaw(version='5'), tags=None, name='sam.edwardes/chicago-business-map-data', user={}, local={})