# STA 141 Data & Web Technologies for Data Analysis


### Lecture 10, 02/05/26, APIs


### Today's topics

- Undocumented APIs

### Announcements

- Midterm next Thursday

### Ressources
- [University of California Compensation](https://ucannualwage.ucop.edu/wage/)
- [Yolo County Health Inspections](https://inspections.myhealthdepartment.com/yolocountyeh/)


[OLD Health insepection](https://yoloeco.envisionconnect.com/)

### Recap: HTTP

A response to an HTTP request always includes a status code that summarizes whether the request was successful. Wikipedia has a full [list of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). Generally,

* 200-299: Your request succeeded.
* 300-399: You need to take further action to complete the request.
* 400-499: Your request wasn't valid (you made a mistake). You've probably seen 404 before!
* 500-599: Your request failed (the server made a mistake).

Some important status codes:
- 200: OK
- 403: Access denied
- 404: Not found

#### Documented API:

In [107]:
import numpy as np
import pandas as pd
import time
import requests

https://www.whatismybrowser.com/detect/what-is-my-user-agent/

In [108]:
# Define the User-Agent header
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

response = requests.get("https://itunes.apple.com/search", headers = headers, params = {
        "term": "swift", 
        "media": "music",
        "entity": "album",
        "attribute": "artistTerm", 
        "country": "US", 
        "limit": "1"
    })

response.raise_for_status()
result = response.json()

In [109]:
result

{'resultCount': 1,
 'results': [{'wrapperType': 'collection',
   'collectionType': 'Album',
   'artistId': 159260351,
   'collectionId': 1440933849,
   'amgArtistId': 816977,
   'artistName': 'Taylor Swift',
   'collectionName': 'reputation',
   'collectionCensoredName': 'reputation',
   'artistViewUrl': 'https://music.apple.com/us/artist/taylor-swift/159260351?uo=4',
   'collectionViewUrl': 'https://music.apple.com/us/album/reputation/1440933849?uo=4',
   'artworkUrl60': 'https://is1-ssl.mzstatic.com/image/thumb/Music221/v4/eb/e6/06/ebe606da-e00f-82d3-47f3-b79904eed541/17UM1IM24651.rgb.jpg/60x60bb.jpg',
   'artworkUrl100': 'https://is1-ssl.mzstatic.com/image/thumb/Music221/v4/eb/e6/06/ebe606da-e00f-82d3-47f3-b79904eed541/17UM1IM24651.rgb.jpg/100x100bb.jpg',
   'collectionPrice': 9.99,
   'collectionExplicitness': 'notExplicit',
   'trackCount': 15,
   'copyright': '℗ 2017 Taylor Swift',
   'country': 'USA',
   'currency': 'USD',
   'releaseDate': '2017-11-10T08:00:00Z',
   'primaryGen

### Undocumented Web APIs

Many websites use undocumented web APIs to get data. For example:

 - [University of California Compensation](https://ucannualwage.ucop.edu/wage/)
 - [Yolo County Health Inspections](https://inspections.myhealthdepartment.com/yolocountyeh/)

You can identify these websites by looking at requests in your browser's developer tools. For Firefox and Chrome these can be accessed (Windows: <kbd>Ctrl</kbd> + <kbd>i</kbd>; MacOS: <kbd>&#8984;</kbd> + <kbd>&#8997;</kbd> + <kbd>i</kbd>).

Requests to web APIs almost always return JSON or XML data. By examining the browser requests, you can work out the endpoints and parameters, allowing you to use the API.

**CAUTION:** Web APIs that are undocumented are often undocumented for a reason. Using an undocumented API may make someone angry or get you into legal trouble! Government and quasi-government websites (like the examples above) are probably okay, as long as you cache and rate-limit your requests. For everything else, find for an alternative or get permission first.

Let's reverse engineer the webiste so that we can get the data.

### UC WAGE

Okay, so how to find an undocumented API?

A __step by step__ guide:
- Open the [website](https://ucannualwage.ucop.edu/wage/).
- Inspect the page either by `right-click + Inspect` or using `cmd + i`.
- Click on the tab `Network`.
- Interact with the page to enforce the API request (e.g. Click on the Search button).
- (Optional): Filter the urls.
- Look for a POST/GET method whose return is, ideally, in `json`.
- Click on the element.
- On the right-hand side: select the field `response` to see what is returned.
- On the right-hand side: select the field `request` to see how to access the data.
- Add these parameters to your request in Python. (Hint: right-click, Copy Value, Copy POST DATA, paste them in Python)!

{"op":"search",
 "page":1,
 "rows":20,
 "sidx":"firsßtname",
 "sord":"asc",
 "count":0,
 "year":"2024",
 "firstname":"",
 "location":"ALL",
 "lastname":"",
 "title":"Prof",
 "startSal":"",
 "endSal":""}

In [None]:
{"op":"search","page":1,"rows":60,"sidx":"lastname","sord":"asc","count":0,"year":"2023","firstname":"","location":"ALL","lastname":"","title":"Prof","startSal":"","endSal":""}

In [None]:
https://ucannualwage.ucop.edu/wage/search

In [110]:
import requests
import requests_cache

In [111]:
session = requests_cache.CachedSession('../output/uc_salary')

In [112]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}

json_data = {
    "op":"search",
    "page":1,
    "rows":60,
    "sidx":"firstname",
    "sord":"asc",
     "count":0,
     "year":"2024",
     "firstname":"",
     "location":"ALL",
     "lastname":"",
     "title":"Prof",
     "startSal":"",
     "endSal":""
}

response = session.post('https://ucannualwage.ucop.edu/wage/search', headers=headers, json=json_data)

In [113]:
response.raise_for_status()

In [114]:
response.json()

{'op': 'search',
 'records': 26088,
 'page': 1,
 'total': 435,
 'pageSize': 60,
 'sidx': 'firstname',
 'sord': 'asc',
 'rows': [{'id': 1,
   'year': '2024',
   'location': 'Berkeley',
   'firstname': '*****',
   'lastname': '*****',
   'title': 'ASST PROF-AY',
   'grosspay': '86,266.00',
   'basepay': '55,542.00',
   'overtimepay': '0.00',
   'adjustpay': '30,724.00'},
  {'id': 2,
   'year': '2024',
   'location': 'Berkeley',
   'firstname': '*****',
   'lastname': '*****',
   'title': 'ASST PROF-AY',
   'grosspay': '196,237.00',
   'basepay': '147,167.00',
   'overtimepay': '0.00',
   'adjustpay': '49,070.00'},
  {'id': 3,
   'year': '2024',
   'location': 'Berkeley',
   'firstname': '*****',
   'lastname': '*****',
   'title': 'ATH PROFL 1 BYA',
   'grosspay': '1,710.00',
   'basepay': '0.00',
   'overtimepay': '0.00',
   'adjustpay': '1,710.00'},
  {'id': 4,
   'year': '2024',
   'location': 'Berkeley',
   'firstname': '*****',
   'lastname': '*****',
   'title': 'ATH PROFL 1 BYA',


In [115]:
def get_page(page_nr, location = 'Davis', ):
    json_data = {
        "op":"search",
        "page":page_nr,
        "rows":60,
        "sidx":"grosspay",
        "sord":"desc",
         "count":1200, # limit to 20 pages
         "year":"2024",
         "firstname":"",
         "location":location,
         "lastname":"",
         "title":"PROF-AY",
         "startSal":"",
         "endSal":""
    }

    response = session.post('https://ucannualwage.ucop.edu/wage/search', headers=headers, json=json_data)
    try:
        response.raise_for_status()
        return(response.json()['rows'])
    except:
        return []

In [116]:
data = []
for p in range(20):
    time.sleep(1)
    tmp = get_page(p)
    if len(tmp):
        data += tmp
    else:
        break

In [117]:
data

[{'id': 1,
  'year': '2024',
  'location': 'Davis',
  'firstname': 'HEMANT',
  'lastname': 'BHARGAVA',
  'title': 'PROF-AY-B/E/E',
  'grosspay': '487,517.00',
  'basepay': '399,383.00',
  'overtimepay': '0.00',
  'adjustpay': '88,133.00'},
 {'id': 2,
  'year': '2024',
  'location': 'Davis',
  'firstname': 'AYAKO',
  'lastname': 'YASUDA',
  'title': 'PROF-AY-B/E/E',
  'grosspay': '448,884.00',
  'basepay': '364,158.00',
  'overtimepay': '0.00',
  'adjustpay': '84,726.00'},
 {'id': 3,
  'year': '2024',
  'location': 'Davis',
  'firstname': 'LORI',
  'lastname': 'LUBIN',
  'title': 'PROF-AY',
  'grosspay': '447,178.00',
  'basepay': '284,583.00',
  'overtimepay': '0.00',
  'adjustpay': '162,594.00'},
 {'id': 4,
  'year': '2024',
  'location': 'Davis',
  'firstname': 'HANS-GEORG',
  'lastname': 'MUELLER',
  'title': 'PROF-AY',
  'grosspay': '440,583.00',
  'basepay': '339,567.00',
  'overtimepay': '0.00',
  'adjustpay': '101,016.00'},
 {'id': 5,
  'year': '2024',
  'location': 'Davis',
  '

In [118]:
df = pd.DataFrame(data)

In [119]:
df.head()

Unnamed: 0,id,year,location,firstname,lastname,title,grosspay,basepay,overtimepay,adjustpay
0,1,2024,Davis,HEMANT,BHARGAVA,PROF-AY-B/E/E,487517.0,399383.0,0.0,88133.0
1,2,2024,Davis,AYAKO,YASUDA,PROF-AY-B/E/E,448884.0,364158.0,0.0,84726.0
2,3,2024,Davis,LORI,LUBIN,PROF-AY,447178.0,284583.0,0.0,162594.0
3,4,2024,Davis,HANS-GEORG,MUELLER,PROF-AY,440583.0,339567.0,0.0,101016.0
4,5,2024,Davis,WOLF,HEYER,PROF-AY,437168.0,320933.0,0.0,116234.0


In [14]:
# df = pd.DataFrame(response.json()['rows'])

In [15]:
# df['grosspay'] = df['grosspay'].str.replace(',', '').astype(float)

In [120]:
def change_format(number_str):
    return number_str.str.replace(',', '').astype(float)

In [121]:
df[['grosspay', 'basepay', 'adjustpay']] = df[['grosspay', 'basepay', 'adjustpay']].apply(change_format)

In [122]:
df.head()

Unnamed: 0,id,year,location,firstname,lastname,title,grosspay,basepay,overtimepay,adjustpay
0,1,2024,Davis,HEMANT,BHARGAVA,PROF-AY-B/E/E,487517.0,399383.0,0.0,88133.0
1,2,2024,Davis,AYAKO,YASUDA,PROF-AY-B/E/E,448884.0,364158.0,0.0,84726.0
2,3,2024,Davis,LORI,LUBIN,PROF-AY,447178.0,284583.0,0.0,162594.0
3,4,2024,Davis,HANS-GEORG,MUELLER,PROF-AY,440583.0,339567.0,0.0,101016.0
4,5,2024,Davis,WOLF,HEYER,PROF-AY,437168.0,320933.0,0.0,116234.0


In [19]:
df.sort_values(by=['basepay'], ascending=False)

Unnamed: 0,id,year,location,firstname,lastname,title,grosspay,basepay,overtimepay,adjustpay
65,6,2024,Davis,GEORGE,MANGUN,PROF-AY,433344.0,422233.0,0.00,11111.0
5,6,2024,Davis,GEORGE,MANGUN,PROF-AY,433344.0,422233.0,0.00,11111.0
0,1,2024,Davis,HEMANT,BHARGAVA,PROF-AY-B/E/E,487517.0,399383.0,0.00,88133.0
60,1,2024,Davis,HEMANT,BHARGAVA,PROF-AY-B/E/E,487517.0,399383.0,0.00,88133.0
71,12,2024,Davis,VIKRAM,AMAR,PROF-AY-LAW,410383.0,385383.0,0.00,25000.0
...,...,...,...,...,...,...,...,...,...,...
1115,1056,2024,Davis,LAURA,CAMMARISANO,ASST PROF-AY,9217.0,9217.0,0.00,0.0
1116,1057,2024,Davis,LIANG,CHEN,ASST PROF-AY,9217.0,9217.0,0.00,0.0
1117,1058,2024,Davis,GEMMA,ZHAO,ASST PROF-AY,9217.0,9217.0,0.00,0.0
1118,1059,2024,Davis,CAITLIN,PATLER,ASSOC ADJ PROF-AY,5000.0,0.0,0.00,5000.0


In [123]:
df.shape

(1120, 10)

In [124]:
def get_Salinfo(startSal, endSal, location):
    time.sleep(0.2)
    json_data = {
        "op":"search",
        "page":1,
        "rows":60,
        "sidx":"grosspay",
        "sord":"desc",
         "count":0, # limit to 20 pages
         "year":"2024",
         "firstname":"",
         "location":location,
         "lastname":"",
         "title":"",
         "startSal":startSal,
         "endSal":endSal
    }

    response = session.post('https://ucannualwage.ucop.edu/wage/search', headers=headers, json=json_data)
    try:
        response.raise_for_status()
        return(response.json()['records'])
    except:
        return []

In [125]:
get_Salinfo(0, 100_000, 'Davis')

38558

In [126]:
l1 = list(range(0, 400_000, 50_000))
l2 = list(range(400_000, 1_000_000, 100_000))
l1+l2

[0,
 50000,
 100000,
 150000,
 200000,
 250000,
 300000,
 350000,
 400000,
 500000,
 600000,
 700000,
 800000,
 900000]

In [127]:
list(zip(l1+l2, l1[1:]+l2+[4_000_000]))

[(0, 50000),
 (50000, 100000),
 (100000, 150000),
 (150000, 200000),
 (200000, 250000),
 (250000, 300000),
 (300000, 350000),
 (350000, 400000),
 (400000, 500000),
 (500000, 600000),
 (600000, 700000),
 (700000, 800000),
 (800000, 900000),
 (900000, 4000000)]

In [128]:
new_data = {l: [get_Salinfo(s1, s2, l) 
                for s1, s2 in zip(l1+l2, l1[1:]+l2+[4_000_000])]
            for l in ['Davis', 'Berkeley', 'Irvine', 'Los Angeles', 'Merced', 'Riverside', 'San Diego', 'San Francisco']}

In [129]:
new_data

{'Davis': [24543,
  14015,
  5314,
  4242,
  1815,
  691,
  343,
  204,
  253,
  116,
  40,
  25,
  13,
  15],
 'Berkeley': [25091,
  6104,
  3094,
  1377,
  576,
  326,
  193,
  127,
  131,
  44,
  12,
  2,
  1,
  4],
 'Irvine': [19680,
  9773,
  4543,
  2208,
  869,
  376,
  264,
  168,
  198,
  118,
  59,
  32,
  13,
  47],
 'Los Angeles': [35109,
  22281,
  8359,
  4355,
  2046,
  1059,
  664,
  482,
  613,
  365,
  223,
  121,
  61,
  129],
 'Merced': [4441, 974, 406, 165, 72, 25, 10, 11, 2, 0, 1, 0, 0, 0],
 'Riverside': [10273, 2445, 1054, 409, 207, 101, 61, 29, 22, 3, 1, 1, 0, 1],
 'San Diego': [27577,
  15008,
  6569,
  3434,
  1248,
  642,
  472,
  306,
  376,
  189,
  68,
  47,
  31,
  63],
 'San Francisco': [10516,
  12705,
  6987,
  4184,
  3818,
  1721,
  734,
  423,
  473,
  277,
  165,
  58,
  38,
  54]}

In [28]:
import itertools

In [130]:
df_sal = pd.DataFrame(new_data, index = itertools.chain(range(0, 400_000, 50_000), range(400_000, 1_000_000, 100_000)))

In [131]:
df_sal

Unnamed: 0,Davis,Berkeley,Irvine,Los Angeles,Merced,Riverside,San Diego,San Francisco
0,24543,25091,19680,35109,4441,10273,27577,10516
50000,14015,6104,9773,22281,974,2445,15008,12705
100000,5314,3094,4543,8359,406,1054,6569,6987
150000,4242,1377,2208,4355,165,409,3434,4184
200000,1815,576,869,2046,72,207,1248,3818
250000,691,326,376,1059,25,101,642,1721
300000,343,193,264,664,10,61,472,734
350000,204,127,168,482,11,29,306,423
400000,253,131,198,613,2,22,376,473
500000,116,44,118,365,0,3,189,277


In [132]:
df_sal.loc['SUM',:] = df_sal.sum()

In [133]:
df_sal

Unnamed: 0,Davis,Berkeley,Irvine,Los Angeles,Merced,Riverside,San Diego,San Francisco
0,24543.0,25091.0,19680.0,35109.0,4441.0,10273.0,27577.0,10516.0
50000,14015.0,6104.0,9773.0,22281.0,974.0,2445.0,15008.0,12705.0
100000,5314.0,3094.0,4543.0,8359.0,406.0,1054.0,6569.0,6987.0
150000,4242.0,1377.0,2208.0,4355.0,165.0,409.0,3434.0,4184.0
200000,1815.0,576.0,869.0,2046.0,72.0,207.0,1248.0,3818.0
250000,691.0,326.0,376.0,1059.0,25.0,101.0,642.0,1721.0
300000,343.0,193.0,264.0,664.0,10.0,61.0,472.0,734.0
350000,204.0,127.0,168.0,482.0,11.0,29.0,306.0,423.0
400000,253.0,131.0,198.0,613.0,2.0,22.0,376.0,473.0
500000,116.0,44.0,118.0,365.0,0.0,3.0,189.0,277.0


In [134]:
df_rel = df_sal.apply(lambda x: round(100*x/x['SUM'],1))

In [135]:
df_rel

Unnamed: 0,Davis,Berkeley,Irvine,Los Angeles,Merced,Riverside,San Diego,San Francisco
0,47.5,67.7,51.3,46.3,72.7,70.3,49.2,24.9
50000,27.1,16.5,25.5,29.4,15.9,16.7,26.8,30.1
100000,10.3,8.3,11.8,11.0,6.6,7.2,11.7,16.6
150000,8.2,3.7,5.8,5.7,2.7,2.8,6.1,9.9
200000,3.5,1.6,2.3,2.7,1.2,1.4,2.2,9.1
250000,1.3,0.9,1.0,1.4,0.4,0.7,1.1,4.1
300000,0.7,0.5,0.7,0.9,0.2,0.4,0.8,1.7
350000,0.4,0.3,0.4,0.6,0.2,0.2,0.5,1.0
400000,0.5,0.4,0.5,0.8,0.0,0.2,0.7,1.1
500000,0.2,0.1,0.3,0.5,0.0,0.0,0.3,0.7


### HEALTH INSPECTIONS

In [210]:
!pip install requests_cache



Check the [docs](https://requests.readthedocs.io/en/latest/api/?highlight=post#requests.post) for `requests`!

#### GET RESTAURANT VISITS BY NAME

No API is involved, we are just adding the name to the URL here.

https://inspections.myhealthdepartment.com/yolocountyeh/search?searchStr=Manna

In [211]:
import requests_cache
requests_cache.install_cache("../output/lecture10a")

In [212]:
url = 'https://inspections.myhealthdepartment.com/yolocountyeh/search'

In [213]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
result = requests.get(url, headers = headers, params = {
    'searchStr': "Manna"
})
result.raise_for_status()

In [214]:
result.url

'https://inspections.myhealthdepartment.com/yolocountyeh/search?searchStr=Manna'

In [215]:
result.content

b'<!doctype html>\n<html lang="en" data-wf-page="5c477d7234329c8ae83693a0" data-wf-site="5c477d7234329c39c036939f">\n<head>\n\t<meta charset="utf-8">\n\t<title>Search | Yolo County California | My Health Department</title>\n\t<meta content="width=device-width, initial-scale=1" name="viewport">\n\t<meta content="Webflow" name="generator">\n\t<!-- Sentry Integration -->\n\t<script\n\t\tsrc="https://d1x0oaju7ljwc0.cloudfront.net/sentry/bundle.tracing.replay.feedback.min.js"\n\t\tintegrity="sha384-Yy2UXIFrWRfe56w1BuJ8/pgltHwWyYP4Q7dYKueJ/c6RG8B/bPJmGv+TBTQSuTSv"\n\t\tcrossorigin="anonymous"\n\t></script>\n\t<script src="/js/sentry.js"></script>\n\t<!-- include Flatpickr JS and CSS for datepicker-->\n\t<link href="css/vendor/flatpickr/flatpickr.css" rel="stylesheet" type="text/css">\n\t<script src="js/vendor/flatpickr/flatpickr.js" type="text/javascript"></script>\n\t<link href="css/myhd-full.min.css" rel="stylesheet" type="text/css">\n\t<script src="https://use.typekit.net/ksy3qdn.js" type

#### GET ALL INSPECTIONS OF THIS MONTH

In this case, there is an undocumented API in the background!

Okay, but how to find an undocumented API?

A __step by step__ guide:
- Open the [website](https://inspections.myhealthdepartment.com/yolocountyeh/).
- Inspect the page either by `right-click + Inspect` or using `cmd + i`.
- Click on the tab `Network`.
- Interact with the page to enforce the API request (e.g. Change the date).
- (Optional): Filter the urls.
- Look for a POST/GET method whose return is, ideally, in `json`.
- Click on the element.
- On the right-hand side: select the field `response` to see what is returned.
- On the right-hand side: select the field `request` to see what how to access it.
- Add these parameters to your request in Python.

In [216]:
{"data":{"path":"yolocountyeh","programName":"","filters":{"date":"2026-01-01 to 2026-02-05","purpose":"Routine"},"start":0,"count":20,"searchStr":"","lat":0,"lng":0,"sort":{}},"task":"searchInspections"}

{'data': {'path': 'yolocountyeh',
  'programName': '',
  'filters': {'date': '2026-01-01 to 2026-02-05', 'purpose': 'Routine'},
  'start': 0,
  'count': 20,
  'searchStr': '',
  'lat': 0,
  'lng': 0,
  'sort': {}},
 'task': 'searchInspections'}

In [217]:
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:147.0) Gecko/20100101 Firefox/147.0'
}

json_data = {
    'data': {
        'path': 'yolocountyeh',
        'programName': '',
        'filters': {
            'date': '2026-01-01 to 2026-02-05',
            'purpose': 'Routine',
        },
        'start': 0,
        'count': 20,
        'searchStr': '',
        'lat': 0,
        'lng': 0,
        'sort': {},
    },
    'task': 'searchInspections',
}

response = requests.post('https://inspections.myhealthdepartment.com/', 
                         headers=headers, json=json_data)

In [218]:
response.raise_for_status()

In [219]:
response.json()

[{'nick': 'ycc',
  'inspectionID': 'F97D72A2-3C4A-4CDF-A1F6-38534A68C852',
  'inspectionDate': '2026-02-03T00:00:00.000Z',
  'score': None,
  'inspectionType': 'Rec Health Routine Inspection',
  'comments': 'No violation observed at the time of inspection.',
  'StartTime': '1899-12-30T14:00:00.000Z',
  'establishmentName': 'LA SALLE APTS',
  'addressLine1': '880 Alvarado Ave ',
  'addressLine2': '',
  'city': 'Davis',
  'state': 'CA',
  'zip': '95616-0677',
  'permitID': 'CA773C21-0442-4B86-B888-058C7576C648',
  'progIdent': None,
  'cfg_permit_typeID': 'E05DC677-7441-496C-9D7C-6E4AF8D8CC6C',
  'INSP_PURPOSEID': 'Routine',
  'permitType': 'PUBLIC SWIMMING POOL OR SPA - YEAR PERMIT',
  'programName': 'Recreational Health'},
 {'nick': 'ycc',
  'inspectionID': '4716B6A1-6D94-408C-9367-0255EA59CA3F',
  'inspectionDate': '2026-02-03T00:00:00.000Z',
  'score': None,
  'inspectionType': 'Retail Food Routine Inspection',
  'comments': '',
  'StartTime': '1899-12-30T13:00:00.000Z',
  'establish

##### INSERTION (do not executed)

https://curlconverter.com/

In [45]:
with open("../keys/cookies.json", "r") as json_file:
    cookies = json.load(json_file)

In [46]:
type(cookies)

dict

In [47]:
len(cookies)

3

##### INSERTION END

In [220]:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:147.0) Gecko/20100101 Firefox/147.0',
}

json_data = {
    'data': {
        'path': 'yolocountyeh',
        'programName': '',
        'filters': {
            'date': '2026-01-01 to 2026-02-05',
            'purpose': 'Routine',
        },
        'start': 26,
        'count': 40,
        'searchStr': '',
        'lat': 0,
        'lng': 0,
        'sort': {},
    },
    'task': 'searchInspections',
}

response = requests.post('https://inspections.myhealthdepartment.com/', 
#                         cookies=cookies, 
                         headers=headers, 
                         json=json_data)

In [221]:
response.raise_for_status()

In [222]:
data = response.json()

In [223]:
data

[{'nick': 'ycc',
  'inspectionID': '3E429B50-7421-4E89-A706-CA67745674EF',
  'inspectionDate': '2026-01-30T00:00:00.000Z',
  'score': None,
  'inspectionType': 'Retail Food Routine Inspection',
  'comments': 'No violation observed at the time of inspection.',
  'StartTime': '1899-12-30T11:00:00.000Z',
  'establishmentName': "IKE'S LOVE & SANDWICHES",
  'addressLine1': '212 F St Ste B ',
  'addressLine2': '',
  'city': 'Davis',
  'state': 'CA',
  'zip': '95616-4592',
  'permitID': '268502A9-3BF8-47CD-8E98-A07BD8648215',
  'progIdent': None,
  'cfg_permit_typeID': '904A5E18-0946-499C-8DF3-B8C23EBC3CB8',
  'INSP_PURPOSEID': 'Routine',
  'permitType': 'Restaurant 26-49 seats, RC 2',
  'programName': 'Food'},
 {'nick': 'ycc',
  'inspectionID': '6E2C2AA3-4385-4B28-A4FD-FB889EC607D7',
  'inspectionDate': '2026-01-30T00:00:00.000Z',
  'score': None,
  'inspectionType': 'Retail Food Routine Inspection',
  'comments': 'PIC: Damien & Makayla',
  'StartTime': '1899-12-30T13:00:00.000Z',
  'establi

In [224]:
df = pd.DataFrame(data)

In [225]:
df.head()

Unnamed: 0,nick,inspectionID,inspectionDate,score,inspectionType,comments,StartTime,establishmentName,addressLine1,addressLine2,city,state,zip,permitID,progIdent,cfg_permit_typeID,INSP_PURPOSEID,permitType,programName
0,ycc,3E429B50-7421-4E89-A706-CA67745674EF,2026-01-30T00:00:00.000Z,,Retail Food Routine Inspection,No violation observed at the time of inspection.,1899-12-30T11:00:00.000Z,IKE'S LOVE & SANDWICHES,212 F St Ste B,,Davis,CA,95616-4592,268502A9-3BF8-47CD-8E98-A07BD8648215,,904A5E18-0946-499C-8DF3-B8C23EBC3CB8,Routine,"Restaurant 26-49 seats, RC 2",Food
1,ycc,6E2C2AA3-4385-4B28-A4FD-FB889EC607D7,2026-01-30T00:00:00.000Z,,Retail Food Routine Inspection,PIC: Damien & Makayla,1899-12-30T13:00:00.000Z,DOMINO'S PIZZA #8696,4120 Chiles Rd,,Davis,CA,95618-6096,AE638505-EC25-4D2B-8AF5-226306D80E57,,C94A3B79-8277-4B03-8C28-99C7F1C9E5F7,Routine,RESTAURANT OVER 650 SQ FT - YEAR PERMIT,Food
2,ycc,49D5D31C-8B7A-4C69-9B2F-7CE7E1B93A4D,2026-01-30T00:00:00.000Z,,Retail Food Routine Inspection,No violtion observed at the time of inspection.,1899-12-30T12:00:00.000Z,BLAZE FAST FIRE'D PIZZA,212 F St Ste A,,Davis,CA,95616-4592,FB0544B9-1574-45F4-9CDC-5E6B29F36463,,904A5E18-0946-499C-8DF3-B8C23EBC3CB8,Routine,"Restaurant 26-49 seats, RC 2",Food
3,ycc,7D1EAA3A-CD2F-4B2F-B8F5-38942865D1E3,2026-01-30T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T10:15:00.000Z,BROS LIQUOR 1,1964 E 8th St,,Davis,CA,95616-2504,37A1EE0F-2E5E-4B4D-B3BA-B2A95BD3438B,,2939927A-FB66-4C15-9334-C4ED4FE1AE1F,Routine,"Retail Food Markets less than 2,000 square fee...",Food
4,ycc,80288AB9-CE88-4909-AFD8-8CFDBEC6FCC8,2026-01-30T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T09:00:00.000Z,BEST WESTERN UNIVERSITY LODGE - FOOD,123 B St,,Davis,CA,95616-4636,5935F280-4BDB-4732-A6E8-8235A4182363,,E24FBE78-E62B-4246-8A3C-F32C304C5D42,Routine,"Restaurant 0-25 seats, RC 2",Food


In [226]:
df.shape

(25, 19)

These are the results for the first page. So how to get all results from all pages?

Visit the [page](https://inspections.myhealthdepartment.com/yolocountyeh) again.

In [229]:
def get_page(page_nr):

    json_data = {
        'data': {
            'path': 'yolocountyeh',
            'programName': '',
            'filters': {
                'date': '2025-12-01 to 2026-02-05',
                'purpose': 'Routine',
            },
            'start': page_nr,
            'count': 25,
            'searchStr': '',
            'lat': 0,
            'lng': 0,
            'sort': {},
        },
        'task': 'searchInspections',
    }

    response = requests.post('https://inspections.myhealthdepartment.com/', 
                         cookies=cookies, headers=headers, json=json_data)
    response.raise_for_status()
    tmp = response.json()
    if isinstance(tmp, list):
        return tmp
    else:
        return []

In [230]:
import time

In [231]:
get_page(150)

[{'nick': 'ycc',
  'inspectionID': '0149B1FC-2E25-437E-B068-0DA23DBE92D6',
  'inspectionDate': '2026-01-07T00:00:00.000Z',
  'score': 0,
  'inspectionType': 'Retail Food Routine Inspection',
  'comments': "A routine inspection was completed. Two major violations were observed during today's inspection which resulted in a conditional pass. A follow-up inspection will be completed within two days to ensure all violations have been corrected. There are electrical issues causing the circuit breaker to trip when multiple cooking equipment is used simultaneously, this needs to be addressed. \n\nThe time indicated on the report includes inspection, report writing, reconciliation, and travel. ",
  'StartTime': '1899-12-30T13:00:00.000Z',
  'establishmentName': 'EL SINALOENSE',
  'addressLine1': '374 California St ',
  'addressLine2': '',
  'city': 'Woodland',
  'state': 'CA',
  'zip': '95695-2996',
  'permitID': '95B62C19-A3F9-4095-ACB9-B7506FDFB734',
  'progIdent': None,
  'cfg_permit_typeID'

In [232]:
data = []
for i in range(20):
    time.sleep(1) # SLOW DOWN the process!
    tmp = get_page(25*i)
    if not len(tmp):
        print('Stopped at page ' + str(i))
        break
    data += tmp

Stopped at page 9


In [233]:
data

[{'nick': 'ycc',
  'inspectionID': 'F97D72A2-3C4A-4CDF-A1F6-38534A68C852',
  'inspectionDate': '2026-02-03T00:00:00.000Z',
  'score': None,
  'inspectionType': 'Rec Health Routine Inspection',
  'comments': 'No violation observed at the time of inspection.',
  'StartTime': '1899-12-30T14:00:00.000Z',
  'establishmentName': 'LA SALLE APTS',
  'addressLine1': '880 Alvarado Ave ',
  'addressLine2': '',
  'city': 'Davis',
  'state': 'CA',
  'zip': '95616-0677',
  'permitID': 'CA773C21-0442-4B86-B888-058C7576C648',
  'progIdent': None,
  'cfg_permit_typeID': 'E05DC677-7441-496C-9D7C-6E4AF8D8CC6C',
  'INSP_PURPOSEID': 'Routine',
  'permitType': 'PUBLIC SWIMMING POOL OR SPA - YEAR PERMIT',
  'programName': 'Recreational Health'},
 {'nick': 'ycc',
  'inspectionID': '4716B6A1-6D94-408C-9367-0255EA59CA3F',
  'inspectionDate': '2026-02-03T00:00:00.000Z',
  'score': None,
  'inspectionType': 'Retail Food Routine Inspection',
  'comments': '',
  'StartTime': '1899-12-30T13:00:00.000Z',
  'establish

In [234]:
type(data)

list

In [235]:
df = pd.DataFrame(data)
df

Unnamed: 0,nick,inspectionID,inspectionDate,score,inspectionType,comments,StartTime,establishmentName,addressLine1,addressLine2,city,state,zip,permitID,progIdent,cfg_permit_typeID,INSP_PURPOSEID,permitType,programName
0,ycc,F97D72A2-3C4A-4CDF-A1F6-38534A68C852,2026-02-03T00:00:00.000Z,,Rec Health Routine Inspection,No violation observed at the time of inspection.,1899-12-30T14:00:00.000Z,LA SALLE APTS,880 Alvarado Ave,,Davis,CA,95616-0677,CA773C21-0442-4B86-B888-058C7576C648,,E05DC677-7441-496C-9D7C-6E4AF8D8CC6C,Routine,PUBLIC SWIMMING POOL OR SPA - YEAR PERMIT,Recreational Health
1,ycc,4716B6A1-6D94-408C-9367-0255EA59CA3F,2026-02-03T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T13:00:00.000Z,FOUR TEASONS COFFEE AND TEA,620 W Covell Blvd Ste A,,Davis,CA,95616-1081,160A773D-78C1-44CC-A71C-198985832679,,E24FBE78-E62B-4246-8A3C-F32C304C5D42,Routine,"Restaurant 0-25 seats, RC 2",Food
2,ycc,751B7BF5-8AF6-4B90-B17E-D5EF65826A4C,2026-02-03T00:00:00.000Z,0.0,Retail Food Routine Inspection,A routine inspection was completed. The time i...,1899-12-30T10:00:00.000Z,PUPUSERIA LA CHICANA,9 Main St Ste 123,,Woodland,CA,95695-3177,CF5ECEAC-B65A-4D58-BDEF-ACF19D2B8195,,504AD2F6-5E0B-4DED-A2AB-217132E64D80,Routine,"Restaurant 0-25 seats, RC 3",Food
3,ycc,AD3EE630-C16D-4AF4-95A8-EAA2B0DEB839,2026-02-03T00:00:00.000Z,,Rec Health Routine Inspection,Note - \n\nSpa has a auto-fil system.,1899-12-30T11:15:00.000Z,EL MACERO APARTMENTS,4735 Cowell Blvd,,Davis,CA,95618-4461,EFCE0F71-2A7A-4483-92E3-145B15B30270,,A668F0F3-043F-4345-92CA-3A9A9F4EEB16,Routine,ADDITIONAL POOL OR SPA - YEAR PERMIT,Recreational Health
4,ycc,7757D162-F4CB-4D1A-B50B-AC97CA3E11EE,2026-02-03T00:00:00.000Z,,Rec Health Routine Inspection,,1899-12-30T10:45:00.000Z,EL MACERO APARTMENTS,4735 Cowell Blvd,,Davis,CA,95618-4461,19D6529D-BF3C-43BD-9175-70907F657599,,E05DC677-7441-496C-9D7C-6E4AF8D8CC6C,Routine,PUBLIC SWIMMING POOL OR SPA - YEAR PERMIT,Recreational Health
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
220,ycc,9E87A57A-C13B-4752-BC52-508F78635949,2025-12-04T00:00:00.000Z,,Body Art Routine Inspection,- Stericycle for sharps waste disposal. Record...,1899-12-30T13:00:00.000Z,SACRED TIGER,735 H St # B,,Davis,CA,95616-3763,FAC819EF-682D-4ACF-9B14-779D7166FE98,,E65D8EB6-5D05-48A8-9640-018830303671,Routine,BODY ART FACILITY PERMIT,Body Art
221,ycc,9B1F79F1-63EF-4194-A775-DFFD190ACC9D,2025-12-03T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T13:00:00.000Z,TIM'S HAWAIIAN BBQ,247 3rd St,,Davis,CA,95616-4524,E0BA8259-7EC1-44C4-8943-D005CEC7FB53,,904A5E18-0946-499C-8DF3-B8C23EBC3CB8,Routine,"Restaurant 26-49 seats, RC 2",Food
222,ycc,9122F376-4585-4C05-B2CC-24E6633653AD,2025-12-03T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T11:30:00.000Z,3RD & U CAFE,223 3rd St,,Davis,CA,95616-4501,F5317037-F065-4160-8363-F89D4512D5E1,,153791A7-6FFF-4287-9D24-80E1D3AAD718,Routine,"Restaurant 50-149 seats, RC 2",Food
223,ycc,AEDBA9D8-8F36-42EA-8CEF-01C0D24D604C,2025-12-03T00:00:00.000Z,0.0,Retail Food Routine Inspection,Flooring in the rear food preparation area is ...,1899-12-30T13:00:00.000Z,STEVE'S PIZZA,714 Main St,,Woodland,CA,95695-3407,04464A18-650A-4B19-ABC5-C4D472A77E23,,C94A3B79-8277-4B03-8C28-99C7F1C9E5F7,Routine,RESTAURANT OVER 650 SQ FT - YEAR PERMIT,Food


In [236]:
df.describe(include=['object'])

Unnamed: 0,nick,inspectionID,inspectionDate,inspectionType,comments,StartTime,establishmentName,addressLine1,addressLine2,city,state,zip,permitID,progIdent,cfg_permit_typeID,INSP_PURPOSEID,permitType,programName
count,225,225,225,225,225.0,225,225,225,224.0,225,225,225,225,14.0,225,225,225,225
unique,1,225,36,4,94.0,37,211,205,5.0,17,1,163,225,1.0,40,1,40,4
top,ycc,F97D72A2-3C4A-4CDF-A1F6-38534A68C852,2026-01-21T00:00:00.000Z,Retail Food Routine Inspection,,1899-12-30T12:00:00.000Z,LA SALLE APTS,880 Alvarado Ave,,Davis,CA,95616,CA773C21-0442-4B86-B888-058C7576C648,,C94A3B79-8277-4B03-8C28-99C7F1C9E5F7,Routine,RESTAURANT OVER 650 SQ FT - YEAR PERMIT,Food
freq,225,1,18,190,107.0,38,2,2,220.0,99,225,8,1,14.0,34,225,34,193


In [237]:
df[df['city'] == 'Davis']

Unnamed: 0,nick,inspectionID,inspectionDate,score,inspectionType,comments,StartTime,establishmentName,addressLine1,addressLine2,city,state,zip,permitID,progIdent,cfg_permit_typeID,INSP_PURPOSEID,permitType,programName
0,ycc,F97D72A2-3C4A-4CDF-A1F6-38534A68C852,2026-02-03T00:00:00.000Z,,Rec Health Routine Inspection,No violation observed at the time of inspection.,1899-12-30T14:00:00.000Z,LA SALLE APTS,880 Alvarado Ave,,Davis,CA,95616-0677,CA773C21-0442-4B86-B888-058C7576C648,,E05DC677-7441-496C-9D7C-6E4AF8D8CC6C,Routine,PUBLIC SWIMMING POOL OR SPA - YEAR PERMIT,Recreational Health
1,ycc,4716B6A1-6D94-408C-9367-0255EA59CA3F,2026-02-03T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T13:00:00.000Z,FOUR TEASONS COFFEE AND TEA,620 W Covell Blvd Ste A,,Davis,CA,95616-1081,160A773D-78C1-44CC-A71C-198985832679,,E24FBE78-E62B-4246-8A3C-F32C304C5D42,Routine,"Restaurant 0-25 seats, RC 2",Food
3,ycc,AD3EE630-C16D-4AF4-95A8-EAA2B0DEB839,2026-02-03T00:00:00.000Z,,Rec Health Routine Inspection,Note - \n\nSpa has a auto-fil system.,1899-12-30T11:15:00.000Z,EL MACERO APARTMENTS,4735 Cowell Blvd,,Davis,CA,95618-4461,EFCE0F71-2A7A-4483-92E3-145B15B30270,,A668F0F3-043F-4345-92CA-3A9A9F4EEB16,Routine,ADDITIONAL POOL OR SPA - YEAR PERMIT,Recreational Health
4,ycc,7757D162-F4CB-4D1A-B50B-AC97CA3E11EE,2026-02-03T00:00:00.000Z,,Rec Health Routine Inspection,,1899-12-30T10:45:00.000Z,EL MACERO APARTMENTS,4735 Cowell Blvd,,Davis,CA,95618-4461,19D6529D-BF3C-43BD-9175-70907F657599,,E05DC677-7441-496C-9D7C-6E4AF8D8CC6C,Routine,PUBLIC SWIMMING POOL OR SPA - YEAR PERMIT,Recreational Health
5,ycc,767B51F0-10CC-4ABF-BA4B-14A1D5A9667C,2026-02-03T00:00:00.000Z,,Retail Food Routine Inspection,bk17484@ghaimanagement.com,1899-12-30T09:30:00.000Z,BURGER KING #17484,2026 Lyndell Ter,,Davis,CA,95616-6203,E0E159B7-607B-44CE-9D15-AC182BB291C3,,904A5E18-0946-499C-8DF3-B8C23EBC3CB8,Routine,"Restaurant 26-49 seats, RC 2",Food
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
216,ycc,FB7F93B2-F824-4960-BA15-DCE21C7D3017,2025-12-04T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T10:45:00.000Z,FOUR SEASONS GOURMET,1601 Research Park Dr,,Davis,CA,95618-6159,80CD1675-CB92-4ED3-965C-3099FFAE1EE5,,504AD2F6-5E0B-4DED-A2AB-217132E64D80,Routine,"Restaurant 0-25 seats, RC 3",Food
217,ycc,7DFF5355-5CD8-41B0-9E1E-DC9EA01BA256,2025-12-04T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T12:15:00.000Z,FAST & EASY MART #33 - FOOD,1601 Research Park Dr,,Davis,CA,95618-6159,7DCDBDB5-DF7E-44FE-81CB-BCAF77A53B04,,2939927A-FB66-4C15-9334-C4ED4FE1AE1F,Routine,"Retail Food Markets less than 2,000 square fee...",Food
220,ycc,9E87A57A-C13B-4752-BC52-508F78635949,2025-12-04T00:00:00.000Z,,Body Art Routine Inspection,- Stericycle for sharps waste disposal. Record...,1899-12-30T13:00:00.000Z,SACRED TIGER,735 H St # B,,Davis,CA,95616-3763,FAC819EF-682D-4ACF-9B14-779D7166FE98,,E65D8EB6-5D05-48A8-9640-018830303671,Routine,BODY ART FACILITY PERMIT,Body Art
221,ycc,9B1F79F1-63EF-4194-A775-DFFD190ACC9D,2025-12-03T00:00:00.000Z,,Retail Food Routine Inspection,,1899-12-30T13:00:00.000Z,TIM'S HAWAIIAN BBQ,247 3rd St,,Davis,CA,95616-4524,E0BA8259-7EC1-44C4-8943-D005CEC7FB53,,904A5E18-0946-499C-8DF3-B8C23EBC3CB8,Routine,"Restaurant 26-49 seats, RC 2",Food


In [238]:
df.shape

(225, 19)

#### GET ALL INSPECTION FOR ONE SPECIFIC RESTAURANT

Let's investigate this further:

Get all inspections for one restaurant.

In [239]:
def get_rest_visits(permitId):
    
    response = requests.get(
        'https://inspections.myhealthdepartment.com/yolocountyeh/permit',
        params={
            'permitID': permitId   
        },
        headers=headers
    )
    response.raise_for_status()
    return(response.content)

In [240]:
df['permitID'][0]

'CA773C21-0442-4B86-B888-058C7576C648'

In [241]:
page = get_rest_visits(df['permitID'][0])

In [242]:
page

b'<!doctype html>\n<html lang="en" data-wf-page="5c477d7234329c8ae83693a0" data-wf-site="5c477d7234329c39c036939f">\n<head>\n\t<meta charset="utf-8">\n\t<title>Permit | Yolo County California | My Health Department</title>\n\t<meta content="width=device-width, initial-scale=1" name="viewport">\n\t<meta content="Webflow" name="generator">\n\t<!-- Sentry Integration -->\n\t<script\n\t\tsrc="https://d1x0oaju7ljwc0.cloudfront.net/sentry/bundle.tracing.replay.feedback.min.js"\n\t\tintegrity="sha384-Yy2UXIFrWRfe56w1BuJ8/pgltHwWyYP4Q7dYKueJ/c6RG8B/bPJmGv+TBTQSuTSv"\n\t\tcrossorigin="anonymous"\n\t></script>\n\t<script src="/js/sentry.js"></script>\n\t<!-- include Flatpickr JS and CSS for datepicker-->\n\t<link href="css/vendor/flatpickr/flatpickr.css" rel="stylesheet" type="text/css">\n\t<script src="js/vendor/flatpickr/flatpickr.js" type="text/javascript"></script>\n\t<link href="css/myhd-full.min.css" rel="stylesheet" type="text/css">\n\t<script src="https://use.typekit.net/ksy3qdn.js" type

In [243]:
import lxml.html as lx

html = lx.fromstring(page)
html

<Element html at 0x16817e850>

In [244]:
links = html.xpath("//a[contains(@class, 'inspection-listing-score w-button')]/@href")

In [245]:
for l in links:
    print(l)

/yolocountyeh/inspection/?inspectionID=F97D72A2-3C4A-4CDF-A1F6-38534A68C852
/yolocountyeh/inspection/?inspectionID=D49C5D2D-9823-48D6-B06E-EF8F26E2476A


In [246]:
base_url = "https://inspections.myhealthdepartment.com"

In [247]:
response = requests.get(base_url + links[0], headers=headers)
response.raise_for_status()
html = lx.fromstring(response.content)

In [248]:
response.url

'https://inspections.myhealthdepartment.com/yolocountyeh/inspection/?inspectionID=F97D72A2-3C4A-4CDF-A1F6-38534A68C852'

In [253]:
score_btn = html.xpath("//span[contains(@class, 'inspection-score-v2 w-button')]/span/strong")[0]

In [256]:
score = score_btn.text

Now, let's put everything together into one function.

In [257]:
def get_score_from_inspection(inspectionID):
    response = requests.get(
        'https://inspections.myhealthdepartment.com/yolocountyeh/inspection/',
        params={
            'inspectionID': inspectionID   
        },
        headers=headers
    )
    response.raise_for_status()
    html = lx.fromstring(response.content)
    score_btn = html.xpath("//span[contains(@class, 'inspection-score-v2 w-button')]/span/strong")[0]
    score = int(score_btn.text)
    
    return(score)

In [258]:
get_score_from_inspection("BEA0E8F3-4D2D-41A5-9470-FAE452250500")

0

In [259]:
import re
def extract_inspectionID(link):
    match = re.search(r'=(.*)', link)
    if match:
        return(match.group(1))
    else:
        return None

In [260]:
extract_inspectionID(links[0])

'F97D72A2-3C4A-4CDF-A1F6-38534A68C852'

In [261]:
get_score_from_inspection("BEA0E8F3-4D2D-41A5-9470-FAE452250500")

0

In [262]:
base_url = "https://inspections.myhealthdepartment.com/tennessee"

In [263]:
def get_page(page_nr, county_name = "tennessee"):

    json_data = {
        'data': {
            'path': county_name,
            'programName': '',
            'filters': {
                'date': '2026-01-01 to 2026-01-25',
                'purpose': '',
            },
            'start': page_nr,
            'count': 25,
            'searchStr': '',
            'lat': 0,
            'lng': 0,
            'sort': {},
        },
        'task': 'searchInspections',
    }

    response = requests.post('https://inspections.myhealthdepartment.com/', 
                         cookies=cookies, 
                        headers=headers, json=json_data)
    response.raise_for_status()
    result = response.json()
    if isinstance(result, list):
        return result
    else:
        return None     

In [264]:
get_page(1)

[{'nick': 'stdh',
  'inspectionID': '8BA844A0-25BD-4C48-8E82-D1DDB65D4695',
  'inspectionDate': '2026-01-23T00:00:00.000Z',
  'recommendation': '',
  'score': 100,
  'inspectionType': 'Food Service Establishment Inspection',
  'purpose': 'Routine',
  'comments': 'Approved mobile to operate. Will send copy of permit. Provided food safety fact sheets for op\n\nDiscussed proper hand washing, ware washing, food source, holding and cook temps, cooling when applicable, employee hygiene, employee health, demonstration of knowledge, and storage and use of toxic items. Food Establishment Regulations can be found at https://publications.tnsosfiles.com/rules/1200/1200-23/1200-23-01.20150716.pdf. Please be sure you and all food handling employees are familiar with these regulations. Guidance/Educational documents can be found at https://www.tn.gov/health/health-program-areas/eh/eh-foodlaw.html. These are an excellent resource to help reduce the risk of a foodborne illness. If you have any question

In [265]:
data = []
for i in range(5):
    time.sleep(1) # SLOW DOWN the process!
    tmp = get_page(25*i)
    if not tmp:
        print('Stopped at page ' + str(i))
        break
    data += tmp

In [266]:
data

[{'nick': 'stdh',
  'inspectionID': 'F14D9D7C-6040-4579-ACCB-864AD185037F',
  'inspectionDate': '2026-01-23T00:00:00.000Z',
  'recommendation': '',
  'score': 100,
  'inspectionType': 'Public Swimming Pools',
  'purpose': 'Routine',
  'comments': '',
  'timein': '1899-12-30T09:00:00.000Z',
  'establishmentName': 'Hamilton Family YMCA',
  'addressLine1': '7430 Shallowford Rd.',
  'addressLine2': '',
  'city': 'Chattanooga',
  'state': 'TN',
  'zip': '37421',
  'scoreDisplay': '',
  'permitID': 'F6556C9D-1982-419B-B96E-164D9F9831B7',
  'permitType': 'Type A- General public and institutional pools',
  'programName': 'Public Swimming Pool',
  'programCode': '690'},
 {'nick': 'stdh',
  'inspectionID': '8BA844A0-25BD-4C48-8E82-D1DDB65D4695',
  'inspectionDate': '2026-01-23T00:00:00.000Z',
  'recommendation': '',
  'score': 100,
  'inspectionType': 'Food Service Establishment Inspection',
  'purpose': 'Routine',
  'comments': 'Approved mobile to operate. Will send copy of permit. Provided foo

In [267]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,nick,inspectionID,inspectionDate,recommendation,score,inspectionType,purpose,comments,timein,establishmentName,addressLine1,addressLine2,city,state,zip,scoreDisplay,permitID,permitType,programName,programCode
0,stdh,F14D9D7C-6040-4579-ACCB-864AD185037F,2026-01-23T00:00:00.000Z,,100.0,Public Swimming Pools,Routine,,1899-12-30T09:00:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F6556C9D-1982-419B-B96E-164D9F9831B7,Type A- General public and institutional pools,Public Swimming Pool,690
1,stdh,8BA844A0-25BD-4C48-8E82-D1DDB65D4695,2026-01-23T00:00:00.000Z,,100.0,Food Service Establishment Inspection,Routine,Approved mobile to operate. Will send copy of ...,1899-12-30T09:45:00.000Z,La Media Naranja LLC #2 Mobile FSE,7826 Santos Dr,,Murfreesboro,TN,37129,,53A4904B-BFF6-4E5F-BF43-79253A7BD020,Commercial Food <51 (Mobile),Food Service Establishment,605
2,stdh,C47AF1AD-1C7B-4AC4-9C4A-2D320A035257,2026-01-23T00:00:00.000Z,,100.0,Public Swimming Pools,Routine,,1899-12-30T09:10:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F9B53208-596A-40AB-AB59-D953C102B970,"Type D- Whirlpools, hot tubs",Public Swimming Pool,690
3,stdh,292318AF-BD49-4DE0-B891-E5E42590424E,2026-01-23T00:00:00.000Z,,100.0,Tattoo Studios Inspection,Routine,Establishment is very clean and organized!,1899-12-30T13:00:00.000Z,Got Ink? #615,8204 Florence Rd,,Smyrna,TN,37167,,F31C828F-71DC-4DDA-A56B-895DA3DFE22C,Tattoo Studios,Tattoo Studios,665
4,stdh,78B4C99F-3EF0-4538-9A6C-EFF04954352D,2026-01-23T00:00:00.000Z,,97.0,Food Service Establishment Inspection,Routine,,1899-12-30T14:00:00.000Z,P.F. Chang's,259 Opry Mills Dr,,Nashville,TN,37214,,C4D6C57D-D19D-47E2-A880-6D6715C6E716,Commercial Food 51+,Food Service Establishment,605


In [268]:
df.shape

(125, 20)

In [269]:
df.describe(include=['object'])

Unnamed: 0,nick,inspectionID,inspectionDate,recommendation,inspectionType,purpose,comments,timein,establishmentName,addressLine1,addressLine2,city,state,zip,scoreDisplay,permitID,permitType,programName,programCode
count,125,125,125,125.0,125,125,125.0,125,125,125,125.0,125,125,125,125.0,125,125,125,125
unique,1,93,1,2.0,8,2,50.0,76,88,88,11.0,43,2,64,1.0,93,20,8,8
top,stdh,B5952897-F402-45C4-82D1-C1A7D79ED894,2026-01-23T00:00:00.000Z,,Food Service Establishment Inspection,Routine,,1899-12-30T10:05:00.000Z,Good Shepherd's Home,1051 Lake St.,,Nashville,TN,37863,,F6B39954-D80A-4222-BBB7-A802919745D4,Commercial Food <51,Food Service Establishment,605
freq,125,3,125,121.0,88,81,62.0,5,3,5,114.0,18,123,6,125.0,3,28,88,88


In [275]:
def get_score_from_inspection(inspectionID, base_url):
    response = requests.get(
        base_url + "/inspection/",
        params={
            'inspectionID': inspectionID   
        },
        headers=headers
    )
    response.raise_for_status()
    html = lx.fromstring(response.content)
    try:
        score_btn = html.xpath("//span[contains(@class, 'inspection-score-v2 w-button')]/span/strong")[0]
        score = int(score_btn.text)
    except:
        return None
    else:   
        return(score)

In [276]:
df['score'] = None

In [277]:
df

Unnamed: 0,nick,inspectionID,inspectionDate,recommendation,score,inspectionType,purpose,comments,timein,establishmentName,addressLine1,addressLine2,city,state,zip,scoreDisplay,permitID,permitType,programName,programCode
0,stdh,F14D9D7C-6040-4579-ACCB-864AD185037F,2026-01-23T00:00:00.000Z,,,Public Swimming Pools,Routine,,1899-12-30T09:00:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F6556C9D-1982-419B-B96E-164D9F9831B7,Type A- General public and institutional pools,Public Swimming Pool,690
1,stdh,8BA844A0-25BD-4C48-8E82-D1DDB65D4695,2026-01-23T00:00:00.000Z,,,Food Service Establishment Inspection,Routine,Approved mobile to operate. Will send copy of ...,1899-12-30T09:45:00.000Z,La Media Naranja LLC #2 Mobile FSE,7826 Santos Dr,,Murfreesboro,TN,37129,,53A4904B-BFF6-4E5F-BF43-79253A7BD020,Commercial Food <51 (Mobile),Food Service Establishment,605
2,stdh,C47AF1AD-1C7B-4AC4-9C4A-2D320A035257,2026-01-23T00:00:00.000Z,,,Public Swimming Pools,Routine,,1899-12-30T09:10:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F9B53208-596A-40AB-AB59-D953C102B970,"Type D- Whirlpools, hot tubs",Public Swimming Pool,690
3,stdh,292318AF-BD49-4DE0-B891-E5E42590424E,2026-01-23T00:00:00.000Z,,,Tattoo Studios Inspection,Routine,Establishment is very clean and organized!,1899-12-30T13:00:00.000Z,Got Ink? #615,8204 Florence Rd,,Smyrna,TN,37167,,F31C828F-71DC-4DDA-A56B-895DA3DFE22C,Tattoo Studios,Tattoo Studios,665
4,stdh,78B4C99F-3EF0-4538-9A6C-EFF04954352D,2026-01-23T00:00:00.000Z,,,Food Service Establishment Inspection,Routine,,1899-12-30T14:00:00.000Z,P.F. Chang's,259 Opry Mills Dr,,Nashville,TN,37214,,C4D6C57D-D19D-47E2-A880-6D6715C6E716,Commercial Food 51+,Food Service Establishment,605
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,stdh,E518B0C0-DAB3-4825-8CAC-F9947854ED01,2026-01-23T00:00:00.000Z,,,Food Service Establishment Inspection,Routine,"2026 permit avail., no cooling down process , ...",1899-12-30T09:15:00.000Z,Superlishous Express (MU),3309 E Foxburrow Cir,,Memphis,TN,38115,,C1D06A3B-E25A-430E-881F-94AF2EA858C2,Commercial Food <51 (Mobile),Food Service Establishment,605
121,stdh,5FE8E3CF-67FD-4752-A40C-655C4D5999C6,2026-01-23T00:00:00.000Z,,,Food Service Establishment Inspection,Follow-Up,,1899-12-30T15:48:00.000Z,Pure Pastry by El Hornito Bakery,2962 S Rutherford Blvd,Suite H,Murfreesboro,TN,37127,,5F2F7FA8-1883-4F84-9477-144CF993B529,Commercial Food <51,Food Service Establishment,605
122,stdh,E31F2CFD-C535-45CD-87E0-41EACABEA82F,2026-01-23T00:00:00.000Z,,,Food Service Establishment Inspection,Routine,Mobile unit is neat and organized. New mini fr...,1899-12-30T11:40:00.000Z,Tacos El Primo,216 Chaho Rd,,Knoxville,TN,37934,,EAA1B6D4-1E19-48E6-9A40-7167C3FD6CD9,Commercial Food <51 (Mobile),Food Service Establishment,605
123,stdh,A6C32E21-7E0B-46AB-9620-30F5C47FB826,2026-01-23T00:00:00.000Z,,,Hotels Motels Inspection,Follow-Up,*critical item 15 corrected at follow up,1899-12-30T12:44:00.000Z,Quality Inn,479 Gordonsville Hwy,,Gordonsville,TN,38563,,A975D714-20D9-4106-A761-A261A8628811,Hotel<51,Hotel,620


In [278]:
df['inspectionID']

0      F14D9D7C-6040-4579-ACCB-864AD185037F
1      8BA844A0-25BD-4C48-8E82-D1DDB65D4695
2      C47AF1AD-1C7B-4AC4-9C4A-2D320A035257
3      292318AF-BD49-4DE0-B891-E5E42590424E
4      78B4C99F-3EF0-4538-9A6C-EFF04954352D
                       ...                 
120    E518B0C0-DAB3-4825-8CAC-F9947854ED01
121    5FE8E3CF-67FD-4752-A40C-655C4D5999C6
122    E31F2CFD-C535-45CD-87E0-41EACABEA82F
123    A6C32E21-7E0B-46AB-9620-30F5C47FB826
124    A1BF01E7-E9C0-4B95-B09A-79B70BC954CC
Name: inspectionID, Length: 125, dtype: object

In [279]:
df['score'] = None
for key, val in df['inspectionID'].items():
    time.sleep(1)
    df.loc[key,'score'] = get_score_from_inspection(val, base_url)

In [280]:
df.head()

Unnamed: 0,nick,inspectionID,inspectionDate,recommendation,score,inspectionType,purpose,comments,timein,establishmentName,addressLine1,addressLine2,city,state,zip,scoreDisplay,permitID,permitType,programName,programCode
0,stdh,F14D9D7C-6040-4579-ACCB-864AD185037F,2026-01-23T00:00:00.000Z,,100,Public Swimming Pools,Routine,,1899-12-30T09:00:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F6556C9D-1982-419B-B96E-164D9F9831B7,Type A- General public and institutional pools,Public Swimming Pool,690
1,stdh,8BA844A0-25BD-4C48-8E82-D1DDB65D4695,2026-01-23T00:00:00.000Z,,100,Food Service Establishment Inspection,Routine,Approved mobile to operate. Will send copy of ...,1899-12-30T09:45:00.000Z,La Media Naranja LLC #2 Mobile FSE,7826 Santos Dr,,Murfreesboro,TN,37129,,53A4904B-BFF6-4E5F-BF43-79253A7BD020,Commercial Food <51 (Mobile),Food Service Establishment,605
2,stdh,C47AF1AD-1C7B-4AC4-9C4A-2D320A035257,2026-01-23T00:00:00.000Z,,100,Public Swimming Pools,Routine,,1899-12-30T09:10:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F9B53208-596A-40AB-AB59-D953C102B970,"Type D- Whirlpools, hot tubs",Public Swimming Pool,690
3,stdh,292318AF-BD49-4DE0-B891-E5E42590424E,2026-01-23T00:00:00.000Z,,100,Tattoo Studios Inspection,Routine,Establishment is very clean and organized!,1899-12-30T13:00:00.000Z,Got Ink? #615,8204 Florence Rd,,Smyrna,TN,37167,,F31C828F-71DC-4DDA-A56B-895DA3DFE22C,Tattoo Studios,Tattoo Studios,665
4,stdh,78B4C99F-3EF0-4538-9A6C-EFF04954352D,2026-01-23T00:00:00.000Z,,97,Food Service Establishment Inspection,Routine,,1899-12-30T14:00:00.000Z,P.F. Chang's,259 Opry Mills Dr,,Nashville,TN,37214,,C4D6C57D-D19D-47E2-A880-6D6715C6E716,Commercial Food 51+,Food Service Establishment,605


In [281]:
df['score'] = df['score'].astype('float64')

In [282]:
sorted_df = df.sort_values(by='score', ascending=False)

In [283]:
sorted_df[~sorted_df['score'].isna()]

Unnamed: 0,nick,inspectionID,inspectionDate,recommendation,score,inspectionType,purpose,comments,timein,establishmentName,addressLine1,addressLine2,city,state,zip,scoreDisplay,permitID,permitType,programName,programCode
0,stdh,F14D9D7C-6040-4579-ACCB-864AD185037F,2026-01-23T00:00:00.000Z,,100.0,Public Swimming Pools,Routine,,1899-12-30T09:00:00.000Z,Hamilton Family YMCA,7430 Shallowford Rd.,,Chattanooga,TN,37421,,F6556C9D-1982-419B-B96E-164D9F9831B7,Type A- General public and institutional pools,Public Swimming Pool,690
37,stdh,B5952897-F402-45C4-82D1-C1A7D79ED894,2026-01-23T00:00:00.000Z,,100.0,Food Service Establishment Inspection,Follow-Up,,1899-12-30T10:05:00.000Z,Station Camp H. S. Food Service,1040 Bison Trail.,,Gallatin,TN,37066,,F6B39954-D80A-4222-BBB7-A802919745D4,School Cafeteria,Food Service Establishment,605
70,stdh,84F9EA65-DE8B-4F02-B801-30DC19A5CD87,2026-01-23T00:00:00.000Z,,100.0,Food Service Establishment Inspection,Follow-Up,,1899-12-30T12:43:00.000Z,Jimmy John's 2088,"1725 ""A"" Wilma Rudolph Blvd.",,Clarksville,TN,37040,,B3839485-7CB2-491F-A654-8663645515D3,Commercial Food 51+,Food Service Establishment,605
68,stdh,2B997866-45C5-4F0F-B54A-6DAE89252D66,2026-01-23T00:00:00.000Z,,100.0,Food Service Establishment Inspection,Follow-Up,,1899-12-30T15:00:00.000Z,AJs Restaurant,449 Opry Mills Dr,,Nashville,TN,37214,,061E663A-B520-453A-AB11-5F2DE2EF3A45,Commercial Food 51+,Food Service Establishment,605
66,stdh,A7DB7E1D-2F26-4FF7-83F2-515219F8D87A,2026-01-23T00:00:00.000Z,,100.0,Food Service Establishment Inspection,Routine,Sharon.doran@hctnschools.com,1899-12-30T10:35:00.000Z,Northside Elementary,1450 E. Main St.,,Savannah,TN,38372,,A2ABAF3B-DC53-42CE-9B08-A147B4D97196,School Cafeteria,Food Service Establishment,605
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
80,stdh,504C7D52-A320-4DB7-8A1F-370E3832FB27,2026-01-23T00:00:00.000Z,,90.0,Food Service Establishment Inspection,Routine,,1899-12-30T09:45:00.000Z,Huddle House,9401 Reco Dr.,,Soddy Daisy,TN,37379,,66F39A8F-6D5E-452D-B415-1EAFB60A5F62,Commercial Food 51+,Food Service Establishment,605
123,stdh,A6C32E21-7E0B-46AB-9620-30F5C47FB826,2026-01-23T00:00:00.000Z,,87.0,Hotels Motels Inspection,Follow-Up,*critical item 15 corrected at follow up,1899-12-30T12:44:00.000Z,Quality Inn,479 Gordonsville Hwy,,Gordonsville,TN,38563,,A975D714-20D9-4106-A761-A261A8628811,Hotel<51,Hotel,620
109,stdh,E3724635-AACA-48DE-9E89-B6921C32DBF5,2026-01-23T00:00:00.000Z,,85.0,School Buildings Inspection,Follow-Up,,1899-12-30T10:03:00.000Z,Smith County Middle School,134 Scms Lane.,,Carthage,TN,37030,,8AE88BF1-68AA-48F0-A2FA-73FB8943EDE8,Middle/Jr. High School Building,School Building,635
59,stdh,EC62CB6E-1209-4FF0-A0C5-2F5B2C575B21,2026-01-23T00:00:00.000Z,,82.0,Food Service Establishment Inspection,Routine,The establishment received a complaint of Cook...,1899-12-30T13:56:00.000Z,Bigfoot Philly Cheesesteaks,2005 Wears Valley Rd,,Sevierville,TN,37862,,06D2214A-32AA-4CBF-837B-22BF786AA592,Commercial Food <51,Food Service Establishment,605


### Summary 

- Check the query type, header and params using the developer tools 
- Often, multiple API queries are made to display one result 