Traditionally Python programmers use [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) to scrape content from the interent. Instead of being *traditional*, we're going to use [Playwright](https://playwright.dev/python/), a **browser automation tool**! This means you actually control the browser! Filling out forms, clicking buttons, downloading documents... it's magic!!!✨✨✨

# Maryland locksmiths

- Inspecting the page
- Filling in a text box
- Working through a list of inputs (zip codes, in this case)
- Combining dataframes
- Back button

## Installation

We need to install a few tools first! Remove the `#` and run the cell to install the Python packages and browsers that we'll need for our scraping adventure.

In [1]:
# %pip install --quiet lxml html5lib beautifulsoup4 pandas
# %pip install --quiet playwright
# !playwright install

## Opening up the browser and visiting our destination


In [18]:
from playwright.async_api import async_playwright

# "Hey, open up a browser"
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless=False)

# Create a new browser window
page = await browser.new_page()

In [19]:
await page.goto("https://www.dllr.state.md.us/cgi-bin/ElectronicLicensing/OP_Search/OP_search.cgi?calling_app=LOCKSMITH::LOCKSMITH_personal_location")

<Response url='https://www.dllr.state.md.us/cgi-bin/ElectronicLicensing/OP_Search/OP_search.cgi?calling_app=LOCKSMITH::LOCKSMITH_personal_location' request=<Request url='https://www.dllr.state.md.us/cgi-bin/ElectronicLicensing/OP_Search/OP_search.cgi?calling_app=LOCKSMITH::LOCKSMITH_personal_location' method='GET'>>

## Filling in a text box

You always start with `await page.locator("input").fill("whatever you want")`. You'll probably get an error because there are multiple inputs on the page, but Playwright doesn't know which one you want to use! Just read the error and figure out the right one.

In [21]:
# 20601 
# 20602
# 20603
# 20606
# 20607
# 20608
# 20609

# await page.locator("input").fill("20601")
await page.locator("[name='zip']").fill("20601")

In [7]:
# await page.get_by_text("Search").click()
await page.get_by_role("button", name="Search").click()

## Grab the tables from the page

[Pandas](https://pandas.pydata.org/) is the Python equivalent to Excel, and it's great at dealing with tabular data! Often the data on a web page that looks like a spreadsheet can be read with `pd.read_html`.

You use `await page.content()` to save the contents of the page, then feed it to `read_html` to find the tables. `len(tables)` checks the number of tables you have, then you manually poke around to see which one is the one you're interested in. `tables[0]` is the first one, `tables[1]` is the second one, and so on...

In [9]:
import pandas as pd
from io import StringIO

html = await page.content()
tables = pd.read_html(StringIO(html))
len(tables)

1

In [10]:
tables[0]

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Personal Name,Business Legal/Trading as Name and Street Address,City,State,Zip,Expiration,Category,Reg. #,Suffix
1,BARRY BRAAN,ABECO SAFE AND LOCK CO. 10520 BEECHWOOD DRIVE ...,WALDORF,MD,20601,2025-03-07,LOCKSMITH,393,


In [12]:
await page.go_back()

<Response url='https://www.dllr.state.md.us/cgi-bin/ElectronicLicensing/OP_Search/OP_search.cgi?calling_app=LOCKSMITH::LOCKSMITH_personal_location' request=<Request url='https://www.dllr.state.md.us/cgi-bin/ElectronicLicensing/OP_Search/OP_search.cgi?calling_app=LOCKSMITH::LOCKSMITH_personal_location' method='GET'>>

## Fill out the ZIP code field again and again and again

I found a list of zipcodes on the internet! I pasted them below, then used `.split()` to make them into something we could use in Python.

In [3]:
zipcodes = """20906
21234
20878
21740
20874
21122
21222
21117
20904
20744
21061
21215
20902
20772
21207
20850
21206
20774
20783
21228
20854
20852
21043
21702
21218
21044
21921
20910
21224
21229""".split("\n")

print(zipcodes)

['20906', '21234', '20878', '21740', '20874', '21122', '21222', '21117', '20904', '20744', '21061', '21215', '20902', '20772', '21207', '20850', '21206', '20774', '20783', '21228', '20854', '20852', '21043', '21702', '21218', '21044', '21921', '20910', '21224', '21229']


Now we fill out the form for each and every zip code, one by one, pulling out the tables and saving them and adding them to the list.

In [12]:
import pandas as pd
from io import StringIO

all_data = pd.DataFrame()

# Go to the front page
await page.goto("https://www.dllr.state.md.us/cgi-bin/ElectronicLicensing/OP_Search/OP_search.cgi?calling_app=LOCKSMITH::LOCKSMITH_personal_location")

# Search for each zipcode
for zipcode in zipcodes:
    print("Searching for", zipcode)

    # Fill out the form and search
    await page.locator("[name='zip']").fill(zipcode)
    await page.get_by_role("button", name="Search").click()

    # try:
    # Get all of the tables on the page
    html = await page.content()
    try:
        tables = pd.read_html(StringIO(html))
    except:
        tables = []

    # Get the table (and edit if necessary)
    if len(tables) > 0:
        df = tables[0]
        print("Found", len(df))
    
        # Add the tables on this page to 
        all_data = pd.concat([all_data, df], ignore_index = True)
    else:
        print("Nothing found")

    # Go back and start again
    await page.go_back()

Searching for 20906
Found 3
Searching for 21234
Found 3
Searching for 20878
Found 3
Searching for 21740
Found 4
Searching for 20874
Found 2
Searching for 21122
Found 4
Searching for 21222
Found 6
Searching for 21117
Found 5
Searching for 20904
Found 4
Searching for 20744
Found 3
Searching for 21061
Found 2
Searching for 21215
Found 4
Searching for 20902
Found 18
Searching for 20772
Found 3
Searching for 21207
Found 3
Searching for 20850
Found 3
Searching for 21206
Nothing found
Searching for 20774
Found 2
Searching for 20783
Found 4
Searching for 21228
Found 4
Searching for 20854
Found 2
Searching for 20852
Found 14
Searching for 21043
Nothing found
Searching for 21702
Found 2
Searching for 21218
Nothing found
Searching for 21044
Found 4
Searching for 21921
Found 3
Searching for 20910
Found 3
Searching for 21224
Nothing found
Searching for 21229
Nothing found


In [13]:
len(all_data)

108

In [14]:
all_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Personal Name,Business Legal/Trading as Name and Street Address,City,State,Zip,Expiration,Category,Reg. #,Suffix
1,JOSE A. MALDONADO,ELKIN LOCKSMITH 3719 FERRARA DRIVE Total Activ...,SILVER SPRING,MD,20906,2026-05-06,LOCKSMITH,635,
2,TERRY ROSEMOND,"SERVICE REPAIRS, LLC 13108 CAMELLIA DRIVE Tota...",SILVER SPRING,MD,20906,2024-11-30,LOCKSMITH,380,
3,Personal Name,Business Legal/Trading as Name and Street Address,City,State,Zip,Expiration,Category,Reg. #,Suffix
4,ROBERT EASTER,EASTER'S LOCK & SECURITY SOLUTIONS 1713 E JOPP...,BALTIMORE,MD,21234,2025-01-31,LOCKSMITH,10,


## Saving the results

Now we'll save it to a CSV file! Easy peasy.

In [16]:
all_data.to_csv("output.csv", index=False)