# Texas Tow Trucks (`.apply` and Selenium)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

### What URL will Selenium be starting on?

- Tip: The answer is *not* `https://www.tdlr.texas.gov/tools_search/`

In [1]:
# yes, it is!
# https://www.tdlr.texas.gov/tools_search/mccs_search.asp

### Why are you using Selenium for this?

In [2]:
# just for fun

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

**You should know how to do `.post` requests by now.**

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [3]:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
import pandas as pd

In [4]:
driver = webdriver.Chrome()

In [5]:
driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')

In [6]:
driver.find_element_by_xpath('//*[@id="mcrdata"]').send_keys('006179570C')

In [7]:
driver.find_element_by_xpath('//*[@id="submit3"]').click()

In [8]:
doc = BeautifulSoup(driver.page_source, 'html.parser')

In [9]:
new_license = {
    'business name': doc.find_all('td')[5].text[8:],
    'phone number': doc.find_all('td')[9].text[9:],
    'license status': doc.find_all('td')[12].text[9:],
    'physical address': doc.find('strong', string='Physical:').find_next_siblings(text=True)[0].strip() + ', ' + doc.find('strong', string='Physical:').find_next_siblings(text=True)[1].strip()
}

new_license

{'business name': 'B.D. SMITH TOWING',
 'license status': 'Active',
 'phone number': '8173330706',
 'physical address': '13619 BRETT JACKSON RD., FORT WORTH,\xa0TX.\xa076179'}

# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [10]:
df = pd.read_csv('trucks-subset.csv')
df

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [11]:
# nope, everything's fine

## Use `.apply` to go through each row of the dataset, printing out information about each tow truck company.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [12]:
def do_the_stuff(row):
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number'])
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    print('business name:', doc.find_all('td')[5].text[8:])
    print('phone number:', doc.find_all('td')[9].text[9:])
    print('license status:', doc.find_all('td')[12].text[9:])
    print('physical address:', doc.find('strong', string='Physical:').find_next_siblings(text=True)[0].strip() + ', ' + doc.find('strong', string='Physical:').find_next_siblings(text=True)[1].strip())
    print('----------')

df.apply(do_the_stuff, axis=1)

business name: AUGUSTUS E SMITH
phone number: 9032276464
license status: Active
physical address: 103 N MAIN ST, BONHAM, TX. 75418
----------
business name: B.D. SMITH TOWING
phone number: 8173330706
license status: Active
physical address: 13619 BRETT JACKSON RD., FORT WORTH, TX. 76179
----------
business name: BARRY MICHAEL SMITH
phone number: 8066544404
license status: Active
physical address: 4501 W CEMETERY RD, CANYON, TX. 79015
----------


0    None
1    None
2    None
dtype: object

## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [13]:
def do_the_stuff_right(row):
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number'])
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    return pd.Series({
        'business name': doc.find_all('td')[5].text[8:],
        'phone number': doc.find_all('td')[9].text[9:],
        'license status': doc.find_all('td')[12].text[9:],
        'physical address': doc.find('strong', string='Physical:').find_next_siblings(text=True)[0].strip() + ', ' + doc.find('strong', string='Physical:').find_next_siblings(text=True)[1].strip()
    })

df = df.apply(do_the_stuff_right, axis=1).join(df)

### Save your dataframe as a CSV

In [14]:
df.to_csv('trucks-results.csv', index=False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [15]:
pd.read_csv('trucks-results.csv')

Unnamed: 0,business name,license status,phone number,physical address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"103 N MAIN ST, BONHAM, TX. 75418",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"13619 BRETT JACKSON RD., FORT WORTH, TX. 76179",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"4501 W CEMETERY RD, CANYON, TX. 79015",006502097C


## Repeat this process for the entire `tow-trucks.csv` file

In [16]:
def do_the_stuff_right_and_for_all(row):
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number'])
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    return pd.Series({
        'business name': doc.find_all('td')[5].text[8:],
        'phone number': doc.find_all('td')[9].text[9:],
        'license status': doc.find_all('td')[12].text[9:],
        'physical address': doc.find('strong', string='Physical:').find_next_siblings(text=True)[0].strip() + ', ' + doc.find('strong', string='Physical:').find_next_siblings(text=True)[1].strip()
    })

df = pd.read_csv('tow-trucks.csv')
df = df.apply(do_the_stuff_right_and_for_all, axis=1).join(df)

df.to_csv('trucks-results-all.csv', index=False)
pd.read_csv('trucks-results-all.csv')

Unnamed: 0,business name,license status,phone number,physical address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"103 N MAIN ST, BONHAM, TX. 75418",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"13619 BRETT JACKSON RD., FORT WORTH, TX. 76179",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"4501 W CEMETERY RD, CANYON, TX. 79015",006502097C
3,HEATH SMITH,Expired,940-552-0687,"1529 WILBARGER ST, VERNON, TX. 76384",006494912C
4,HEATH SMITH,Expired,9405520687,"1529 WILBARGER ST, VERNON, TX. 76384",0649468VSF
5,HYSMITH AUTOMOTIVE,,icer: ASHLEY ERIN HYSMITH ...,"1210 US 380 BYPASS, GRAHAM, TX. 76450",006448786C
6,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,,icer: WILLIAM THOMAS HYSMITH ...,"927 LOVING HWY, GRAHAM, TX. 76450",0648444VSF
7,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,,icer: ASHLEY ERIN HYSMITH ...,"1210 380 BYPASS, GRAHAM, TX. 76450",0651667VSF
8,JEFF & WENDY SMITH,,icer: WENDY SMITH ...,"10842 FM 2138 N, JACKSONVILLE, TX. 75766",006017767C
9,JEFF SMITH,Active,8324354670,"4338 HARVEY RD, CROSBY, TX. 77532",006495492C


In [17]:
driver.close()