# Texas Tow Trucks (`.apply` and Selenium)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

### What URL will Selenium be starting on?

- Tip: The answer is *not* `https://www.tdlr.texas.gov/tools_search/`

In [None]:
# https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006179570C 
# or the search page
# https://www.tdlr.texas.gov/tools_search/


### Why are you using Selenium for this?

In [None]:
# We don't have to - we don't have to use a form. 
# But we've heard that this page is weird if you just use requests.

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

**You should know how to do `.post` requests by now.**

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [85]:
from selenium import webdriver

In [86]:
# open selenium -> driver = webdriver.Chrome() <- or later (see below)

In [87]:
driver.get("https://www.tdlr.texas.gov/tools_search/")

In [88]:
# //*[@id="mcrdata"]
tdlr_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
tdlr_input.send_keys('006179570C')

In [89]:
search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
search_button.click()

In [90]:
#scraping the page with xpath - information

# The business name
# Phone number
# License status
# Physical address

In [91]:
biz_name_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]')
biz_name_tag.text.replace("Name:", "").strip()

'B.D. SMITH TOWING'

In [92]:
phone_tag = driver.find_element_by_xpath(' //*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]')  
phone_tag.text.replace("Phone:", "").strip()

'8173330706'

In [93]:
status_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font/font')
status_tag.text

'Active'

In [94]:
address_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')     
address_tag.text

'Carrier Type:  Tow Truck Company\nNumber of Active Tow Trucks:   0\n\nAddress Information\nMailing:\n13619 BRETT JACKSON RD\nFORT WORTH, TX. 76179\n\nPhysical:\n13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179'

In [95]:
print(address_tag.text.split(":")[-1].strip())

13619 BRETT JACKSON RD.
FORT WORTH, TX. 76179


# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [96]:
import pandas as pd
df = pd.read_csv('trucks-subset.csv')
df.head()

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [97]:
# file is okay

## Use `.apply` to go through each row of the dataset, printing out information about each tow truck company.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [98]:
df.head()

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


In [99]:
#scraping with a function - start with printing, move on with returning series


def process_truck(row):
    #visit the search page
    driver.get("https://www.tdlr.texas.gov/tools_search/")
    
    #fill out the form and submit it
    tdlr_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    tdlr_input.send_keys(row['TDLR Number'])
    
    search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
    search_button.click()
    
    biz_name_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]')
    print("Name", biz_name_tag.text.replace("Name:", "").strip())

    phone_tag = driver.find_element_by_xpath(' //*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]')  
    print("Phone", phone_tag.text.replace("Phone:", "").strip())

    status_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font/font')
    print("Status", status_tag.text)

    address_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')  
    print("Adress", address_tag.text.split(":")[-1].strip())
    
    print('-------')   

#open up selenium - and close after work ;-)
driver = webdriver.Chrome()          
df.apply(process_truck, axis=1)
driver.close()


Name AUGUSTUS E SMITH
Phone 9032276464
Status Active
Adress 103 N MAIN ST
BONHAM, TX. 75418
-------
Name B.D. SMITH TOWING
Phone 8173330706
Status Active
Adress 13619 BRETT JACKSON RD.
FORT WORTH, TX. 76179
-------
Name BARRY MICHAEL SMITH
Phone 8066544404
Status Active
Adress 4501 W CEMETERY RD
CANYON, TX. 79015
-------


## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [100]:
df.head()

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


In [105]:
def process_truck(row):
    #visit the search page
    driver.get("https://www.tdlr.texas.gov/tools_search/")
    
    #fill out the form and submit it
    tdlr_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    tdlr_input.send_keys(row['TDLR Number'])
    
    search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
    search_button.click()
    
    biz_name_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]')
   # print("Name", biz_name_tag.text.replace("Name:", "").strip())

    phone_tag = driver.find_element_by_xpath(' //*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]')  
    #print("Phone", phone_tag.text.replace("Phone:", "").strip())

    status_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font/font')
    #print("Status", status_tag.text)

    address_tag = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')  
    #print("Adress", address_tag.text.split(":")[-1].strip())
    
    #print('-------')  
    
    return pd.Series({
        'Name': biz_name_tag.text.replace("Name:", "").strip(),
        'Phone': phone_tag.text.replace("Phone:", "").strip(),
        'Status': status_tag.text,
        'Adress': address_tag.text.split(":")[-1].strip()       
    })


driver = webdriver.Chrome()          
df.apply(process_truck, axis=1)
driver.close()

In [106]:
df

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


In [109]:
driver = webdriver.Chrome()    
complete_df = df.apply(process_truck, axis=1).join(df)
driver.close()

In [110]:
complete_df

Unnamed: 0,Adress,Name,Phone,Status,TDLR Number
0,"103 N MAIN ST\nBONHAM, TX. 75418",AUGUSTUS E SMITH,9032276464,Active,006507931C
1,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",B.D. SMITH TOWING,8173330706,Active,006179570C
2,"4501 W CEMETERY RD\nCANYON, TX. 79015",BARRY MICHAEL SMITH,8066544404,Active,006502097C


### Save your dataframe as a CSV

In [111]:
#be sure to not save the index!
complete_df.to_csv('complete_trucks.csv', index=False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [112]:
pd.read_csv('complete_trucks.csv')

Unnamed: 0,Adress,Name,Phone,Status,TDLR Number
0,"103 N MAIN ST\nBONHAM, TX. 75418",AUGUSTUS E SMITH,9032276464,Active,006507931C
1,"13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179",B.D. SMITH TOWING,8173330706,Active,006179570C
2,"4501 W CEMETERY RD\nCANYON, TX. 79015",BARRY MICHAEL SMITH,8066544404,Active,006502097C


## Repeat this process for the entire `tow-trucks.csv` file