# Texas Tow Trucks (`.apply` and Selenium)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

### What URL will Selenium be starting on?

- Tip: The answer is *not* `https://www.tdlr.texas.gov/tools_search/`

https://www.tdlr.texas.gov/tools_search/mccs_search.asp

https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=006179570C
The TDLR Number shows up in URL! So we could also use https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=

### Why are you using Selenium for this?

(We don't have to really, because the number shows up in the URL)
But with Selenium we can click through a page and make multiple requests rather than having to physically do it each time

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

**You should know how to do `.post` requests by now.**

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [1]:
#first import all the things we might need for this.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#selenium is a package
driver = webdriver.Chrome()
#webdriver is a module within selenium
driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

In [2]:
my_url = "https://www.tdlr.texas.gov/tools_search/mccs_search.asp"
raw_html = urlopen(my_url).read()
soup_doc = BeautifulSoup(raw_html, "html.parser")

In [3]:
response = requests.post('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
response.text

#requests.post is different than requests.get becuase .post allows you to interact with the site rather than 
# just pull a URL


# We don't need this, but here it is:

# data = {
# 'namedata':'',
# 'name_carrier_type' : 'COMPANY',
# 'searchtype' : 'mcr',
# 'mcrdata' : '006179570C',
# 'citydata':'',
# 'city_status':'A',
# 'city_carrier_type':'tow',
# 'zipcodedata':'',
# 'zip_status':'ALL',
# 'zip_carrier_type':'all',
# 'proc':''
# }

# Headers = {
#     "Referer" : "https://www.tdlr.texas.gov/tools_search/mccs_search.asp?message=mcrerr",
#     'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
# }


'\r\n\r\n<html>\r\n<head>\r\n\t<title>TDLR Tow Truck and Vehicle Storage Facility Inquiry</title>\r\n\r\n<meta\tcontent="text/html; charset=windows-1252" http-equiv="Content-Type" />\r\n<meta\tNAME="GENERATOR" Content="Microsoft Visual Studio" /> \r\n\r\n<meta\tHTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8" />\r\n<meta\tname="description" \r\n\t\tcontent="Welcome to the Tow Truck and Vehicle Storage Facility Inquiry Information Page. \r\n\t\t\t\tThis web application allows users to obtain information on companies that have \r\n\t\t\t\tobtained registration through TDLR. This includes addresses, insurance records, \r\n\t\t\t\trecent activities, and vehicle data." />\r\n<meta\tname="keywords" \r\n\t\tcontent="Tow Trucks, Vehicle Storage Facility, registration, insurance, Permit \r\n\t\t\t\tRestrictions, Texas Department of Licensing and Regulation, TDLR" />\r\n<meta\tname="subject" content="Transportation" />\r\n<meta\tname="type" content="Programs and services" />\r\n<meta

In [4]:
TDLR_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')

#here's where we are making a variable so we can select the TDLR field where we 
#input the number we're searching for, then hit enter (send_keys)

TDLR_input.send_keys('006179570C')
search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
search_button.click()



In [5]:
name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]')
name.text

'Name:   B.D. SMITH TOWING'

In [6]:
phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]')
phone.text

'Phone:   8173330706'

In [7]:
license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[1]/font/font')
license_status.text

'Active'

In [8]:
physical_address = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')
physical_address.text

#How do we get down to the physical address??? Should we use beautifulsoup? the only parameters are that it's the 9th 
#line break.


'Carrier Type:  Tow Truck Company\nNumber of Active Tow Trucks:   0\n\nAddress Information\nMailing:\n13619 BRETT JACKSON RD\nFORT WORTH, TX. 76179\n\nPhysical:\n13619 BRETT JACKSON RD.\nFORT WORTH, TX. 76179'

# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [9]:
import pandas as pd
df = pd.read_csv('trucks-subset.csv', dtype = {'TDLR':'str'})
df.head()
df.dtypes

TDLR Number    object
dtype: object

### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [10]:
df.head()

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


## Use `.apply` to go through each row of the dataset, printing out information about each tow truck company.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [11]:
def looking_at_something(row):

    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    
    TDLR_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    TDLR_input.send_keys(row['TDLR Number'])
    search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
    search_button.click()
    
    name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]')
    print(name.text)
    phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]')
    print(phone.text)
    license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[1]/font/font')
    print(license_status.text)
    physical_address = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]')
    print(physical_address.text)
    
   

## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [15]:
def scrape_trucks_info(row):

    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    
    #NOTE: don't do the webdriver piece here, because otherwise you'll open a bajillion webdrivers
    #think of the actions you take when you go to the web page. do that same thing.
    
    TDLR_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    TDLR_input.send_keys(row['TDLR Number'])
    search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
    search_button.click()
    
    name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
    phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text
    license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[1]/font/font').text
    physical_address = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]').text
    
    return pd.Series({
        'name': name,
        'phone': phone,
        'license status':license_status,
        'physical address': physical_address
    })

driver = webdriver.Chrome ()

trucks_sub_df = df.apply(scrape_trucks_info, axis=1).join(df)

driver.close()
#applying the function 'scrape_trucks_info to the df (remmeber that's the csv file we downloaded!)
#also joining those two things!
#note: could also use .strip() in the return instead of .text

### Save your dataframe as a CSV

In [17]:
trucks_sub_df.head()
trucks_sub_df.to_csv('texas_trucks.csv', index=False)


### Re-open your dataframe to confirm you didn't save any extra weird columns

In [18]:
trucks_sub_df = pd.read_csv('texas_trucks.csv')
trucks_sub_df.head()

Unnamed: 0,license status,name,phone,physical address,TDLR Number
0,Active,Name: AUGUSTUS E SMITH,Phone: 9032276464,Carrier Type: Tow Truck Company\nNumber of Ac...,006507931C
1,Active,Name: B.D. SMITH TOWING,Phone: 8173330706,Carrier Type: Tow Truck Company\nNumber of Ac...,006179570C
2,Active,Name: BARRY MICHAEL SMITH,Phone: 8066544404,Carrier Type: Tow Truck Company\nNumber of Ac...,006502097C


## Repeat this process for the entire `tow-trucks.csv` file

In [None]:
tow_trucks_all_df = pd.read_csv('tow-trucks.csv', dtype = {'TDLR':'str'})
tow_trucks_all_df.dtypes

In [None]:
len(tow_trucks_all_df)

In [None]:
tow_trucks_all_df.head()
tow_trucks_all_df.tail()


In [21]:
tow_trucks_all_df = pd.read_csv('tow-trucks.csv', dtype = {'TDLR':'str'})
tow_trucks_all_df.head()

def scrape_trucks_all(row):

    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    
    #think of all the actions you take when you go to the web page. do that same thing, just via selenium
    
    TDLR_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
    TDLR_input.send_keys(row['TDLR Number'])
    search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
    search_button.click()
    
    name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
    phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text 
    license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font').text 
    physical_address = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]').text 

    
    return pd.Series({
        'name': name,
        'phone': phone,
        'license status':license_status,
        'physical address': physical_address
    })
    
#         license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[1]/font/font').text

trucks_new_df = tow_trucks_all_df.apply(scrape_trucks_all, axis=1).join(tow_trucks_all_df)
trucks_new_df.head()

# new_df = imported df from csv.apply(function, axis=1).join(imported df from csv)

#It seems to be breaking because it isn't finding a record for 0646264VSF. Below I've tried a While loop to skip that 
#but I can't seem to get the syntax right.

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]"}
  (Session info: chrome=58.0.3029.110)
  (Driver info: chromedriver=2.30.477690 (c53f4ad87510ee97b5c3425a14c0e79780cdf262),platform=Mac OS X 10.12.5 x86_64)


In [31]:
tow_trucks_all_df = pd.read_csv('tow-trucks.csv', dtype = {'TDLR':'str'})
tow_trucks_all_df.head()

    
while True:
    
    try:
        
        driver.get('https://www.tdlr.texas.gov/tools_search/mccs_search.asp')
    
    #think of all the actions you take when you go to the web page. do that same thing, just via selenium
    
        TDLR_input = driver.find_element_by_xpath('//*[@id="mcrdata"]')
        TDLR_input.send_keys(row['TDLR Number'])
        search_button = driver.find_element_by_xpath('//*[@id="submit3"]/b')
        search_button.click()
    
        name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]').text
        phone = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[4]/td[1]').text 
        license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[1]/td[2]/font').text 
        physical_address = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[2]').text 
    
    if:
        name = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[2]/tbody/tr[2]/td[1]' !=Na)
    
    continue;
    
    return pd.Series({
        'name': name,
        'phone': phone,
        'license status':license_status,
        'physical address': physical_address
    })
    
#         license_status = driver.find_element_by_xpath('//*[@id="t1"]/tbody/tr/td/font/table[3]/tbody/tr[2]/td[1]/font/font').text

trucks_new_df = tow_trucks_all_df.apply(scrape_trucks_all, axis=1).join(tow_trucks_all_df)
trucks_new_df.head()

# new_df = imported df from csv.apply(function, axis=1).join(imported df from csv)

#It seems to be breaking because it isn't finding a record for 0646264VSF. I think a While loop could help but I can't figure
#out what's wrong with my syntax...

SyntaxError: invalid syntax (<ipython-input-31-bebdb1fd784a>, line 23)