# Texas Tow Trucks (`.apply` and Selenium)

We're going to scrape some [tow trucks in Texas](https://www.tdlr.texas.gov/tools_search/).

Try searching for the TLDR Number `006179570C`.

## Preparation

### What URL will Selenium be starting on?

- Tip: The answer is *not* `https://www.tdlr.texas.gov/tools_search/`

In [234]:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.tdlr.texas.gov/tools_search/mccs_search.asp")

In [235]:
from selenium.webdriver.common.keys import Keys
tdlr_number = driver.find_element_by_xpath('//*[@id="mcrdata"]')
tdlr_number.send_keys('006179570C')
tdlr_number.send_keys(Keys.RETURN)

### Why are you using Selenium for this?

In [236]:
# To imitate inputting TLDR Number 006179570C in the box on the browser

## Scrape this page

Scrape this page, displaying the

- The business name
- Phone number
- License status
- Physical address

**You should know how to do `.post` requests by now.**

- *TIP: For physical address, **ask me on the board** and I'll give you a secret trick about situations like this.*

In [237]:
from bs4 import BeautifulSoup
import requests

In [238]:
doc = BeautifulSoup(driver.page_source, 'html.parser')
doc

<html xmlns="http://www.w3.org/1999/xhtml"><head><link href="style.css" rel="stylesheet" type="text/css"/>
<title>Towing Company Information</title></head>
<body onload="scrollTo(4000,4000)" topmargin="0">
<style media="screen" type="text/css">
	body {
		background-color: #CCCCCC;
		border-color: #000000;
		color: #2e2d2c;
		font-family: Arial, Helvetica, sans-serif;
		font-size: .9em;
		line-height: normal;
		margin: 0 0 0 0;
		padding: 0 0 0 0;
		text-align: center;
	}

	#outerWrapper {
		position: relative;
		background-color: #ffc;
		margin: 0 auto 0 auto; 
		width: 960px;
	}
	#outerWrapper #header {
		background-color: #dbc9a0;
		text-align: left; 
	}
	#outerWrapper #header h1 {
		background-color: #fff6d9;
		text-align: left; 
		color: #000000;
	}
	#outerWrapper #topNavigation {
		clear: left;
		margin: 0 0 0 0;
		line-height: 2em;
		border-bottom: 1px solid #ccc;
		background-color: #FFC;
		border-bottom: 1px solid #ccc;
		border-top: 1px solid #ccc;
	}
	#outerWrapper #main-nav 

In [239]:
table = doc.find_all('table', attrs={'align':'center'})

In [240]:
truck_info = {
    'business_name': table[2].find_all('td')[2].strong.next.next,
    'phone_number': table[2].find_all('td')[6].strong.next_sibling[3:],
    'license_status': table[3].find_all('td')[1].text[9:],
    'physical_address': table[3].find_all('td')[3].find_all('strong')[-1].find_next_siblings(text=True)
}
truck_info

{'business_name': 'B.D. SMITH TOWING',
 'license_status': 'Active',
 'phone_number': '8173330706',
 'physical_address': ['\n\n13619 BRETT JACKSON RD.',
  '\n\t\t\t    FORT WORTH,\xa0TX.\xa076179\n        ']}

# Using .apply to find data about SEVERAL tow truck companies

The file `trucks-subset.csv` has information about the trucks, we'll use it to find the pages to scrape.

### Open up `trucks-subset.csv` and save it into a dataframe

In [241]:
import pandas as pd
df = pd.read_csv('trucks-subset.csv')
df

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C


### Open up `trucks-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: I can help with this.*

In [242]:
# N/A

## Use `.apply` to go through each row of the dataset, printing out information about each tow truck company.

- The business name
- Phone number
- License status
- Physical address

Just print it out for now.

- *TIP: use .apply and a function*
- *TIP: If you need help with .apply, look at the "Using apply in pandas" notebook *

In [243]:
def truck_company(row):   
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number'])
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    table = doc.find_all('table', attrs={'align':'center'})
    print('business_name:', table[2].find_all('td')[2].strong.next.next)
    print('phone_number:', table[2].find_all('td')[6].strong.next_sibling[3:])
    print('license_status:', table[3].find_all('td')[1].text[9:])
    print('physical_address:', table[3].find_all('td')[3].find_all('strong')[-1].find_next_siblings(text=True))
    print("--------")

df.apply(truck_company, axis=1)

business_name: AUGUSTUS E SMITH
phone_number: 9032276464
license_status: Active
physical_address: ['\n\n103 N MAIN ST', '\n\t\t\t    BONHAM,\xa0TX.\xa075418\n        ']
--------
business_name: B.D. SMITH TOWING
phone_number: 8173330706
license_status: Active
physical_address: ['\n\n13619 BRETT JACKSON RD.', '\n\t\t\t    FORT WORTH,\xa0TX.\xa076179\n        ']
--------
business_name: BARRY MICHAEL SMITH
phone_number: 8066544404
license_status: Active
physical_address: ['\n\n4501 W CEMETERY RD', '\n\t\t\t    CANYON,\xa0TX.\xa079015\n        ']
--------


0    None
1    None
2    None
dtype: object

## Scrape the following information for each row of the dataset, and save it into new columns in your dataframe.

- The business name
- Phone number
- License status
- Physical address

It's basically what we did before, but using the function a little differently.

- *TIP: Use .apply and a function*
- *TIP: Remember to use `return`*

In [253]:
def truck_company(row):   
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number'])
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    table = doc.find_all('table', attrs={'align':'center'})
    return pd.Series({
        'business_name': table[2].find_all('td')[2].strong.next.next,
        'phone_number': table[2].find_all('td')[6].strong.next_sibling[3:],
        'license_status': table[3].find_all('td')[1].text[9:],
        'physical_address': table[3].find_all('td')[3].find_all('strong')[-1].find_next_siblings(text=True)
    })

df.apply(truck_company, axis=1).join(df)

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"[ 103 N MAIN ST, BONHAM, TX. 75418  ...",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"[ 13619 BRETT JACKSON RD., FORT WORTH...",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"[ 4501 W CEMETERY RD, CANYON, TX. 790...",006502097C


### Save your dataframe as a CSV

In [254]:
df_results = df.apply(truck_company, axis=1).join(df)

In [255]:
df_results.to_csv('trucks-results.csv', index=False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [256]:
df_results = pd.read_csv('trucks-results.csv')
df_results.head

<bound method NDFrame.head of          business_name license_status  phone_number  \
0     AUGUSTUS E SMITH         Active    9032276464   
1    B.D. SMITH TOWING         Active    8173330706   
2  BARRY MICHAEL SMITH         Active    8066544404   

                                    physical_address TDLR Number  
0  ['\n\n103 N MAIN ST', '\n\t\t\t    BONHAM,\xa0...  006507931C  
1  ['\n\n13619 BRETT JACKSON RD.', '\n\t\t\t    F...  006179570C  
2  ['\n\n4501 W CEMETERY RD', '\n\t\t\t    CANYON...  006502097C  >

## Repeat this process for the entire `tow-trucks.csv` file

In [257]:
df_all = pd.read_csv('tow-trucks.csv')
df_all

Unnamed: 0,TDLR Number
0,006507931C
1,006179570C
2,006502097C
3,006494912C
4,0649468VSF
5,006448786C
6,0648444VSF
7,0651667VSF
8,006017767C
9,006495492C


In [258]:
def truck_company_all(row):   
    driver.get('https://www.tdlr.texas.gov/tools_search/mccs_display.asp?mcrnumber=' + row['TDLR Number'])
    doc = BeautifulSoup(driver.page_source, 'html.parser')
    table = doc.find_all('table', attrs={'align':'center'})
    return pd.Series({
        'business_name': table[2].find_all('td')[2].strong.next.next,
        'phone_number': table[2].find_all('td')[6].strong.next_sibling[3:],
        'license_status': table[3].find_all('td')[1].text[9:],
        'physical_address': table[3].find_all('td')[3].find_all('strong')[-1].find_next_siblings(text=True)
    })

df_all.apply(truck_company_all, axis=1).join(df_all)

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"[ 103 N MAIN ST, BONHAM, TX. 75418  ...",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"[ 13619 BRETT JACKSON RD., FORT WORTH...",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"[ 4501 W CEMETERY RD, CANYON, TX. 790...",006502097C
3,HEATH SMITH,Expired,940-552-0687,"[ 1529 WILBARGER ST, VERNON, TX. 763...",006494912C
4,HEATH SMITH,Expired,9405520687,"[ 1529 WILBARGER ST, VERNON, TX. 763...",0649468VSF
5,HYSMITH AUTOMOTIVE,Active,LEY ERIN HYSMITH ...,"[ 1210 US 380 BYPASS, GRAHAM, TX. 764...",006448786C
6,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Expired,LIAM THOMAS HYSMITH ...,"[ 927 LOVING HWY, GRAHAM, TX. 76450 ...",0648444VSF
7,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Active,LEY ERIN HYSMITH ...,"[ 1210 380 BYPASS, GRAHAM, TX. 76450 ...",0651667VSF
8,JEFF & WENDY SMITH,Expired,DY SMITH ...,"[ 10842 FM 2138 N, JACKSONVILLE, TX. ...",006017767C
9,JEFF SMITH,Active,8324354670,"[ 4338 HARVEY RD, CROSBY, TX. 77532 ...",006495492C


In [261]:
df_all_results = df_all.apply(truck_company_all, axis=1).join(df_all)
df_all_results.to_csv('trucks-results-all.csv', index=False)

In [262]:
df_all_results = pd.read_csv('trucks-results-all.csv')
df_all_results

Unnamed: 0,business_name,license_status,phone_number,physical_address,TDLR Number
0,AUGUSTUS E SMITH,Active,9032276464,"['\n\n103 N MAIN ST', '\n\t\t\t BONHAM,\xa0...",006507931C
1,B.D. SMITH TOWING,Active,8173330706,"['\n\n13619 BRETT JACKSON RD.', '\n\t\t\t F...",006179570C
2,BARRY MICHAEL SMITH,Active,8066544404,"['\n\n4501 W CEMETERY RD', '\n\t\t\t CANYON...",006502097C
3,HEATH SMITH,Expired,940-552-0687,"['\n\n1529 WILBARGER ST', '\n\t\t\t VERNON...",006494912C
4,HEATH SMITH,Expired,9405520687,"['\n\n1529 WILBARGER ST', '\n\t\t\t VERNON...",0649468VSF
5,HYSMITH AUTOMOTIVE,Active,LEY ERIN HYSMITH ...,"['\n\n1210 US 380 BYPASS', '\n\t\t\t GRAHAM...",006448786C
6,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Expired,LIAM THOMAS HYSMITH ...,"['\n\n927 LOVING HWY', '\n\t\t\t GRAHAM,\x...",0648444VSF
7,HYSMITH AUTOMOTIVE & TRUCK REPAIR INC,Active,LEY ERIN HYSMITH ...,"['\n\n1210 380 BYPASS', '\n\t\t\t GRAHAM,\x...",0651667VSF
8,JEFF & WENDY SMITH,Expired,DY SMITH ...,"['\n\n10842 FM 2138 N', '\n\t\t\t JACKSONVI...",006017767C
9,JEFF SMITH,Active,8324354670,"['\n\n4338 HARVEY RD', '\n\t\t\t CROSBY,\x...",006495492C
