# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [1]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [2]:
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [3]:
lastname = driver.find_element_by_name('pht_lnm')
lastname.send_keys('nguyen')

In [4]:
#pht-status
profession = Select(driver.find_element_by_name('pht_status'))
profession.select_by_visible_text('Cosmetologists')

In [5]:
button = driver.find_element_by_xpath('//*[@id="dat-menu"]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[18]/td/input[1]')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [9]:
#//*[@id="dat-menu"]/div/div[2]/div/div/section/div/div/table/tbody/tr[2]/td[1]
spaguys = driver.find_elements_by_tag_name('td')
for guy in spaguys[:30]:
    print(guy.text)

NGUYEN, TOAN HUU
City: SAN ANTONIO
County: BEXAR
Zip Code: 78217


License #(s): 780948, 1706491, 1699123

Complaint # COS20180004289
Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $500.
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
NGUYEN, HANH CONG
City: EL PASO
County: EL PASO
Zip Code: 79934


License #: 737708

Complaint # COS20180006594
Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $1,000.
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
NGUYEN, KHIEM VAN
City: LONGVIEW
County: GREGG
Zip Code: 75604


License #: 731665

Complaint # COS20180000257
Date: 5/17/2018

Respondent is assessed an administrative penalty in the amount of $1,250.
Respondent failed to follow whirlpool foot spas 

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

In [19]:
#//*[@id="dat-menu"]/div/div[2]/div/div/section/div/div/table/tbody/tr[2]/td[1]/span[1]
for guy in spaguys:
    print(guy)

<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-1")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-2")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-3")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-4")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-5")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-6")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.5575122002849555-7")>
<selenium.webdriver.remote.webelement.WebElement (session="3d9f04523080f2fd7c38d2fda01cb2ec", element="0.557512

## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

### Save that to a CSV

- Tip: You'll want to use pandas here

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.