# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait 
import pandas as pd

driver = webdriver.Chrome()

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [2]:
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [3]:
dropdown = driver.find_element_by_name('pht_status')
select = Select(dropdown)
select.select_by_visible_text('Cosmetologists')

In [4]:
last_name = driver.find_element_by_name('pht_lnm')
last_name.send_keys("Nguyen")

In [5]:
button = driver.find_element_by_xpath('//*[@id="dat-menu"]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[18]/td/input[1]')
button.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [6]:
listings = driver.find_elements_by_tag_name('tr')

for listing in listings[:10]:
    print("----------")
    print(listing.text)

----------
Name and Location Order Basis for Order
----------
NGUYEN, TOAN HUU
City: SAN ANTONIO
County: BEXAR
Zip Code: 78217


License #(s): 780948, 1706491, 1699123

Complaint # COS20180004289 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $500. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
----------
NGUYEN, HANH CONG
City: EL PASO
County: EL PASO
Zip Code: 79934


License #: 737708

Complaint # COS20180006594 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $1,000. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
----------
NGUYEN, KHIEM VAN
City: LONGVIEW
County: GREGG
Zip Code: 75604


License #: 731665

Complaint # COS20180000257 Date: 5/17/2018

Respondent is assessed an administrati

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

In [7]:
for listing in listings:
    try:
        print("----------")
        print(listing.find_elements_by_class_name("results_text")[0].text)
    except:
        pass

----------
----------
NGUYEN, TOAN HUU
----------
NGUYEN, HANH CONG
----------
NGUYEN, KHIEM VAN
----------
NGUYEN, DIEP THI NGOC


## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...

In [8]:
for listing in listings:
    try:
        print("----------")
        print(listing.find_elements_by_tag_name("td")[2].text)
    except:
        pass

----------
----------
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
----------
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
----------
Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect all wax pots.
----------
Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution; Respondent failed to disinfect multi-use equipment, implements, and tools prior to use on each client.


## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

In [9]:
for listing in listings:
    try:
        print("----------")
        print(listing.find_elements_by_class_name("results_text")[5].text)
    except:
        pass

----------
----------
COS20180004289
----------
COS20180006594
----------
COS20180000257
----------
COS20180004915


## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [26]:
rows = []

for listing in listings:
    row = {}
    try:
        
        #Person's name
        
        name = listing.find_elements_by_class_name("results_text")[0].text
        #print(name)
        row['name'] = name
        
        #Violation description
        
        violation_description = listing.find_elements_by_tag_name("td")[2].text
        #print(violation_description)
        row['violation_description'] = violation_description
        
        #Violation number
        
        violation_number = listing.find_elements_by_class_name("results_text")[5].text
        #print(violation_number)
        row['violation_number'] = violation_number
        
        #License Numbers
        
        license_number = listing.find_elements_by_class_name("results_text")[4].text
        #print(license_number)
        row['license_number'] = license_number
        
        #Zip Code
        
        zip_code = listing.find_elements_by_class_name("results_text")[3].text
        #print(zip_code)
        row['zip_code'] = zip_code
        
        #County
        
        county = listing.find_elements_by_class_name("results_text")[2].text
        #print(county)
        row['county'] = county
        
        #City
        
        city = listing.find_elements_by_class_name("results_text")[1].text
        #print(city)
        row['city'] = city
        
        rows.append(row)
        
    except:
          pass


rows

[{'name': 'NGUYEN, TOAN HUU',
  'violation_description': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.',
  'violation_number': 'COS20180004289',
  'license_number': '780948, 1706491, 1699123',
  'zip_code': '78217',
  'county': 'BEXAR',
  'city': 'SAN ANTONIO'},
 {'name': 'NGUYEN, HANH CONG',
  'violation_description': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.',
  'violation_number': 'COS20180006594',
  'license_number': '737708',
  'zip_code': '79934',
  'county': 'EL PASO',
  'city': 'EL PASO'},
 {'name': 'NGUYEN, KHIEM VAN',
  'violation_description': 'Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; 

### Save that to a CSV

- Tip: You'll want to use pandas here

In [28]:
df = pd.DataFrame(rows)

df

df.to_csv("texas-cosmetology-violations.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.