# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for **cosmetologist violations** for people with the last name **Nguyen**.

In [2]:
driver.get("https://www.tdlr.texas.gov/cimsfo/fosearch.asp")

In [3]:
driver.find_element_by_id("pht_status").send_keys("Cosmetologists")
driver.find_element_by_id("pht_lnm").send_keys("Nguyen")
button = driver.find_element_by_xpath("/html/body/div[1]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[18]/td/input[1]")
button.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [4]:
table_rows = driver.find_elements_by_tag_name("tr")
for row in table_rows[:10]:
    print(row.text.strip())
    print("----------")

Name and Location Order Basis for Order
----------
NGUYEN, HUNG VU
City: HOUSTON
County: HARRIS
Zip Code: 77086


License #: 730692

Complaint # COS20190011025 Date: 11/24/2020

Respondent is assessed an administrative penalty in the amount of $1,375. Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to discard single use implements after each use; Respondent failed to properly ventilate the salon to eliminate strong odors away from the public area; Respondent failed to keep all products properly labeled in compliance with OSHA requirements.
----------
NGUYEN, MIMI PHAM
City: KATY
County: HARRIS
Zip Code: 77449


License #: 784210

Complaint # COS20190010072 Date: 11/12/2020

Respondent is assessed an administrative penalty in the amount of $1,125. Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant so

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   # try to do something
except:
   # Instead of stopping on an error, it'll jump down here instead
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement. Most people use `pass`, but it's also nice to print out debug statements so you know when/where it's running into errors.

**Why doesn't the first one have a name?**

In [5]:
table_cells = driver.find_elements_by_tag_name("td")
table_cells[5].text.strip()

'Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.'

In [6]:
for row in table_rows:
    try:
        print(row.find_element_by_class_name("results_text").text.strip())
    except:
        print("Did not work")
#     for cell in table_cells[:10]:
        

Did not work
NGUYEN, HUNG VU
NGUYEN, MIMI PHAM
NGUYEN, HA
NGUYEN, THAO HONG
NGUYEN, MAI
NGUYEN, CINDY
NGUYEN, CHAU KHANH LINH
NGUYEN, TRANG T
NGUYEN, DUNG MINH
NGUYEN, YEN NHI THI
NGUYEN, JOHNNY DAT
NGUYEN, KELLY PHUONG N
NGUYEN, NGA THU
NGUYEN, IVY
NGUYEN, DIEMTRINH T
NGUYEN, HUAN CAO
NGUYEN, THOA KIM
NGUYEN, TONY
NGUYEN, HIEN
NGUYEN, NGOC TRAM
NGUYEN, TRAN NAM
NGUYEN, PHILLIP
NGUYEN, THUY T
NGUYEN, TRACY
NGUYEN, LE PHUC
NGUYEN, HAI MINH
NGUYEN, TUYET
NGUYEN, BA VAN
NGUYEN, LAN THANH
NGUYEN, PHUOC BA
NGUYEN, LINH THUY KIEU
NGUYEN, TONY MINH
NGUYEN, THANH
NGUYEN, HIEN THI MINH
NGUYEN, VAN NGOC THAO
NGUYEN, TAM THANH
NGUYEN, ANDY HUU
NGUYEN, CUONG HULL
NGUYEN, NANCY HOA
NGUYEN, THOA THI KIM
NGUYEN, TRANG
NGUYEN, THONG VAN
NGUYEN, TUAN
NGUYEN, LYNDA
NGUYEN, CATHY H
NGUYEN, PAMELA DAN
NGUYEN, SON QUOC
NGUYEN, KIM
NGUYEN, THAI VAN
NGUYEN, HANH THI
NGUYEN, DIEP THI NGOC
NGUYEN, CASEY
NGUYEN, ANTHONY VAN
NGUYEN, CHRISTINA D
NGUYEN, PHUONG T
NGUYEN, KENNY
NGUYEN, THAO (MINH THAO
NGUYEN, KENNY

## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...*

In [24]:
for row in table_rows:
    try:
        print(row.find_elements_by_tag_name("td")[2].text.strip())
    except:
        print("Did not work")
    print("--------")

Did not work
--------
Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to discard single use implements after each use; Respondent failed to properly ventilate the salon to eliminate strong odors away from the public area; Respondent failed to keep all products properly labeled in compliance with OSHA requirements.
--------
Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.
--------
Respondent failed to clean and sanitize four (4) whirlpool foot spas as required at the end of each day, constituting two (2) violations; Respondent failed to keep a record of the date and time of four (4) foot spas daily or bi-weekly cleaning and if the foot spas were not used, constituting two (2) violations.
--------
Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after e

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

In [31]:
for row in table_rows:
    print("Complaint #:")
    try:
        print(row.find_elements_by_tag_name("span")[10].text.strip())
    except:
        print("Did not work")
    print("--------")

Complaint #:
Did not work
--------
Complaint #:
COS20190011025
--------
Complaint #:
COS20190010072
--------
Complaint #:
COS20190016762
--------
Complaint #:
COS20200010387
--------
Complaint #:
COS20200007264
--------
Complaint #:
COS20200010502
--------
Complaint #:
COS20190008104
--------
Complaint #:
COS20200010511
--------
Complaint #:
COS20200004202
--------
Complaint #:
COS20190004199
--------
Complaint #:
COS20200000101
--------
Complaint #:
COS20200011664
--------
Complaint #:
COS20200010961
--------
Complaint #:
COS20200008858
--------
Complaint #:
COS20200008859
--------
Complaint #:
COS20200009732
--------
Complaint #:
COS20200006548
--------
Complaint #:
COS20200009605
--------
Complaint #:
COS20190016479
--------
Complaint #:
COS20190012148
--------
Complaint #:
COS20190010318
--------
Complaint #:
COS20190014688
--------
Complaint #:
COS20190004016
--------
Complaint #:
COS20190016499
--------
Complaint #:
COS20200006146
--------
Complaint #:
COS20190016549
--------
Com

## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [32]:
results = []
for row in table_rows:
    try:
        name = row.find_element_by_class_name("results_text").text.strip()
        v_description = row.find_elements_by_tag_name("td")[2].text.strip()
        v_number = row.find_elements_by_tag_name("span")[10].text.strip()
        lic_number = row.find_elements_by_tag_name("span")[8].text.strip()
        zip_code = row.find_elements_by_tag_name("span")[6].text.strip()
        county = row.find_elements_by_tag_name("span")[4].text.strip()
        city = row.find_elements_by_tag_name("span")[2].text.strip()
        result = {
            'name': name,
            'v_description': v_description,
            'v_number': v_number,
            'lic_number': lic_number,
            'zip_code': zip_code,
            'county': county,
            'city': city,
        }
        results.append(result)
    except:
        pass
results

[{'name': 'NGUYEN, HUNG VU',
  'v_description': 'Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to discard single use implements after each use; Respondent failed to properly ventilate the salon to eliminate strong odors away from the public area; Respondent failed to keep all products properly labeled in compliance with OSHA requirements.',
  'v_number': 'COS20190011025',
  'lic_number': '730692',
  'zip_code': '77086',
  'county': 'HARRIS',
  'city': 'HOUSTON'},
 {'name': 'NGUYEN, MIMI PHAM',
  'v_description': 'Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.',
  'v_number': 'COS20190010072',
  'lic_number': '784210',
  'zip_code': '77449',
  'county': 'HARRIS',
  'city': 'KATY'},
 {'name': 'NGUYEN, HA',
  'v_description': 'Respondent failed to clean and sanitize four (4) whirlpool f

### Save that to a CSV

- Tip: Use `pd.DataFrame` to create a dataframe, and then save it to a CSV.

In [33]:
import pandas as pd
df = pd.DataFrame(results)
df.head()



Unnamed: 0,name,v_description,v_number,lic_number,zip_code,county,city
0,"NGUYEN, HUNG VU",Respondent failed properly clean and sanitize ...,COS20190011025,730692,77086,HARRIS,HOUSTON
1,"NGUYEN, MIMI PHAM",Respondent failed properly clean and sanitize ...,COS20190010072,784210,77449,HARRIS,KATY
2,"NGUYEN, HA",Respondent failed to clean and sanitize four (...,COS20190016762,764888,76017,TARRANT,ARLINGTON
3,"NGUYEN, THAO HONG","Respondent failed to clean, disinfect, and ste...",COS20200010387,"799926, 1753491",78238,BEXAR,SAN ANTONIO
4,"NGUYEN, MAI","Respondent failed to clean, disinfect, and ste...",COS20200007264,687294,75150,DALLAS,MESQUITE


In [34]:
df.to_csv("texas_cosmet.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [35]:
df1 = pd.read_csv("texas_cosmet.csv")
df1.head()

Unnamed: 0,name,v_description,v_number,lic_number,zip_code,county,city
0,"NGUYEN, HUNG VU",Respondent failed properly clean and sanitize ...,COS20190011025,730692,77086,HARRIS,HOUSTON
1,"NGUYEN, MIMI PHAM",Respondent failed properly clean and sanitize ...,COS20190010072,784210,77449,HARRIS,KATY
2,"NGUYEN, HA",Respondent failed to clean and sanitize four (...,COS20190016762,764888,76017,TARRANT,ARLINGTON
3,"NGUYEN, THAO HONG","Respondent failed to clean, disinfect, and ste...",COS20200010387,"799926, 1753491",78238,BEXAR,SAN ANTONIO
4,"NGUYEN, MAI","Respondent failed to clean, disinfect, and ste...",COS20200007264,687294,75150,DALLAS,MESQUITE


## Let's do this an easier way

Use Selenium and `pd.read_html` to get the table as a dataframe.

In [37]:
import lxml
from bs4 import BeautifulSoup

In [42]:
table = pd.read_html(driver.page_source)
table[0]

Unnamed: 0,Name and Location,Order,Basis for Order
0,"NGUYEN, HUNG VU City: HOUSTON County: HARRIS Z...",Date: 11/24/2020Respondent is assessed an admi...,Respondent failed properly clean and sanitize ...
1,"NGUYEN, MIMI PHAM City: KATY County: HARRIS Zi...",Date: 11/12/2020Respondent is assessed an admi...,Respondent failed properly clean and sanitize ...
2,"NGUYEN, HA City: ARLINGTON County: TARRANT Zip...",Date: 11/12/2020Respondent is assessed an admi...,Respondent failed to clean and sanitize four (...
3,"NGUYEN, THAO HONG City: SAN ANTONIO County: BE...",Date: 11/12/2020Respondent is assessed an admi...,"Respondent failed to clean, disinfect, and ste..."
4,"NGUYEN, MAI City: MESQUITE County: DALLAS Zip ...",Date: 10/29/2020Respondent is assessed an admi...,"Respondent failed to clean, disinfect, and ste..."
...,...,...,...
154,"NGUYEN, SHARON City: BASTROP County: BASTROP Z...",Date: 9/17/2018Respondent is assessed an admin...,Respondent leased space in a salon to an indiv...
155,"NGUYEN, BINH THANH City: LAREDO County: WEBB Z...",Date: 9/12/2018Respondent is assessed an admin...,Respondent leased space in a salon to an indiv...
156,"NGUYEN, SAMANTHA TRAN City: MCKINNEY County: C...",Date: 9/4/2018Respondent is assessed an admini...,"Respondent failed to clean, disinfect, and ste..."
157,"NGUYEN, THU LE City: SAN ANTONIO County: BEXAR...",Date: 8/7/2018The Respondent's Cosmetology Man...,Respondent failed to comply with an order prev...
