# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [83]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from webdriver_manager.chrome import ChromeDriverManager

import pandas as pd

In [84]:
driver = webdriver.Chrome(ChromeDriverManager().install())



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Driver [/Users/jmingram/.wdm/drivers/chromedriver/mac64/96.0.4664.45/chromedriver] found in cache
  driver = webdriver.Chrome(ChromeDriverManager().install())


## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [87]:
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [88]:
lname_input = driver.find_element(By.ID, 'pht_lnm')
lname_input.send_keys('Nguyen')

In [89]:
driver.find_element(By.XPATH, '/html/body/div[1]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[18]/td/input[1]').click()

## Scraping

Once you are on the results page, do this. **I step you through things bit by bit, so it's going to be a little different than we did in class.** Also, no `pd.read_html` allowed because this isn't actual tabular data!

> You can use either Selenium by itself or Selenium+BeautifulSoup to scrape the results page. The choice is up to you!

### Loop through each result and print the entire row

Okay wait, maybe not, i's a heck of a lot of rows. Use `[:10]` to only do the first ten! For example, if you saved the table rows into `results` you might do something like this:

```python
for result in results[:10]:
    print(result)
```

Although you'd want to print out the text from the row (I give example output below).

> *Tip: If you're using Selenium, `By.TAG_NAME` is used if you don't have a class or ID. If you're using BeautifulSoup, just do your normal thing.*

In [90]:
results = driver.find_elements(By.TAG_NAME, 'tr')

In [74]:
for result in results[:10]:
    print(result.text)

Name and Location Order Basis for Order
NGUYEN, THANH
City: FRISCO
County: COLLIN
Zip Code: 75034


License #: 790672

Complaint # COS20210004784 Date: 11/16/2021

Respondent is assessed an administrative penalty in the amount of $1,875. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
NGUYEN, LONG D
City: SAN SABA
County: SAN SABA
Zip Code: 76877


License #(s): 760420, 1620583

Complaint # COS20210009745 Date: 11/16/2021

Respondent is assessed an administrative penalty in the amount of $1,550. Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables pr

The result should look something like this:

```
Name and Location Order Basis for Order
NGUYEN, THANH
City: FRISCO
County: COLLIN
Zip Code: 75034


License #: 790672

Complaint # COS20210004784 Date: 11/16/2021

Respondent is assessed an administrative penalty in the amount of $1,875. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
NGUYEN, LONG D
City: SAN SABA
County: SAN SABA
Zip Code: 76877
```

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   # try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything when there's an error, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

Output should look like this:

```
Doesn't have a name
NGUYEN, THANH
NGUYEN, LONG D
NGUYEN, LUCIE HUONG
NGUYEN, CHINH
NGUYEN, JIMMY
```

* *Tip: The name has a class you can use. The class name is reused in a lot of places, but because it's the first one you don't have to worry about that!*
* *Tip: Instead of searching across the entire page – `driver.find_element` or `doc.select_one` – you should be doing your searching just inside of each **row** (I used this technique in the beginning of class with BeautifulSoup when we were scraping the books page)* 

In [75]:
for result in results:
    try:
        print(result.find_element(By.CLASS_NAME, 'results_text').text)
    except:
        pass

NGUYEN, THANH
NGUYEN, LONG D
NGUYEN, LUCIE HUONG
NGUYEN, CHINH
NGUYEN, JIMMY
NGUYEN, NAM
NGUYEN, DUC
NGUYEN, THU THAO THI
NGUYEN, MINH NHU
NGUYEN, DUNG VAN
NGUYEN, TINH
NGUYEN, HANG NU THANH
NGUYEN, LAN N
NGUYEN, TIEP HUY
NGUYEN, DUNG
NGUYEN, THAI VAN
NGUYEN, THI THAO UYEN
NGUYEN, THANH TRONG
NGUYEN, KHANH V
NGUYEN, DUC VAN
NGUYEN, THI HOAN CHAU
NGUYEN, PHUCHUNG
NGUYEN, KIM
NGUYEN, HUE VAN
NGUYEN, HOAI THU T
NGUYEN, NGHIA T
NGUYEN, HANH THI
NGUYEN, HUYEN
NGUYEN, CHRISTINA M
NGUYEN, THI THIBICH
NGUYEN, KIM AN THI
NGUYEN, VIVIAN KIM
NGUYEN, LONG
NGUYEN, MAI
NGUYEN, CHINH THI
NGUYEN, THU HA
NGUYEN, HA THI THU
NGUYEN, CHAU
NGUYEN, TONY THANH
NGUYEN, MIMI PHAM
NGUYEN, HA
NGUYEN, THAO HONG
NGUYEN, CINDY
NGUYEN, CHAU KHANH LINH
NGUYEN, TRANG T
NGUYEN, DUNG MINH
NGUYEN, YEN NHI THI
NGUYEN, JOHNNY DAT
NGUYEN, KELLY PHUONG N
NGUYEN, NGA THU
NGUYEN, IVY
NGUYEN, NHO HONG THI
NGUYEN, TAMMY
NGUYEN, DIEMTRINH T
NGUYEN, HUAN CAO
NGUYEN, THOA KIM
NGUYEN, TONY
NGUYEN, TAMMY
NGUYEN, HIEN
NGUYEN, NGOC TRA

## Loop through each result, printing each violation description ("Basis for order")

Your results should look something like:

```
Doesn't have a violation
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.
...
```

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: If you're using Selenium by itself, you can get the HTML of something by doing `.get_attribute('innerHTML')` – that way it'll look like BeautifulSoup when you print it. It might help you diagnose your issue!*
> - *Tip: Or I guess you could just skip the one with the problem...*

In [76]:
for result in results:
    try:
        print(result.find_elements(By.TAG_NAME, 'td')[2].text)
    except:
        print("Doesn't have a violation")

Doesn't have a violation
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used; Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to

Respondent failed to clean and disinfect facial chairs and beds, including headrest, prior to providing service to each client
Respondent leased space in a salon to an individual who engaged in the practice of cosmetology but had not obtained a cosmetology license.
Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to have chairs made of or covered in a non-porous material so that they can be disinfected.
Respondent tested positive for a prohibited substance before a bout.
Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.
Respondent failed to clean and sanitize four (4) whirlpool foot spas as required at the end of each day, constituting two (2) violations; Respondent failed to keep a record of the date and time of four (4) foot spas daily or bi-weekly cleaning and if the foot spas were not 

Respondent possessed or used a prohibited implement, specifically a blade or cutting tool intended for the purpose of removing corns or calluses.
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used.
Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to disinfect the salon's whirlpool foot spa basins with an EPA-registered disinfectant.
Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used; Respondent failed to dispose of single use items after each use.
Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to keep all products properly labeled in compliance with OSHA requirements; Respondent failed to ma

## Loop through each result, printing the complaint number

Output should look like this:

```
Doesn't have a complaint number
COS20210004784
COS20210009745
COS20210011484
...
```

- *Tip: Think about the order of the elements. Can you count from the opposite direction than you normally do?*

In [77]:
for result in results:
    try:
        cell = result.find_elements(By.TAG_NAME, 'td')[0]
        print(cell.find_elements(By.CLASS_NAME, 'results_text')[-1].text)
    except:
        print("Doesn't have a violation")

Doesn't have a violation
COS20210004784
COS20210009745
COS20210011484
COS20210011721
COS20200007069
COS20210010530
COS20200007141
COS20200000839
COS20210009714
COS20210005838
COS20200010129
COS20210003441
COS20200009912
COS20210004018
COS20210000935
COS20210003238
COS20200005799
COS20210004209
COS20200012887
COS20190014959
COS20190011954
COS20200016795
COS20200016258
COS20200017028
COS20200016091
COS20210001519
COS20200012575
COS20200007138
COS20200010817
COS20190012445
COS20190010554
COS20200009961
COS20200011967
COS20200007264
COS20200009985
COS20200014601
COS20200015200
COS20210002574
BOX20200006941
COS20190010072
COS20190016762
COS20200010387
COS20200010502
COS20190008104
COS20200010511
COS20200004202
COS20190004199
COS20200000101
COS20200011664
COS20200010961
COS20200008858
MAS20190005842
MAS20190013644
COS20200008859
COS20200009732
COS20200006548
COS20200009605
MAS20190013644
COS20190016479
COS20190012148
COS20190010318
COS20190014688
COS20190004016
COS20190016499
COS20200006146


## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

Based on what you print out, the output might look something like:

```
This row is broken: Name and Location Order Basis for Order
{'name': 'NGUYEN, THANH', 'city': 'FRISCO', 'county': 'COLLIN', 'zip_code': '75034', 'complaint_no': 'COS20210004784', 'license_numbers': '790672', 'complaint': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.'}
{'name': 'NGUYEN, LONG D', 'city': 'SAN SABA', 'county': 'SAN SABA', 'zip_code': '76877', 'complaint_no': 'COS20210009745', 'license_numbers': '760420, 1620583', 'complaint': 'Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.'}
```

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [95]:
rows = []
for result in results:
    try:
        row_dict = {}
        row_dict['name'] = result.find_element(By.CLASS_NAME, 'results_text').text
        row_dict['city'] = result.find_elements(By.CLASS_NAME, 'results_text')[1].text
        row_dict['county'] = result.find_elements(By.CLASS_NAME, 'results_text')[2].text
        row_dict['zip_code'] = result.find_elements(By.CLASS_NAME, 'results_text')[3].text
        row_dict['complaint_no'] = result.find_elements(By.CLASS_NAME, 'results_text')[5].text
        row_dict['license_numbers'] = result.find_elements(By.CLASS_NAME, 'results_text')[4].text
        row_dict['complaint'] = result.find_elements(By.TAG_NAME, 'td')[2].text
        rows.append(row_dict)
    except:
        pass

In [96]:
rows

[{'name': 'NGUYEN, THANH',
  'city': 'FRISCO',
  'county': 'COLLIN',
  'zip_code': '75034',
  'complaint_no': 'COS20210004784',
  'license_numbers': '790672',
  'complaint': 'Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day, the Department is charging 2 violations; Respondent operated a cosmetology salon without the appropriate license.'},
 {'name': 'NGUYEN, LONG D',
  'city': 'SAN SABA',
  'county': 'SAN SABA',
  'zip_code': '76877',
  'complaint_no': 'COS20210009745',
  'license_numbers': '760420, 1620583',
  'complaint': 'Respondent failed to keep a record of the date and time of each foot spa daily or bi-weekly cleaning and if the foot spa was not used, the Department is charging 2 violations; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect manicure tables prior to use for each client.'},
 {'name': 'NGUYEN, LUCIE HUONG',
  'city': 'UVALDE',
  

### Save that to a CSV named `output.csv`

The dataframe should look something like...

|index|name|city|county|zip_code|complaint_no|license_numbers|complaint|
|---|---|---|---|---|---|---|---|
|0|NGUYEN, THANH|FRISCO|COLLIN|75034|COS20210004784|790672|Respondent failed to clean and sanitize whirlp...|
|1|NGUYEN, LONG D|SAN SABA|SAN SABA|76877|COS20210009745|760420, 1620583|Respondent failed to keep a record of the date...|


- *Tip: If you send a list of dictionaries to `pd.DataFrame(...)`, it will create a dataframe out of that list!*

In [99]:
nguyen_violations = pd.DataFrame(rows)

In [101]:
nguyen_violations.head()

Unnamed: 0,name,city,county,zip_code,complaint_no,license_numbers,complaint
0,"NGUYEN, THANH",FRISCO,COLLIN,75034,COS20210004784,790672,Respondent failed to clean and sanitize whirlp...
1,"NGUYEN, LONG D",SAN SABA,SAN SABA,76877,COS20210009745,"760420, 1620583",Respondent failed to keep a record of the date...
2,"NGUYEN, LUCIE HUONG",UVALDE,UVALDE,78801,COS20210011484,"762626, 1811788",Respondent failed to keep a record of the date...
3,"NGUYEN, CHINH",TEMPLE,BELL,76502,COS20210011721,777067,Respondent failed to follow whirlpool foot spa...
4,"NGUYEN, JIMMY",ROWLETT,DALLAS,75088,COS20200007069,796773,Respondent failed to clean and sanitize whirlp...


In [104]:
nguyen_violations.to_csv('output.csv', index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [105]:
output = pd.read_csv('output.csv')
output.head()

Unnamed: 0,name,city,county,zip_code,complaint_no,license_numbers,complaint
0,"NGUYEN, THANH",FRISCO,COLLIN,75034,COS20210004784,790672,Respondent failed to clean and sanitize whirlp...
1,"NGUYEN, LONG D",SAN SABA,SAN SABA,76877,COS20210009745,"760420, 1620583",Respondent failed to keep a record of the date...
2,"NGUYEN, LUCIE HUONG",UVALDE,UVALDE,78801,COS20210011484,"762626, 1811788",Respondent failed to keep a record of the date...
3,"NGUYEN, CHINH",TEMPLE,BELL,76502,COS20210011721,777067,Respondent failed to follow whirlpool foot spa...
4,"NGUYEN, JIMMY",ROWLETT,DALLAS,75088,COS20200007069,796773,Respondent failed to clean and sanitize whirlp...
