# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [103]:
import requests
from selenium import webdriver

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for **cosmetologist violations** for people with the last name **Nguyen**.

In [104]:
driver = webdriver.Chrome()
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [105]:
textbox = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[3]/td/select')


In [106]:
textbox.send_keys('Cosmetologists')

In [107]:
name_textbox= driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[7]/td/p/input')

In [108]:
name_textbox.send_keys('Nguyen')

In [109]:
driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div/div/section/div/div/table/tbody/tr/td/form/table/tbody/tr[18]/td/input[1]').click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [110]:
table = driver.find_element_by_xpath('/html/body/div[1]/div/div[2]/div/div/section/div/div/table/tbody')


rows = table.find_elements_by_tag_name('tr')

for row in rows:
    print(row.text)



Name and Location Order Basis for Order
NGUYEN, MIMI PHAM
City: KATY
County: HARRIS
Zip Code: 77449


License #: 784210

Complaint # COS20190010072 Date: 11/12/2020

Respondent is assessed an administrative penalty in the amount of $1,125. Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.
NGUYEN, HA
City: ARLINGTON
County: TARRANT
Zip Code: 76017


License #: 764888

Complaint # COS20190016762 Date: 11/12/2020

Respondent is assessed an administrative penalty in the amount of $2,250. Respondent failed to clean and sanitize four (4) whirlpool foot spas as required at the end of each day, constituting two (2) violations; Respondent failed to keep a record of the date and time of four (4) foot spas daily or bi-weekly cleaning and if the foot spas were not used, constituting two (2) violations.
NGUYEN, THAO HONG
City: SAN ANTONIO
County: BEXAR
Zip

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   # try to do something
except:
   # Instead of stopping on an error, it'll jump down here instead
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement. Most people use `pass`, but it's also nice to print out debug statements so you know when/where it's running into errors.

**Why doesn't the first one have a name?**

In [111]:
for row in rows[:10]:
    try:
        name = row.find_element_by_class_name('results_text')
        print(name.text)
    except:
        pass
     

NGUYEN, MIMI PHAM
NGUYEN, HA
NGUYEN, THAO HONG
NGUYEN, CINDY
NGUYEN, CHAU KHANH LINH
NGUYEN, TRANG T
NGUYEN, DUNG MINH
NGUYEN, YEN NHI THI
NGUYEN, JOHNNY DAT


## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...*

In [112]:
for row in rows[1:11]:
    cells = row.find_elements_by_tag_name('td')[2]
    print(cells.text)
    print('-------')
    

    

Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.
-------
Respondent failed to clean and sanitize four (4) whirlpool foot spas as required at the end of each day, constituting two (2) violations; Respondent failed to keep a record of the date and time of four (4) foot spas daily or bi-weekly cleaning and if the foot spas were not used, constituting two (2) violations.
-------
Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use.
-------
Respondent failed to clean and disinfect all wax pots; Respondent failed to properly clean multi-use items prior to each service.
-------
Respondent engaged in fraud or deceit in obtaining a certificate, license, or permit.
-------
Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to wipe clean and disinfect el

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

In [113]:
for row in rows[0:10]: 
    try:
        name = row.find_elements_by_class_name('results_text')
        print(name[5].text)
    except:
        pass
     

COS20190010072
COS20190016762
COS20200010387
COS20200010502
COS20190008104
COS20200010511
COS20200004202
COS20190004199
COS20200000101


## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [114]:
dictionary_list = []
for row in rows[:10]:
    try:
        dictionary = {}
        name = row.find_element_by_class_name('results_text')
        dictionary['name'] = name.text
        description = row.find_elements_by_tag_name('td')[2]
        dictionary['description'] = description.text
        violation_number = row.find_elements_by_class_name('results_text')
        dictionary['violation number'] = violation_number[5].text
        dictionary['license number'] = violation_number[4].text
        dictionary['zip code'] = violation_number[3].text
        dictionary['county'] = violation_number[2].text
        dictionary['city'] = violation_number[1].text
        dictionary_list.append(dictionary)
    except:
        pass
print(dictionary_list)
        

[{'name': 'NGUYEN, MIMI PHAM', 'description': 'Respondent failed properly clean and sanitize the metal implements used at the Salon; Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution.', 'violation number': 'COS20190010072', 'license number': '784210', 'zip code': '77449', 'county': 'HARRIS', 'city': 'KATY'}, {'name': 'NGUYEN, HA', 'description': 'Respondent failed to clean and sanitize four (4) whirlpool foot spas as required at the end of each day, constituting two (2) violations; Respondent failed to keep a record of the date and time of four (4) foot spas daily or bi-weekly cleaning and if the foot spas were not used, constituting two (2) violations.', 'violation number': 'COS20190016762', 'license number': '764888', 'zip code': '76017', 'county': 'TARRANT', 'city': 'ARLINGTON'}, {'name': 'NGUYEN, THAO HONG', 'description': 'Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use.'

### Save that to a CSV

- Tip: Use `pd.DataFrame` to create a dataframe, and then save it to a CSV.

In [115]:
import pandas as pd 

In [116]:
df = pd.DataFrame(dictionary_list)
df.to_csv('nguyen_table', index= False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [117]:
df = pd.read_csv('nguyen_table')
df

Unnamed: 0,name,description,violation number,license number,zip code,county,city
0,"NGUYEN, MIMI PHAM",Respondent failed properly clean and sanitize ...,COS20190010072,784210,77449,HARRIS,KATY
1,"NGUYEN, HA",Respondent failed to clean and sanitize four (...,COS20190016762,764888,76017,TARRANT,ARLINGTON
2,"NGUYEN, THAO HONG","Respondent failed to clean, disinfect, and ste...",COS20200010387,"799926, 1753491",78238,BEXAR,SAN ANTONIO
3,"NGUYEN, CINDY",Respondent failed to clean and disinfect all w...,COS20200010502,"806232, 1260359, 1280071",78414,NUECES,CORPUS CHRISTI
4,"NGUYEN, CHAU KHANH LINH",Respondent engaged in fraud or deceit in obtai...,COS20190008104,1764073,36116,OUT OF STATE,MONTGOMERY
5,"NGUYEN, TRANG T",Respondent failed properly clean and sanitize ...,COS20200010511,748483,78155,GUADALUPE,SEGUIN
6,"NGUYEN, DUNG MINH",Respondent failed properly clean and sanitize ...,COS20200004202,785878,77066,HARRIS,HOUSTON
7,"NGUYEN, YEN NHI THI",Respondent failed to clean and disinfect all w...,COS20190004199,763645,78717,TRAVIS,AUSTIN
8,"NGUYEN, JOHNNY DAT",Respondent failed to clean and sanitize whirlp...,COS20200000101,797651,78574,HIDALGO,MISSION


## Let's do this an easier way

Use Selenium and `pd.read_html` to get the table as a dataframe.

In [118]:
driver = webdriver.Chrome()
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp')

pd.read_html(table.get_attribute('outerHTML'))





ValueError: No tables found