# Texas Cosmetologist Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for cosmetologists!

## Setup: Import what you'll need to scrape the page

We'll be using Selenium for this, *not* BeautifulSoup and requests.

In [13]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd

## Starting your search

Starting from [here](https://www.tdlr.texas.gov/cimsfo/fosearch.asp), search for cosmetologist violations for people with the last name **Nguyen**.

In [14]:
#Load the page
driver = webdriver.Chrome()
driver.get('https://www.tdlr.texas.gov/cimsfo/fosearch.asp')

In [15]:
select = Select(driver.find_element_by_name('pht_status'))
select.select_by_visible_text('Cosmetologists')

In [16]:
last_name = driver.find_element_by_name('pht_lnm')
last_name.send_keys('Nguyen')


In [17]:
search=driver.find_element_by_name('B1')
search.click()

## Scraping

Once you are on the results page, do this.

### Loop through each result and print the entire row

Okay wait, that's a heck of a lot. Use `[:10]` to only do the first ten (`listname[:10]` gives you the first ten).

In [18]:
cosmos = driver.find_elements_by_tag_name('tr')
#print(cosmos[:10])
for cosmo in cosmos[:10]:
     print(cosmo.text)

Name and Location Order Basis for Order
NGUYEN, TOAN HUU
City: SAN ANTONIO
County: BEXAR
Zip Code: 78217


License #(s): 780948, 1706491, 1699123

Complaint # COS20180004289 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $500. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
NGUYEN, HANH CONG
City: EL PASO
County: EL PASO
Zip Code: 79934


License #: 737708

Complaint # COS20180006594 Date: 5/30/2018

Respondent is assessed an administrative penalty in the amount of $1,000. Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
NGUYEN, KHIEM VAN
City: LONGVIEW
County: GREGG
Zip Code: 75604


License #: 731665

Complaint # COS20180000257 Date: 5/17/2018

Respondent is assessed an administrative penalty in the amount of $1,250. Responde

### Loop through each result and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen?! If you want to ignore an error, you use code like this:

```python
try:
   try to do something
except:
   print("It didn't work')
```

It should help you out. If you don't want to print anything, you can type `pass` instead of the `print` statement.

**Why doesn't the first one have a name?**

In [19]:
for cosmo in cosmos[:10]:
    print('----')
    try:
        info_name = cosmo.find_elements_by_class_name('results_text')
        print(info_name[0].text)
    except:
        pass

----
----
NGUYEN, TOAN HUU
----
NGUYEN, HANH CONG
----
NGUYEN, KHIEM VAN
----
NGUYEN, DIEP THI NGOC
----
NGUYEN, LAN T-THUY
----
NGUYEN, TUAN A
----
NGUYEN, THAO B
----
NGUYEN, BETH MARIA
----
NGUYEN, TRUNG N


## Loop through each result, printing each violation description ("Basis for order")

> - *Tip: You'll get an error even if you're ALMOST right - which row is causing the problem?*
> - *Tip: You can get the HTML of something by doing `.get_attribute('innerHTML')` - it might help you diagnose your issue.*
> - *Tip: Or I guess you could just skip the one with the problem...

In [20]:
for cosmo in cosmos[1:10]:
    print('----')
    info_desc = cosmo.find_elements_by_tag_name('td')
    print('The violation description is:', info_desc[2].text)


----
The violation description is: Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day.
----
The violation description is: Respondent failed to clean and sanitize whirlpool foot spas as required at the end of each day; Respondent failed to use items subject to possible cross contamination in a manner that does not contaminate the remaining product.
----
The violation description is: Respondent failed to follow whirlpool foot spas cleaning and sanitization procedures as required; Respondent failed to clean, disinfect, and sterilize manicure and pedicure implements after each use; Respondent failed to clean and disinfect all wax pots.
----
The violation description is: Respondent failed to disinfect tools, implements, and supplies with an EPA-registered disinfectant solution; Respondent failed to disinfect multi-use equipment, implements, and tools prior to use on each client.
----
The violation description is: Respondent failed to clean, disinf

## Loop through each result, printing the complaint number

- TIP: Think about the order of the elements

In [21]:
for cosmo in cosmos[1:10]:
    print('----')
    try:
        info_comp = cosmo.find_elements_by_class_name('results_text')
        print(info_comp[-2].text)
    except:
        pass

----
COS20180004289
----
COS20180006594
----
COS20180000257
----
COS20180004915
----
COS20180009255
----
COS20140018343
----
COS20180008846
----
COS20180000897
----
COS20170023893


## Saving the results

### Loop through each result to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number
- License Numbers
- Zip Code
- County
- City

Create a new dictionary for each result (except the header).

> *Tip: If you want to ask for the "next sibling," you can't use `find_next_sibling` in Selenium, you need to use `element.find_element_by_xpath("following-sibling::div")` to find the next div, or `element.find_element_by_xpath("following-sibling::*")` to find the next anything.

In [22]:
rows=[]
loading = 0
for cosmo in cosmos[1:]:
    info_name = cosmo.find_elements_by_class_name('results_text')
    info_desc = cosmo.find_elements_by_tag_name('td')
    row={}
    row['Name'] = info_name[0].text
    row['Violation Description'] = info_desc[2].text
    row['Violation number'] = info_name[-2].text
    row['License Numbers']= info_name[-3].text
    row['Zip Code'] = info_name[3].text
    row['County']= info_name[2].text
    row['City'] = info_name[1].text
    rows.append(row)
    loading = loading + 1
    print('working on entry number', loading, 'out of ', len(cosmos))
print(rows[:3])

working on entry number 1 out of  531
working on entry number 2 out of  531
working on entry number 3 out of  531
working on entry number 4 out of  531
working on entry number 5 out of  531
working on entry number 6 out of  531
working on entry number 7 out of  531
working on entry number 8 out of  531
working on entry number 9 out of  531
working on entry number 10 out of  531
working on entry number 11 out of  531
working on entry number 12 out of  531
working on entry number 13 out of  531
working on entry number 14 out of  531
working on entry number 15 out of  531
working on entry number 16 out of  531
working on entry number 17 out of  531
working on entry number 18 out of  531
working on entry number 19 out of  531
working on entry number 20 out of  531
working on entry number 21 out of  531
working on entry number 22 out of  531
working on entry number 23 out of  531
working on entry number 24 out of  531
working on entry number 25 out of  531
working on entry number 26 out of 

working on entry number 209 out of  531
working on entry number 210 out of  531
working on entry number 211 out of  531
working on entry number 212 out of  531
working on entry number 213 out of  531
working on entry number 214 out of  531
working on entry number 215 out of  531
working on entry number 216 out of  531
working on entry number 217 out of  531
working on entry number 218 out of  531
working on entry number 219 out of  531
working on entry number 220 out of  531
working on entry number 221 out of  531
working on entry number 222 out of  531
working on entry number 223 out of  531
working on entry number 224 out of  531
working on entry number 225 out of  531
working on entry number 226 out of  531
working on entry number 227 out of  531
working on entry number 228 out of  531
working on entry number 229 out of  531
working on entry number 230 out of  531
working on entry number 231 out of  531
working on entry number 232 out of  531
working on entry number 233 out of  531


working on entry number 414 out of  531
working on entry number 415 out of  531
working on entry number 416 out of  531
working on entry number 417 out of  531
working on entry number 418 out of  531
working on entry number 419 out of  531
working on entry number 420 out of  531
working on entry number 421 out of  531
working on entry number 422 out of  531
working on entry number 423 out of  531
working on entry number 424 out of  531
working on entry number 425 out of  531
working on entry number 426 out of  531
working on entry number 427 out of  531
working on entry number 428 out of  531
working on entry number 429 out of  531
working on entry number 430 out of  531
working on entry number 431 out of  531
working on entry number 432 out of  531
working on entry number 433 out of  531
working on entry number 434 out of  531
working on entry number 435 out of  531
working on entry number 436 out of  531
working on entry number 437 out of  531
working on entry number 438 out of  531


### Save that to a CSV

- Tip: You'll want to use pandas here

In [23]:
df=pd.DataFrame(rows)
df.head(15)

Unnamed: 0,City,County,License Numbers,Name,Violation Description,Violation number,Zip Code
0,SAN ANTONIO,BEXAR,"780948, 1706491, 1699123","NGUYEN, TOAN HUU",Respondent failed to clean and sanitize whirlp...,COS20180004289,78217
1,EL PASO,EL PASO,737708,"NGUYEN, HANH CONG",Respondent failed to clean and sanitize whirlp...,COS20180006594,79934
2,LONGVIEW,GREGG,731665,"NGUYEN, KHIEM VAN",Respondent failed to follow whirlpool foot spa...,COS20180000257,75604
3,HOUSTON,HARRIS,"1347649, 760528","NGUYEN, DIEP THI NGOC","Respondent failed to disinfect tools, implemen...",COS20180004915,77014
4,SAN ANTONIO,BEXAR,767339,"NGUYEN, LAN T-THUY","Respondent failed to clean, disinfect, and ste...",COS20180009255,78255
5,AUSTIN,TRAVIS,681274,"NGUYEN, TUAN A",Respondent failed to clean and disinfect all w...,COS20140018343,78723
6,EULESS,TARRANT,"721373, 1142884","NGUYEN, THAO B",Respondent failed to clean and sanitize whirlp...,COS20180008846,76039
7,HOUSTON,HARRIS,1470271,"NGUYEN, BETH MARIA",The Respondent's license was revoked upon Resp...,COS20180000897,77083
8,AMARILLO,POTTER,"1196244, 767015, 767014","NGUYEN, TRUNG N","Respondent failed to clean, disinfect, and ste...",COS20170023893,79106
9,PITTSBURG,CAMP,759931,"NGUYEN, NGAT THI",Respondent failed to follow whirlpool foot spa...,COS20180004076,75686


In [24]:
df.to_csv('dirty-cosmologists.csv', index = False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [25]:
pd.read_csv('dirty-cosmologists.csv').head()

Unnamed: 0,City,County,License Numbers,Name,Violation Description,Violation number,Zip Code
0,SAN ANTONIO,BEXAR,"780948, 1706491, 1699123","NGUYEN, TOAN HUU",Respondent failed to clean and sanitize whirlp...,COS20180004289,78217
1,EL PASO,EL PASO,737708,"NGUYEN, HANH CONG",Respondent failed to clean and sanitize whirlp...,COS20180006594,79934
2,LONGVIEW,GREGG,731665,"NGUYEN, KHIEM VAN",Respondent failed to follow whirlpool foot spa...,COS20180000257,75604
3,HOUSTON,HARRIS,"1347649, 760528","NGUYEN, DIEP THI NGOC","Respondent failed to disinfect tools, implemen...",COS20180004915,77014
4,SAN ANTONIO,BEXAR,767339,"NGUYEN, LAN T-THUY","Respondent failed to clean, disinfect, and ste...",COS20180009255,78255
