# Texas Barber Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for barbers in Houson!

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

### What is the tag and class name for every row of data?

The data is displayed inside a `<table>` (the only table in the whold html doc).

We find the individual rows of data inside a set of `<tr>`-tags (i.e. the table rows).

Some `<tr>` have `style="background:#dedede"`, others have `style="background:#ffffff"`

### What is the tag and class name for every person's name?

The name is nested in a `<td>` and in there, we have a `<span class="results_text">`

### What is the tag and class name for the violation number?

Same as above, only it is not the first `<span>` but the last inside of the `<td>`

### What is the tag and class name for the description of their violation?

The description is located in the third `<td>` of the `<tr>` mentioned before

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [3]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[1].text` to get the text of the first `<tr>` element.

- If the result starts with  **nPlease enter at least one (1) parameter** you were NOT successful.
- If the result starts with **MONTES DE OCA, REINIER**, you were successful.

### Try to request the page however you think you should.

"Try" to do it, because it *will not work.* Once you've learned that it won't work, you should **ask how to do it on the board**.

In [4]:
url = "https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp"

In [5]:
response = requests.get(url)

In [6]:
doc = BeautifulSoup(response.text, "html.parser")

In [7]:
doc.find_all("tr")[1].text

"\n\nPlease enter at least one (1) parameter; two (2) if License Program Type is 'Cosmetologists'.\nFinal Orders - (Note: The earliest available Order Date is 7/30/2014.)\n\n"

### Try to request the page with the correct data parameters

Secret tip: It still won't work. **Ask why not on the board.**

In [8]:
formdata = {
    "pht_status": "BAR",
    "pht_lic": "",
    "pht_lnm": "",
    "pht_fnm": "",
    "pht_old_name": "",
    "phy_zip": "",
    "phy_city": "Houston",
    "phy_cnty": "-1",
    "B1": "Search"
}
headerdata = {
    "Referer": "https://www.tdlr.texas.gov/cimsfo/fosearch.asp"
}

In [9]:
response = requests.post(url, data=formdata, headers=headerdata)

In [10]:
doc = BeautifulSoup(response.text, "html.parser")

### What is the smallest `curl` command that still gives you a result?

In [11]:
#didn't do curl because using windows, not set up yet...

## Request the page with the correct data parameters AND the correct MINIMUM headers

This time it should work.

## Scraping

### Loop through each `tr` and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen? I'm happy to help if you ask on the board.

In [18]:
rows = doc.find_all("tr", style="background:#ffffff;")

In [19]:
rows += doc.find_all("tr", style="background:#dedede;")

In [24]:
cases = []

for row in rows:
    
    current = {}
    
    tds = row.find_all("td")
    spans = tds[0].find_all("span", class_="results_text")
    
    name = spans[0]
    violation = spans[-1]
    describe = tds[2]
    
    current["name"] = name.text
    print (name.text)
    
    if violation:
        current["violation"] = violation.text.rstrip()
        print (violation.text.rstrip())
    
    current["describe"] = describe.text
    print (describe.text)
    
    cases.append(current)
    print ("----------------")
    

MONTES DE OCA, REINIER 
BAR20170009735
Respondent performed barbering without the required license.
----------------
CHAPMAN, JESSICA 
BAR20160014463
Respondent failed to electronically submit to the Department at least one time per month student's accrued hours.
----------------
GONZALES, DAVID 
BAR20160024898
Respondent leased space in a barber shop to an individual who engaged in the practice of barbering but had not obtained a barber license.
----------------
ARMSTEAD, CEDRIC J
BAR20170017750
The Respondent's license was revoked upon Respondent's imprisonment in a penitentiary.
----------------
TREJO, BLADIMAR A
BAR20170015712
Respondent performed barbering without the required license.
----------------
HOPKINS, JOSHUA 
BAR20170004945
Respondent performed barbering without the required license.
----------------
HEATH, LOLETHA N
BAR20170008862
Respondent leased space in a barber shop to an individual who engaged in the practice of barbering with an expired barber license.
----------

## Loop through each `tr`, printing each violation description

- TIP: What is the container tag name for it?
- TIP: You'll get an error even if you're ALMOST right - which row is causing the problem?

In [None]:
#we did this together in class groupwork on the other computer, just showing our final code here

## Loop through each `tr`, printing the complaint number

- TIP: It should be the last piece of the fist `td`

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number

Create a new dictionary for each `tr` (except the header).

### Save that to a CSV

In [25]:
df = pd.DataFrame(cases)

In [26]:
df.to_csv("cases.csv", index=False)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [27]:
df_in = pd.read_csv("cases.csv")

In [28]:
df_in.head(10)

Unnamed: 0,describe,name,violation
0,Respondent performed barbering without the req...,"MONTES DE OCA, REINIER",BAR20170009735
1,Respondent failed to electronically submit to ...,"CHAPMAN, JESSICA",BAR20160014463
2,Respondent leased space in a barber shop to an...,"GONZALES, DAVID",BAR20160024898
3,The Respondent's license was revoked upon Resp...,"ARMSTEAD, CEDRIC J",BAR20170017750
4,Respondent performed barbering without the req...,"TREJO, BLADIMAR A",BAR20170015712
5,Respondent performed barbering without the req...,"HOPKINS, JOSHUA",BAR20170004945
6,Respondent leased space in a barber shop to an...,"HEATH, LOLETHA N",BAR20170008862
7,Respondent performed barbering without the req...,"MONTES DE OCA, REINIER",BAR20170009735
8,Respondent performed barbering without the req...,TOP STYLES BARBER SHOP,BAR20170015711
9,The Respondent's license was revoked upon Resp...,"SHEPHARD, JAMES C",BAR20170012408
