# Texas Barber Violations

Texas has a system for [searching for license violations](https://www.tdlr.texas.gov/cimsfo/fosearch.asp). You're going to search for barbers in Houson!

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

### What is the tag and class name for every row of data?

In [77]:
from bs4 import BeautifulSoup
import requests

# Name and Location	Order	Basis for Order

### What is the tag and class name for every person's name?

### What is the tag and class name for the violation number?

### What is the tag and class name for the description of their violation?

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[1].text` to get the text of the first `<tr>` element.

- If the result starts with  **nPlease enter at least one (1) parameter** you were NOT successful.
- If the result starts with **MONTES DE OCA, REINIER**, you were successful.

### Try to request the page however you think you should.

"Try" to do it, because it *will not work.* Once you've learned that it won't work, you should **ask how to do it on the board**.

In [78]:
# pht_status:BAR
# pht_lic:
# pht_lnm:
# pht_fnm:
# pht_oth_name:
# phy_city:-1
# phy_cnty:-1
# phy_zip:
# B1:Search

data = {
    'pht_status':'BAR',
    'pht_lic':'',
    'pht_lnm':'',
    'pht_fnm':'',
    'pht_oth_name':'',
    'phy_city':'HOUSTON',
    'phy_cnty':'-1',
    'phy_zip':'',
    'B1':'Search'
}

### Try to request the page with the correct data parameters

Secret tip: It still won't work. **Ask why not on the board.**

In [79]:
headers = {
    'Referer': 'https://www.tdlr.texas.gov/cimsfo/'
}

### What is the smallest `curl` command that still gives you a result?

In [80]:
# Referer: 'https://www.tdlr.texas.gov/cimsfo/fosearch.asp

## Request the page with the correct data parameters AND the correct MINIMUM headers

This time it should work.

In [81]:
import pandas

In [82]:
response = requests.post("https://www.tdlr.texas.gov/cimsfo/fosearch_results.asp", data = data, headers = headers)
doc = BeautifulSoup(response.text, "html.parser")
table = doc.find_all('tr')

for x in table:
    print(x.prettify())

<tr>
 <th style="padding:4px; text-align:left; background:#c2c2c2;">
  Name and Location
 </th>
 <th style="padding:4px; text-align:left; background:#c2c2c2;">
  Order
 </th>
 <th style="padding:4px; text-align:left; background:#c2c2c2;">
  Basis for Order
 </th>
</tr>

<tr style="background:#ffffff;">
 <td style="padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:22%;">
  <span class="results_text">
   MONTES DE OCA, REINIER
  </span>
  <br/>
  <br/>
  <span class="default_text">
   Company:
  </span>
  <span class="results_text">
   LA BENDICION
  </span>
  <br/>
  <span class="default_text">
   City:
  </span>
  <span class="results_text">
   HOUSTON
  </span>
  <br/>
  <span class="default_text">
   County:
  </span>
  <span class="results_text">
   HARRIS
  </span>
  <br/>
  <span class="default_text">
   Zip Code:
  </span>
  <span class="results_text">
   77072
  </span>
  <br/>
  <br/>
  <br/>
  <span class="default_text">
   License:
  </spa

## Scraping

### Loop through each `tr` and print each person's name

You'll get an error because the first one doesn't have a name. How do you make that not happen? I'm happy to help if you ask on the board.

In [83]:
for a in table:
    name = a.find("span", attrs = {'class': "results_text"})
    if not name:
        continue
    print(name.text)

MONTES DE OCA, REINIER 
ALFORD, RAYMOND 
CHAPMAN, JESSICA 
SALAZAR-ALVAREZ, SAMUEL 
GONZALES, DAVID 
FLORES, CHRISTOPHER 
ARMSTEAD, CEDRIC J
MORAH, PATRICK 
TREJO, BLADIMAR A
DAVIS, RICHARD D
HOPKINS, JOSHUA 
NINO, ROBERT 
HEATH, LOLETHA N
SALAZAR-ALVAREZ, SAMUEL 
MONTES DE OCA, REINIER 
MARLEN'S BEAUTY SALON LIC 747062
TOP STYLES BARBER SHOP
SUTTON, EMANUEL B
SHEPHARD, JAMES C
HERNANDEZ, MARIA DIOCELINA 
WILLIAMS, DONTUEL 
JOHNSON, JEFFERY J
PERFECTION BARBER & HAIR STUDIO
HUERTA, FRANCISCO 
TIPTON, SELINA I
ARREOLA, ERIC D
HARRISON, OTTO M
RIVERA TORRES, ANGEL D
PECK, MARVIN 
MOTA SOTO, CRISTIAN D
WADDLE, EDDIE D
SON, YOUNG J
HILL, BRIAN 
BROWN, DELRICK JAREL 
FRANKLIN, KELVIN 
LEDET, LEON 
WILLIAMS, DONTUEL 
LACY, JUSTIN J
MAKE THE CUT
ARELLANO, GREGORY F
MACEDO, ANTONIO 
MILLER, SHAWN ERIC 
HAYWARD, ABBIE DEAN 
BROWN, CHARLES EARL 
MCQUEEN, IDA M
MCQUEEN, IDA M
CAESAR, RON 
MORRIS, VICTOR B
NOLAN, CHRIS B
BICKHAM, DONNELL 
LOUIS, DIONNE N
HARRELL, KENTON D
SUBRAHMANIAN, CHITRA N
FR

## Loop through each `tr`, printing each violation description

- TIP: What is the container tag name for it?
- TIP: You'll get an error even if you're ALMOST right - which row is causing the problem?

In [84]:
for x in table:
    order = x.find("td", attrs = {"style": "padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:39%;"})
    description = order
    if not order:
        continue
    print(order.text)

Date: 5/24/2017Respondent is assessed an administrative penalty in the amount of $1,125.
Date: 5/24/2017Respondent is ordered to immediately cease and desist from acting as or impersonating a licensed barber and/or performing or offering to provide barbering services for customers, clients, or the public.
Date: 5/23/2017Respondent is assessed an administrative penalty in the amount of $300.
Date: 5/23/2017Respondent is assessed an administrative penalty in the amount of $1,125.
Date: 5/19/2017Respondent is assessed an administrative penalty in the amount of $1,125.
Date: 5/12/2017Respondent is assessed an administrative penalty in the amount of $1,125.
Date: 5/10/2017Respondent's Class A Barber license was revoked by operation of law on 01/12/16.
Date: 5/9/2017Respondent is assessed an administrative penalty in the amount of $1,000.
Date: 4/28/2017Respondent is ordered to immediately cease and desist from acting as or impersonating a licensed barber and/or performing or offering to pro

## Loop through each `tr`, printing the complaint number

- TIP: It should be the last piece of the fist `td`

In [85]:
for x in table:
    results = x.find_all("span", attrs = {'class': "results_text"})
    if not results:
        continue
    complaint = results[-2]
    print(complaint.text)

BAR20170009735      
BAR20170013061      
BAR20160014463      
BAR20170009706      
BAR20160024898      
BAR20170003858      
BAR20170017750      
BAR20170001067      
BAR20170015712      
BAR20160026976      
BAR20170004945      
BAR20170005752      
BAR20170008862      
BAR20170009706      
BAR20170009735      
BAR20170010211      
BAR20170015711      
BAR20170005607      
BAR20170012408      
BAR20160015455      
BAR20170004000      
BAR20170004622      
BAR20170009953      
BAR20160019178      
BAR20170003998      
BAR20170005585      
BAR20170004247      
BAR20170004644      
BAR20170001084      
BAR20170003233      
BAR20170007267      
BAR20170004607      
BAR20170004726      
BAR20170000258      
BAR20170000872      
BAR20170000888      
BAR20170004000      
BAR20170001296      
BAR20170001765      
BAR20160000930      
BAR20160012081      
BAR20160023793      
BAR20160020239      
BAR20160025221      
BAR20160003560      
BAR20160014501      
BAR20160020292      
BAR2016001971

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Person's name
- Violation description
- Violation number

Create a new dictionary for each `tr` (except the header).

In [89]:
barbor_list = []

for x in table:
    barbor_dict = {}
    person = a.find("span", attrs = {'class': "results_text"})
    if not person:
        continue
    barbor_dict["Person's name"] = person.text
    
    order = x.find("td", attrs = {"style": "padding:4px; text-align:left; font-size:11px; font:Arial, Helvetica, sans-serif; width:39%;"})
    description = order
    if not order:
        continue
    barbor_dict["Violation description"] = order.text
    
    results = x.find_all("span", attrs = {'class': "results_text"})
    if not results:
        continue
    complaint = results[-2]
    barbor_dict["Violation number"] = complaint.text
    
    barbor_list.append(barbor_dict)
    
barbor_list

[{"Person's name": 'RAND, CHARLES ',
  'Violation description': 'Date: 5/24/2017Respondent is assessed an administrative penalty in the amount of $1,125.',
  'Violation number': 'BAR20170009735      '},
 {"Person's name": 'RAND, CHARLES ',
  'Violation description': 'Date: 5/24/2017Respondent is ordered to immediately cease and desist from acting as or impersonating a licensed barber and/or performing or offering to provide barbering services for customers, clients, or the public.',
  'Violation number': 'BAR20170013061      '},
 {"Person's name": 'RAND, CHARLES ',
  'Violation description': 'Date: 5/23/2017Respondent is assessed an administrative penalty in the amount of $300.',
  'Violation number': 'BAR20160014463      '},
 {"Person's name": 'RAND, CHARLES ',
  'Violation description': 'Date: 5/23/2017Respondent is assessed an administrative penalty in the amount of $1,125.',
  'Violation number': 'BAR20170009706      '},
 {"Person's name": 'RAND, CHARLES ',
  'Violation description

### Save that to a CSV

In [90]:
df = pd.DataFrame(barbor_list)

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [91]:
df.head()

Unnamed: 0,Person's name,Violation description,Violation number
0,"RAND, CHARLES",Date: 5/24/2017Respondent is assessed an admin...,BAR20170009735
1,"RAND, CHARLES",Date: 5/24/2017Respondent is ordered to immedi...,BAR20170013061
2,"RAND, CHARLES",Date: 5/23/2017Respondent is assessed an admin...,BAR20160014463
3,"RAND, CHARLES",Date: 5/23/2017Respondent is assessed an admin...,BAR20170009706
4,"RAND, CHARLES",Date: 5/19/2017Respondent is assessed an admin...,BAR20160024898
