# Scraping mine data with `.apply`

## The pages we'll be looking at

If I wanted to read specific information about a specfic mine, it takes a few steps. **Do these steps with your browser before you try any programming.**

1. Visit the [Mine Data Retrieval System](https://arlweb.msha.gov/drs/drshome.htm)
2. Scroll down to **Mine Identification Number (ID) Search**
3. Type in a mine ID number, such as `3503598`, click **Search**
4. I'm on a page! It lists the MINE NAME and MINE OWNER.

After searching for and finding a mine, I can use this page to **find reports about this mine**. Some of the reports are on accidents, violations, inspections, health samples and more. To get those reports:

1. Search for a mine (if you haven't already)
2. Scroll down and change **Beginning Date** to `1/1/1995` (violation reports begin in 1995, accidents begin in 1983)
3. Select the report type of `Violations`
4. Click **Get Report**
5. I'm on a page! It lists ALL OF THE MINE'S VIOLATIONS.

By changing the report type you're searching for you can find all sorts of different data.

# Doing this programmatically

## First, scraping a single page

### Import your imports

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

import pandas as pd
import time

### Searching for a mine

Visit the [Mine Data Retrieval System](https://arlweb.msha.gov/drs/drshome.htm) and use Selenium to search for `3503598`

- *TIP: You might need to use the Selenium code to scroll down to the right spot on the page. Or not!*
- *TIP: Use `.send_keys` to type into the box*
- *TIP: On pages that never change, you can usually just use XPath if you're feeling lazy*

In [2]:
driver = webdriver.Chrome()

driver.get("https://arlweb.msha.gov/drs/drshome.htm")

In [3]:
search_field = driver.find_element_by_name('MineId')
driver.execute_script("arguments[0].scrollIntoView(true)", search_field)
search_field.send_keys('3503598')
driver.find_element_by_xpath('//*[@id="content"]/table[3]/tbody/tr[3]/td[2]/input').click()

### Finding reports

On the "Report Selection Page" (where you should be after you search), use Selenium to...

- Change the **Beginning Date** to `1/1/1995`
- Select the report type of `Violations`
- Click **Get Report**

.

- *TIP: Remember, if someone isn't on the page Selenium can't click it!*

In [4]:
bdate = driver.find_element_by_name('BDate')
bdate.send_keys('1/1/1995')

In [5]:
type_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table[3]/tbody/tr[2]/td[2]/table/tbody/tr[1]/td/input')
type_button.click()

In [6]:
get_report_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table[3]/tbody/tr[3]/td[2]/input')
get_report_button.click()

### Saving reports

Save all of the rows of data on that page into a new dataframe. Each column is its own column, **and you also need to save the URL under the 'Standard' column.** Here, I even made you a blank dictionary:

```python
data = {}
data['violator'] = ''
data['contract_id'] = ''
data['citation_no'] = ''
data['case_no'] = ''
data['date_issues'] = ''
data['final_order_date'] = ''
data['section_of_act'] = ''
data['date_terminated'] = ''
data['citation'] = ''
data['s_and_s'] = ''
data['standard'] = ''
data['standard_url'] = ''
data['proposed_penalty'] = ''
data['citation_status'] = ''
data['current_penalty'] = ''
data['amount_paid'] = ''
```

- *TIP: Some of those table rows aren't what you want. How can you tell them apart from the good ones? (the previous mine owner ones are okay, I just mean the weird headers)*
- *TIP: I sense `.find_elements` + a lot of square brackets*
- *TIP: This is just like scraping a search results page!*
- *TIP: For the URL, you'll need to find the `a` inside of the cell*
- *TIP: class name is sadly not going to save your life here, because some of the `tr`s and `td`s have the same class! It's stupid. But there's a trick: CSS selectors! Something like `div#container` finds a `div` with the id of `container`, while `span.important` finds a `span` with the class of `important`. It should be helpful! And use `.find_elements_by_` + tab to see what the command is*

In [7]:
reports = driver.find_elements_by_xpath('//*[@id="content"]/table[5]/tbody/tr')
#print(reports)

violations = []

for report in reports[1:]:
    
    print('---------')

    data = {}
    
    if len(report.find_elements_by_class_name('drsviols')) != 0:
        print(report.find_element_by_class_name('drsviols').text)

#row = report.find_elements_by_class_name('drsviols')

        data['violator'] = report.find_elements_by_class_name('drsviols')[0].text
        print(report.find_elements_by_class_name('drsviols')[0].text)

        data['contract_id'] = report.find_elements_by_class_name('drsviols')[1].text
        print(report.find_elements_by_class_name('drsviols')[1].text)

        data['citation_no'] = report.find_elements_by_class_name('drsviols')[2].text
        print(report.find_elements_by_class_name('drsviols')[2].text)

        data['case_no'] = report.find_elements_by_class_name('drsviols')[3].text
        print(report.find_elements_by_class_name('drsviols')[3].text)

        data['date_issues'] = report.find_elements_by_class_name('drsviols')[4].text
        print(report.find_elements_by_class_name('drsviols')[4].text)

        data['final_order_date'] = report.find_elements_by_class_name('drsviols')[5].text
        print(report.find_elements_by_class_name('drsviols')[5].text)

        data['section_of_act'] = report.find_elements_by_class_name('drsviols')[6].text
        print(report.find_elements_by_class_name('drsviols')[6].text)

        data['date_terminated'] = report.find_elements_by_class_name('drsviols')[7].text
        print(report.find_elements_by_class_name('drsviols')[7].text)

        data['citation'] = report.find_elements_by_class_name('drsviols')[8].text
        print(report.find_elements_by_class_name('drsviols')[8].text)

        data['s_and_s'] = report.find_elements_by_class_name('drsviols')[9].text
        print(report.find_elements_by_class_name('drsviols')[9].text)

        data['standard'] = report.find_elements_by_class_name('drsviols')[10].text
        print(report.find_elements_by_class_name('drsviols')[10].text)

        data['standard_url'] = report.find_elements_by_class_name('drsviols')[10].find_element_by_tag_name('a').get_attribute('href')
        print(report.find_elements_by_class_name('drsviols')[10].find_element_by_tag_name('a').get_attribute('href'))

        data['proposed_penalty'] = report.find_elements_by_class_name('drsviols')[11].text
        print(report.find_elements_by_class_name('drsviols')[11].text)

        data['citation_status'] = report.find_elements_by_class_name('drsviols')[12].text
        print(report.find_elements_by_class_name('drsviols')[12].text)

        data['current_penalty'] = report.find_elements_by_class_name('drsviols')[13].text
        print(report.find_elements_by_class_name('drsviols')[13].text)

        data['amount_paid'] = report.find_elements_by_class_name('drsviols')[14].text
        print(report.find_elements_by_class_name('drsviols')[14].text)

        violations.append(data)



---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8992368  
000440809
5/2/2017
7/23/2017 
104(a)
5/2/2017
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-vol1/pdf/CFR-2017-title30-vol1-sec56-14107.pdf
116.00
Closed
116.00 
116.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8992214  
000435717
2/7/2017
5/21/2017 
104(a)
2/8/2017
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-vol1/pdf/CFR-2017-title30-vol1-sec56-14107.pdf
116.00
Closed
116.00 
116.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8872462  
000398910
10/21/2015
1/13/2016 
104(a)
10/27/2015
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-14107.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8872461  
000398910
10/21/2015
1/13/2016 
104(a)
10/27/2015
C
N
56.12004
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-12004.pdf
100.00
Closed
100.00 
100.00 
---

100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479612  
000180614
2/26/2009
5/22/2010 
104(a)
3/4/2009
C
Y
56.9200(d)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-9200.pdf
263.00
Closed
263.00 
263.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479619  
000180614
2/26/2009
5/22/2010 
104(a)
2/26/2009
C
N
56.14100(a)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-14100.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479616  
000180614
2/26/2009
5/22/2010 
104(a)
3/4/2009
C
N
56.14103(b)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-14103.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479617  
000180614
2/26/2009
5/8/2009 
104(a)
3/4/2009
C
N
56.4201(a)(2)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-4201.pdf
100.00
Closed
100.0

In [9]:
len(violations)

49

### Saving that data

Save the dataframe to a CSV file called `3503598-violations.csv` (that's the TDLR code)

In [10]:
df = pd.DataFrame(violations)
df.head()

Unnamed: 0,amount_paid,case_no,citation,citation_no,citation_status,contract_id,current_penalty,date_issues,date_terminated,final_order_date,proposed_penalty,s_and_s,section_of_act,standard,standard_url,violator
0,116.0,440809,C,8992368,Closed,,116.0,5/2/2017,5/2/2017,7/23/2017,116.0,N,104(a),56.14107(a),http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-...,Newberg Rock & Dirt
1,116.0,435717,C,8992214,Closed,,116.0,2/7/2017,2/8/2017,5/21/2017,116.0,N,104(a),56.14107(a),http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-...,Newberg Rock & Dirt
2,100.0,398910,C,8872462,Closed,,100.0,10/21/2015,10/27/2015,1/13/2016,100.0,N,104(a),56.14107(a),http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-...,Newberg Rock & Dirt
3,100.0,398910,C,8872461,Closed,,100.0,10/21/2015,10/27/2015,1/13/2016,100.0,N,104(a),56.12004,http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-...,Newberg Rock & Dirt
4,100.0,383840,C,8790965,Closed,,100.0,4/15/2015,4/15/2015,7/17/2015,100.0,N,104(a),56.13015,http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-...,Newberg Rock & Dirt


## Put that all in ONE cell that runs correctly

The **entire process**, from searching to saving as a CSV

In [11]:
driver.get("https://arlweb.msha.gov/drs/drshome.htm")

search_field = driver.find_element_by_name('MineId')
driver.execute_script("arguments[0].scrollIntoView(true)", search_field)
search_field.send_keys('3503598')
driver.find_element_by_xpath('//*[@id="content"]/table[3]/tbody/tr[3]/td[2]/input').click()

time.sleep(1)

bdate = driver.find_element_by_name('BDate')
bdate.send_keys('1/1/1995')

type_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table[3]/tbody/tr[2]/td[2]/table/tbody/tr[1]/td/input')
type_button.click()

get_report_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table[3]/tbody/tr[3]/td[2]/input')
get_report_button.click()

time.sleep(1)

reports = driver.find_elements_by_xpath('//*[@id="content"]/table[5]/tbody/tr')
#print(reports)

violations = []

for report in reports[1:]:
    
    print('---------')

    data = {}
    
    if len(report.find_elements_by_class_name('drsviols')) != 0:
        print(report.find_element_by_class_name('drsviols').text)

#row = report.find_elements_by_class_name('drsviols')

        data['violator'] = report.find_elements_by_class_name('drsviols')[0].text
        print(report.find_elements_by_class_name('drsviols')[0].text)

        data['contract_id'] = report.find_elements_by_class_name('drsviols')[1].text
        print(report.find_elements_by_class_name('drsviols')[1].text)

        data['citation_no'] = report.find_elements_by_class_name('drsviols')[2].text
        print(report.find_elements_by_class_name('drsviols')[2].text)

        data['case_no'] = report.find_elements_by_class_name('drsviols')[3].text
        print(report.find_elements_by_class_name('drsviols')[3].text)

        data['date_issues'] = report.find_elements_by_class_name('drsviols')[4].text
        print(report.find_elements_by_class_name('drsviols')[4].text)

        data['final_order_date'] = report.find_elements_by_class_name('drsviols')[5].text
        print(report.find_elements_by_class_name('drsviols')[5].text)

        data['section_of_act'] = report.find_elements_by_class_name('drsviols')[6].text
        print(report.find_elements_by_class_name('drsviols')[6].text)

        data['date_terminated'] = report.find_elements_by_class_name('drsviols')[7].text
        print(report.find_elements_by_class_name('drsviols')[7].text)

        data['citation'] = report.find_elements_by_class_name('drsviols')[8].text
        print(report.find_elements_by_class_name('drsviols')[8].text)

        data['s_and_s'] = report.find_elements_by_class_name('drsviols')[9].text
        print(report.find_elements_by_class_name('drsviols')[9].text)

        data['standard'] = report.find_elements_by_class_name('drsviols')[10].text
        print(report.find_elements_by_class_name('drsviols')[10].text)

        data['standard_url'] = report.find_elements_by_class_name('drsviols')[10].find_element_by_tag_name('a').get_attribute('href')
        print(report.find_elements_by_class_name('drsviols')[10].find_element_by_tag_name('a').get_attribute('href'))

        data['proposed_penalty'] = report.find_elements_by_class_name('drsviols')[11].text
        print(report.find_elements_by_class_name('drsviols')[11].text)

        data['citation_status'] = report.find_elements_by_class_name('drsviols')[12].text
        print(report.find_elements_by_class_name('drsviols')[12].text)

        data['current_penalty'] = report.find_elements_by_class_name('drsviols')[13].text
        print(report.find_elements_by_class_name('drsviols')[13].text)

        data['amount_paid'] = report.find_elements_by_class_name('drsviols')[14].text
        print(report.find_elements_by_class_name('drsviols')[14].text)

        violations.append(data)

df = pd.DataFrame(violations)

df.to_csv("violations_data.csv", index=False)

---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8992368  
000440809
5/2/2017
7/23/2017 
104(a)
5/2/2017
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-vol1/pdf/CFR-2017-title30-vol1-sec56-14107.pdf
116.00
Closed
116.00 
116.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8992214  
000435717
2/7/2017
5/21/2017 
104(a)
2/8/2017
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-vol1/pdf/CFR-2017-title30-vol1-sec56-14107.pdf
116.00
Closed
116.00 
116.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8872461  
000398910
10/21/2015
1/13/2016 
104(a)
10/27/2015
C
N
56.12004
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-12004.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8872462  
000398910
10/21/2015
1/13/2016 
104(a)
10/27/2015
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-14107.pdf
100.00
Closed
100.00 
100.00 
---

100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479620  
000180614
2/26/2009
5/8/2009 
104(a)
3/4/2009
C
N
56.12028
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-12028.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479615  
000180614
2/26/2009
5/22/2010 
104(a)
3/4/2009
C
N
56.14100(b)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-14100.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479617  
000180614
2/26/2009
5/8/2009 
104(a)
3/4/2009
C
N
56.4201(a)(2)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-4201.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479613  
000180614
2/26/2009
5/22/2010 
104(a)
2/26/2009
C
Y
56.3200
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-3200.pdf
100.00
Closed
100.00 
100.

# Using .apply to find data about SEVERAL mines

The file `mines-subset.csv` has a list of mine IDs. We're going to scrape the operator's name for each of those mines.

### Open up `mines-subset.csv` and save it into a dataframe

In [13]:
mines_df = pd.read_csv('mines-subset.csv')
mines_df.head()

Unnamed: 0,id
0,4104757
1,801306
2,3609931


In [14]:
mines_df.dtypes

id    int64
dtype: object

### Open up `mines-subset.csv` in a text editor, then look at your dataframe. Is something different about them? If so, make them match.

- *TIP: You can zero fill if you want, but another option is that when reading in a CSV, `dtype='str'` will force everything to be a string*

In [15]:
mines_df = pd.read_csv('mines-subset.csv', dtype={'id': str})
mines_df.head()

Unnamed: 0,id
0,4104757
1,801306
2,3609931


In [16]:
mines_df.dtypes

id    object
dtype: object

### Convert your one-cell scraper into a function, and use it on each row of our dataset

- *TIP: You'll be using `.apply`*
- *TIP: You won't be joining this back into your dataframe, so you don't need to `return` anything or `join` or any of that.*
- *TIP: Be careful of your **other variable names** - if you're calling the thing you're sending your function `row`, you can't use it anywhere else (like in your loop)*
- *TIP: **BE CAREFUL WHAT YOU NAME YOUR DATAFRAMES.** If you name the citations dataframe `df` it can overwrite your mine ID `df`*
- *TIP: You'll be saving a dataframe each time*
- *TIP: Be sure you change everything that refers to the mine ID to refer to the current row's ID instead of `3503598`*
- *TIP: BE SURE TO CHANGE EVERYTHING THAT REFERS TO THE MINE ID*
- *TIP: EVERYTHING, EVERYTHING, EVERYTHING! Look at the end of your function! Maybe I'm overreacting, I don't know.*
- *TIP: If you hit an error about list index out of range, see what line it's happening on and go look at the page. What's different about this page than the previous ones? (answer: the last three columns!) If you assign those columns later using `try`/`except` you should be able to get some data from those rows without throwing it all out. If you can't figure it out, just wrap it all in try/except and give up on those rows*
- *TIP: Some of the standards might not have links, either, so you might want to wrap that in a `try`/`except`, too!*

In [17]:
def get_violations(row):

    driver.get("https://arlweb.msha.gov/drs/drshome.htm")

    search_field = driver.find_element_by_name('MineId')
    driver.execute_script("arguments[0].scrollIntoView(true)", search_field)
    search_field.send_keys(row['id'])
    driver.find_element_by_xpath('//*[@id="content"]/table[3]/tbody/tr[3]/td[2]/input').click()

    time.sleep(1)

    bdate = driver.find_element_by_name('BDate')
    bdate.send_keys('1/1/1995')

    type_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table[3]/tbody/tr[2]/td[2]/table/tbody/tr[1]/td/input')
    type_button.click()

    get_report_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table[3]/tbody/tr[3]/td[2]/input')
    get_report_button.click()

    time.sleep(1)

    reports = driver.find_elements_by_xpath('//*[@id="content"]/table[5]/tbody/tr')
    #print(reports)

    violations = []

    for report in reports[1:]:

        print('---------')

        data = {}

        if len(report.find_elements_by_class_name('drsviols')) != 0:
            print(report.find_element_by_class_name('drsviols').text)

    #row = report.find_elements_by_class_name('drsviols')

            data['violator'] = report.find_elements_by_class_name('drsviols')[0].text
            print(report.find_elements_by_class_name('drsviols')[0].text)

            data['contract_id'] = report.find_elements_by_class_name('drsviols')[1].text
            print(report.find_elements_by_class_name('drsviols')[1].text)

            data['citation_no'] = report.find_elements_by_class_name('drsviols')[2].text
            print(report.find_elements_by_class_name('drsviols')[2].text)

            data['case_no'] = report.find_elements_by_class_name('drsviols')[3].text
            print(report.find_elements_by_class_name('drsviols')[3].text)

            data['date_issues'] = report.find_elements_by_class_name('drsviols')[4].text
            print(report.find_elements_by_class_name('drsviols')[4].text)

            data['final_order_date'] = report.find_elements_by_class_name('drsviols')[5].text
            print(report.find_elements_by_class_name('drsviols')[5].text)

            data['section_of_act'] = report.find_elements_by_class_name('drsviols')[6].text
            print(report.find_elements_by_class_name('drsviols')[6].text)

            data['date_terminated'] = report.find_elements_by_class_name('drsviols')[7].text
            print(report.find_elements_by_class_name('drsviols')[7].text)

            data['citation'] = report.find_elements_by_class_name('drsviols')[8].text
            print(report.find_elements_by_class_name('drsviols')[8].text)

            data['s_and_s'] = report.find_elements_by_class_name('drsviols')[9].text
            print(report.find_elements_by_class_name('drsviols')[9].text)
            
            try:
                data['standard'] = report.find_elements_by_class_name('drsviols')[10].text
                print(report.find_elements_by_class_name('drsviols')[10].text)

                data['standard_url'] = report.find_elements_by_class_name('drsviols')[10].find_element_by_tag_name('a').get_attribute('href')
                print(report.find_elements_by_class_name('drsviols')[10].find_element_by_tag_name('a').get_attribute('href'))
                
            except:
                print('Missing url')

            data['proposed_penalty'] = report.find_elements_by_class_name('drsviols')[11].text
            print(report.find_elements_by_class_name('drsviols')[11].text)

            data['citation_status'] = report.find_elements_by_class_name('drsviols')[12].text
            print(report.find_elements_by_class_name('drsviols')[12].text)

            data['current_penalty'] = report.find_elements_by_class_name('drsviols')[13].text
            print(report.find_elements_by_class_name('drsviols')[13].text)

            data['amount_paid'] = report.find_elements_by_class_name('drsviols')[14].text
            print(report.find_elements_by_class_name('drsviols')[14].text)

            violations.append(data)

    new_df = pd.DataFrame(violations)

    new_df.to_csv(row['id']+"_violations_data.csv", index=False)
    

In [18]:
mines_df.apply(get_violations, axis=1)

---------
Dirt Works
Dirt Works
  
8778046  
000374480
10/14/2014
3/27/2015 
104(a)
10/14/2014
C
N
56.14132(a)
http://www.gpo.gov/fdsys/pkg/CFR-2014-title30-vol1/pdf/CFR-2014-title30-vol1-sec56-14132.pdf
100.00
Closed
100.00 
100.00 
---------
Dirt Works
Dirt Works
  
8778047  
000374480
10/14/2014
3/27/2015 
104(a)
11/4/2014
C
N
56.18010
http://www.gpo.gov/fdsys/pkg/CFR-2014-title30-vol1/pdf/CFR-2014-title30-vol1-sec56-18010.pdf
162.00
Closed
162.00 
162.00 
---------
Dirt Works
Dirt Works
  
8771783  
000345454
1/22/2014
4/20/2014 
104(a)
1/22/2014
C
Y
56.9300(a)
http://www.gpo.gov/fdsys/pkg/CFR-2014-title30-vol1/pdf/CFR-2014-title30-vol1-sec56-9300.pdf
243.00
Closed
243.00 
243.00 
---------
Dirt Works
Dirt Works
  
8771781  
000348280
1/22/2014
5/21/2014 
104(a)
1/22/2014
C
N
56.14100(b)
http://www.gpo.gov/fdsys/pkg/CFR-2014-title30-vol1/pdf/CFR-2014-title30-vol1-sec56-14100.pdf
100.00
Closed
100.00 
100.00 
---------
Dirt Works
Dirt Works
  
8771784  
000345454
1/22/2014
4/20/2014

6466144  
000206354
11/3/2009
1/20/2010 
104(a)
11/3/2009
C
Y
56.9300(a)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-9300.pdf
100.00
Closed
100.00 
100.00 
---------
Holley Dirt Company, Inc
Holley Dirt Company, Inc
  
9421625  
 
5/15/2018
104(a)
5/15/2018
C
Y
56.20003(a)
Not Assessed Yet
Missing url


IndexError: ('list index out of range', 'occurred at index 1')

# Okay, now do it for ALL of the mines

Open up `mines.csv` using pandas and do the same thing, it will just be for more mines this time.

In [19]:
pd.read_csv('mines.csv')

Unnamed: 0,id
0,3503598
1,4801789
2,5001797
3,4608254
4,2103723
5,4104757
6,801306
7,3901432
8,3609624
9,3609931


In [20]:
all_df = pd.read_csv('mines.csv')
all_df.head()

Unnamed: 0,id
0,3503598
1,4801789
2,5001797
3,4608254
4,2103723


In [21]:
all_df.dtypes

id    int64
dtype: object

In [22]:
all_df = pd.read_csv('mines.csv', dtype={'id': str})
all_df.head()

Unnamed: 0,id
0,3503598
1,4801789
2,5001797
3,4608254
4,2103723


In [23]:
all_df.dtypes

id    object
dtype: object

In [27]:
#all_df.apply(get_moreviolations, axis=1)

In [32]:
all_df = pd.read_csv('mines.csv', dtype={'id': str})
all_df

Unnamed: 0,id
0,3503598
1,4801789
2,5001797
3,4608254
4,2103723
5,4104757
6,801306
7,3901432
8,3609624
9,3609931


In [34]:
all_df.apply(get_violations, axis=1)

---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8992368  
000440809
5/2/2017
7/23/2017 
104(a)
5/2/2017
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-vol1/pdf/CFR-2017-title30-vol1-sec56-14107.pdf
116.00
Closed
116.00 
116.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8992214  
000435717
2/7/2017
5/21/2017 
104(a)
2/8/2017
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2017-title30-vol1/pdf/CFR-2017-title30-vol1-sec56-14107.pdf
116.00
Closed
116.00 
116.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8872461  
000398910
10/21/2015
1/13/2016 
104(a)
10/27/2015
C
N
56.12004
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-12004.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
8872462  
000398910
10/21/2015
1/13/2016 
104(a)
10/27/2015
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-14107.pdf
100.00
Closed
100.00 
100.00 
---

100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479620  
000180614
2/26/2009
5/8/2009 
104(a)
3/4/2009
C
N
56.12028
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-12028.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479613  
000180614
2/26/2009
5/22/2010 
104(a)
2/26/2009
C
Y
56.3200
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-3200.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479612  
000180614
2/26/2009
5/22/2010 
104(a)
3/4/2009
C
Y
56.9200(d)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-9200.pdf
263.00
Closed
263.00 
263.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6479617  
000180614
2/26/2009
5/8/2009 
104(a)
3/4/2009
C
N
56.4201(a)(2)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-4201.pdf
100.00
Closed
100.00 
100.00

8601402  
000357404
6/17/2014
9/3/2014 
104(a)
6/17/2014
C
N
56.4201(a)(1)
http://www.gpo.gov/fdsys/pkg/CFR-2014-title30-vol1/pdf/CFR-2014-title30-vol1-sec56-4201.pdf
100.00
Closed
100.00 
100.00 
---------
Dirt Company
Dirt Company
  
8601222  
000293537
5/31/2012
8/9/2012 
104(a)
5/31/2012
C
Y
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2012-title30-vol1/pdf/CFR-2012-title30-vol1-sec56-14107.pdf
108.00
Closed
108.00 
108.00 
---------
Dirt Company
Dirt Company
  
8601223  
000293537
5/31/2012
8/9/2012 
104(a)
5/31/2012
C
Y
56.14100(d)
http://www.gpo.gov/fdsys/pkg/CFR-2012-title30-vol1/pdf/CFR-2012-title30-vol1-sec56-14100.pdf
243.00
Closed
243.00 
243.00 
---------
Dirt Company
Dirt Company
  
8601224  
000293537
5/31/2012
8/9/2012 
104(a)
5/31/2012
C
N
56.4201(a)(1)
http://www.gpo.gov/fdsys/pkg/CFR-2012-title30-vol1/pdf/CFR-2012-title30-vol1-sec56-4201.pdf
100.00
Closed
100.00 
100.00 
---------
Dirt Company
Dirt Company
  
8601228  
000293537
5/31/2012
8/9/2012 
104(a)
5/31/2012
C

C
N
56.14100(b)
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-14100.pdf
285.00
Closed
285.00 
285.00 
---------
Sunshine Reclamation Inc.
Sunshine Reclamation Inc.
  
8570064  
000221815
4/15/2010
7/23/2010 
104(a)
4/19/2010
C
N
56.14100(b)
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-14100.pdf
100.00
Closed
100.00 
100.00 
---------
Sunshine Reclamation Inc.
Sunshine Reclamation Inc.
  
8570067  
000221815
4/15/2010
7/23/2010 
104(a)
4/15/2010
C
N

http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec-.pdf
100.00
Closed
100.00 
100.00 
---------
Sunshine Reclamation Inc.
Sunshine Reclamation Inc.
  
8570070  
000221815
4/15/2010
7/23/2010 
104(a)
4/19/2010
C
N
56.14132(a)
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-14132.pdf
100.00
Closed
100.00 
100.00 
---------
Sunshine Reclamation Inc.
Sunshine Reclamation Inc.
  
8570065  
000221815
4/15/201

Newberg Rock & Dirt
Newberg Rock & Dirt
  
6338511  
000210055
12/29/2009
1/10/2013 
104(a)
1/11/2010
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-14107.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6338507  
000210055
12/29/2009
1/10/2013 
104(a)
1/11/2010
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-14107.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6338508  
000210055
12/29/2009
1/10/2013 
104(a)
12/30/2009
C
N
56.11002
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-11002.pdf
100.00
Closed
100.00 
100.00 
---------
Newberg Rock & Dirt
Newberg Rock & Dirt
  
6338512  
000210055
12/29/2009
1/10/2013 
104(a)
12/29/2009
C
N
56.12004
http://www.gpo.gov/fdsys/pkg/CFR-2009-title30-vol1/pdf/CFR-2009-title30-vol1-sec56-12004.pdf
100.00
Closed
100.00 
100.00 
---------


114.00 
---------
Dirt Company
Dirt Company
  
8601712  
000423303
10/1/2016
12/16/2016 
104(a)
10/28/2016
C
N
56.14112(a)(1)
http://www.gpo.gov/fdsys/pkg/CFR-2016-title30-vol1/pdf/CFR-2016-title30-vol1-sec56-14112.pdf
114.00
Closed
114.00 
114.00 
---------
Dirt Company
Dirt Company
  
6356395  
000416201
6/28/2016
9/16/2016 
104(a)
6/30/2016
C
N
56.14112(b)
http://www.gpo.gov/fdsys/pkg/CFR-2016-title30-vol1/pdf/CFR-2016-title30-vol1-sec56-14112.pdf
114.00
Closed
114.00 
114.00 
---------
Dirt Company
Dirt Company
  
6356396  
000416201
6/28/2016
9/16/2016 
104(a)
6/28/2016
C
N
56.14132(b)(2)
http://www.gpo.gov/fdsys/pkg/CFR-2016-title30-vol1/pdf/CFR-2016-title30-vol1-sec56-14132.pdf
114.00
Closed
114.00 
114.00 
---------
Dirt Company
Dirt Company
  
8601395  
000390299
7/2/2015
10/8/2015 
104(a)
8/25/2015
C
N
56.14107(a)
http://www.gpo.gov/fdsys/pkg/CFR-2015-title30-vol1/pdf/CFR-2015-title30-vol1-sec56-14107.pdf
100.00
Closed
100.00 
100.00 
---------
Dirt Company
Dirt Company
  
86

176.00
Closed
176.00 
176.00 
---------
Dirt Con
Dirt Con
J392 
8580026  
000252992
10/26/2010
5/28/2011 
104(a)
10/26/2010
C
Y
56.14207
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-14207.pdf
1,944.00
Closed
1,944.00 
1,944.00 
---------
Dirt Con
Dirt Con
J392 
8580028  
000239419
10/26/2010
1/2/2011 
104(a)
10/26/2010
C
Y
56.14132(a)
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-14132.pdf
585.00
Closed
585.00 
585.00 
---------
Sunshine Reclamation Inc.
Sunshine Reclamation Inc.
  
8580029  
000240535
10/26/2010
1/13/2011 
104(a)
10/26/2010
C
N
56.14132(a)
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-14132.pdf
362.00
Closed
362.00 
362.00 
---------
Sunshine Reclamation Inc.
Sunshine Reclamation Inc.
  
8570069  
000221815
4/19/2010
7/23/2010 
104(a)
5/4/2010
C
N
56.9300(b)
http://www.gpo.gov/fdsys/pkg/CFR-2010-title30-vol1/pdf/CFR-2010-title30-vol1-sec56-9300.pdf
150.00
Cl

IndexError: ('list index out of range', 'occurred at index 3')