# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm), thank goodness we can search for these things.

## Setup: Import what you'll need to search and scrape and Selenium

In [2]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import pandas as pd

## Starting from `https://arlweb.msha.gov/drs/drshome.htm`, search for every operator with 'dirt' in their name, including abandoned mines.

> - *Tip: If you can't make an element work using name, class or ID, try to use the XPath*

In [3]:
#Load the page
driver = webdriver.Chrome()
driver.get('https://arlweb.msha.gov/drs/drshome.htm')

In [4]:
#Seach for the 'operator name' input box and enter search terms
text_input = driver.find_element_by_name('OperSearch')
text_input.send_keys('dirt')

In [5]:
#click the checkbox for 'include abandoned mines'
check = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[3]/td[3]/table/tbody/tr/td/input')
check.click()

In [6]:
#click the 'search' button
search = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[7]/td[3]/input[1]')
search.click()

## Scrape the results page, saving it as `dirt-operators.csv`

> - *Tip: Think about what each row in your dataset will be, and start by looping through that*
> - *Tip: Printing is cool and good! Print everything! Move it into a dictionary later.*
> - *Tip: If you don't want a row, think about what's in the row that makes it different. You can use an `if` statement or list slicing to skip the ones you aren't interested in.*
> - *Tip: Make sure your dictionary and your loop variable have DIFFERENT NAMES*
> - *Tip: After you've made your dictionary (and printed it, of course), you'll want to add it to your list of rows*
> - *Tip: Be sure to import pandas to convert it to a dataframe*
> - *Tip: Make sure you don't include the index when saving your dataframe*

### Hopefully you know that each `tr` is supposed to be a row of your data. What is the index of the first row element that is actually a result?

> - *Tip: `.text` will help you here.*
> - *Tip: You aren't interesting in annotations or anything, just mines and where they are from*
> - *Tip: Using `print("-----")` will help you keep track of different rows*
> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third. You can use this to skip ahead to the 'good' data if you want*

In [7]:
operators = driver.find_elements_by_tag_name('tr')
operators = operators[7:-2]
print(operators[0].text)

3503598
OR  Newberg Rock & Dirt   Newberg Rock & Dirt Surface M  Active  Crushed, Broken Stone NEC 


In [8]:
print(operators[-1].text)

4103429
TX  Y B Dirt & Loam   Y B Mine Surface M  Abandoned  Construction Sand and Gravel 


### Loop through each operator result, printing its name

> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third.*
> - *Tip: You can use list slicing or an `if` statement to skip the non-data row(s). List slicing is probably easier, even if you aren't comfortable with it.*
> - *Tip: or honestly you can use `try` and `except` if you know how it works.*
> - *Tip: Once you have the "right" rows of data, you're going to be looking for a certain tag inside*
> - *Tip: Sometimes you can't say "give me this class," and instead you have to say "give me all of the `div` elements, and then give me the third one."*

In [10]:
for operator in operators[7:-2]:
    print('------')
    columns = operator.find_elements_by_tag_name('td')
    print('Operator name:', columns[2].text)

------
Operator name: Barber'S Dirt Pit  
------
Operator name: Bender Sand & Dirt  
------
Operator name: BERT'S DIRT  
------
Operator name: Big D Dirt Service Inc  
------
Operator name: Big Red Dirt Farm LLC  
------
Operator name: Big River Dirt Pit  
------
Operator name: Bob Harris Dirt Contracting  
------
Operator name: Bohannon Sand & Dirt  
------
Operator name: Bratcher'S Sand & Dirt  
------
Operator name: Brewer Dirt Works  
------
Operator name: Buck'S Dirt Pit  
------
Operator name: C & G Dirt Hauling  
------
Operator name: C N C Dirt Movers, Inc.  
------
Operator name: Cambridge Dirt Sand and Gravel LLC  
------
Operator name: Central Iowa Dirt & Demo LLC  
------
Operator name: Crowes Trucking & Dirt Pit Services  
------
Operator name: D & H Dirt  
------
Operator name: Diez Dirt & Sand Hauling Inc  
------
Operator name: Dirt Cheap  
------
Operator name: Dirt Company  
------
Operator name: Dirt Company  
------
Operator name: Dirt Company  
------
Operator name

### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [11]:
for operator in operators[7:-2]:
    print('------')
    columns = operator.find_elements_by_tag_name('td')
    print('Operator ID is:', columns[0].text)

------
Operator ID is: 4103265
------
Operator ID is: 1401575
------
Operator ID is: 1700776
------
Operator ID is: 1601251
------
Operator ID is: 0301963
------
Operator ID is: 1601082
------
Operator ID is: 3401751
------
Operator ID is: 1600916
------
Operator ID is: 3401211
------
Operator ID is: 0301267
------
Operator ID is: 1600956
------
Operator ID is: 2200033
------
Operator ID is: 0504953
------
Operator ID is: 3401929
------
Operator ID is: 1302445
------
Operator ID is: 1601106
------
Operator ID is: 3400915
------
Operator ID is: 1600983
------
Operator ID is: 4503200
------
Operator ID is: 3401266
------
Operator ID is: 3401468
------
Operator ID is: 5001797
------
Operator ID is: 4608254
------
Operator ID is: 1510279
------
Operator ID is: 2103723
------
Operator ID is: 0100776
------
Operator ID is: 4104016
------
Operator ID is: 2103914
------
Operator ID is: 4104757
------
Operator ID is: 0301729
------
Operator ID is: 0404851
------
Operator ID is: 2200734
------
O

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

> - *Tip: Start with an empty dictionary, then add the keys one at a time like we did during class*
> - *Tip: You might want to save all of the cells in a variable, then use indexes to get the second, third, fourth, etc.*
> - *Tip: I know you already skipped a bunch of rows already, but one of them still might be bad! Which one is it? How can you skip it? You might need to slice out some of the end of your list, too. Use `print` to help you debug, or just look at the page closely.*
> - *Tip: Or, if you did the other homework already, `try` / `except` is also an option*

In [13]:
rows= []
line = 0
for operator in operators[7:-2]:
    line += 1
    print('Now reading line ', line, 'out of ', len(operators[7:-2]))
    row={}
    columns = operator.find_elements_by_tag_name('td')
    row['Operator ID'] = columns[0].text
    row['State'] = columns[1].text
    row['Operator Name'] = columns[2].text
    row['Mine Name'] = columns[3].text
    row['Mine Type'] = columns[4].text
    row['Coal or Metal'] = columns[5].text
    row['Status'] = columns[6].text
    row['Commodity'] = columns[7].text
    rows.append(row)
print('Our list of dictionaries looks like:', rows)

Now reading line  1 out of  122
Now reading line  2 out of  122
Now reading line  3 out of  122
Now reading line  4 out of  122
Now reading line  5 out of  122
Now reading line  6 out of  122
Now reading line  7 out of  122
Now reading line  8 out of  122
Now reading line  9 out of  122
Now reading line  10 out of  122
Now reading line  11 out of  122
Now reading line  12 out of  122
Now reading line  13 out of  122
Now reading line  14 out of  122
Now reading line  15 out of  122
Now reading line  16 out of  122
Now reading line  17 out of  122
Now reading line  18 out of  122
Now reading line  19 out of  122
Now reading line  20 out of  122
Now reading line  21 out of  122
Now reading line  22 out of  122
Now reading line  23 out of  122
Now reading line  24 out of  122
Now reading line  25 out of  122
Now reading line  26 out of  122
Now reading line  27 out of  122
Now reading line  28 out of  122
Now reading line  29 out of  122
Now reading line  30 out of  122
Now reading line  3

### Save that to a CSV named `dirt-operators.csv`

In [14]:
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,Coal or Metal,Commodity,Mine Name,Mine Type,Operator ID,Operator Name,State,Status
0,M,Construction Sand and Gravel,Barber'S Dirt Pit,Surface,4103265,Barber'S Dirt Pit,TX,Abandoned
1,M,Construction Sand and Gravel,BENDER SAND & DIRT,Surface,1401575,Bender Sand & Dirt,KS,Intermittent
2,M,Construction Sand and Gravel,BERT'S DIRT,Surface,1700776,BERT'S DIRT,ME,Abandoned
3,M,Construction Sand and Gravel,Dorothy V Pit,Surface,1601251,Big D Dirt Service Inc,LA,Abandoned
4,M,Construction Sand and Gravel,Big Red Dirt Farm,Surface,301963,Big Red Dirt Farm LLC,AR,Abandoned


In [15]:
df.to_csv('dirt-operators.csv', index = False)

### Open the CSV file and examine the first few.

Make sure you didn't save that extra weird unnamed index column.

In [17]:
pd.read_csv('dirt-operators.csv').head()

Unnamed: 0,Coal or Metal,Commodity,Mine Name,Mine Type,Operator ID,Operator Name,State,Status
0,M,Construction Sand and Gravel,Barber'S Dirt Pit,Surface,4103265,Barber'S Dirt Pit,TX,Abandoned
1,M,Construction Sand and Gravel,BENDER SAND & DIRT,Surface,1401575,Bender Sand & Dirt,KS,Intermittent
2,M,Construction Sand and Gravel,BERT'S DIRT,Surface,1700776,BERT'S DIRT,ME,Abandoned
3,M,Construction Sand and Gravel,Dorothy V Pit,Surface,1601251,Big D Dirt Service Inc,LA,Abandoned
4,M,Construction Sand and Gravel,Big Red Dirt Farm,Surface,301963,Big Red Dirt Farm LLC,AR,Abandoned
