# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm), thank goodness we can search for these things.

## Setup: Import what you'll need to search and scrape and Selenium

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

## Starting from `https://arlweb.msha.gov/drs/drshome.htm`, search for every operator with 'dirt' in their name, including abandoned mines.

> - *Tip: If you can't make an element work using name, class or ID, try to use the XPath*

In [2]:
driver = webdriver.Chrome()

driver.get("https://arlweb.msha.gov/drs/drshome.htm")

In [3]:
text_input = driver.find_element_by_name('OperSearch')

In [4]:
driver.execute_script("arguments[0].scrollIntoView(true)", text_input)

In [5]:
text_input.send_keys('dirt')

In [6]:
abandoned_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[3]/td[3]/table/tbody/tr/td/input')
abandoned_button.click()

In [7]:
button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[7]/td[3]/input[1]')
button.click()

## Scrape the results page, saving it as `dirt-operators.csv`

> - *Tip: Think about what each row in your dataset will be, and start by looping through that*
> - *Tip: Printing is cool and good! Print everything! Move it into a dictionary later.*
> - *Tip: If you don't want a row, think about what's in the row that makes it different. You can use an `if` statement or list slicing to skip the ones you aren't interested in.*
> - *Tip: Make sure your dictionary and your loop variable have DIFFERENT NAMES*
> - *Tip: After you've made your dictionary (and printed it, of course), you'll want to add it to your list of rows*
> - *Tip: Be sure to import pandas to convert it to a dataframe*
> - *Tip: Make sure you don't include the index when saving your dataframe*

### Hopefully you know that each `tr` is supposed to be a row of your data. What is the index of the first row element that is actually a result?

> - *Tip: `.text` will help you here.*
> - *Tip: You aren't interesting in annotations or anything, just mines and where they are from*
> - *Tip: Using `print("-----")` will help you keep track of different rows*
> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third. You can use this to skip ahead to the 'good' data if you want*

In [8]:
driver.find_elements_by_tag_name('tr')[7].text


'3503598\nOR  Newberg Rock & Dirt   Newberg Rock & Dirt Surface M  Active  Crushed, Broken Stone NEC '

### Loop through each operator result, printing its name

> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third.*
> - *Tip: You can use list slicing or an `if` statement to skip the non-data row(s). List slicing is probably easier, even if you aren't comfortable with it.*
> - *Tip: or honestly you can use `try` and `except` if you know how it works.*
> - *Tip: Once you have the "right" rows of data, you're going to be looking for a certain tag inside*
> - *Tip: Sometimes you can't say "give me this class," and instead you have to say "give me all of the `div` elements, and then give me the third one."*

In [9]:
operators = driver.find_elements_by_tag_name('tr')
for operator in operators [7:]:
        columns = operator.find_elements_by_tag_name('td')[2]
        print(columns.text)

Newberg Rock & Dirt  
Allied Dirt Moving Company  
AM Dirtworks & Aggregate Sales  
Atlas-Dirty Devil Mining  
Atlas-Dirty Devil Mining  
Babe's Dirt Work  
Bar-Lin Dirt Company  
Barber'S Dirt Pit  
Bender Sand & Dirt  
BERT'S DIRT  
Big D Dirt Service Inc  
Big Red Dirt Farm LLC  
Big River Dirt Pit  
Bob Harris Dirt Contracting  
Bohannon Sand & Dirt  
Bratcher'S Sand & Dirt  
Brewer Dirt Works  
Buck'S Dirt Pit  
C & G Dirt Hauling  
C N C Dirt Movers, Inc.  
Cambridge Dirt Sand and Gravel LLC  
Central Iowa Dirt & Demo LLC  
Crowes Trucking & Dirt Pit Services  
D & H Dirt  
Diez Dirt & Sand Hauling Inc  
Dirt Cheap  
Dirt Company  
Dirt Company  
Dirt Company  
Dirt Con  
Dirt Diggers Inc  
Dirt Doctor Inc  
Dirt Inc  
Dirt Pit  
Dirt Work Specialists LLC  
Dirt Works  
Dirtco Inc  
Dirtman Trucking  
DIRTWORKS, INC.  
Dirtworks, Inc.  
Dirty Coal  
Dorchester Dirt Company Inc  
Douglas Dirt Sand & Gravel Company  
Ell Dirt Works LLC.  
Floyd Smith Dirt Pit  
Gary Kelm Dirt Servi

IndexError: list index out of range

### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [None]:
operators = driver.find_elements_by_tag_name('tr')
for operator in operators [7:]:
        operator_id = operator.find_elements_by_tag_name('td')[0]
        print(operator_id.text)

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

> - *Tip: Start with an empty dictionary, then add the keys one at a time like we did during class*
> - *Tip: You might want to save all of the cells in a variable, then use indexes to get the second, third, fourth, etc.*
> - *Tip: I know you already skipped a bunch of rows already, but one of them still might be bad! Which one is it? How can you skip it? You might need to slice out some of the end of your list, too. Use `print` to help you debug, or just look at the page closely.*
> - *Tip: Or, if you did the other homework already, `try` / `except` is also an option*

In [14]:
operators = driver.find_elements_by_tag_name('tr')[7:]

In [15]:
mines = []

In [18]:
for operator in operators:
    print('-----------')
    row = {}

    ID = operator.find_elements_by_tag_name('td')[0]
    row['ID'] = ID.text
    
    State = operator.find_elements_by_tag_name('td')[1]
    row['State'] = State.text
    
    Operator_Name = operator.find_elements_by_tag_name('td')[2]
    row['Operator_Name'] = Operator_Name.text
    
    Mine_Name = operator.find_elements_by_tag_name('td')[3]
    row['Mine_Name'] = Mine_Name.text
    
    Type = operator.find_elements_by_tag_name('td')[4]
    row['Type'] = Type.text
    
    CM = operator.find_elements_by_tag_name('td')[5]
    row['CM'] = CM.text
    
    Status = operator.find_elements_by_tag_name('td')[6]
    row['Status'] = Status.text
    
    Commodity = operator.find_elements_by_tag_name('td')[7]
    row['Commodity'] = Commodity.text
    
    print("dictionary look like", row)
    
    mines.append(row)

-----------
dictionary look like {'ID': '3503598', 'State': 'OR ', 'Operator_Name': 'Newberg Rock & Dirt  ', 'Mine_Name': 'Newberg Rock & Dirt', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Active ', 'Commodity': 'Crushed, Broken Stone NEC '}
-----------
dictionary look like {'ID': '0502030', 'State': 'CO ', 'Operator_Name': 'Allied Dirt Moving Company  ', 'Mine_Name': 'Allied Dirt Moving Co Pit & Plant', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '4801789', 'State': 'ND ', 'Operator_Name': 'AM Dirtworks & Aggregate Sales  ', 'Mine_Name': 'AM Dirtworks & Aggregate Sales', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '4201449', 'State': 'UT ', 'Operator_Name': 'Atlas-Dirty Devil Mining  ', 'Mine_Name': 'Unit Train Loading Facility', 'Type': 'Facility', 'CM': 'C ', 'Status': 'Abandoned ', 'Commodity'

dictionary look like {'ID': '4104757', 'State': 'TX ', 'Operator_Name': 'Dirt Works  ', 'Mine_Name': 'Portable #1', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Intermittent ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '0301729', 'State': 'AR ', 'Operator_Name': 'Dirtco Inc  ', 'Mine_Name': 'DIRTCO INC', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '0404851', 'State': 'CA ', 'Operator_Name': 'Dirtman Trucking  ', 'Mine_Name': 'Dirtman Sand & Gravel #2', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '2200734', 'State': 'MS ', 'Operator_Name': 'DIRTWORKS, INC.  ', 'Mine_Name': 'DIRTWORKS, INC.', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '5002028', 'State': 'AK ',

dictionary look like {'ID': '1600953', 'State': 'LA ', 'Operator_Name': 'Little-G-Dirt Pit  ', 'Mine_Name': 'Little-G-Dirt Pit', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '4104185', 'State': 'TX ', 'Operator_Name': 'Lone Star Dirt & Paving  ', 'Mine_Name': 'Lone Star Crusher', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Crushed, Broken Stone NEC '}
-----------
dictionary look like {'ID': '2901536', 'State': 'NM ', 'Operator_Name': "Loyd'S Dirt & Gravel  ", 'Mine_Name': "LOYD'S PIT", 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '3609624', 'State': 'PA ', 'Operator_Name': 'M R Dirt  ', 'Mine_Name': 'Forbes Pit', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Temporarily Idled ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '3800709', 'S

dictionary look like {'ID': '4102682', 'State': 'TX ', 'Operator_Name': 'Peveto Dirt Pit  ', 'Mine_Name': 'Peveto Pit', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '0801259', 'State': 'FL ', 'Operator_Name': 'Phil-Dirt Industries, Inc  ', 'Mine_Name': 'PIT #1', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '0203332', 'State': 'AZ ', 'Operator_Name': 'Prescott Dirt, LLC  ', 'Mine_Name': 'Sandretto Drive', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Intermittent ', 'Commodity': 'Construction Sand and Gravel '}
-----------
dictionary look like {'ID': '0302015', 'State': 'AR ', 'Operator_Name': 'R & R Dirtworks  ', 'Mine_Name': 'Martins Quarry', 'Type': 'Surface', 'CM': 'M ', 'Status': 'Abandoned ', 'Commodity': 'Crushed, Broken Limestone NEC '}
-----------
dictionary look like {'ID': '2901986', 'State'

IndexError: list index out of range

### Save that to a CSV named `dirt-operators.csv`

In [17]:
import pandas as pd

In [19]:
df = pd.DataFrame(mines)
df.head(10)

Unnamed: 0,CM,Commodity,ID,Mine_Name,Operator_Name,State,Status,Type
0,M,"Crushed, Broken Stone NEC",3503598,Newberg Rock & Dirt,Newberg Rock & Dirt,OR,Active,Surface
1,M,Construction Sand and Gravel,502030,Allied Dirt Moving Co Pit & Plant,Allied Dirt Moving Company,CO,Abandoned,Surface
2,M,Construction Sand and Gravel,4801789,AM Dirtworks & Aggregate Sales,AM Dirtworks & Aggregate Sales,ND,Abandoned,Surface
3,C,Coal (Bituminous),4201449,Unit Train Loading Facility,Atlas-Dirty Devil Mining,UT,Abandoned,Facility
4,C,Coal (Bituminous),4201450,Blackie Surface Mine & Prep Plant,Atlas-Dirty Devil Mining,UT,Abandoned,Surface
5,M,Construction Sand and Gravel,1002257,"Hitt Pit, Inc.",Babe's Dirt Work,ID,Abandoned,Surface
6,M,Construction Sand and Gravel,1601167,Bar-Lin Dirt Pit,Bar-Lin Dirt Company,LA,Abandoned,Surface
7,M,Construction Sand and Gravel,4103265,Barber'S Dirt Pit,Barber'S Dirt Pit,TX,Abandoned,Surface
8,M,Construction Sand and Gravel,1401575,BENDER SAND & DIRT,Bender Sand & Dirt,KS,Intermittent,Surface
9,M,Construction Sand and Gravel,1700776,BERT'S DIRT,BERT'S DIRT,ME,Abandoned,Surface


In [20]:
df.to_csv("dirt-operators.csv", index=False)

### Open the CSV file and examine the first few.

Make sure you didn't save that extra weird unnamed index column.

In [21]:
df = pd.read_csv('dirt-operators.csv')
df

Unnamed: 0,CM,Commodity,ID,Mine_Name,Operator_Name,State,Status,Type
0,M,"Crushed, Broken Stone NEC",3503598,Newberg Rock & Dirt,Newberg Rock & Dirt,OR,Active,Surface
1,M,Construction Sand and Gravel,502030,Allied Dirt Moving Co Pit & Plant,Allied Dirt Moving Company,CO,Abandoned,Surface
2,M,Construction Sand and Gravel,4801789,AM Dirtworks & Aggregate Sales,AM Dirtworks & Aggregate Sales,ND,Abandoned,Surface
3,C,Coal (Bituminous),4201449,Unit Train Loading Facility,Atlas-Dirty Devil Mining,UT,Abandoned,Facility
4,C,Coal (Bituminous),4201450,Blackie Surface Mine & Prep Plant,Atlas-Dirty Devil Mining,UT,Abandoned,Surface
5,M,Construction Sand and Gravel,1002257,"Hitt Pit, Inc.",Babe's Dirt Work,ID,Abandoned,Surface
6,M,Construction Sand and Gravel,1601167,Bar-Lin Dirt Pit,Bar-Lin Dirt Company,LA,Abandoned,Surface
7,M,Construction Sand and Gravel,4103265,Barber'S Dirt Pit,Barber'S Dirt Pit,TX,Abandoned,Surface
8,M,Construction Sand and Gravel,1401575,BENDER SAND & DIRT,Bender Sand & Dirt,KS,Intermittent,Surface
9,M,Construction Sand and Gravel,1700776,BERT'S DIRT,BERT'S DIRT,ME,Abandoned,Surface
