# Mines, Part 2

You can get information about a specific mine by using its Mine ID.

**Try searching using the Mine ID `3503598`**.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

### What is the tag and class name for the mine operator name?

In [None]:
# <b> tag within a <font> tag within a <td> tag without any class

### What is the tag and class name for the current controller?

In [None]:
# <b> tag within a <font> tag within a <td> tag without any class

### What is the tag and class name for the operator history area?

In [None]:
# The operator history area is a <table> tag without class

### What is the tag and class name for the mine's address?

In [None]:
# The mine's address is a <font> tag without class and within <td> tag

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [169]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Scrape this page

Scrape this page, displaying the

- The operator
- The current address
- The current controller

**You should know how to do `.post` requests by now.**

In [189]:
data = {
    'MineId':'3503598',
    'x':'0',
    'y':'0'
}
response = requests.post('https://arlweb.msha.gov/drs/ASP/BasicMineInfonew.asp', data=data)
doc = BeautifulSoup(response.text, "html.parser")

In [228]:
infos_table = doc.find_all('table')[1]
print('Operator:', infos_table.find('tr').find_all_next('tr')[2].find_all('td')[4].text.strip())
print('Address:', infos_table.find('tr').find_all_next('tr')[17].find_all('td')[1].text.strip())
print('Controller:', infos_table.find('tr').find_all_next('tr')[10].find_all('td')[3].text.strip())

Operator: Newberg Rock & Dirt
Address: Yamhill County,  OR
Controller: S-2 Contractors Inc


## Getting information on many mines

### Reading in our source

Using pandas, read in `mines-subset.csv`.

In [233]:
mines_sublet = pd.read_csv('mines-subset.csv')
mines_sublet.head()

Unnamed: 0,id
0,2501216
1,3200965
2,2901371
3,2901544


## Scrape every single row, storing the current controller and mine operator in new columns.

You probably want to open up the Jupyter Notebook that's about `.apply`.

In [234]:
def transform(r):
    
    data = {
        'MineId': r['id'],
        'x':'0',
        'y':'0'
    }

    response = requests.post('https://arlweb.msha.gov/drs/ASP/BasicMineInfonew.asp', data=data)
    doc = BeautifulSoup(response.text, "html.parser")
    
    infos_table = doc.find_all('table')[1]
    
    return pd.Series({
        'id': r['id'],
        'operator': infos_table.find('tr').find_all_next('tr')[2].find_all('td')[4].text.strip(),
        'controller': infos_table.find('tr').find_all_next('tr')[8].find_all('td')[1].text.strip(),
        'address':infos_table.find('tr').find_all_next('tr')[15].find_all('td')[1].get_text(strip=True)
    })

mines_sublet = mines_sublet.apply(transform, axis=1)
mines_sublet

Unnamed: 0,address,controller,id,operator
0,"24617 W Center RdWaterloo, NE 68069",David A Iske,2501216,Iske Dirt Sand & Gravel
1,"485 Helene StPalermo, ND 58769",John Lynn,3200965,J M Lynn Dirtwork
2,"E Hwy 60HEREFORD, TX 79045",Lawson Warner,2901371,Jake Diel Dirt & Paving Inc
3,"E Hwy 60HEREFORD, TX 79045",Lawson Warner,2901544,Jake Diel Dirt & Paving Inc


### Save your dataframe

In [235]:
mines_sublet.to_csv('mines-complete.csv', index=False)

### Re-open your dataframe to confirm you didn't save any extra weird columns

In [236]:
pd.read_csv('mines-complete.csv')

Unnamed: 0,address,controller,id,operator
0,"24617 W Center RdWaterloo, NE 68069",David A Iske,2501216,Iske Dirt Sand & Gravel
1,"485 Helene StPalermo, ND 58769",John Lynn,3200965,J M Lynn Dirtwork
2,"E Hwy 60HEREFORD, TX 79045",Lawson Warner,2901371,Jake Diel Dirt & Paving Inc
3,"E Hwy 60HEREFORD, TX 79045",Lawson Warner,2901544,Jake Diel Dirt & Paving Inc


## Repeat this process for the entire `mines.csv` file

In [277]:
def transform(r):
    
    data = {
        'MineId': r['Operator ID'],
        'x':'0',
        'y':'0'
    }
    
    response = requests.post('https://arlweb.msha.gov/drs/ASP/BasicMineInfonew.asp', data=data)
    doc = BeautifulSoup(response.text, "html.parser")
    
    if len(doc.find_all('table')) > 1:
        infos_table = doc.find_all('table')[1]      
        
        return pd.Series({
            'controller': infos_table.find('tr').find_all_next('tr')[8].find_all('td')[1].text.strip(),
            'address':infos_table.find('tr').find_all_next('tr')[15].find_all('td')[1].get_text(strip=True)
        })
    else:
        return pd.Series({
            'controller': 'NaN',
            'address': 'NaN'
        })
    
mines = pd.read_csv('mines.csv')[1:10]
merged = mines.apply(transform, axis=1).join(mines)
merged.head()

Unnamed: 0,address,controller,Coal or metal,Commodity,Mine name,Mine type,Operator ID,Operator name,State,Status
1,,,M,Construction Sand and Gravel,Allied Dirt Moving Co Pit & Plant,Surface,502030,Allied Dirt Moving Company,CO,Abandoned
2,"120 Dally LnBuffalo, WY 82834",Matt Mitchell,M,Construction Sand and Gravel,AM Dirtworks & Aggregate Sales,Surface,4801789,AM Dirtworks & Aggregate Sales,ND,Intermittent
3,"8002 Dogwood TrailHAUGHTON, LA 71037",Barlow James & John Lindsey,M,Construction Sand and Gravel,Bar-Lin Dirt Pit,Surface,1601167,Bar-Lin Dirt Company,LA,Abandoned
4,"Orange County, TX",Barber'S Dirt Pit,M,Construction Sand and Gravel,Barber'S Dirt Pit,Surface,4103265,Barber'S Dirt Pit,TX,Abandoned
5,"Oreana Star RteLOVELOCK, NV 89419",Cramer Basil,M,Gold Ore,Pay Dirt,Surface,2601714,Basil Cramer,NV,Abandoned
