# School Board Minutes

Scrape all of the school board minutes from http://www.mineral.k12.nv.us/pages/School_Board_Minutes

Save a CSV called `minutes.csv` with the date and the URL to the file. The date should be formatted as YYYY-MM-DD.

**Bonus:** Download the PDF files

**Bonus 2:** Use [PDF OCR X](https://solutions.weblite.ca/pdfocrx/index.php) on one of the PDF files and see if it can be converted into text successfully.

* **Hint:** If you're just looking for links, there are a lot of other links on that page! Can you look at the link to know whether it links or minutes or not? You'll want to use an "if" statement.
* **Hint:** You could also filter out bad links later on using pandas instead of when scraping
* **Hint:** If you get a weird error that you can't really figure out, you can always tell Python to just ignore it using `try` and `except`, like below. Python will try to do the stuff inside of 'try', but if it hits an error it will skip right out.
* **Hint:** Remember the codes at http://strftime.org
* **Hint:** If you have a date that you've parsed, you can use `.dt.strftime` to turn it into a specially-formatted string. You use the same codes (like %B etc) that you use for converting strings into dates.

```python
try:
  blah blah your code
  your code
  your code
except:
  pass
```

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
response = requests.get("http://www.mineral.k12.nv.us/pages/School_Board_Minutes")
doc = BeautifulSoup(response.text)

In [3]:
minutes = []
links = doc.find_all('a')
for link in links:
    minute = {}
    minute['date'] = link.text
    minute['URL1'] = link.get('href')
    minutes.append(minute)

In [4]:
import pandas as pd
df = pd.DataFrame(minutes)

In [5]:
df = df.dropna(subset=['URL1', 'date'])
df = df[df.date.str.contains(',')]

In [6]:
df['date'] = pd.to_datetime(df.date)

In [7]:
df['URL'] = "http://www.mineral.k12.nv.us" + df.URL1

In [8]:
df = df[['date', 'URL']]
df.head()

Unnamed: 0,date,URL
22,2019-06-04,http://www.mineral.k12.nv.us/files/6.4.19_minu...
23,2019-05-28,http://www.mineral.k12.nv.us/files/5.28.19_min...
24,2019-05-07,http://www.mineral.k12.nv.us/files/5.7.19_minu...
25,2019-04-23,http://www.mineral.k12.nv.us/files/4.23.19_min...
26,2019-04-08,http://www.mineral.k12.nv.us/files/4.8.19_minu...


In [9]:
df.to_csv("minutes.csv", index=False)