# Rock and Mineral Clubs

Scrape all of the rock and mineral clubs listed at https://rocktumbler.com/blog/rock-and-mineral-clubs/ (but don't just cut and paste!)

Save a CSV called `rock-clubs.csv` with the name of the club, their URL, and the city they're located in.

**Bonus**: Add a column for the state. There are a few ways to do this, but knowing that `element.parent` goes 'up' one element might be helpful.

* _**Hint:** The name of the club and the city are both inside of td elements, and they aren't distinguishable by class. Instead you'll just want to ask for all of the tds and then just ask for the text from the first or second one._
* _**Hint:** If you use BeautifulSoup, you can select elements by attributes other than class or id._

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
response = requests.get("https://rocktumbler.com/blog/rock-and-mineral-clubs/")
doc = BeautifulSoup(response.text)

In [3]:
clubs = []
lines = doc.find_all('td', attrs={'width' : "60%"})
for line in lines: 
    club = {}
    club['URL'] = line.a.get('href')
    club['name'] = line.text
    club['belongs_to'] = line.find_previous('h3').string
    clubs.append(club)

In [4]:
import pandas as pd
df = pd.DataFrame(clubs)

In [5]:
citylist = []
cities = doc.find_all('td', attrs={'width' : "40%"})
for city in cities:
    citylist.append(city.text)
df['city'] = citylist

In [6]:
df['state'] = df['belongs_to'].str.extract("(\w+) Rock and Mineral Clubs", expand=False)

In [7]:
df = df[['name', 'URL', 'city', 'state']]
df.head()

Unnamed: 0,name,URL,city,state
0,Alabama Mineral & Lapidary Society,http://www.lapidaryclub.com/,Birmingham,Alabama
1,Dothan Gem & Mineral Club,http://www.wiregrassrockhounds.com/,Dothan,Alabama
2,Huntsville Gem and Mineral Society,http://huntsvillegms.org/,Huntsville,Alabama
3,Mobile Rock & Gem Society,http://www.mobilerockandgem.com/,Mobile,Alabama
4,Montgomery Gem & Mineral Society,http://montgomerygemandmineralsociety.com/mgms/,Montgomery,Alabama


In [8]:
df.to_csv("rock-clubs.csv", index=False)