# Rock and Mineral Clubs

Scrape all of the rock and mineral clubs listed at https://rocktumbler.com/blog/rock-and-mineral-clubs/ (but don't just cut and paste!)

Save a CSV called `rock-clubs.csv` with the name of the club, their URL, and the city they're located in.

**Bonus**: Add a column for the state. There are a few ways to do this, but knowing that `element.parent` goes 'up' one element might be helpful.

* _**Hint:** The name of the club and the city are both inside of td elements, and they aren't distinguishable by class. Instead you'll just want to ask for all of the tds and then just ask for the text from the first or second one._
* _**Hint:** If you use BeautifulSoup, you can select elements by attributes other than class or id - instead of `doc.find_all({'class': 'cat'})` you can do things like `doc.find_all({'other_attribute': 'blah'})` (sorry for the awful example)._
* _**Hint:** If you love `pd.read_html` you might also be interested in `pd.concat` and potentially `list()`. But you'll have to clean a little more!_

In [None]:
#Importing necessary libraries

import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import re

In [None]:
url = "https://rocktumbler.com/blog/rock-and-mineral-clubs/"
r = requests.get(url).content

soup_doc = BeautifulSoup(r, "html.parser")
print(soup_doc.prettify())

In [None]:
sections = soup_doc.find_all('section')[1:]
sections[0]

In [72]:
#Creating list of dictionaries 'rock_list'

rock_list = []


for i in range(0, len(sections)-2, 1):
 
    #extract title
    trs = sections[i].find_all('tr')
    state = trs[0].find('h3')
    
    for title in trs[1:]:
        rocks ={}
        rocks['title'] = title.find('a').text
        rocks['state'] = state.text.replace("Rock and Mineral Clubs", "").strip(" ")
        rocks['link'] = title.a['href']
        rocks['city'] = title.find_all('td')[1].text
        rock_list.append(rocks)
        
rock_list

[{'title': 'Alabama Mineral & Lapidary Society',
  'state': 'Alabama',
  'link': 'http://www.lapidaryclub.com/',
  'city': 'Birmingham'},
 {'title': 'Dothan Gem & Mineral Club',
  'state': 'Alabama',
  'link': 'http://www.wiregrassrockhounds.com/',
  'city': 'Dothan'},
 {'title': 'Huntsville Gem and Mineral Society',
  'state': 'Alabama',
  'link': 'http://huntsvillegms.org/',
  'city': 'Huntsville'},
 {'title': 'Mobile Rock & Gem Society',
  'state': 'Alabama',
  'link': 'http://www.mobilerockandgem.com/',
  'city': 'Mobile'},
 {'title': 'Montgomery Gem & Mineral Society',
  'state': 'Alabama',
  'link': 'http://montgomerygemandmineralsociety.com/mgms/',
  'city': 'Montgomery'},
 {'title': 'Chugach Gem & Mineral Society',
  'state': 'Alaska',
  'link': 'http://www.chugachgemandmineralsociety.com/',
  'city': 'Anchorage'},
 {'title': 'Mat-Su Rock and Mineral Club',
  'state': 'Alaska',
  'link': 'http://matsurockclub.com/',
  'city': 'Palmer'},
 {'title': 'Apache Junction Rock and Gem 

In [76]:
#Creating dataframe rock_df from rock_list

rock_df = pd.DataFrame(rock_list)
rock_df.head(50)

Unnamed: 0,title,state,link,city
0,Alabama Mineral & Lapidary Society,Alabama,http://www.lapidaryclub.com/,Birmingham
1,Dothan Gem & Mineral Club,Alabama,http://www.wiregrassrockhounds.com/,Dothan
2,Huntsville Gem and Mineral Society,Alabama,http://huntsvillegms.org/,Huntsville
3,Mobile Rock & Gem Society,Alabama,http://www.mobilerockandgem.com/,Mobile
4,Montgomery Gem & Mineral Society,Alabama,http://montgomerygemandmineralsociety.com/mgms/,Montgomery
5,Chugach Gem & Mineral Society,Alaska,http://www.chugachgemandmineralsociety.com/,Anchorage
6,Mat-Su Rock and Mineral Club,Alaska,http://matsurockclub.com/,Palmer
7,Apache Junction Rock and Gem Club,Arizona,http://www.ajrockclub.com/,Apache Junction
8,Black Canyon City Rock Club,Arizona,http://www.bccrockclub.mysite.com/,Black Canyon City
9,Daisy Mountain Rock & Mineral Club,Arizona,http://www.dmrmc.com/,Anthem


In [75]:
#Creating .csv file from df

rock_df.to_csv(r'/Users/karthikanamboothiri/Desktop/Columbia_academics/CODE/Files/rocks-clubs.csv', index=False)