# Rock and Mineral Clubs

Scrape all of the rock and mineral clubs listed at https://rocktumbler.com/blog/rock-and-mineral-clubs/ (but don't just cut and paste!)

Save a CSV called `rock-clubs.csv` with the name of the club, their URL, and the city they're located in.

**Bonus**: Add a column for the state. There are a few ways to do this, but knowing that `element.parent` goes 'up' one element might be helpful.

* _**Hint:** The name of the club and the city are both inside of td elements, and they aren't distinguishable by class. Instead you'll just want to ask for all of the tds and then just ask for the text from the first or second one._
* _**Hint:** If you use BeautifulSoup, you can select elements by attributes other than class or id._

In [1]:
import requests
import pandas as pd
import re

from bs4 import BeautifulSoup

In [2]:
url = 'https://rocktumbler.com/blog/rock-and-mineral-clubs/'
response = requests.get(url, verify=False)
doc = BeautifulSoup(response.text)



In [3]:
doc

<!DOCTYPE doctype html>

<html>
<head>
<meta charset="utf-8"/>
<link href="https://rocktumbler.com/blog/rock-and-mineral-clubs/" rel="canonical"/>
<title>450+ Rock and Mineral Clubs Across the USA</title>
<meta content="Rock mineral and gem clubs are a great way to meet people with knowledge and interest in the field. Find one near you." name="description"/>
<meta content="Rock and mineral clubs in the United States" name="page-topic"/>
<link href="https://rocktumbler.com/cssa.css" media="all" rel="stylesheet" type="text/css"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<link href="https://rocktumbler.com/favicon.ico" rel="SHORTCUT ICON"/>
<script async="async" src="https://www.googletagservices.com/tag/js/gpt.js"></script>
<script>
  var googletag = googletag || {};
  googletag.cmd = googletag.cmd || [];
</script>
<script>
  googletag.cmd.push(function() {
    googletag.defineSlot('/10057

In [11]:
tables = doc.find_all('section')[1:-3]
LIST = []
for table in tables:           
    rows = table.find_all('tr')[1:]
    for row in rows: 
        DIC = {}
        DIC['club'] = row.find('td').text
        DIC['url'] = row.find('td').find('a')['href']
        DIC['city'] = row.find_all('td')[1].text  
        DIC['state'] = re.findall('([A-Z]\w*)',row.parent.parent.find('h3').text)[0]
        LIST.append(DIC)

In [10]:
LIST

[{'club': 'Alabama Mineral & Lapidary Society',
  'url': 'http://www.lapidaryclub.com/',
  'city': 'Birmingham',
  'state': 'Alabama'},
 {'club': 'Dothan Gem & Mineral Club',
  'url': 'http://www.wiregrassrockhounds.com/',
  'city': 'Dothan',
  'state': 'Alabama'},
 {'club': 'Huntsville Gem and Mineral Society',
  'url': 'http://huntsvillegms.org/',
  'city': 'Huntsville',
  'state': 'Alabama'},
 {'club': 'Mobile Rock & Gem Society',
  'url': 'http://www.mobilerockandgem.com/',
  'city': 'Mobile',
  'state': 'Alabama'},
 {'club': 'Montgomery Gem & Mineral Society',
  'url': 'http://montgomerygemandmineralsociety.com/mgms/',
  'city': 'Montgomery',
  'state': 'Alabama'},
 {'club': 'Chugach Gem & Mineral Society',
  'url': 'http://www.chugachgemandmineralsociety.com/',
  'city': 'Anchorage',
  'state': 'Alaska'},
 {'club': 'Mat-Su Rock and Mineral Club',
  'url': 'http://matsurockclub.com/',
  'city': 'Palmer',
  'state': 'Alaska'},
 {'club': 'Apache Junction Rock and Gem Club',
  'url':

In [6]:
df = pd.DataFrame(LIST)
df.shape

(481, 4)

In [7]:
df.head()

Unnamed: 0,city,club,state,url
0,Birmingham,Alabama Mineral & Lapidary Society,Alabama,http://www.lapidaryclub.com/
1,Dothan,Dothan Gem & Mineral Club,Alabama,http://www.wiregrassrockhounds.com/
2,Huntsville,Huntsville Gem and Mineral Society,Alabama,http://huntsvillegms.org/
3,Mobile,Mobile Rock & Gem Society,Alabama,http://www.mobilerockandgem.com/
4,Montgomery,Montgomery Gem & Mineral Society,Alabama,http://montgomerygemandmineralsociety.com/mgms/


In [8]:
df.to_csv("rock-clubs.csv", index=False)