**Scrape example html page for data science meetings. The html looks like this:**
```html
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Data Science Meetings</title>
  </head>
  <body>
   <h1>2017 Meeting Schedule</h1>
      
    <div class="information">
      <h2>Learn Data Science</h2>
      <p title="description">Learn Data science with Julia, Python &amp; R (Jupyter)</p>
      <address>216 N Mosley St, Wichita, KS 67202</address>
      <p>Time: 6:30 PM <strong>Meeting lasts 1 hour</strong></p>
      <p><a href="https://www.continuum.io/why-anaconda">Anaconda</a></p>
      <a href="mailto:datascience@goat.org">datascience@goat.org</a>
    </div>
      
   <ul id="meeting-list">
      <li>January 23</li>
      <li>February 12</li>
      <li>March 9</li>
      <li>April 30</li>
      <li>May 1</li>
   </ul>
  </body>
</html>```

**Import libraries**

In [1]:
import urllib.request as request
from bs4 import BeautifulSoup as BS
import re

**Fetch sample meeting schedule html page**

And turn it into a beautiful soup for easy parsing.

In [40]:
url = "http://localhost:8888/files/Open%20Wichita/files/meeting_schedule.html"
with request.urlopen(url) as response:
    html = response.read()
    
soup = BS(html, 'html.parser')
type(soup)

bs4.BeautifulSoup

**Get the information div**

Contains meeting type information.

In [3]:
info = soup.find(class_="information")
info

<div class="information">
<h2>Learn Data Science</h2>
<p title="description">Learn Data science with Julia, Python &amp; R (Jupyter)</p>
<address>216 N Mosley St, Wichita, KS 67202</address>
<p>Time: 6:30 PM <strong>Meeting lasts 1 hour</strong></p>
<p><a href="https://www.continuum.io/why-anaconda">Anaconda</a></p>
<a href="mailto:datascience@goat.org">datascience@goat.org</a>
</div>

**Create meeting fields dictionary**

In [4]:
fields = {"type": "data_science"}

**Get the Meetting summary (title)**

It's in the h2 tag, which is a property of the information div we got above.

In [5]:
summary = info.h2
summary

<h2>Learn Data Science</h2>

**get_text extracts text from a tag**

In [6]:
summary.get_text()

'Learn Data Science'

In [7]:
fields["summary"] = summary.get_text()
fields

{'summary': 'Learn Data Science', 'type': 'data_science'}

**Description is in the p tag with a title**

In [8]:
description = info.find(lambda tag: tag.has_attr("title"))
description

<p title="description">Learn Data science with Julia, Python &amp; R (Jupyter)</p>

In [9]:
fields["description"] = description.get_text()

**Address is in the address tag**

In [10]:
address = info.address
address

<address>216 N Mosley St, Wichita, KS 67202</address>

In [11]:
fields["location"] = address.get_text()

**Match the meeting time based on Time in the text**

In [12]:
meeting_time = info.find(string=re.compile("Time"))
meeting_time

'Time: 6:30 PM '

In [13]:
type(meeting_time)

bs4.element.NavigableString

**Match the hour and minutes**

In [14]:
pattern = re.compile("\d{1,2}:\d\d")
match = pattern.findall(meeting_time)
match

['6:30']

In [15]:
hour, minute = match[0].split(":")
hour, minute = int(hour), int(minute)
hour, minute

(6, 30)

In [16]:
'PM' in meeting_time

True

In [17]:
if 'PM' in meeting_time:
    hour += 12
    
hour

18

In [18]:
date_props = {"hour": hour, "minute": minute}

**Get the meeting length**

In [19]:
duration = info.find("strong")
duration

<strong>Meeting lasts 1 hour</strong>

In [20]:
match_day = re.search("\d", duration.get_text())
match_day

<_sre.SRE_Match object; span=(14, 15), match='1'>

In [21]:
fields["duration"] = int(match_day.group())

**Find the links in information**

In [22]:
links = info.find_all("a")
links

[<a href="https://www.continuum.io/why-anaconda">Anaconda</a>,
 <a href="mailto:datascience@goat.org">datascience@goat.org</a>]

In [23]:
agenda_link = links[0]
agenda_link.attrs

{'href': 'https://www.continuum.io/why-anaconda'}

In [24]:
fields["agenda"] = agenda_link.attrs['href']

In [25]:
fields["email"] = links[1].text
fields

{'agenda': 'https://www.continuum.io/why-anaconda',
 'description': 'Learn Data science with Julia, Python & R (Jupyter)',
 'duration': 1,
 'email': 'datascience@goat.org',
 'location': '216 N Mosley St, Wichita, KS 67202',
 'summary': 'Learn Data Science',
 'type': 'data_science'}

**Extract meeting dates**

In [26]:
meeting_list = soup.find(id="meeting-list")
meeting_list

<ul id="meeting-list">
<li>January 23</li>
<li>February 12</li>
<li>March 9</li>
<li>April 30</li>
<li>May 1</li>
</ul>

In [27]:
meeting_list.get_text()

'\nJanuary 23\nFebruary 12\nMarch 9\nApril 30\nMay 1\n'

In [28]:
# use to convert to ordinal number
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 
          'November', 'December']

**Match on anuy month space 1 or 2 digit days**

In [29]:
event_pattern = re.compile("(\w+ \d{1,2})", re.IGNORECASE)
event_pattern

re.compile(r'(\w+ \d{1,2})', re.IGNORECASE|re.UNICODE)

In [30]:
events = event_pattern.findall(meeting_list.get_text())
events

['January 23', 'February 12', 'March 9', 'April 30', 'May 1']

**Match month & day separately**

In [31]:
pattern = re.compile("\w+", re.IGNORECASE)
pattern.findall("March 3")

['March', '3']

In [32]:
for event in events:
    match = pattern.findall(event)
    month_text, day_text = match[0], match[1]
    month = months.index(month_text) + 1 # convert to ordinal month
    day = int(day_text)
    
    print("meeting event on {0}/{1}".format(month, day))

meeting event on 1/23
meeting event on 2/12
meeting event on 3/9
meeting event on 4/30
meeting event on 5/1


<hr style="border: 6px outset pink">

**Add Data Science Meeting**

Inherit from Meeting base class in meetings_scraper

In [33]:
from meetings_scraper import Meeting
from datetime import datetime

In [34]:
class DataScienceMeeting(Meeting):
    def __init__(self, date, fields):
        # Make sure date and fields are the right data type
        assert (type(date) is dict), "data must be a dictionary"
        assert (type(fields) is dict), "fields must be a dictionary"
        # assert date hash values as integers??? 
        
        self.date = datetime(date["year"], date["month"], date["day"], date["hour"], date["minute"])
        
        # Add all the fields as attributes for this object
        for key, value in fields.items():
            self.__dict__[key] = value  
            
    # scraper method from code above could be added here
    # def parse_meetings(self, soup):
    #     Parse that beautiful soup, and create some meetings

In [35]:
date_props["year"] = 2017
date_props["month"] = 4
date_props["day"] = 13

meeting = DataScienceMeeting(date_props, fields)
meeting

Thu Apr 13 06:30 PM: Learn Data Science

In [36]:
for key, value in meeting:
    print(key, ":", value)

date : 2017-04-13 18:30:00
location : 216 N Mosley St, Wichita, KS 67202
description : Learn Data science with Julia, Python & R (Jupyter)
email : datascience@goat.org
summary : Learn Data Science
agenda : https://www.continuum.io/why-anaconda
duration : 1
type : data_science


**Iteratively create meetings from parsed events list**

In [37]:
meetings = []
for event in events:
    match = pattern.findall(event)
    month_text, day_text = match[0], match[1]
    month = months.index(month_text) + 1 # convert to ordinal month
    day = int(day_text)
    
    date_props["year"] = 2017
    date_props["month"] = month
    date_props["day"] = day
    
    meeting = DataScienceMeeting(date_props, fields)
    meetings.append(meeting)
    
meetings

[Mon Jan 23 06:30 PM: Learn Data Science,
 Sun Feb 12 06:30 PM: Learn Data Science,
 Thu Mar 09 06:30 PM: Learn Data Science,
 Sun Apr 30 06:30 PM: Learn Data Science,
 Mon May 01 06:30 PM: Learn Data Science]

**Save last meeting to file in ical format**

In [38]:
filename = meetings[-1].to_ics(True)
filename

Meeting saved to files/data_science_meeting_5_1_2017.ics


'files/data_science_meeting_5_1_2017.ics'

**Read the .ics file back in**

In [39]:
with open(filename, 'r') as f:
    for line in f.readlines():
        print(line.replace("\n", ""))
        
f.close()

BEGIN:VCALENDAR
BEGIN:VEVENT
SUMMARY:Learn Data Science
DTSTART;VALUE=DATE-TIME:20170501T183000
DTEND;VALUE=DATE-TIME:20170501T193000
DTSTAMP;VALUE=DATE-TIME:20170501T183000Z
DESCRIPTION:Learn Data science with Julia\, Python & R (Jupyter)
LOCATION:216 N Mosley St\, Wichita\, KS 67202
URL:https://www.continuum.io/why-anaconda
END:VEVENT
END:VCALENDAR
