Our first step in analyzing Nashville's MeetUp scene is to scrape the data from the web. Fortunately, MeetUp has a great REST API client that we can use to build our database. Head on over to their (interactive console)[https://secure.meetup.com/meetup_api/console] if you want to see its complete functionality. 

For our purposes, **event attendance** is the fundamental piece of data we need to build the graph. In MeetUp terminology, we are after the **Yes RSVPs** individuals make when a MeetUp group organizes an event. This doesn't guarantee their actual attendance, of course, but it's the best proxy we have. 

Since we're interested in guiding decisions NOW, we will limit our search to events occuring in the past two years. We also want to get metadata for individual members and groups so that we can enrich our graph once we have it.

Our approach will be: NEW API KEY: 646928682c4b5d6f5f6c782a6b351b29

1. Download a list of all groups in Nashville.
2. Download a list of events from the past two years from each group.
3. Download a list of all "yes" RSVPs for each event.
4. Download metadata for each member ID that RSVPed. 

#### 1. Download a list of all groups in Nashville.

In [2]:
from utils import get_all_groups

groups = get_all_groups('nashville tn', radius=25, write_path=None)

groups.head()

ConnectionError: HTTPSConnectionPool(host='api.meetup.com', port=443): Max retries exceeded with url: /find/groups?&sign=true&location=nashville%20tn&radius=25&page=200&offset=0&key=646928682c4b5d6f5f6c782a6b351b29 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000024A35473240>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))

#### 2. Download a list of events from the past two years from each group.

In [None]:
from utils import get_events

events = pd.DataFrame()
for gp_url in groups.group_urlname:
    events.append(get_events(gp_url, date_filter_str='2015-11-01'), inplace=True)

#### 3. Download a list of all "yes" RSVPs for each event.

In [None]:
from utils import get_all_event_rsvps

rsvps = pd.DataFrame()
for event_id, gp_url in events.group_urlname.iteritems():
    rsvps.append(get_event_rsvps(urlname, event_id), inplace=True)

#### 4. Download metadata for each member ID that RSVPed. 

In [5]:
unique_members = rsvps.member_id.unique()



NameError: name 'rsvps' is not defined

## Run Data

In [4]:
# read in membership data and trim to "recent" visits
edges = pd.read_csv('data\\memberships.csv', parse_dates=['joined','visited'])
# edges = edges.loc[edges.visited > pd.to_datetime('2016-01-01')]

# create a "members" dataset
members = edges[['member_id', 'name', 'hometown', 'city','state', 'lat', 'lon']].groupby('member_id').first()
members['num_groups'] = edges[['member_id']].groupby('member_id').size()

# read in group information and trim down to only groups with edges
groups = pd.read_csv('data\\groups.csv', index_col='group_id')
groups = groups.loc[edges.group_id.unique()]

In [113]:
all_events_filename = './data/events.csv'
all_rsvps_filename = './data/rsvps.csv'

if not os.path.exists(all_events_filename):
    (pd.DataFrame(columns=['event_id', 'name', 'status', 'time', 'yes_rsvp_count', 'group_urlname'])
         .to_csv(all_events_filename, header=True, index=None) )
if not os.path.exists(all_rsvps_filename):
    (pd.DataFrame(columns=['group_urlname', 'event_id', 'member_id'])
         .to_csv(all_rsvps_filename, header=True, index=None) )

for i, g in groups.dropna().iloc[712:].iterrows():
    try:
        print(g.group_name)
        events = get_events(g.group_urlname, api_key, date_filter_str='2015-11-01')
        rsvps = get_all_event_rsvps(g.group_urlname, events.id.tolist(), api_key)
        
        events.to_csv(all_events_filename, header=False, mode='a', index=None)
        rsvps.to_csv(all_rsvps_filename, header=False, mode='a', index=None)
        sleep(1)
    except ValueError as exc:
        print(exc)
    except ConnectionError as exc:
        print(exc)
    finally:
        if 'events' in dir(): 
            del events
        if 'rsvps' in dir():
            del rsvps

Nashville Hiking Meetup


In [100]:
# Nashville Improv & Comedy Meetup
# Nasville Slow Ride

Int64Index([18964683], dtype='int64', name='group_id')

In [27]:
# from time import sleep
# import json

# edges = pd.DataFrame()
# err_ids = []
# for pid in members.index:
#     r = requests.get('https://api.meetup.com/2/groups?&sign=true&member_id={}&page=200&key=1eb16676d664fa48314391ae5b6c'.format(pid))
#     try:
#         r = r.json()
#         for membership in r['results']:
#             edge = pd.Series({'member_id': pid, 
#                               'group_id': membership['id'], 
#                               'group_name': membership['name']})
#             edges = edges.append(edge, ignore_index=True)
#     except json.decoder.JSONDecodeError:
#         print(pid)
#         err_ids.append(pid)
    
#     # Sleep briefly so that API doesn't get overwhelmed
#     sleep(0.2)
        

# # Write to computer
# write_data = True
# if write_data == True:
#     edges.to_csv('data_edges.csv') 
