Our first step in analyzing Nashville's MeetUp scene is to scrape the data from the web. Fortunately, MeetUp has a great REST API client that we can use to build our database. Head on over to their (interactive console)[https://secure.meetup.com/meetup_api/console] if you want to see its complete functionality. 

For our purposes, **event attendance** is the fundamental piece of data we need to build the graph. In MeetUp terminology, we are after the **Yes RSVPs** individuals make when a MeetUp group organizes an event. This doesn't guarantee their actual attendance, of course, but it's the best proxy we have. 

Since we're interested in guiding decisions NOW, we will limit our search to events occuring in the past two years. We also want to get metadata for individual members and groups so that we can enrich our graph once we have it.

Our approach will be: NEW API KEY: 

1. Download a list of all groups in Nashville.
2. Download a list of events from the past two years from each group.
3. Download a list of all "yes" RSVPs for each event.
4. Download metadata for each member ID that RSVPed. 

In [1]:
import pandas as pd
import requests

api_key = '646928682c4b5d6f5f6c782a6b351b29' # for a dummy account

#### 1. Download a list of all groups in Nashville.

q = 'https://api.meetup.com/{}/events?'.format(urlname)
q += '&sign=true&page=200&status=past&only=id,name,status,time,yes_rsvp_count&desc=True'
q += '&{}'.format(api_key)

In [2]:
from utils import get_all_groups

groups = get_all_groups('nashville tn', radius=25, write_path=None)

groups.head()

Community Experience Testers - Nashville 'category'


Unnamed: 0_level_0,group_name,num_members,category_id,category_name,organizer_id,group_urlname
group_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
339011,Nashville Hiking Meetup,18669,23,Outdoors & Adventure,4353803,nashville-hiking
6335372,Nashville soccer,3461,32,Sports & Recreation,108448302,Nashville-soccer
20040517,Franklin 40+ Out and About,2105,31,Socializing,115037452,Franklin-40-Out-and-About
18297014,Nashville Christian Singles,2074,23,Outdoors & Adventure,183268581,Nashville-Active-Christian-Singles
24440003,Outgoing Introverts of Nashville (20 - 35),2153,31,Socializing,105492502,Nashchill


#### 2. Download a list of events from the past two years from each group.

In [3]:
from utils import get_events

In [4]:
events = pd.DataFrame()
#for gp_url in groups.group_urlname:
#    events.append(get_events(gp_url, date_filter_str='2016-10-02'), inplace=True)
for gp_url in groups.group_urlname:
    #print(gp_url)
    events = events.append(get_events(gp_url, date_filter_str='2016-10-22', verbose = True))

nashville-hiking
Nashville-soccer
Franklin-40-Out-and-About


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.


  sort=sort)


Nashville-Active-Christian-Singles
Nashchill
nashville-ux
steppingoutsocialdance
nashjs
Nashville-Tennis-Meetup
Nashville-NET-User-Group
Nashville-Blockchain-Meetup
newintown-890
NashvilleYoungCrowd
sexposnashville
All-Things-Fablous-Black-Professional-Women
The-Nashville-Microsoft-Azure-Users-Group
NashBI
Sunday-Assembly-Nashville
Nashville-Networking-Business-Luncheon
horrorfriends
Tails-of-the-Trail
PyNash
nashville-software-beginners
Tennessee-Golf-Events
Greater-Nashville-Healthcare-Analytics
diablos-que-bailan
Southeast-Golf-Events
Agile-Nashville-User-Group
Nashville-Spanish-English-Conversation-Group
nashvillewriters
NAGACentral
dnd-49
Franklin-Friends-50
J-Love-Christian-Singles-of-Nashville
DesignThinkingNashville
NashvilleWomenProgrammers
MCBS-Beer
Middle-Tennessee-Hiking-Outdoor-Meetup
NashvilleMDUG
EatLoveNash
Nashville-Modern-Excel-User-Group
aawstn
Bellevue-Friends-ages-21-And-Up
MOMS-Club-of-Brentwood-2
code-for-nashville
Dazzling-Dames-50
serendipitylabsnashville
MOMSC

Williamson-County-Republican-Party
meetup-group-beautyandthebrush
Nashville-Cross-Culture-Meetup
Nashville-HAES-and-Intuitive-Eating-Meetup
Wellness-Nashville
Connect-Franklin
Nashville-Toastmasters
-- No event results for  Nashville-Toastmasters
Nashville-Fun-Times-Meetup
meetup-group-kGWCrbFX
Mediumship-and-Intuitive-Development-Circle-of-Nashville
Nashville-Classic-and-Contemporary-Fiction-Group-for-women
Shopify-Nashville
Beyond-the-Brunch-Drunk-on-Life-Takes-Nashville
Nashville-Young-Christians
Middle-Tennesse-Naturists-Nudists
Calm-Abiding-Meditation-followed-by-Heart-Sutra
Nashville-International-Travel
Women-of-Purpose-and-Power
Nashville-Cyber-Security-for-Control-Systems
Grace-Business-Networking-Group
931-Singles-30s-60s
Nashville-Synthesizer-Meetup
Its-a-FIT-Life-Creation-Nashville
-- No event results for  Its-a-FIT-Life-Creation-Nashville
Songwriter-Survival-Kit-Nashville
Pumps-And-Passports-Nashville
Women-n-Wine-of-Nashville
NashReact-Meetup
East-Nashville-Book-Club
Vega

Nasvhille-Bitcoin-Crypto-Meetup
meetup-group-qTWwMeMm
The-Leadership-Circle
meetup-group-adTZVUra
-- No event results for  meetup-group-adTZVUra
Nashville-Cardinals-Club
Advanced-Audio-Applications-Exchange-A3E-Nashville
Blockchain-Technology-Disrupting-Healthcare
French-Toastmasters-club
Lipscomb-Environmental-Sustainability-Meetup
Nashville-QA
Craft-Beer-Lovers-of-Nashville-Meetup
Nashville-Podcasting-for-Music-Business
Nashville-Fishing-Meetup
Detroit-Red-Wings-Fans-in-Nashville-Meetup
Nashville-Feminist-Meetup
Nashville-Tennessee-Magento
Analytics-At-Speed-Nashville
Nashville-Screenwriters-Meetup
Ingress
Shopify-Entrepreneurs-of-Nashville
Nashville-Wanderlusters
Bi-Tennessee
Nashvilles-House-of-Poets
CaliforniaRelocation
Nashville-Artificial-Intelligence-Society
Nashville-Apache-Kafka-Meetup-by-Confluent
-- No event results for  Nashville-Apache-Kafka-Meetup-by-Confluent
UU-Moms
Nashville-Real-Estate-Investor-Entourage
Nashville-EOS-Meetup
-- No event results for  Nashville-EOS-Mee

-- No event results for  Kundalini-Yoga-Meditation-Mount-Juliet-Tn
DrugFree-Wilco
-- No event results for  DrugFree-Wilco
Mount-Juliet-Exercise-Meetup
-- No event results for  Mount-Juliet-Exercise-Meetup
Mount-Juliet-Titans-Fans
Wilson-County-Real-Estate-Investment-Group
Wilson-County-Women-Volunteering-and-Having-Fun
Mount-Juliet-Supper-Club
Kindness-as-the-Ultimate-Expression-of-Self-Love
-- No event results for  Kindness-as-the-Ultimate-Expression-of-Self-Love
meetup-group-KuYyZLuP
Nashvilles-Elite-Wine-Travel-2-0
Healthy-Fit-Fabulous
Williamson-County-Mobile-Developers
Franklin-Dining-Out-Meetup
-- No event results for  Franklin-Dining-Out-Meetup
SOWBO_Nashville
Real-Estate-Investors-of-Williamson-County-How-to-succeed
Bible-Reading
MHO-Nashville-Active-Adventures
Organ-Transplant-Support-Group
-- No event results for  Organ-Transplant-Support-Group
Franklin-Happy-Hour-Meetup
Resist-Franklin
Nashville-Computer-Science-Meetup
New-Franklin-Locals
-- No event results for  New-Frankli

In [7]:
events.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25867 entries, 0 to 4
Data columns (total 6 columns):
group_urlname     25867 non-null object
id                25867 non-null object
name              25867 non-null object
status            20293 non-null object
time              25867 non-null datetime64[ns]
yes_rsvp_count    25867 non-null object
dtypes: datetime64[ns](1), object(5)
memory usage: 1.4+ MB


In [8]:
events.head()

Unnamed: 0,group_urlname,id,name,status,time,yes_rsvp_count
0,nashville-hiking,253584350,Marcella Vivrette Smith Park - moderate to str...,past,2018-10-21 14:15:00,21
1,nashville-hiking,zxqrgqyxnbbc,Fontanel Hike then lunch at Cafe Fontanella,past,2018-10-20 14:00:00,20
2,nashville-hiking,255586555,"Happy Brain, Healthy Body: Nature Workshop",past,2018-10-20 12:30:00,1
3,nashville-hiking,hnfqdqyxnbxb,Edwin Warner Park Evening Hike,past,2018-10-18 23:00:00,22
4,nashville-hiking,254588203,Group Volunteering at Second Harvest Food Bank...,past,2018-10-16 22:30:00,11


#### 3. Download a list of all "yes" RSVPs for each event.

In [17]:
for event_id, gp_url in events.head().group_urlname.iteritems():
    print(event_id, gp_url)

0 nashville-hiking
1 nashville-hiking
2 nashville-hiking
3 nashville-hiking
4 nashville-hiking


In [20]:
from utils import get_all_event_rsvps

rsvps = pd.DataFrame()
for event_id, gp_url in events.group_urlname.iteritems():
    rsvps.append(get_all_event_rsvps(gp_url, event_id, api_key), inplace=True)

TypeError: 'int' object is not iterable

#### 4. Download metadata for each member ID that RSVPed. 

In [None]:
unique_members = rsvps.member_id.unique()

## Run Data

In [None]:
# read in membership data and trim to "recent" visits
edges = pd.read_csv('data\\memberships.csv', parse_dates=['joined','visited'])
# edges = edges.loc[edges.visited > pd.to_datetime('2016-01-01')]

# create a "members" dataset
members = edges[['member_id', 'name', 'hometown', 'city','state', 'lat', 'lon']].groupby('member_id').first()
members['num_groups'] = edges[['member_id']].groupby('member_id').size()

# read in group information and trim down to only groups with edges
groups = pd.read_csv('data\\groups.csv', index_col='group_id')
groups = groups.loc[edges.group_id.unique()]

In [None]:
all_events_filename = './data/events.csv'
all_rsvps_filename = './data/rsvps.csv'

if not os.path.exists(all_events_filename):
    (pd.DataFrame(columns=['event_id', 'name', 'status', 'time', 'yes_rsvp_count', 'group_urlname'])
         .to_csv(all_events_filename, header=True, index=None) )
if not os.path.exists(all_rsvps_filename):
    (pd.DataFrame(columns=['group_urlname', 'event_id', 'member_id'])
         .to_csv(all_rsvps_filename, header=True, index=None) )

for i, g in groups.dropna().iloc[712:].iterrows():
    try:
        print(g.group_name)
        events = get_events(g.group_urlname, api_key, date_filter_str='2015-11-01')
        rsvps = get_all_event_rsvps(g.group_urlname, events.id.tolist(), api_key)
        
        events.to_csv(all_events_filename, header=False, mode='a', index=None)
        rsvps.to_csv(all_rsvps_filename, header=False, mode='a', index=None)
        sleep(1)
    except ValueError as exc:
        print(exc)
    except ConnectionError as exc:
        print(exc)
    finally:
        if 'events' in dir(): 
            del events
        if 'rsvps' in dir():
            del rsvps

In [None]:
# Nashville Improv & Comedy Meetup
# Nasville Slow Ride

In [None]:
# from time import sleep
# import json

# edges = pd.DataFrame()
# err_ids = []
# for pid in members.index:
#     r = requests.get('https://api.meetup.com/2/groups?&sign=true&member_id={}&page=200&key=1eb16676d664fa48314391ae5b6c'.format(pid))
#     try:
#         r = r.json()
#         for membership in r['results']:
#             edge = pd.Series({'member_id': pid, 
#                               'group_id': membership['id'], 
#                               'group_name': membership['name']})
#             edges = edges.append(edge, ignore_index=True)
#     except json.decoder.JSONDecodeError:
#         print(pid)
#         err_ids.append(pid)
    
#     # Sleep briefly so that API doesn't get overwhelmed
#     sleep(0.2)
        

# # Write to computer
# write_data = True
# if write_data == True:
#     edges.to_csv('data_edges.csv') 


In [None]:
import pathlib
import pandas as pd

data_files = [x for x in pathlib.Path('data').iterdir()]
events = pd.read_csv('data/events.csv')
members = pd.read_csv('data/members.csv')
groups = pd.read_csv('data/groups.csv')

rsvps = pd.read_csv('data/rsvps.csv')


In [None]:
import matplotlib.pyplot as plt


In [None]:
rsvps_full = rsvps.merge(events[['event_id', 'group_urlname']], on='event_id', how='left')
#.merge(members, on='member_id', how='left', suffixes=('_event', '_member'))

group_attendance = rsvps_full.groupby(['member_id', 'group_urlname']).size()
group_attendance_trimmed = group_attendance.loc[group_attendance>0].reset_index()
mgdf = group_attendance_trimmed.merge(groups[['group_id', 'group_urlname']], on='group_urlname')
mgdf = mgdf.rename(columns={0: 'weight'})[['member_id', 'group_id', 'weight']]


In [None]:
mgdf.to_csv('data/member-to-group-edges.csv', index=None)

In [None]:
g = nx.from_pandas_edgelist(mgdf, 'member_id', 'group_id', 'weight')
Gc = max(nx.connected_component_subgraphs(g), key=len)
member_nodes, group_nodes = nx.bipartite.sets(Gc)

mg = nx.bipartite.weighted_projected_graph(Gc, nodes=member_nodes, ratio=False)
gg = nx.bipartite.weighted_projected_graph(Gc, nodes=group_nodes, ratio=False)

In [None]:
len(mg.nodes)

In [None]:
df = nx.to_pandas_edgelist(mg, source='member1', target='member2')

In [None]:
df.to_csv('data/member_edges.csv')

In [None]:
groups = groups.set_index('group_id').loc[[x for x in g.nodes]]

In [None]:
groups.sort_values(by='num_members', ascending=False)

In [None]:
groups.loc[groups.group_name=='PyNash']

# Working on Pynash

In [None]:
import networkx as nx 
import pandas as pd

df = pd.read_csv('data/group_edges.csv')
g = nx.from_pandas_edgelist(df, source='group1', target='group2', edge_attr='weight')

In [None]:
nx.spring_layout?

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns 

colors = sns.color_palette('muted')

In [None]:
pynash_nodes = [11625832] + [x for x in g[11625832].keys()]

fig, ax = plt.subplots(1,1,figsize=(5,5), dpi=150)

pos = nx.spring_layout(g.subgraph(pynash_nodes), k=3)
nx.draw_networkx_edges(g.subgraph(pynash_nodes), pos, with_labels=False,
                 width=0.1, alpha=0.5)
nx.draw_networkx_nodes(g.subgraph(pynash_nodes), pos, with_labels=False,
                 node_color=colors[0], node_size=100, alpha=0.9)
nx.draw_networkx_nodes(g, pos, nodelist=[11625832], node_color=colors[1])

plt.axis('off')

plt.show()

In [None]:
nx.draw_networkx?

In [None]:
groups.loc[[25903892, 1505523, 1585196, 11625832]]

In [None]:
groups['clustering'] = pd.Series(nx.clustering(g, weight='weight'))

In [None]:
groups.groupby('category_name').clustering.mean().index

In [None]:
fig, ax = plt.subplots(1,1, figsize=(4,8))

cats_to_use = groups.groupby('category_name').size().sort_values().iloc[15:].index
sns.barplot(data=groups.loc[groups.category_name.isin(cats_to_use)], 
            y='category_name', x='clustering', 
              order = (groups.loc[groups.category_name.isin(cats_to_use)]
                           .groupby('category_name').clustering.median()
                           .sort_values().index),
              ax=ax)
ax.set_yticklabels(ax.get_yticklabels(), fontsize=20, ha='right')
ax.set_ylabel('')
ax.set_title('Clustering Coefficient\nBy Category', fontsize=28)
plt.show()

In [None]:
pynash_revelers = nx.shortest_path(g, source=25903892, target=11625832)
gsub = g.subgraph(pynash_revelers)
label_dict = groups.loc[pynash_revelers, 'group_name'].to_dict()

fig, ax = plt.subplots(1,1,figsize=(5,5), dpi=150)

pos = nx.circular_layout(gsub)
nx.draw_networkx_edges(gsub, pos, width=5, alpha=0.5)
nx.draw_networkx_nodes(gsub, pos, with_labels=True,
                 node_color=colors[0], node_size=1000, alpha=1)
nx.draw_networkx_nodes(gsub, pos, nodelist=[11625832], node_color=colors[1])
nx.draw_networkx_labels(gsub, pos, labels=label_dict)

plt.axis('off')

plt.show()

In [None]:
groups.loc[[x for x in g[11625832].keys()]].category_name.value_counts()

In [None]:
cent = nx.betweenness_centrality(g, weight=None, normalized=True)
groups['centrality'] = pd.Series(cent)
groups['degree'] = pd.Series(dict(nx.degree(g)))

In [None]:
cent_names = groups.loc[[x for x in g.nodes]].sort_values(by='centrality', ascending=False)
cent_names.loc[cent_names.centrality>0, ['group_name', 'category_name', 'centrality']].head(8)

In [None]:
list(ax.get_xticklabels())

In [None]:
fig, ax = plt.subplots(1,1,figsize=(8,5), dpi=150)

sns.regplot(data=groups, x='num_members', y='centrality', order=1, ax=ax)
ax.set_xlim([0,6500])

textprops = {'fontsize': 18}
ax.set_xlabel('Number of Members', **textprops)
ax.set_ylabel('Centrality', **textprops)
#ax.set_xticklabels(ax.get_xticklabels(), fontsize=14)
#ax.set_yticklabels(ax.get_yticklabels(), fontsize=14)

#x.set_frame_on(False)

plt.show()

In [None]:
fig, ax = plt.subplots(1,1,figsize=(8,5), dpi=150)

sns.regplot(data=groups, x='clustering', y='centrality', logx=True, ax=ax)
#ax.set_xlim([0,6500])

textprops = {'fontsize': 18}
ax.set_xlabel('Number of Members', **textprops)
ax.set_ylabel('Centrality', **textprops)
#ax.set_xticklabels(ax.get_xticklabels(), fontsize=14)
#ax.set_yticklabels(ax.get_yticklabels(), fontsize=14)

#x.set_frame_on(False)

plt.show()

In [None]:
nx.betweenness_centrality?

In [None]:
import seaborn as sns 

fig, ax = plt.subplots(1,1, dpi=300)

sns.kdeplot(np.array(ws), kernel='gau', bw=2, shade=True, ax=ax)
ax.set_xlim([0,15])
ax.set_frame_on(True)
ax.set_title('Distribution of Edge Weights', fontsize=24)
ax.vlines(2.3, 0, 1, linestyle='--')
ax.set_yticks([0.05, 0.1, 0.15, 0.2])
ax.set_yticklabels(['5%', '10%', '15%'])
ax.set_ylim([0,0.2])
ax.annotate(xy=(4, 0.15), s='Mean Shared Members\nis 2.3', fontsize=18)
plt.show()

In [None]:
import pandas as pd

df = pd.read_csv('data/member-to-group-edges.csv')
len(df.group_id.unique())

In [None]:
groups = pd.read_csv('data/groups.csv')
members = pd.read_csv('data/members.csv')
events = pd.read_csv('data/events.csv')
rsvps = pd.read_csv('data/rsvps.csv')

In [None]:
groups.loc[groups.group_id.isin(df.group_id.unique())].to_csv('kaggle/meta-groups.csv', index=None)
members.loc[members.member_id.isin(df.member_id.unique())].to_csv('kaggle/meta-members.csv', index=None)

ev  = events.merge(groups[['group_id', 'group_urlname']], on='group_urlname')[['event_id', 'group_id', 'name', 'time']]
ev.loc[ev.group_id.isin(df.group_id.unique())].to_csv('kaggle/meta-events.csv', index=None)



In [None]:
rp = rsvps.loc[rsvps.event_id.isin(ev.event_id.unique()) & rsvps.member_id.isin(df.member_id.unique())]
rp.merge(ev[['event_id', 'group_id']], on='event_id').to_csv('kaggle/rsvps.csv')