# Analysis of membership overlap between 3 PMF FB groups

The FB webpage showing the full list of members from each FB group was manually saved to the local machine. BeautifulSoup was then used to parse the html. A bit of exploritory data analysis revealed each FB member's name could eventually be found in the recently joined section and with a particular class label. That information was used to scrape the names from each FB group, and counts of members of each group and overlaps between the groups could be made. These counts were slightly smaller than FB claimed. Finally, the Venn Diagram was made by an online tool: 

https://www.meta-chart.com/share/fb-group-overlap-as-of-7918

In [None]:
import pandas as pd
import numpy as np
import os
from bs4 import BeautifulSoup

Making sure I'm in the same directory as the saved html files:

In [76]:
os.listdir()

['.ipynb_checkpoints',
 'FB_overlap.ipynb',
 'Presidential Management Fellows (PMF) - 2017.html',
 'Presidential Management Fellows (PMF) - 2017_files',
 'Presidential Management Fellows (PMF) 2018.html',
 'Presidential Management Fellows (PMF) 2018_files',
 'Presidential Management Fellows (PMF) DC Metro Area.html',
 'Presidential Management Fellows (PMF) DC Metro Area_files',
 'VennDiagram.jpeg']

In [58]:
FB_pages=[elm for elm in os.listdir() if 'html' in elm]
FB_pages

['Presidential Management Fellows (PMF) - 2017.html',
 'Presidential Management Fellows (PMF) 2018.html',
 'Presidential Management Fellows (PMF) DC Metro Area.html']

Looping through each html file, first finding the recently joined subsection, then finding all divs with the class id associated with a name. Extracting the name and appending it to a list. Finally, creating a list of lists for all 3 FB groups.

In [84]:
name_set=[]
for x in range(0,len(FB_pages)):
    with open(FB_pages[x],'r',encoding='utf-8') as html:
        soup = BeautifulSoup(html, 'html.parser')
        soup = soup.find("div", {"id": "groupsMemberSection_recently_joined"})
        names=[]
        for div in soup.findAll("div", {"class": "_60ri fsl fwb fcb"}):
            names.append(div.text)
        N=list(pd.Series(names).unique());
        name_set.append(N)

Basic counts of the members found in each FB group page's html. Slightly less than FB claims.

In [85]:
for x in range(0,len(FB_pages)):
    print(FB_pages[x])
    print(len(name_set[x]))

Presidential Management Fellows (PMF) - 2017.html
407
Presidential Management Fellows (PMF) 2018.html
278
Presidential Management Fellows (PMF) DC Metro Area.html
803


Overlap counts between the groups:

In [86]:
print('2017 2018 overlap')
print(len([elm for elm in name_set[0] if elm in name_set[1]]))
print('2017 DC overlap')
print(len([elm for elm in name_set[0] if elm in name_set[2]]))
print('2018 DC overlap')
print(len([elm for elm in name_set[1] if elm in name_set[2]]))
print('2017 2018 DC overlap')
print(len([elm for elm in name_set[0] if elm in name_set[1] and elm in name_set[2]]))

2017 2018 overlap
99
2017 DC overlap
250
2018 DC overlap
137
2017 2018 DC overlap
64
