# PanelJam Web Scraping
## Friendships Graph construction

***PanelJam.com*** is a small online community of artists, on which cartoons are published. In particular each cartoon (also called jam) is made up by different panels drawn by distinct users: so it is the result of some artists collaboration.

This script performs a **Web Scraping** activity, analyzing the HTML code of the web pages (so without using any APIs), to get information about friendship relationships between the users. This information are used to model an undirected **Friendships Graph**.

The used libraries are:
- ***requests***: is a Python HTTP library, used to easily make HTTP requests
- ***bs4***: BeautifulSoup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.
- ***networkx***:  is a Python library for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
- ***pickle***: is a library which implements binary protocols for serializing and de-serializing a Python object structure.

In [None]:
import requests
import bs4
import networkx as nx
import pickle

The defined functions are:

- ***getUsersOnPage ( page )*** : this function takes as input an integer which refers to the page for browsing users on *PanelJam.com*. It returns a list containing the names of the users shown in that page.

In [None]:
def getUsersOnPage(page):
    #sends an HTTP request
    res = requests.get('https://www.paneljam.com/stars/?page=' + str(page))
    #parses the obtained page
    soup = bs4.BeautifulSoup(res.text, 'lxml')
    #searches all the a elements with class: strip-preview-click, containing users information
    soupUsers = soup.findAll("a", {"class": "strip-preview-click"})
    
    usersList = []
    
    #adds users names to usersList
    for i in soupUsers:
        usersList.append(i['href'].replace('/',''))
        
    return usersList

- ***getFriendsOf ( user )***: this function takes as input the name of a user, and returns a list containing the names of his friends on *Paneljam.com*

In [None]:
def getFriendsOf(name):
    #sends an HTTP request
    res = requests.get('https://www.paneljam.com/' + name + '/friends/')
    #parses the obtained page
    soup = bs4.BeautifulSoup(res.text, 'lxml')
    #searches all the a elements with class: box group, containing users friends information
    soupFriends = soup.findAll("div", {"class": "box group"})
    
    friendsList = []
    
    #adds friends names to friendsList
    for i in soupFriends:
        friendsList.append(i.find('a')['href'].replace('/','').replace('%20',' '))
        
    return friendsList

The following code is used to get information about friendship relationships between *PanelJam.com* users, and model a graph through it. In particular this information is taken browsing pages which show all the *PanelJam.com* users, oredered by their score on the platform. At last, the obtained graph is saved in a *friendshipGraph.pckl* file.

In [None]:
#creates an empty undirected graph
G = nx.Graph()

page = 1
while True:
    
    #print('Page: ' + str(page) + ', nodes: ' + str(len(G.nodes)) + ', edges: ' + str(len(G.edges)))
    
    #gets the users shown on nth page
    users = getUsersOnPage(page)
    
    #if the nth page is empty, users searching is interrupted
    if len(users) > 0:
        
        #add users to the graph nodes
        G.add_nodes_from(users)
        for i in users:
            
            #searches the friends of the just retrieved users, and adds graph nodes and edges
            friends = getFriendsOf(i)
            G.add_nodes_from(friends)
                
            for j in friends:
                    
                    G.add_edge(i,j)
    else:
            
        break
    
    #moves to the next page
    page = page + 1

#saves the graph
f = open('friendshipGraph.pckl', 'wb')
pickle.dump(G, f)
f.close()