# Scraper to a Letterboxd Ego Network

In this notebook, we will guide you through the process of scraping and mapping a social network from Letterboxd, a popular social media platform for film enthusiasts. We'll cover the background and requirements, step-by-step instructions for mapping the social network, and finally, we'll present the results of our analysis.


## What is Letterboxd? 
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/93/Letterboxd_2018_logo_%28vertical%29.svg/1200px-Letterboxd_2018_logo_%28vertical%29.svg.png" alt="Letterboxd Logo" style="width: 20%;">

Letterboxd is a global social network for grass-roots film discussion and discovery, a service for sharing film reviews and lists. Used as a diary to record and share your opinion about films as you watch them, or just to keep track of films you’ve seen in the past. In this project, we aim to map the social network of Letterboxd users by scraping follower and following relationships.

### Tools and Libraries

To achieve this, we will use the following tools and libraries, among others:
- **Python**: The main programming language used for this project.
- **Requests**: For sending HTTP requests to fetch web pages.
- **BeautifulSoup**: For parsing HTML and extracting data.
- **Pyvis Network**: For creating and analyzing the social network graph.
- **Matplotlib**: For visualizing the network.

You can find the complete list of required libraries and install them using the requirements file available on GitHub: [requirements.txt](https://github.com/juanjuanjuanfer/SNA-of-Letterboxd/blob/main/requirements.txt).


### Limitations

There are a few limitations to consider:
- **Rate Limiting**: Letterboxd may limit the number of requests we can send in a given time period. Try not to scrap various accounts in a short period of time. 
- **Data Privacy**: Ensure that the data collection respects the privacy policies of Letterboxd and the users.
- **Incomplete Data**: We might not be able to scrape all connections due to account privacy settings or rate limiting.

If you prefer not to scrape data, you can use the example JSON file provided: [example_network.json](https://github.com/juanjuanjuanfer/SNA-of-Letterboxd/blob/main/example_network.json).

## Mapping an Ego Network

### What is an Ego Network?

Ego Networks are a particular type of network which specifically maps the connections of and from the perspective of a single person (an “ego”) and the individuals they are directly connected to (their "alters"). [1](https://bookdown.org/omarlizardo/_main/2-10-ego-centric-networks.html)
In Ego Networks strong ties are homophilous. That is, people have the strongest ties with people who similar to themselves on key attributes, such as social class, age, sex, race, political views, etc. [2](http://www.analytictech.com/networks/egonet.htm)
#### Why Study Ego Networks?

**Reason 1: Individuals are Interesting!**

In today’s hyper-connected workspace, organizations are constantly trying to enhance the Social Capital of their employees by identifying issues or opportunities that arise from the constraints or benefits of the networks around them. There are multiple direct applications of Ego Network Analysis:

- **Leadership Development**: Identify how well leaders are positioned to leverage their networks as support mechanisms, information pathways, and innovation channels.
- **High Potential Programs**: Use the Ego Network of potential high-potential employees to quantify how effectively they are cultivating their relationships and developing themselves as leaders.
- **Social Capital**: The Ego Network of every employee can provide a wealth of information on areas of improvement such as the quality and quantity of connections, diversity of relationships, and the level of support provided by colleagues.

**Reason 2: Complete Networks Get Very Large, Very Quickly!**

If we were to do an Organizational Network Analysis (ONA) of a midsize organization with 5000 people, assuming each person has 15 immediate links (i.e., First Degree connections), that results in a network with a minimum of 75,000 connections! This number increases significantly when considering Second and Third Degree connections (i.e., friends of friends). [3](https://medium.com/orglens/enhancing-employee-social-capital-with-ego-network-analysis-4ff0fc6738e3)

### Step-by-Step Guide Summary

1. **Setting Up the Environment**
    - Install the required libraries.
    - Import the necessary modules.

2. **Fetching Data from Letterboxd**
    - Define the target user (ego) whose network we want to map.
    - Send requests to the Letterboxd website to retrieve the list of followers and followings.
    - Parse the HTML content to extract user data.

3. **Creating the Network Graph**
    - Initialize a graph using Pyvis Network.
    - Add nodes and edges to represent the ego and their connections.
    - Optionally, fetch additional data for each alter to enrich the network.

4. **Analyzing the Network**
    - Calculate basic network metrics (e.g., degree, centrality).
    - Identify key influencers or hubs within the network.

5. **Visualizing the Network**
    - Plot the network graph using Matplotlib.
    - Customize the visualization for better clarity and presentation.

## Results

### Visualizations

We include visualizations of the network graph that will help illustrate your results. For the visualization, we are using a library that generates an HTML file to interact with the graph. This method was chosen because it provides the best option for the information to be readable. Further in the file, we will explain how to open it, but basically, you just need to drag the file to your browser.



# Let's begin!

The first thing we need to do is import the necessary libraries and install the requirements. If you haven't already installed the required libraries, you can do so by running the command below in your terminal:

```bash
pip install -r requirements.txt

```
Don't forget that you can download the requirements file [here](https://github.com/juanjuanjuanfer/SNA-of-Letterboxd/blob/main/requirements.txt).

Import the functions from the library utils. It is where the scraper functions are.
Also import JSON library to manage the scraped data.

In [7]:
# Libraries
import utils
import json

# Change the user to define the target for the Ego Network
user = "fer_nwn"
network = utils.scrape_network(user)
short_network = utils.scrape_network(user,depth=1)

# You can change the name of the file where the JSON will be saved. 
# Change f"{user}_network.json" to whatever you want (keep the JSON extension as it is the format of the scraped data)
with open(f"{user}_network.json", "w") as f:
    json.dump(network, f, indent=4)
with open(f"{user}_short_network.json","w") as g:
    json.dump(short_network,g,indent=4)

### That's it!

You just scraped your network! Now you can use this data to perform various analyses. But wait, the code only generated a JSON file. Next, we are going to use NetworkX to create the graph from this data.

First, let's load the JSON file and then construct the network graph.


In [8]:
import requests
from pyvis.network import Network
from collections import defaultdict
from bs4 import BeautifulSoup

ego_network = "fer_nwn"

# Load the network data
with open(f"{ego_network}_network.json", "r") as f:
    network = json.load(f)

# Create a pyvis network
net = Network(notebook=True, directed=True, height="750px", width="100%", bgcolor="#111111", font_color="white")

# Collect all unique nodes
all_nodes = set(network.keys())
for data in network.values():
    all_nodes.update(data["following"])
    all_nodes.update(data["followers"])

# Add all nodes to the network
for node in all_nodes:
    profile_link = f"https://letterboxd.com/{node}/"
    #print(profile_link)
    response = requests.get(profile_link)
    html = response.text

    # Parse the HTML with BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')

    # Find the div with the class 'profile-avatar' and then the img within it
    div = soup.find('div', class_='profile-avatar')
    if div:
        img = div.find('img')
        if img and 'src' in img.attrs:
            profile_image_link = img['src']
            net.add_node(node, label=node, color='#2C916B', shape='circularImage', image=profile_image_link)


# Add edges to the network with a limit of 2 edges between any two nodes
edge_count = defaultdict(lambda: defaultdict(int))


for usr, data in network.items():
    if usr in all_nodes:
        for following in data["following"]:
            if following in all_nodes:
                if edge_count[usr][following] < 1:
                    net.add_edge(usr, following, color= "#D4D4D4", alpha=0.5, arrows=None)
                    edge_count[usr][following] += 1

# Show the graph
net.show(f"{user}_ego_network_graph_withimage.html")



KeyboardInterrupt: 

In [10]:
ego_network = "fer_nwn"

# Load the network data
with open(f"{ego_network}_short_network.json", "r") as f:
    network = json.load(f)

# Create a pyvis network
net = Network(notebook=True, directed=True, height="750px", width="100%", bgcolor="#111111", font_color="white")

# Collect all unique nodes
all_nodes = set(network.keys())
for data in network.values():
    all_nodes.update(data["following"])
    all_nodes.update(data["followers"])

# Add all nodes to the network
for node in all_nodes:
    profile_link = f"https://letterboxd.com/{node}/"
    #print(profile_link)
    response = requests.get(profile_link)
    html = response.text

    # Parse the HTML with BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')

    # Find the div with the class 'profile-avatar' and then the img within it
    div = soup.find('div', class_='profile-avatar')
    if div:
        img = div.find('img')
        if img and 'src' in img.attrs:
            profile_image_link = img['src']
            net.add_node(node, label=node, color='#2C916B', shape='circularImage', image=profile_image_link)


# Add edges to the network with a limit of 2 edges between any two nodes
edge_count = defaultdict(lambda: defaultdict(int))


for usr, data in network.items():
    if usr in all_nodes:
        for following in data["following"]:
            if following in all_nodes:
                if edge_count[usr][following] < 2:
                    net.add_edge(usr, following, color= "#D4D4D4", alpha=0.5, arrows=None)
                    edge_count[usr][following] += 1

# Show the graph
net.show(f"{user}_short_ego_network_graph_withimage.html")

fer_nwn_short_ego_network_graph_withimage.html
