# Social Network Analysis Project: Report 1
# Exploring the *Titanic* dataset

## Introduction - *Unveiling Titanic's Social Fabric: A Network Analysis*

### Group F members
The minds behind this journey are: **Leonardo Azzi**, **Sofia Bruni**, **Francesca Romana Sanna**, **Alexandra Tabarani** and **Marta Torella**, five students of the Bachelor in Management and Computer Science.

Throughout the course weeks, we aim to not only visually represent these connections but also analyze them through various graph metrics.

In our project, the **characters** aboard the Titanic take center stage as *nodes*, while the *edges* connecting them represent **shared scenes**, their *weights* reflecting the **frequency** of these shared appearences.
This analytical approach allows us to decode the complex web of interactions that shaped the destinies of the movie characters.

Through meticulous examination, we will be able to uncover the characters as well as the patterns of connection that lie underneath the narrative.

*Join us as we sail through the Titanic's data!*

### Imports
We start by importing the necessary libraries for our analysis.

In [3]:
# Basic Imports with aliases
import numpy as np 
import matplotlib.pyplot as plt
import csv
import networkx as nx

## Week 1: *Introduction to Graph Creation and Basic Analysis*

### Objective
The primary objective this week was to construct a visual representation of a graph, representing the relationships and interactions among the characters in the Titanic dataset.

### Graph Construction
In our graphical illustration:
- **Nodes** represent individual *characters* in the movie
- **Edges** indicate the presence of a *shared scene* between two characters
- **Edge weights** reflect the *frequency* of these shared scenes

### Graph Analysis
We then analyzed the graph by determining some of its basic properties:
- **Total number of nodes and edges**: to understand the *scale and complexity* of the graph
- **Average degree**: computed using the formula \\(\\frac{2E}{N}\\), where \\(E\\) stands for the total number of edges and \\(N\\) is the number of nodes, giving an average of *connections per node*
- **Graph density**: calculated with \\(D = \\frac{2E}{N \\times (N-1)}\\), where \\(D\\) is the density, \\(E\\) is the number of edges and \\(N\\) is the number of nodes. This metric is useful to understand how many connections are present in the graph, compared to the maximum possible number of connections, hence providing an insight into the *overall connectivity*. The density of a graph can range from 0 to 1, with 0 indicating a graph with no edges and 1 a graph with the maximum number of edges.

### Insights gained\n",
By computing these metrics, we were able to gain a better understanding of the graph's structure and complexity. In particular, we obtained a foundational knowledge that will be expanded upon in the following weeks.

### Loading the data from the given CSV files

In [4]:
G = nx.Graph() # Create a graph

# Read nodes from *nodes.csv* file
with open('../Graph/nodes.csv', 'r') as file:       # Open the CSV file
    reader = csv.DictReader(file)                   # DictReader is a class that reads a CSV file and converts it into a dictionary
    for row in reader:                              # Iterate over the rows of the CSV file, each row is a dictionary
        G.add_node(row['Id'], label=row['Label'])   # Add a node to the graph with the node id and label

# Read edges from *edges.csv* file
with open('../Graph/edges.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        G.add_edge(row['Source'], row['Target'], weight=int(row['Weight'])) # Add an edge to the graph with the source, target and weight


### Plotting the graph

In [None]:
# Setting up the plot dimensions
plt.figure(figsize=(18, 18))

# Use the spring layout algorithm for positioning the nodes
pos = nx.spring_layout(G)

# Adjust node sizes based on their degrees