# Project 2
Authors: Richie Rivera and Naomi Buell

*Instructions*
1. *Identify a large 2-node network dataset—you can start with a dataset in a repository.  Your data should meet the criteria that it consists of ties between and not within two (or more) distinct groups.*
2. *Reduce the size of the network using a method such as the island method described in chapter 4 of social network analysis.*
3. *What can you infer about each of the distinct groups?*

## 1. Identify and load data

First, we load the necessary libraries.

In [1]:
# Import libraries
import networkx as nx
from networkx.algorithms import bipartite as bi
import pandas as pd
import numpy as np
import requests
import zipfile
import io

We identified a bipartite One Piece character network dataset used by Sugashita and Masuda (2023).[^1] The data is available on GitHub here: https://github.com/KS-92/Manga. The unprojected data are equivalent to a temporal bipartite graph in which the two types of nodes are characters and pages, and an edge connects a character and a page in which the character appears. Note that the characters are anonymized. 

[^1]: [Sugishita, K., Masuda, N. Social network analysis of manga: similarities to real-world social networks and trends over decades. Appl Netw Sci 8, 79 (2023). https://doi.org/10.1007/s41109-023-00604-0](#references)

In [2]:
from networkx.algorithms import bipartite

# Download the zip file
zip_url = 'https://github.com/KS-92/Manga/raw/main/Manga_network_data.zip'
response = requests.get(zip_url)
with zipfile.ZipFile(io.BytesIO(response.content)) as z:
    # Extract the required files to memory
    with z.open('Manga_network_data/Temporal/Temporal_One_Piece.csv') as f:
        temporal_one_piece = pd.read_csv(f)
    with z.open('Manga_network_data/Static/Static_One_Piece.edgelist') as f:
        static_one_piece = nx.read_weighted_edgelist(f, nodetype=int)
        static_one_piece_df = nx.to_pandas_edgelist(static_one_piece)
        
# Create the bipartite graph: nodes are characters and pages, edges connect characters to pages they appear on
B = nx.Graph()
characters = set(temporal_one_piece['i']).union(set(temporal_one_piece['j']))
pages = set(temporal_one_piece['t'])

for _, row in temporal_one_piece.iterrows():
    B.add_edge(row['i'], row['t'])

# Get the character and page node lists (as in the bipartite sets)
character_list = sorted(characters)
page_list = sorted(pages)

print("Biadjacency matrix")
print(bi.biadjacency_matrix(B, character_list, page_list))

Biadjacency matrix
<Compressed Sparse Row sparse array of dtype 'int64'
	with 1039 stored elements and shape (34, 475)>
  Coords	Values
  (0, 0)	1
  (0, 1)	1
  (0, 2)	1
  (0, 3)	1
  (0, 4)	1
  (0, 5)	1
  (0, 6)	1
  (0, 7)	1
  (0, 8)	1
  (0, 10)	1
  (0, 13)	1
  (0, 14)	1
  (0, 15)	1
  (0, 16)	1
  (0, 17)	1
  (0, 18)	1
  (0, 19)	1
  (0, 20)	1
  (0, 21)	1
  (0, 22)	1
  (0, 23)	1
  (0, 24)	1
  (0, 25)	1
  (0, 26)	1
  (0, 27)	1
  :	:
  (30, 437)	1
  (30, 438)	1
  (30, 439)	1
  (30, 441)	1
  (30, 442)	1
  (30, 443)	1
  (30, 444)	1
  (30, 445)	1
  (30, 446)	1
  (30, 447)	1
  (30, 448)	1
  (30, 474)	1
  (31, 458)	1
  (31, 459)	1
  (31, 460)	1
  (31, 461)	1
  (31, 462)	1
  (31, 463)	1
  (31, 464)	1
  (31, 465)	1
  (31, 466)	1
  (31, 468)	1
  (31, 469)	1
  (31, 470)	1
  (31, 471)	1


The biadjacency matrix shows there are 34 characters (represented as rows) and 475 pages (represented as columns) in the dataset. There are 1,039 edges between characters and pages. Coord (0,0) with a value of 1 (binary indicator that the character appeared on that page) respresents character 0 appearing on page 0, for e.g.

The character network is constructed by projecting the bipartite graph onto the space of character nodes. I.e., in the character network, two characters are connected by an edge if and only if they appear on the same page at least once. We read in two projected datasets:

1. A time-stamped copresence of characters in One Piece volumes one through three, which we read in as `temporal_once_piece` below. The copresence of characters on each page were recorded as interaction between the characters. 

In [3]:
# Show the first few rows of each dataframe
temporal_one_piece.head()

Unnamed: 0,i,j,t
0,1,2,3
1,1,2,4
2,1,2,5
3,1,2,6
4,1,2,7


2. A static character network, which we read in as `static_one_piece` below. The static character network is an aggregate of the temporal character network over time, where the weight of an edge is the number of pages on which the two characters are copresent. Because we are interested in interaction between the characters, the isolated nodes in the character networks are excluded from the analysis.

In [4]:
static_one_piece_df.head()

Unnamed: 0,source,target,weight
0,1,2,31.0
1,1,4,16.0
2,1,26,19.0
3,1,33,9.0
4,1,30,13.0


## 2. Perform island method

## 3. Inference

We can infer that...