# Week Four - Assignment Centrality Measures

Authors: Naomi Buell and Richie Rivera

## Instructions

*Centrality measures can be used to predict (positive or negative) outcomes for a node.*

*Your task in this week’s assignment is to identify an interesting set of network data that is available on the web (either through web scraping or web APIs) that could be used for analyzing and comparing centrality measures across nodes.  As an additional constraint, there should be at least one categorical variable available for each node (such as “Male” or “Female”; “Republican”, “Democrat,” or “Undecided”, etc.).*

*In addition to identifying your data source, you should create a high level plan that describes how you would load the data for analysis, and describe a hypothetical outcome that could be predicted from comparing degree centrality across categorical groups.*

*For this week’s assignment, you are not required to actually load or analyze the data.*

## Step 1: Identify Data Source

For this assignment, we use the [One Piece Interaction](https://github.com/jonaszeu/one-piece-interaction-data) dataset from jonaszeu on github. It was created by scraping a wiki of the show and documenting every interaction between two characters and categorizes it into an interaction type (Communication, Confrontation, Cooperation, etc). This dataset is available in a few different ways but we will be using the `.csv` that is created as our data source.

In this dataset, each row corresponds to one interaction (edge) between two characters (nodes). For each node, the available categorical variables are: 
- `Interaction`, which describes the type of interaction between the two characters (e.g. Perception, Communication, Confrontation, Cooperation, Emotional, Indirect, Physical, or Other),
- `Saga`, which describes the saga in which the interaction took place (e.g. East Blue, Alabasta, Sky Island, etc.),
- `Arc`, which describes the smaller story arc in which the interaction took place (e.g. Romance Dawn, Orange Town, Syrup Village, etc.), and
- `Filler`, which indicates whether the interaction took place in a filler episode (True or False).

## Step 2: Analysis Plan

Below, we outline a high-level analysis plan for loading the data described above and calculating characters' degree centrality to predicting whether they are a central character for a given story arc.

1. Import Libraries: Use NetworkX for graph analysis, pandas for data manipulation, and matplotlib for visualization.

2. Load the Dataset: Read the CSV file directly from the GitHub repository (using the following url: 'https://raw.githubusercontent.com/jonaszeu/one-piece-interaction-data/refs/heads/main/one_piece_interactions_1-1085.csv') into a pandas DataFrame.

3. Visualize the Data: Use NetworkX to visualize the data in a graph, with characters as nodes and interactions as edges, including displaying `Saga` and `Arc` attributes with different colors or shapes, and weighting edges based on interaction frequency.

5. Calculate Degree Centrality with NetworkX:** 
    - Calculate degree centrality of each node/character using `NetworkX.degree()`.
    - Aggregate and group scores across categorical variables `Saga` and `Arc`.
    - Aggregate and group scores across `Interaction` to see how characters rely on specific types of communication types (e.g., some characters may solely rely on confrontation).

6. Determine main characters in each saga and arc: 
    - By comparing degree centrality scores we can find out who the main characters are in each saga and arc and how they interact with other main characters.
    - By comparing the degree centrality scores across `Interaction`, we can see how characters rely on different communication styles.

## Presentation Video

Link to presentation video: [insert].