The Star Wars social network
This repository contains the code I used to create my blog post on the Star Wars social network.

Update: I added the 7th episode Star Wars: The Force Awakens and included more network analysis on my blog.

The folder data contains:

  • characters.csv: extracted list of named characters that appeared in the screenplays.

  • aliases.csv: csv file with alternative names for some of the characters

  • charactersPerScene.csv: each line contains name of a character followed by the relative times when the character is mentioned in the screenplay. I used this data to generate character timelines. The values were computed as

     episode number + scene number/number of scenes in episode

    Values [0,1] correspond to mentions in Episode I, [1,2] to Episode II etc.

    [Note that this is not a valid csv file because each line contains a different number of columns]


The folder networks contains:

  • starwars-episode-N-interactions.json contains the social network extracted from Episode N, where the links between characters are defined by the times the characters speak within the same scene.
  • starwars-episode-N-mentions.json contains the social network extracted from Episode N, where the links between characters are defined by the times the characters are mentioned within the same scene.
  • starwars-episode-N-interactions-allCharacters.json is the interactions network with R2-D2 and Chewbacca added in using data from mentions network.
  • starwars-full-... contain the corresponding social networks for the whole set of 6 episodes.

Description of networks

The json files representing the networks contain the following information:


The nodes contain the following fields:

  • name: Name of the character
  • value: Number of scenes the character appeared in
  • colour: Colour in the visualization


Links represent connections between characters. The link information corresponds to:

  • source: zero-based index of the character that is one end of the link, the order of nodes is the order in which they are listed in the “nodes” element
  • target: zero-based index of the character that is the the other end of the link.
  • value: Number of scenes where the “source character” and “target character” of the link appeared together. Please not that the network is undirected. Which character represents the source and the target is arbitrary, they correspond only to two ends of the link.

The other files contain the code used to generate the data files and the plots.

  • interactions.html and episode-interactions.html contain the D3.js code to visualize the networks.
  • The rest of the code is in F# and uses FsLab
  • To run the code, use paket reference manager to download all the dependencies