# Bacon Number


[Jian Tao](https://coehpc.engr.tamu.edu/people/jian-tao/), Texas A&M University

June 30, 2023

The [Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#Bacon_numbers) of an actor or actress is the number of degrees of separation (see Six degrees of separation) they have from actor Kevin Bacon, as defined by the game known as [Six Degrees of Kevin Bacon](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon). The higher the Bacon number, the farther away from Kevin Bacon the actor is.

For example, Kevin Bacon's Bacon number is 0. If an actor works in a movie with Kevin Bacon, the actor's Bacon number is 1. If an actor works with an actor who worked with Kevin Bacon in a movie, the first actor's Bacon number is 2, and so forth.

Using the file Movie_Data.txt in the repository to

1. construct a graph with pandas and NetworkX;
2. implement a function to find Bacon number of an arbitrary actor/actress;
3. with your function, find the Bacon number of Bruce Lee and Elizabeth Taylor or your favorite actor/actress.

The movie data was downloaded and uncompressed from https://oracleofbacon.org/data.txt.bz2, which is collected with a Ruby script by Patrick Reynolds at https://github.com/piki/wikipedia-film-database.

In [None]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from itertools import combinations

Read in the movie data and explore the content.

In [None]:
# this is to detect if we are running on Google Colab.
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
  datafile = "https://raw.githubusercontent.com/jtao/dswebinar/master/networkx/case2.2/Movie_Data.txt"
else:  
  datafile = "Movie_Data.txt"
df = pd.read_json(datafile, lines = True)

In [None]:
df.head(5)

In [None]:
df.info()

List all the movies that Bruce Lee played.

In [None]:
for i in range(0,len(df)):
    try:
        if "Bruce Lee" in df["cast"][i]:
            print (df['title'][i])
    except:
        pass

To get the Bacon Number, we first create a complex graph that associates different actors/actresses together based on their movies. In the graph, the actor/actress names are the nodes and if two actors/actresses are in the same movie they will be connected by an edge.

In [None]:
G = nx.Graph()
for x in range(0,len(df)):
    myList = list(combinations(df['cast'][x],2))
    G.add_edges_from(myList)

Define a function to find the Bacon Number of an actor/actress.

In [None]:
def Bacon_Number(Actor_Name):
    bcn_num = nx.shortest_path_length(G,'Kevin Bacon', Actor_Name)
    print ("Bacon Number of %s is %d" % (Actor_Name, bcn_num))
    shortest_paths = nx.all_shortest_paths(G, 'Kevin Bacon', Actor_Name)
    for sp in shortest_paths:
        print(sp)
    return bcn_num

#function is used to determine the bacon number of a certain actor/actress, where the actor/actress name will be input

Let's find the Bacon Number of your favourite actor/actress!

In [None]:
Bacon_Number('Bruce Lee')
#determines bacon number of Bruce Lee

In [None]:
Bacon_Number('Elizabeth Taylor')
#determines Bacon Number of Elizabeth taylor