<font style='font-size:1.5em'>**NOTEBOOK TITLE**</font>

<font style='font-size:1.2em'>Notebook Subtitle</font>

**AUTHOR:**  

**DEPARTMENT:** 

**DATE:** 

---


## Imports 
Section with library imports.

In [3]:
import pandas as pd
import networkx as nx
import altair as alt

DATA_DIR = '../data'

### Custom functions
Section with some general functions used over the notebook

In [4]:
def read_data(data, data_dir=DATA_DIR):
    """Reads a JSON file and returns a dataframe.

    Args:
        data (str): Name of the JSON file.
        data_dir (str): Path to the directory where the JSON file is located.

    Returns:
        df (pandas.DataFrame): Dataframe containing the data from the JSON file.

    Raises:
        NotImplementedError: If the JSON file contains more than one list.
                             Know how to handle this? Please open a PR!
    """

    # I found it easier to read the JSON file as a series and then convert it to a dataframe
    df = pd.read_json(f'{data_dir}/{data}.json', typ='series')

    # If the JSON file only contains one single list, parse it as a dataframe
    if len(df) == 1:
        df = pd.DataFrame.from_dict(df.iloc[0])
    else:
        error = f'JSON file {data} contains more than one list. Please check the file.'
        raise NotImplementedError(error)

    return df

# 💽 1: The Data

I am using data collected by [jeffreylancaster/game-of-thrones](https://github.com/jeffreylancaster/game-of-thrones) to create a network of characters from the Game of Thrones series. 

I cloned the repository and copied the `data` folder to the `data` folder of my project. Because the license of the data is permissive, I copied the data to my project and gave credit to the original author (always do that!).

## 1.1: Load the data

Let's look at the data I am using.

**Character data**

In [5]:
df_characters = read_data('characters')
df_characters[['characterName', 'houseName', 'royal']].head(20)

Unnamed: 0,characterName,houseName,royal
0,Addam Marbrand,,
1,Aegon Targaryen,Targaryen,True
2,Aeron Greyjoy,Greyjoy,
3,Aerys II Targaryen,Targaryen,True
4,Akho,,
5,Alliser Thorne,,
6,Alton Lannister,Lannister,
7,Alys Karstark,,
8,Amory Lorch,,
9,Anguy,,


**Characters groups**

In [6]:
df_characters_groups = read_data('characters-groups')
df_characters_groups

Unnamed: 0,name,characters
0,Stark,"[Arya Stark, Benjen Stark, Bran Stark, Catelyn..."
1,Targaryen,"[Daenerys Targaryen, Drogon, Rhaegal, Viserion..."
2,Baratheon,"[Joffrey Baratheon, Myrcella Baratheon, Renly ..."
3,Lannister,"[Cersei Lannister, Jaime Lannister, Kevan Lann..."
4,Night's Watch,"[Alliser Thorne, Eddison Tollett, Grenn, Jeor ..."
5,Dothraki,"[Doreah, Irri, Khal Drogo, Rakharo, Qhono]"
6,Greyjoy,"[Balon Greyjoy, Euron Greyjoy, Theon Greyjoy, ..."
7,Tyrell,"[Loras Tyrell, Mace Tyrell, Margaery Tyrell, O..."
8,Wildlings,"[Baby Sam, Craster, Gilly, Mag the Mighty, Man..."
9,Martell,"[Doran Martell, Ellaria Sand, Nymeria Sand, Ob..."


**Episode data**

In [7]:
df_episodes = read_data('episodes')
df_episodes.shape

(73, 8)

In [7]:
print(df_episodes.head(1).to_markdown())

|    |   seasonNum |   episodeNum | episodeTitle     | episodeLink       | episodeAirDate   | episodeDescription                                                                                                                                                                                                                         | openingSequenceLocations                               | scenes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

**Inspect Scenes**

In [45]:
plot_df['sceneStart'].dt.minute

0        0
1        1
2        3
3        3
4        3
        ..
4160    17
4161    17
4162    18
4163    18
4164    19
Name: sceneStart, Length: 4165, dtype: int64

In [59]:
(plot_df['sceneStart'] - plot_df['sceneStart'].min()).dt.total_seconds()

0         34.0
1         99.0
2        198.0
3        205.0
4        212.0
         ...  
4160    4638.0
4161    4669.0
4162    4685.0
4163    4694.0
4164    4774.0
Name: sceneStart, Length: 4165, dtype: float64

In [122]:
plot_df = \
    [{'seasonNum': row.seasonNum,
      'episodeNum': row.episodeNum,
      'sceneStart': pd.to_datetime(scene['sceneStart']),
      'sceneEnd': pd.to_datetime(scene['sceneEnd']),
      'location': scene['location'],
      'numCharacters': len(scene['characters']),
      'characters': sorted([character['name'] for character in scene['characters']])}
      for _, row in df_episodes.iterrows()
      for scene in row['scenes']
    ]
plot_df = pd.DataFrame(plot_df)
plot_df['characters']  = plot_df['characters'].apply(lambda x: ', '.join(x))

# Convert from datetime to seconds (integer)
min_time = plot_df['sceneStart'].min()
plot_df['sceneStart'] = (plot_df['sceneStart'] - min_time).dt.total_seconds() / 60
plot_df['sceneEnd'] = (plot_df['sceneEnd'] - min_time).dt.total_seconds() / 60
plot_df['timeSpan'] = plot_df['sceneEnd'] - plot_df['sceneStart']

plot_df

Unnamed: 0,seasonNum,episodeNum,sceneStart,sceneEnd,location,numCharacters,characters,timeSpan
0,1,1,0.566667,1.650000,The Wall,3,"Gared, Waymar Royce, Will",1.083333
1,1,1,1.650000,3.300000,North of the Wall,3,"Gared, Waymar Royce, Will",1.650000
2,1,1,3.300000,3.416667,North of the Wall,2,"Wight Wildling Girl, Will",0.116667
3,1,1,3.416667,3.533333,North of the Wall,1,Will,0.116667
4,1,1,3.533333,3.633333,North of the Wall,0,,0.100000
...,...,...,...,...,...,...,...,...
4160,8,6,77.300000,77.816667,The North,1,Sansa Stark,0.516667
4161,8,6,77.816667,78.083333,The Sunset Sea,1,Arya Stark,0.266667
4162,8,6,78.083333,78.233333,The Wall,3,"Ghost, Jon Snow, Tormund Giantsbane",0.150000
4163,8,6,78.233333,79.566667,North of the Wall,3,"Ghost, Jon Snow, Tormund Giantsbane",1.333333


In [123]:
# Define the selection dropdown
season_select = alt.binding_select(options=list(range(1, plot_df['seasonNum'].max()+1)))
season_filter = alt.selection_point(fields=['seasonNum'], bind=season_select, name='Season', value=1)

# Create the Altair chart
chart = alt.Chart(plot_df).properties(
    title='Bubble Plot of Scenes',
    width=900,
    height=400
).add_params(
    season_filter
)

# Add the rectangle marks
rects = chart.mark_rect(color='lightgray', opacity=0.2, stroke='black', strokeWidth=0.5).encode(
    y=alt.Y('episodeNum:N', axis=alt.Axis(title='Episode Number')),
    x=alt.X('sceneStart:Q', axis=alt.Axis(title='Time Span (min)', values=list(range(0, 100, 10)))),
    x2=alt.X2('sceneEnd:T'),
    color=alt.Color('numCharacters:Q', 
                    legend=alt.Legend(title='Num Characters'),
                    scale=alt.Scale(scheme='cividis')),
    tooltip=[alt.Tooltip('location', title='Scene Location'),
             alt.Tooltip('numCharacters', title='Number of Characters'),
             alt.Tooltip('characters', title='Names')]
).transform_filter(
    season_filter
)

chart = rects # + circles

chart = chart.configure_title(fontSize=40)
chart = chart.configure_axis(labelFontSize=20, titleFontSize=30, grid=False)
chart = chart.configure_legend(labelFontSize=20, titleFontSize=20)



chart