<h1 style="text-align:center;border:2px solid Violet;">The Office</h1>

<img src = https://img.nbc.com/sites/nbcunbc/files/images/2016/1/19/MDot-TheOffice-640x360-MP.jpg> </img>

The IMdB website has a description of The Office(US) which runs as follows: 

### " A mockumentary on a group of typical office workers, where the workday consists of ego clashes, inappropriate behavior, and tedium."

This in short describes The Office perfectly. 

<h1 style="text-align:center;border:3px solid Blue;">Kaggle, this is foolofatook.</h1>

1. [The Data](#data)

2. [Seasonal Trends](#trends)

3. [Dialogues Analysis](#dialogues)

4. [FInding Similar Episodes](#simepi)

5. [Season Wise First and Last Lines](#linesseason)

6. [First and Last Lines of Cast](#linescast)

In [None]:
import pandas as pd
import re

#importing plotly for creating visualization
import plotly_express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode
import plotly.io as pio

# importing tensorflow and tf_hub for finding similar episodes
import tensorflow as tf
import tensorflow_hub as hub
from sklearn.decomposition import PCA

# setting default template to plotly_white for all visualizations
pio.templates.default = "plotly_white"
# for charts to be rendered properly
init_notebook_mode()

<a id=#data></a>
<h1 style="text-align:center;border:3px solid Blue;">The Data</h1>

In [None]:
episodesData = pd.read_csv('../input/the-office-dataset/the_office_series.csv')
transcripts = pd.read_csv('../input/the-office-us-complete-dialoguetranscript/The-Office-Lines-V3.csv', encoding = 'latin-1')

[](http://)Here's what the Dataset that contains the information about each episode looks like. 

In [None]:
episodesData.head()

We also have a dataset which contains all the dialogues in the show along with the name of the speaker. Here's what it looks like. 

In [None]:
transcripts.head()

<a id=#trends></a>
<h1 style="text-align:center;border:3px solid Blue;">Seasonal Trends</h1>

In [None]:
# dataframe for average duration of each Season
averageDurationRating = episodesData.groupby(['Season'])[['Duration','Ratings']].mean().reset_index()

# plotting a scatter plot of Season vs Ratings and adding a trendline
fig = px.scatter(averageDurationRating, x = 'Season', y = 'Ratings',trendline = 'lowess',size = 'Duration',
                 title = '<b>Ratings over each Season</b>')

# adding The Office cast image to the background
fig.add_layout_image(
        dict(
            source="https://www.scarymommy.com/wp-content/uploads/2019/10/the-office-quotes-587x308.jpg",
            xref="x",
            yref="y",
            x=0,
            y=8.7,
            sizex=10,
            sizey=1.5,
            opacity = 0.6,
            sizing="stretch",
            layer="below")
)

# updating the Scatter plot markers
fig.update_traces(marker=dict(line=dict(width=2,color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
fig.show()

We see that the ratings of the show have sort of an inverted-U shape, with the ratings of the show increasing for the first few seasons, then showing a drastic decrease. 

## Some Observations and Personal Notes

- Season 1 of the show has a rating of around 8, this can be attributed by the fact that the show is particular unique in the way it portrays humour which can be quite cringy for first time viewers. I personally loved the first season, but I can understand it would require some getting used to realize that The Office is not - 'OMG Hahaha', but rather - 'Ummm...that is hilarious but I'm not sure if I should laugh'
    - Also, season 1 only had 6 episodes, thus serves as a way for the writers to just see how the audience react and based on those responses, they could adapt and modify characters in the subsequent seasons. Also, season 1 could also be served as the gateway to all the seasons, new viewers would try The Office based on some recommendation and then might end up finding things not what they expected, while not realizing the other seasons are much much more enjoyable. 
- The show was at its peak rating at Season 3 and after that there is a steady decline up until Season 5, Season 6 was a sharp dip, which was recovered in Season 7. As an Office fan, I tend to ignore the existence of Season 8. Season 9 was pretty good for most parts. 
- The Average Duration of episodes has also changed across seasons. Season 3 which has the highest Rating has an average episode duration of 25 mins, while Season 4 has the highest average episode duration of 32.5 mins.  

In [None]:
# dataframe of average Votes and Viewership each season
views_votes = episodesData.groupby('Season')[['Votes','Viewership']].mean().reset_index()

# creating subplots for plotting the Average Votes and Average Viewership each season
fig = make_subplots(rows = 1, cols = 2, horizontal_spacing=0.2,
                    subplot_titles=['Average Number of Votes Each Season', 'Average Viewership Each Season'])


# trace for Average Votes Each Season
trace0 = go.Bar(
    x = views_votes['Votes'],
    y = views_votes['Season'],
    orientation = 'h',
    marker_color = px.colors.sequential.tempo[::-1],
    name = 'Votes'
)

fig.add_trace(trace0, row = 1, col = 1)
fig.update_xaxes(title_text = 'Average Number of Votes', row = 1, col = 1)
fig.update_yaxes(title_text = 'Season', row = 1, col = 1)

# trace for Average Viewership Each Season
trace1 = go.Bar(
    x = views_votes['Viewership'],
    y = views_votes['Season'],
    orientation = 'h',
    marker_color = px.colors.sequential.deep[::-1],
    name = 'Viewership'
)

fig.add_trace(trace1, row = 1, col = 2)
fig.update_xaxes(title_text = 'Average Viewership', row = 1, col = 2)
fig.update_yaxes(title_text = 'Season', row = 1, col = 2)


# changing font size and font color
fig.update_layout(
    showlegend = False,
    font=dict(
        family="Fira Code",
        size=16,
        color="RebeccaPurple"
    )
)
fig.show()

The average number of votes per episode has a somewhat consistent downward trend. 

The average Viewership on the other hand has a general upward trend till season 5 after whih we see it declining. 

In [None]:
episodesData['TotalRating'] = episodesData['Ratings'] * episodesData['Votes']
averageDurationTotalRating = episodesData.groupby(['Season'])[['Duration','TotalRating']].mean().reset_index()
fig = px.scatter(averageDurationTotalRating, x = 'Season', y = 'TotalRating',trendline = 'ols',size = 'Duration', 
                 title = '<b>Total Rating across Seasons</b>')
fig.show()

In order to capture the true effect of each season, I thought of Measuring what I call as the Total Rating, which is nothing but the product of Rating of each episode with the Viewership of Each Episode. 

Averaging Total Rating across seasons we see a much more prominent downward curve, which confirms that the **popularity of the show declined over the seasons.**

In [None]:
directedEpisodes = episodesData['Director'].value_counts().reset_index()
directedEpisodes.columns = ['Director','Number of Episodes']
directedEpisodes = directedEpisodes[directedEpisodes['Director'] != 'See full summary']

directorAvgRating = episodesData.groupby('Director')['Ratings'].mean().reset_index()
directorAvgRating = directorAvgRating[directorAvgRating['Director'] != 'See full summary']
directorAvgRating = directorAvgRating.sort_values(by = 'Ratings',ascending = False)

fig = make_subplots(rows = 1, cols = 2,
                    subplot_titles=['Number of Episodes Directed', 'Average Rating of Episodes'], 
                   horizontal_spacing=0.2)

colors = ['#5cb3ce'] * 10
colors[9] = '#1c4d5d'

trace0 = go.Bar(
    x = directedEpisodes['Number of Episodes'][:10],
    y = directedEpisodes['Director'][:10],
    orientation = 'h',
    name = 'Directors',
    marker_color = colors,
    opacity = 0.8,
    text = directedEpisodes['Number of Episodes'][:10],
    textposition='inside'
)

fig.add_trace(trace0, row = 1, col = 1)
fig.update_xaxes(title_text = 'Number of Episodes Directed', row = 1, col = 1)
fig.update_yaxes(title_text = 'Director Name', row = 1, col = 1)

colors = ['#ff2d4c']*10
colors[2] = '#ca001e'

trace1 = go.Bar(
    x = directorAvgRating['Ratings'][:10],
    y = directorAvgRating['Director'][:10],
    orientation = 'h',
    name = 'Ratings',
    marker_color = colors,
    opacity = 0.8,
    text = directorAvgRating['Ratings'][:10],
    textposition = 'inside'
)

fig.add_trace(trace1, row = 1, col = 2)
fig.update_xaxes(title_text = 'Average Rating', row = 1, col = 2)
fig.update_yaxes(title_text = 'Director Name', row = 1, col = 2)

fig.update_layout(
    showlegend = False,
    font=dict(
        family="Fira Code",
        size=10,
    )
)
fig.show()

Paul Feig and Randall Einhorn have directed the most number of episodes. Paul was the director of some of the most memorable episodes.

Paul Feig had directed Goodbye Michael, Dinner Party (an episode that is both amazing yet difficult to watch) and Niagara : Part 1 & 2. Randall has also directed some great episodes like Golden Ticket and Product Recall. 

Harold Ramis has the highest Average Rating per directed episode at 8.825.Although Harold has only directed 4 episodes, he has directed episodes like Safety Training. 

He is followed by Jason Reitman and our very own Steve Carell. It should however be noted the Jason Reitman has only directed one episode - Local Ad

A Few of the Cast have also Directed episodes, let us see the average ratings in their direction

In [None]:
castDirectors = ['Paul Lieberstein', 'B.J. Novak','Steve Carell', 'John Krasinski','Rainn Wilson','Mindy Kaling']
fig = px.bar(directorAvgRating[directorAvgRating['Director'].isin(castDirectors)],
            x = 'Ratings',y='Director',orientation='h',color='Ratings',color_continuous_scale='peach'
            )
fig.update_layout(coloraxis_showscale=False)
fig.show()

Steve Carell has the highest average Rating of the Episodes he has directed. 

In [None]:
fig = make_subplots(rows = 3, cols = 2, specs = [[{"type":"table"}] * 2]*3,
                                subplot_titles = castDirectors,
                                horizontal_spacing=0.03,vertical_spacing = 0.07)
for i, director in enumerate(castDirectors):
    df = episodesData[episodesData['Director'] == director].sort_values(by = 'Ratings',ascending = False)
    trace = go.Table(header = dict(values = ['<b>Episode Name</b>','<b>Rating</b>']), cells = dict(values = [df['EpisodeTitle'], df['Ratings']]))
    fig.add_trace(trace, row = (i//2)+1 , col = (i%2)+1)
fig.update_layout(height = 500,margin=dict(l=80, r=80, t=100, b=20),
                  title = { 'text' : '<b>Cast Directed Episodes</b>', 'x' : 0.5})
fig.show()

TIL Scott's Tots - one of my favourite episodes is Directed by B.J. Novak. 

### "I fell in love with these kids, and I didn't want to see them fall victim to the system. So I made them a promise.I told them that if they gradated from high school, I would pay for their college education. I have made some empty promises in my life, but hands down, that was the most generous."

In [None]:
def countNumberofGuestStars(guestStars):
    if(guestStars == ''):
        return 0
    else:
        stars = guestStars.split(',')
        return len(stars)

episodesData['GuestStars'] = episodesData['GuestStars'].fillna('')
episodesData['Number of Guests'] = episodesData['GuestStars'].apply(lambda x: countNumberofGuestStars(str(x)))

In [None]:
df = episodesData.sort_values(by = 'Number of Guests', ascending = False)
trace = go.Table(header = dict(values = ['<b>Episode Name</b>','<b>Rating</b>','<b>Number of Guests</b>']), 
                 cells = dict(values = [df['EpisodeTitle'][:3], df['Ratings'][:3], df['Number of Guests'][:3]]))
fig = go.Figure([trace])
fig.update_layout(title = '<b>Top 3 Episodes with the Most number of Guest Stars',height = 300)
fig.show()

In [None]:
fig = px.box(episodesData, y = 'Ratings',x = 'Number of Guests', color = 'Number of Guests', title = 'Ratings Distribution With the Number of Guests')
fig.update_layout(showlegend = False)
fig.show()

Having more Guest stars definitely seems to help increase the ratings

The Guest Stars in these Episodes are:

- **Finale** : Joan Cusack, Ed Begley Jr, Rachel Harris, Nancy Walls
- **The Search** : Will Arnett,  Ray Romano, Jim Carrey, Ricky Gervais
- **Stress Relief** : Cloris Leachman, Jack Black, Jessica Alba

In [None]:
numOfGuests = episodesData.groupby('Season')['Number of Guests'].mean().reset_index()
fig = px.bar(numOfGuests, x = 'Season', y = 'Number of Guests', title = '<b>Average Number of Guests per Season</b>')
fig.show()

We see a high average number of Guests per Episode in the 7th Season. 

Possible Explanation - After seeing their then all time low rating in Season 6, the creators may have thought to add more episodes of Guests in order to help gain some popularity back. If this was the reason, they were more or less successful as S7 had a better rating than S6. 

The Season 7 Episodes with Guests in them are :

- **The Search** - 4 Guest Stars
- **Nepotism** - 1 Guest Star
- **The Sting** - 1 Guest Star
- **Training Day** - 1 Guest Star
- **The Seminar** - The Guest Star! Ricky Gervais

In [None]:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
writerDf = pd.DataFrame({})
writerDf['WriterList'] = episodesData['Writers'].apply(lambda x: [y.strip() for y in x.split('|')])

mlb.fit(writerDf['WriterList'])
#creating columns = the classes of the multilabelbinarizer
writerDf[mlb.classes_] = mlb.transform(writerDf['WriterList'])
writerDf.drop('WriterList',axis = 1, inplace = True)

In [None]:
writerEpisodes = writerDf.sum().reset_index()
writerEpisodes.columns = ['Writer', 'Number of Episodes']
writerEpisodes = writerEpisodes.sort_values(by = 'Number of Episodes')
fig = px.bar(writerEpisodes,x = 'Number of Episodes', y = 'Writer', title = 'Number of Episodes Written',
             height  = 1000, color = 'Number of Episodes', color_continuous_scale='greens', template = 'plotly_dark')
fig.show()

Mindy Kaling, Paul Lieberstein and B.J.Novak have written 22,16,16 episodes respectively. These are multi-talented people who can not only act, direct but also write. 

<a id=#dialogues></a>
<h1 style="text-align:center;border:3px solid Blue;">Dialogues Analysis</h1>

In [None]:
print(f"There are {transcripts['speaker'].nunique()} unique speakers in the show")

In [None]:
numberOfLinesSpoken = transcripts['speaker'].value_counts().reset_index()
numberOfLinesSpoken.columns = ['Speaker','Number of Dialogues']
numberOfLinesSpoken = numberOfLinesSpoken.sort_values(by = 'Number of Dialogues', ascending = False)
fig = px.bar(numberOfLinesSpoken[:10][::-1],x = 'Number of Dialogues', y = 'Speaker', orientation = 'h', 
             title = '<b>Top 10 Speakers with the Most Lines</b>',
            color_continuous_scale=px.colors.sequential.Blugrn,color = 'Number of Dialogues',text = 'Number of Dialogues')
fig.show()

It is no surprise that Michael had a lot of Dialogues to say, but the fact that he has such a high lead over the other characters is Amazing. Especially since we only see him for 7 Seasons and once in the Series Finale

In [None]:
seasonLines = transcripts.groupby('season')['line'].count().reset_index()
seasonLines['Number of Episodes'] = episodesData.groupby('Season')['EpisodeTitle'].count().to_list()
seasonLines['Average Duration'] = episodesData.groupby('Season')['Duration'].mean().to_list()
seasonLines['Number of Lines Per Episode'] = seasonLines['line'] / seasonLines['Number of Episodes']
seasonLines['Rate of Dialogues'] = seasonLines['line'] / seasonLines['Average Duration']

fig = make_subplots(rows = 1, cols = 2, 
                    subplot_titles=['Number of Lines Per Episode', 'Rate of Dialogue Delivery'],
                   horizontal_spacing=0.1)
trace0 = go.Bar(x = seasonLines['Number of Lines Per Episode'], y = seasonLines['season'], name = 'Number of Lines Per Episode',
               orientation = 'h', text = seasonLines['Number of Lines Per Episode'], textposition='inside',texttemplate='%{text:.2f}')
fig.add_trace(trace0, row = 1, col = 1)
trace1 = go.Bar(x = seasonLines['Rate of Dialogues'], y = seasonLines['season'], name = 'Rate of Dialogue Delivery',
               orientation = 'h', text = seasonLines['Rate of Dialogues'], textposition='inside',texttemplate='%{text:.2f}')
fig.add_trace(trace1, row = 1, col = 2)
fig.update_layout(showlegend = False)
fig.show()

We see that Season 4 has the highest average number of lines per episode, but also has a very small rate of Dialogue Delivery. Thus, it shows that Season 4 has pretty long episode durations

In [None]:
fig = make_subplots(rows = 3,cols = 3, 
                    subplot_titles=[f'Season {i}' for i in range(1,10)],
                   horizontal_spacing=0.1)

for i in range(3):
    for j in range(3):
        season = i*3 + j + 1
        seasonDf = transcripts[transcripts['season'] == season]
        speakerDialogues = seasonDf['speaker'].value_counts().reset_index()
        speakerDialogues.columns = ['Speaker','Number of Dialogues']
        speakerDialogues = speakerDialogues.sort_values(by = 'Number of Dialogues', ascending = False).iloc[:3,]
        trace = go.Bar(x = speakerDialogues['Number of Dialogues'], y = speakerDialogues['Speaker'], name = f'Season {season}', orientation = 'h')
        fig.add_trace(trace, row = i+1, col = j+1)
fig.update_layout(showlegend = False, title = '<b>Top 3 Speakers each Season</b>')
fig.show()

Each season, the boss gets to speak the most number of lines.

I am not going to say out loud that I hate Andy (well I might just have), but S08 being the worst season in the Office Series and Andy having the most number of lines in that season is something I cannot ignore. 

Coincidence? I think not!

In [None]:
episodeDialogues = transcripts.groupby('title')['line'].count().reset_index()
episodeDialogues = pd.merge(episodeDialogues,episodesData, left_on = 'title', right_on = 'EpisodeTitle')
fig  = px.scatter(episodeDialogues, x = 'line', y = 'Ratings', trendline = 'ols', color = episodeDialogues['Season'].astype('category'),
                 hover_name='EpisodeTitle',
                 title = 'Relation Between Number of Dialogues and Rating')
fig.show()

PS. You can click on each Season in the Legend and Isolate and observe the trends for each season. 

In [None]:
def formatLine(line):
    line = line.lower()
    line = re.sub(r'[^\w\s]','',line)
    return line

transcripts['formatted_lines'] = transcripts['line'].apply(lambda x:formatLine(x))

Calculating how many times does the characters say the following terms and showing the 5 Most Frequent users of these terms and catchphrases. 

In [None]:
df = transcripts[transcripts['formatted_lines'].str.contains('dunder mifflin')]
df = df['speaker'].value_counts().reset_index()[:5]
df.columns = ['Speaker', 'Number of References']
fig = go.Figure()
for i in range(5):
    trace = go.Indicator(
        mode = "number",
        value = df.iloc[i,:]['Number of References'],
        title = {"text": f"<b>{df.iloc[i,:]['Speaker']}</b>"},
        domain = {'x': [0.1*(i+1), 0.2*(i+1)], 'y': [0, 1]}
    )
    fig.add_trace(trace)
fig.update_layout(title = '<b>Dunder Mifflin Score</b>',height = 200)
fig.show()

In [None]:
df = transcripts[transcripts['formatted_lines'].str.contains("thats what she said")]
df = df['speaker'].value_counts().reset_index()[:3]
df.columns = ['Speaker', 'Number of References']
fig = go.Figure()
for i in range(len(df)):
    trace = go.Indicator(
        mode = "number",
        value = df.iloc[i,:]['Number of References'],
        title = {"text": f"<b>{df.iloc[i,:]['Speaker']}</b>"},
        domain = {'x': [0.1*(i+1), 0.30*(i+1)], 'y': [0, 1]}
    )
    fig.add_trace(trace)
fig.update_layout(title = "<b>That's What She Said Score</b>",height = 200)
fig.show()

In [None]:
df = transcripts[transcripts['formatted_lines'].str.contains("boss|manager")]
df = df['speaker'].value_counts().reset_index()[:5]
df.columns = ['Speaker', 'Number of References']
fig = go.Figure()
for i in range(len(df)):
    trace = go.Indicator(
        mode = "number",
        value = df.iloc[i,:]['Number of References'],
        title = {"text": f"<b>{df.iloc[i,:]['Speaker']}</b>"},
        domain = {'x': [0.1*(i+1), 0.2*(i+1)], 'y': [0, 1]},
    )
    fig.add_trace(trace)
fig.update_layout(title = "<b>Boss / Manager Score</b>",height = 200)
fig.show()

In [None]:
df = transcripts[transcripts['formatted_lines'].str.contains("sale|sales")]
df = df['speaker'].value_counts().reset_index()[:5]
df.columns = ['Speaker', 'Number of References']
fig = go.Figure()
for i in range(len(df)):
    trace = go.Indicator(
        mode = "number",
        value = df.iloc[i,:]['Number of References'],
        title = {"text": f"<b>{df.iloc[i,:]['Speaker']}</b>"},
        domain = {'x': [0.1*(i+1), 0.2*(i+1)], 'y': [0, 1]},
    )
    fig.add_trace(trace)
fig.update_layout(title = "<b>Sale Score</b>",height = 200)
fig.show()

In [None]:
df = transcripts[transcripts['formatted_lines'].str.contains("sabre")]
df = df['speaker'].value_counts().reset_index()[:5]
df.columns = ['Speaker', 'Number of References']
fig = go.Figure()
for i in range(len(df)):
    trace = go.Indicator(
        mode = "number",
        value = df.iloc[i,:]['Number of References'],
        title = {"text": f"<b>{df.iloc[i,:]['Speaker']}</b>"},
        domain = {'x': [0.1*(i+1), 0.2*(i+1)], 'y': [0, 1]},
    )
    fig.add_trace(trace)
fig.update_layout(title = "<b>Sabre Score</b>",height = 200)
fig.show()

<a id=#simepi></a>
<h1 style="text-align:center;border:3px solid Blue;">Finding Similar Episodes</h1>

In [None]:
episodeCorpus = pd.DataFrame({'Episode Number' : [], 'Full Text': [], 'Season' : []})
episodes = []
episodeTexts = []
seasons = []
for season in range(1,10):
    subSeason = transcripts[(transcripts['season'] == season)]
    for episodeNo, df in subSeason.groupby('episode'):
        full_text = df['formatted_lines'].values
        episodes.append(episodeNo)
        episodeTexts.append(" ".join(full_text).lower())
        seasons.append(season)
episodeCorpus['Episode Number'] = episodes
episodeCorpus['Full Text'] = episodeTexts
episodeCorpus['Season'] = seasons

In [None]:
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" 
model = hub.load(module_url)

features = model(episodeCorpus['Full Text'].values)
pca = PCA(n_components=2, random_state=42)
reduced_features = pca.fit_transform(features)

episodeTitles = episodesData['EpisodeTitle'].to_list()
episodeTitles.pop(108)
episodeTitles.pop(95)

episodeCorpus['Dimension 1'] = reduced_features[:,0]
episodeCorpus['Dimension 2'] = reduced_features[:,1]
episodeCorpus['Episode Titles'] = episodeTitles
fig = px.scatter(episodeCorpus, x = 'Dimension 1', y = 'Dimension 2', color = 'Season', hover_name='Episode Titles',
                title = '<b>Finding Similar Episodes</b>',
                template = 'plotly_white')
fig.update_traces(marker=dict(size=12))
fig.show()

This graph is extremely interesting as it helps us find episodes that were more of less similar in terms of their content. This clear segregation of these episodes makes it easier for us to find episodes which are similar to the ones we like. 

We can clearly show how the festival themed episodes are clubbed together on the right side of the graph and are quite different from the other episodes. 

We can also see S05E25 - Broke is much different from all the other episodes. 

We can use this graph to find episodes that do not fall in the central grouping and thus are somewhat different. Few of these Episodes are :

- **Sexual Harassment**
- **PDA**
- **Trivia**
- **Livin' the Dream**

<a id=#linesseason></a>
<h1 style="text-align:center;border:3px solid Blue;">Season wise First and Last Lines</h1>

In [None]:
for season, seasonTranscript in transcripts.groupby('season'):
    print(f'Season {season}')
    print('-' * 50)
    print(f"First Line of the Season was said by {seasonTranscript.iloc[0,]['speaker']}")
    print(seasonTranscript.iloc[0,]['line'].strip())
    print('-' * 50)
    print(f"Last Line of the Season was said by {seasonTranscript.iloc[-1,]['speaker']}")
    print(seasonTranscript.iloc[-1,]['line'].strip())
    print('#' * 50)

<a id=#linescast></a>
<h1 style="text-align:center;border:3px solid Blue;">First and Last Lines of the cast members</h1>

In [None]:
mainCast = ['Michael','Dwight','Jim','Pam','Andy','Kevin','Angela','Erin','Oscar','Ryan','Creed','Kelly','Stanley','Jan']
for castMember in mainCast:
    temp = transcripts[transcripts['speaker'] == castMember]
    print(f'Character - {castMember}')
    print('-' * 50)
    print(f"First Line -> {temp.iloc[0,]['line'].strip()}")
    print('-' * 50)
    print(f"Last Line-> {temp.iloc[-1,]['line'].strip()}")
    print('#' * 50)

Work in Progress... 
I am not done yet. 

# That's what She Said