It is widely known that screenwriters are influenced by the geographical, historical and social context in which they find themselves. Important social and historical events have a strong impact on consciousness of peoples. This often make change their tastes, interests and problematics to which they face and pay attention. This will probably have an impact on the themes of the films since the screenwriters on the one hand seek to adapt to the tastes of the public and their interests, and on the other hand often to expose, raise or provide elements of answers to the problems of the time.

The research question at the core of this investigation is: Does cinema reflect political events? In other words, can the evolution of the frequencies of different socio-political themes in films be explained in a coherent manner in relation to the historical and socio-political events that occur over time?  What are the prevailing socio-political themes depicted in movies across different quinquennium, and how do they align with historical events and cultural shifts?
 
To answer this question we will focus on the 20 th century period and on a set of predefined socio politcal themes that are linked to key historical and social events or period of the 20 th century. Movie plots of films are analysed and for each movie plot a score is assigned with respect to each of the themes. This score reflect the distance between the theme and the plot summary and it is obtained by computing the embeddings for each theme and each movie plot and then measuring the cosine similarity between the movie plot and the theme respective embeddings. Similarity scores are then converted in binary data (0/1) by applying a threshold. 1 means a certain theme is present in the movie plot and 0 otherwise. Then data are grouped by years and average frequency of each socio political theme is computed within a time window of 5-year period (quinquenium)that moove year by year. The evolution of these frequencies of each theme of interest are plotted accross time and compared. The significance of increase or decrease of the frequency of a certain theme at a certain date is assessed by computing it's z score that determine if a frequency value is significantly different from the mean computed over all frequency values.

In [276]:
# Import important librairies.
import pandas as pd
import numpy as np
import spacy
import json
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import scipy.stats as stats
import ast

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
from sentence_transformers import SentenceTransformer, util
from tqdm.notebook import tqdm
tqdm.pandas()

pd.options.mode.chained_assignment = None  

In [2]:
#Loading of the dataset that conist in a merging of IMDB and CMU dataset.
df = pd.read_csv('dataset/cmu_merged_with_imdb.csv')

# Remove of movies without information about the release data or plot summarie.
df = df.dropna(subset = ['MovieReleaseDate','PlotSummaries'])

#Sorting movies by release date in ascending order
df = df.sort_values(by='MovieReleaseDate')

#Pre-visualisation.
df.head()

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,IMDbID,MovieName,MovieReleaseDate,MovieBoxOfficeRevenue,MovieRuntime,MovieLanguages,MovieCountries,PlotSummaries,averageRating,genres
67918,tt0000009,Miss Jerry,1894.0,,,"{""/m/06ppq"": ""Silent film""}","{""/m/09c7w0"": ""United States of America""}",After finding out that her father is suffering...,5.3,Romance
81467,,Corbett and Courtney Before the Kinetograph,1894.0,,,"{""/m/06ppq"": ""Silent film""}","{""/m/09c7w0"": ""United States of America""}",James J. Corbett and Peter Courtney both take ...,,
9265,,The Photographical Congress Arrives in Lyon,1895.0,,,"{""/m/06ppq"": ""Silent film""}","{""/m/0f8l9c"": ""France""}",Photographers leave the deck of a riverboat in...,,
80428,,Le Manoir du diable,1896.0,,3.0,{},"{""/m/0f8l9c"": ""France""}",The film starts off with a large bat flying in...,,
49416,tt0000147,The Corbett-Fitzsimmons Fight,1897.0,100000.0,,{},{},The film no longer exists in its entirety; how...,5.3,"Documentary,News,Sport"


In [3]:

# List of predefined socio-political theme of interest for the 20 th century.
themes = ['War','cold war','space race','Economic hardship, struggle for survival','Revolution, communism',
          'China revolution','Berlin wall','Soviet union dissolution','decolonization','women rights','apartheid',
          'civil rights, racial segregation']

# generation of dates/years of the 20th century
years = np.arange(0,110,1)
years = years + 1900

# Computation of embeddings for each theme by using Siamese BERT-Networks models.
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
embeddings_themes = model.encode(themes)

In [4]:
#Computation of embeddings for each movie plot by using Siamese BERT-Networks models.
PlotSumEmbeddings =df['PlotSummaries'].progress_apply(lambda x: model.encode(x))

  0%|          | 0/23096 [00:00<?, ?it/s]

In [313]:
for i in range(0,len(themes)):
    #measurement of the cosine similarity of each movie plot embedding with respect to each theme embedding
    df[themes[i]] = PlotSumEmbeddings.apply(lambda x: util.pytorch_cos_sim(x, embeddings_themes[i]))
    # conversion of the similarity value into a binary response. 1 if value is above a threshold meaning
    # that the theme is present in movie plot. Othervise 0 meaning the theme is absent from the movie plot.
    df[themes[i]] = df[themes[i]].apply(lambda x: True if x > 0.3 else (False))

# back up for further analysis: z score analysis.   
df2 = df 

#Visualisation
df.head()

Unnamed: 0,IMDbID,MovieName,MovieReleaseDate,MovieBoxOfficeRevenue,MovieRuntime,MovieLanguages,MovieCountries,PlotSummaries,averageRating,genres,...,space race,"Economic hardship, struggle for survival","Revolution, communism",China revolution,Berlin wall,Soviet union dissolution,decolonization,women rights,apartheid,"civil rights, racial segregation"
67918,tt0000009,Miss Jerry,1894.0,,,"{""/m/06ppq"": ""Silent film""}","{""/m/09c7w0"": ""United States of America""}",After finding out that her father is suffering...,5.3,Romance,...,False,False,False,False,False,False,False,False,False,False
81467,,Corbett and Courtney Before the Kinetograph,1894.0,,,"{""/m/06ppq"": ""Silent film""}","{""/m/09c7w0"": ""United States of America""}",James J. Corbett and Peter Courtney both take ...,,,...,False,False,False,False,False,False,False,False,False,False
9265,,The Photographical Congress Arrives in Lyon,1895.0,,,"{""/m/06ppq"": ""Silent film""}","{""/m/0f8l9c"": ""France""}",Photographers leave the deck of a riverboat in...,,,...,False,False,False,False,False,False,False,False,False,False
80428,,Le Manoir du diable,1896.0,,3.0,{},"{""/m/0f8l9c"": ""France""}",The film starts off with a large bat flying in...,,,...,False,False,False,False,False,False,False,False,False,False
49416,tt0000147,The Corbett-Fitzsimmons Fight,1897.0,100000.0,,{},{},The film no longer exists in its entirety; how...,5.3,"Documentary,News,Sport",...,False,False,False,False,False,False,False,False,False,False


In [314]:
# function that take as input the name of theme and output of dataframe with significativz z-score and years associated
# to them
def z_score(nom_theme) : 
    i = themes.index(nom_theme)
    year =  years[0:len(years)-5]
    frequency = f_list[i]
    z_score = stats.zscore(frequency)
    d = pd.DataFrame({'MovieReleaseDate': years[0:len(years)-5] ,'zscore':  z_score})
    d = d.loc[d["zscore"] >= 2.0]
    return d  

In [315]:

# Création of PlotLy figures.
fig0 = go.Figure()
fig1 = go.Figure()
fig2 = go.Figure()
fig3 = go.Figure()
fig4 = go.Figure()
fig5 = go.Figure()
fig6 = go.Figure()
fig7 = go.Figure()
fig8 = go.Figure()
fig9 = go.Figure()
fig10 = go.Figure()
fig11 = go.Figure()
figure_list = [fig0,fig1,fig2,fig3,fig4,fig5,fig6,fig7,fig8,fig9,fig10,fig11]
#Loop that compute frequency of themes within a windows of 5 years that mooves year by year
for j in range(0, 12):
    #loop over them of index from 0 to 4
    f = []
    for i in range(0, len(years) - 5):
        # loop over lower and upper bounds values of each 5 years window
        a = years[i] # lower bound of a 5 year window
        b = years[i + 4] # upper bound of a 5 year window
        
        # computation of the frequency of films about each theme within the 5 year window
        data = df[(df['MovieReleaseDate'] >= a) & (df['MovieReleaseDate'] <= b)]
        decade_frequency = len(data[data[themes[j]] == True]) / len(data)
        f.append(decade_frequency)

    # Adding the trace to the figure
    figure_list[j].add_trace(go.Scatter(x=years[1:], y=f, mode='lines', name=themes[j]))

#Plotting the figures
fig = make_subplots(rows=4, cols=1,subplot_titles = ['War','cold war','space race','Economic hardship, struggle for survival','Revolution, communism',
          'China revolution','Berlin wall','Soviet union dissolution','decolonization','women rights','apartheid',
          'civil rights, racial segregation'],vertical_spacing = 0.04,horizontal_spacing = 0.04)
fig.add_trace(fig8.data[0], row=1, col=1)
fig.add_vrect(x0=1919, x1=1923,fillcolor = 'red', opacity=0.3, row=1, col=1)
fig.add_vrect(x0=1914, x1=1918,annotation_text = 'World War I',fillcolor = "green",opacity = 0.2,line_width = 0, row=1, col=1)
fig.add_vrect(x0=1939, x1=1945,annotation_text = 'World War II',fillcolor = 'orange', opacity=0.2,line_width = 0, row=1, col=1)
fig.add_vrect(x0=1960, x1=1960,annotation_text = 'Start decolonization',fillcolor = 'brown',annotation_position="top left", opacity=0.4,line_width = 1, row=1, col=1)
fig.add_trace(fig9.data[0], row=2, col=1)
fig.add_vrect(x0=1920, x1=1924,fillcolor = 'red', opacity=0.3, row=2, col=1)
fig.add_vrect(x0=1954, x1=1968,annotation_text = 'Civil Right movement',fillcolor = "yellow",opacity = 0.4,line_width = 0, row=2, col=1)
fig.add_vrect(x0=1960, x1=1970,annotation_text = 'Women Right movement',fillcolor = 'violet', opacity=0.4,line_width = 0,annotation_position="bottom left", row=2, col=1)
fig.add_trace(fig10.data[0], row=3, col=1)
fig.add_vrect(x0=1911, x1=1915,fillcolor = 'red', opacity=0.3, row=3, col=1)
fig.add_vrect(x0=1954, x1=1968,annotation_text = 'Civil Right movement',fillcolor = "yellow",opacity = 0.4,line_width = 0, row=3, col=1)
fig.add_vrect(x0=1960, x1=1970,annotation_text = 'Women Right movement',fillcolor = 'violet', opacity=0.4,line_width = 0,annotation_position="bottom left", row=3, col=1)
fig.add_vrect(x0=1994, x1=1994,annotation_text = 'End of Apartheid',fillcolor = 'black', opacity=0.4,line_width = 1,annotation_position="top left", row=3, col=1)
fig.add_trace(fig11.data[0], row=4, col=1)
fig.add_vrect(x0=1960, x1=1970,fillcolor = 'red', opacity=0.3, row=4, col=1)
fig.add_vrect(x0=1954, x1=1968,annotation_text = 'Civil Right movement',fillcolor = "yellow",opacity = 0.4,line_width = 0, row=4, col=1)
fig.add_vrect(x0=1960, x1=1970,annotation_text = 'Women Right movement',fillcolor = 'violet', opacity=0.4,line_width = 0,annotation_position="bottom left", row=4, col=1)
fig.add_vrect(x0=1994, x1=1994,annotation_text = 'End of Apartheid',fillcolor = 'black', opacity=0.4,line_width = 1,annotation_position="top left",row=4, col=1)
fig.update_layout(height=1000, width=1000, title_text='Evolution of frequency of different socio-political theme accross years',title_x=1, title_y=1)

fig_json = fig.to_json()
with open('socio-political-themes-frequency-evolution', 'w') as json_file:
    json.dump(json.loads(fig_json), json_file, indent=4)
fig.show()

##### Analysis
In what concern alignement of depicted socio political themes with historical event and cultural shifts we will proceed by the following way. We will look for presence of significant z-score (higher or equal to 2). The time period that match to a serie of successive years or years close to yeach other with significant z-score for a certain theme are considered as time period or a time window where theme was adressed significantly higher compared to other time periods.This times windows are represented as red vertical rectangles. We look then for time periods of historical or political event that overlap or are located near our hight z score window in orde to suggest the influence of the political events on the frequency of a certain theme, or the effect of awarness of a certain problematic or thought that further lead to social and political event. The different periods that match to historical events are represented as vertical rectangles of different colors. We see the following results.

The theme of decolonization,apartheid and women rights are all signifcantly adressed during the inter war period. Their respective hight z-score time windows are [1919-1923], [1911-1915] and [1920,1924]. We nottice this theme are highly adressed in cinema far before the social and political events linked to these themes as beginning of decolonization, end of apartheid, start of civil and women right movements. A possible suggestion to understand this phenomenon could be the fact that interwar period was marked by particularly hard conditions people living in colonies and women which lead to an awarness of people and thus screenwriter for this theme.As cinema about these theme probably contributed to amplify this awarness and lead further to decolonization, anti apartheid and women right movements.

For the theme of civil right movement and racial segregation we can see a time windows [1960-1970] that is made of year with hight z score close to each other. Thus the theme of civil right and racial or sex segregation was particularly adressed during this time window. As this period overlap period of civil right and women right movement, we can sugest that probably this movement has inspired this theme in the world of cinema.

To conclude we can say that themes human right, racial or sex segregation undergo a significant increase of their frequency in a time period that is characterised by social and politcal event linked to this themes meaning that this themes reflects these events. For other themes as apartheid, decolonization or women right the time period during which they were particularly assessed occured far before the events linked to that theme. In these case we could hypothetise that movies don't reflect the events themselves but an awarness of peoples to problematic linked with this events and that probably lead to these event in a future. In these situations the cinema gives a premonitory overview of these events

In [309]:
PlotSumEmbeddings.to_csv('embedding', index=False)