# Individual Playwrights in the Context of Their Time Periods
In this analysis we will explore the playwrights in the context of the time period in which they creates their comedies. We will focus on the following features:
- The number of dramatic characters
- The mobility coefficient
- The standard range of the number of speaking characters
- The percentage of polylogues
- The percentage of monologues.

Our agenda:
1. For each time period and each feature, we will identify the comedy that that represents the maximum, the minimum, and the comedies that are the closest to the mean.
2. For each time period, we create a summary for selected playwrights:
    - Calculate the mean number of dramatic characters, mean mobility coefficient, the standard range of the number of speaking characters (sigma), the mean percentage of polylogues, and the mean percentage of monologues.
    - Calculate open-form scores for each playwright that tell us how experimental each playwright in the context of his time.
3. Examine the evolution of Jean-François Collin d’Harleville whose oeuvre spanned two literary periods.

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import re

### Define Functions

In [2]:
# update the boundaries as we determined based on our hypothesis testing
def determine_period(row):
    if row <= 1695:
        period = 1
    elif row >= 1696 and row <= 1795:
        period = 2
    else:
        period = 3
    
    return period

In [3]:
def make_list(row):
    speech_dist = []
    for value in row[1:-1].split('\n '):
        speech_dist.append([int(num) for num in re.findall('[0-9]+', value)])
        
    return speech_dist

In [4]:
def speech_distribution_by_period(period_df):
    all_distributions = []
    for row in period_df['speech_distribution']:
        speech_dist_df = pd.DataFrame(row).T
        # rename columns to make sure they start with 1 and not 0
        speech_dist_df.columns = speech_dist_df.iloc[0, :]
        # no need to include the variants as a row - they will be column names
        only_counts_df = pd.DataFrame(speech_dist_df.iloc[1, :])
        only_counts_df.columns = ['raw_numbers']
        only_counts_df['percentage'] = only_counts_df['raw_numbers'] / only_counts_df.sum().values[0]
        all_distributions.append(round(only_counts_df['percentage'], 4))
    period_df_dist = pd.concat(all_distributions, axis=1).fillna(0)
    # take the mean for each period
    mean_per_type = pd.DataFrame(period_df_dist.mean(axis=1)).T 
    mean_per_type.index.name = 'number_of_speakers'
    mean_per_type = (mean_per_type * 100).round(2)
        
    return mean_per_type

In [5]:
def sigma_iarkho(df):
    """
    The function allows calculating standard range following iarkho's procedure.
    Parameters:
        df  - a dataframe where columns are variants, i.e., the distinct number of speakers in the ascending order, 
              e.g. [1, 2, 3, 4, 5] and values weights corresponding to these variants, i.e.,
              the number of scenes, e.g. [20, 32, 18, 9, 1]
    Returns:
        sigma - standard range per iarkho
    """
    weighted_mean_variants = np.average(df.columns.tolist(), weights=df.values[0])
    differences_squared = [(variant - weighted_mean_variants)**2 for variant in df.columns]
    weighted_mean_difference = np.average(differences_squared, weights=df.values[0])
    sigma = round(weighted_mean_difference**0.5, 2)

    return sigma

In [6]:
def sigma_for_playwrights(df, playwrights_lst):
    sigmas = []
    for playwright in playwrights_lst:
        selection = df[(df.last_name == playwright[0]) & (df.first_name == playwright[1])].copy()
        sigma = selection.pipe(speech_distribution_by_period).pipe(sigma_iarkho)
        sigmas.append(sigma)
        
    summary = pd.DataFrame(sigmas, columns=['sigma_iarkho'])
    summary['z_score'] = (summary['sigma_iarkho'] - df['sigma_iarkho'].mean()) / df['sigma_iarkho'].std()
    summary.index = playwrights_lst
    
    return summary

In [7]:
def summary_feature(df, feature):
    print('Mean, standard deviation, median, min and max values for the period:')
    display(df[feature].describe()[['mean', 'std', '50%','min', 'max']].round(2))
    print('Period Max:')
    display(df[df[feature] == df[feature].max()][['last_name', 'first_name', 'title', 'date', feature]].round(2))
    print('Period Min:')
    display(df[df[feature] == df[feature].min()][['last_name', 'first_name', 'title', 'date', feature]].round(2))
    print('The closest to the mean:')
    df_copy = df.copy()
    df_copy['diff_with_mean'] = df_copy[feature].apply(lambda x: np.absolute(x - df_copy[feature].mean()))
    display(df_copy[df_copy['diff_with_mean'] == df_copy['diff_with_mean'].min()][['last_name', 'first_name', 'title', 'date', feature]].round(2))

In [8]:
def authors_by_period(period_df, feature, playwrights_lst):
    period_df = period_df.sort_values(by='last_name')
    period_summary = period_df.groupby(['last_name', 'first_name']).describe()
    period_mean = round(period_df[feature].mean(), 2)
    period_std = round(period_df[feature].std(), 2)
    statistics = ['mean', '50%', 'std', 'min', 'max']
    
    all_authors = period_summary.loc[[author[0] for author in playwrights_lst], 
                                    feature][statistics].round(2)

    all_authors['z_score'] = (all_authors['mean'] - period_mean) / period_std
    return  all_authors

In [9]:
def playwrights_place(df, playwrights, with_mode=True):
    if with_mode:
        column = 'z_score'
        sigma_col = column
    else:
        column = ['mean']
        sigma_col = 'sigma_iarkho'
    summary = pd.DataFrame(authors_by_period(df, 'num_present_characters', playwrights)[column])
    summary.columns = ['num_present_characters']
    # make sure the order of the playwrights is the same
    
    ind = summary.index
    summary['mobility_coefficient'] = authors_by_period(df, 'mobility_coefficient', 
                                                        playwrights).loc[ind, column]
    summary['sigma_iarkho'] = sigma_for_playwrights(df, ind)[sigma_col]
    summary['polylogues'] = authors_by_period(df, 'percentage_polylogues', 
                                                         playwrights).loc[ind, column]
    summary['monologues'] = authors_by_period(df, 'percentage_monologues', 
                                                         playwrights).loc[ind, column]
    summary = summary.round(2)
    if with_mode:
        summary['monologues'] = summary['monologues'].apply(lambda x: -x)
        summary['open_form_score'] = round(summary.apply(lambda x: x.mean(), axis=1), 2)
    
    return summary

### Load Data

In [10]:
data = pd.read_csv('../French_Comedies/Data/French_Comedies_Data.csv')

In [11]:
data.shape

(279, 25)

In [12]:
# include only five act comedies and only original comedies
original_comedies = data[(data['num_acts'] ==5)&
                         (data['translation/adaptation/contrastive'] == 0)].copy()

In [13]:
original_comedies.head(3)

Unnamed: 0,index,title,last_name,first_name,date,translation/adaptation/contrastive,num_acts,url,num_present_characters,num_scenes_text,...,percentage_above_two_speakers,av_percentage_non_speakers,sigma_iarkho,number_scenes_with_discontinuous_change_characters,percentage_scenes_with_discontinuous_change_characters,total_utterances,num_verse_lines,dialogue_vivacity,five_year_intervals,decades
2,F_3,Mélite ou Les fausses lettres,Corneille,Pierre,1629,0,5,http://www.theatre-classique.fr/pages/document...,8,35,...,23.08,0.513,0.906,12,30.769,483.0,1822.0,0.265,1630,1630
3,F_5,La Veuve ou Le Traître trahi,Corneille,Pierre,1633,0,5,http://www.theatre-classique.fr/pages/document...,12,40,...,20.0,3.519,1.062,12,26.667,521.0,2010.0,0.259,1635,1640
4,F_9,La Célimène,Rotrou,Jean de,1633,0,5,http://www.xn--thtre-documentation-cvb0m.com/c...,10,36,...,22.22,8.963,1.092,5,11.111,,,,1635,1640


In [14]:
original_comedies.shape

(257, 25)

In [15]:
sorted_comedies = original_comedies.sort_values(by='date')

In [16]:
sorted_comedies['period'] = sorted_comedies['date'].apply(determine_period)

In [17]:
sorted_comedies = sorted_comedies.rename(columns={'num_scenes_iarkho': 'mobility_coefficient', 
                                                  'percentage_non_duologues': 'percentage_non_dialogues',
                                                  'percentage_above_two_speakers': 'percentage_polylogues'})

In [18]:
# remove white spaces
sorted_comedies['last_name'] = sorted_comedies['last_name'].str.strip()
sorted_comedies['first_name'] = sorted_comedies['first_name'].str.strip()
sorted_comedies['first_name'] = sorted_comedies['first_name'].fillna('')
sorted_comedies['speech_distribution'] = sorted_comedies['speech_distribution'].apply(make_list)

In [19]:
period_one = sorted_comedies[sorted_comedies.period == 1].copy()
period_two = sorted_comedies[sorted_comedies.period == 2].copy()
period_three = sorted_comedies[sorted_comedies.period == 3].copy()

In [20]:
features = ['num_present_characters', 
            'mobility_coefficient',
            'sigma_iarkho',
            'percentage_monologues', 
            'percentage_non_dialogues', 
            'percentage_polylogues']

In [21]:
playwrights_period_one = [('Corneille', 'Pierre'),
                          ('Corneille', 'Thomas'),
                          ('Molière', ''),
                          ('Montfleury', ''),
                          ('Scarron', 'Paul'),
                          ('Boisrobert', 'François Le Métel de'),
                          ('Quinault', 'Philippe'),
                          ('Boursault', 'Edmé'),
                          ('La Fontaine', 'Jean de')]

In [22]:
playwrights_period_two = [('Boissy', 'Louis de'),
                        ('Rousseau', 'Jean-Baptiste'),
                        ('Voltaire', ''),
                        ('Regnard', 'Jean-François'),
                        ('Néricault Destouches', 'Philippe'),
                        ('Dorat', 'Claude-Joseph'),
                        ('Desforges', ''),
                        ('Dancourt', 'Pierre Claude'),
                        ('Nivelle de la Chaussée', ''),
                        ('Collin d’Harleville', 'Jean-François'),
                        ('Boissy', 'Louis de')]

In [23]:
playwrights_period_three = [('Duval', 'Alexandre'),
                           ('Delavigne', 'Casimir'),
                           ('Bonjour', 'Casimir'),
                           ('Gosse', 'Étienne'),
                           ('Collin d’Harleville', 'Jean-François'),
                           ('Picard','Louis-Benoît')]

## Part 1. The Maximum, The Minimum, and The Closest to The Mean

### Period 1 (1629 to 1695)

#### Number of Dramatic Characters

In [24]:
summary_feature(period_one, 'num_present_characters')

Mean, standard deviation, median, min and max values for the period:


mean    11.91
std      4.05
50%     11.00
min      7.00
max     30.00
Name: num_present_characters, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
80,Chalussay,Le Boulanger de,Elomire Hypocondre,1670,30


Period Min:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
25,Corneille,Pierre,La Suite du menteur,1643,7
70,Montfleury,,L’École des filles,1666,7
81,Marcel,,Le Mariage sans mariage,1671,7


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
3,Corneille,Pierre,La Veuve ou Le Traître trahi,1633,12
6,Corneille,Pierre,La Galerie du Palais,1634,12
54,Boisrobert,François Le Métel de,Les Apparences trompeuses,1655,12
60,Molière,,Le Dépit amoureux,1658,12
65,Montfleury,,Le Mari sans femme,1663,12
66,Boursault,Edmé,Les Nicandres,1663,12
68,Boursault,Edmé,Les deux frères gémeaux,1665,12
79,Poisson,Raymond,Les Pipeurs ou les Femmes coquettes,1670,12


#### Mobility Coefficient

In [25]:
summary_feature(period_one, 'mobility_coefficient')

Mean, standard deviation, median, min and max values for the period:


mean    41.9
std     11.6
50%     40.0
min     19.0
max     85.0
Name: mobility_coefficient, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
86,Hauteroche,"Noël Lebreton, sieur de",Crispin musicien,1674,85


Period Min:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
18,Guérin de Bouscal,Guyon,Dom Quixote de la Manche,1639,19


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
39,Boisrobert,François Le Métel de,La Folle Gageure,1651,42
45,Corneille,Thomas,Le Charme de la voix,1653,42
79,Poisson,Raymond,Les Pipeurs ou les Femmes coquettes,1670,42


#### Standard Range of the Number of Speaking Characters (Sigma)

In [26]:
summary_feature(period_one, 'sigma_iarkho')

Mean, standard deviation, median, min and max values for the period:


mean    1.25
std     0.40
50%     1.19
min     0.60
max     3.54
Name: sigma_iarkho, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
80,Chalussay,Le Boulanger de,Elomire Hypocondre,1670,3.54


Period Min:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
69,Quinault,Philippe,La Mère coquette,1665,0.6


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
89,Montfleury,,Crispin gentilhomme,1677,1.25


#### The Percentage of Polylogues

In [27]:
summary_feature(period_one, 'percentage_polylogues')

Mean, standard deviation, median, min and max values for the period:


mean    42.36
std     13.87
50%     41.46
min     10.64
max     86.21
Name: percentage_polylogues, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
41,Boisrobert,François Le Métel de,Les trois Orontes,1652,86.21


Period Min:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
7,Corneille,Pierre,La Suivante,1634,10.64


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
26,Ouville,Antoine d',La Dame suivante,1643,42.31


#### The Percentage of Monologues

In [28]:
summary_feature(period_one, 'percentage_monologues')

Mean, standard deviation, median, min and max values for the period:


mean    13.41
std      9.55
50%     12.12
min      0.00
max     33.33
Name: percentage_monologues, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
4,Rotrou,Jean de,La Célimène,1633,33.33


Period Min:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
18,Guérin de Bouscal,Guyon,Dom Quixote de la Manche,1639,0.0
35,Scarron,Paul,L'Héritier ridicule ou la Dame intéressée,1649,0.0
42,Scarron,Paul,Don Japhet d'Arménie,1652,0.0
46,Boisribert,François,La Belle plaideuse,1654,0.0
61,Corneille,Thomas,Le Galant doublé,1660,0.0
69,Quinault,Philippe,La Mère coquette,1665,0.0
71,Molière,,Le Misanthrope,1666,0.0
73,Corneille,Thomas,Le Baron d'Albikrac,1668,0.0
77,Corneille,Thomas,La Comtesse d'Orgueil,1670,0.0


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
19,Mareschal,André,Le Véritable Capitan Matamore,1640,13.33


## Period 2 (1696 to 1795)

#### Number of Dramatic Characters

In [29]:
summary_feature(period_two, 'num_present_characters')

Mean, standard deviation, median, min and max values for the period:


mean    10.53
std      2.89
50%     10.00
min      6.00
max     24.00
Name: num_present_characters, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
201,Cubières-Palmézeaux,Michel de,L'Homme d'état imaginaire,1789,24


Period Min:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
158,Néricault Destouches,,Le Mari confident,1758,6
163,Bastide,Jean-François de,Le Jeune homme,1766,6


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
106,Dancourt,,Les Enfants de Paris ou la Famille à la mode,1699,11
126,Néricault Destouches,,Les Philosophes amoureux,1729,11
128,Du Fresny,Charles,Le Faux Sincère,1731,11
132,Rousseau,Jean-Baptiste,Les ayeux chimériques ou la comtesse de Critognac,1735,11
138,Néricault Destouches,,L'Ambitieux et l'Indiscrète,1737,11
187,Bièvre,Marquis de,Le Séducteur,1783,11
192,Borel,,Le Méfiant,1786,11
197,Collin d’Harleville,Jean-François,L'Optimiste ou l'Homme content de tout,1788,11
202,Fabre d'Églantine,,Le Philinte de Molière,1790,11


#### Mobility Coefficient

In [30]:
summary_feature(period_two, 'mobility_coefficient')

Mean, standard deviation, median, min and max values for the period:


mean    49.29
std     11.32
50%     48.50
min     29.00
max     91.00
Name: mobility_coefficient, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
164,Chauveau,,L'homme de cour,1767,91


Period Min:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
142,Nivelle de la Chaussée,Pierre Claude,Mélanide,1741,29


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
104,Rousseau,Jean-Baptiste,Le Flatteur,1696,49
113,Regnard,Jean-François,Le Légataire Universel,1708,49
124,Néricault Destouches,,Le Philosophe marié,1727,49
140,Boissy,Louis de,Les Dehors trompeurs,1740,49
148,Voltaire,,La Prude,1747,49
156,La Noue,Jean-Baptiste,La Coquette corrigée,1757,49
173,Bret,Antoine,Le protecteur bourgeois ou la confiance trahie,1772,49
209,Reynier,L.,L'Avare fastueux,1794,49


#### Sigma

In [31]:
summary_feature(period_two, 'sigma_iarkho')

Mean, standard deviation, median, min and max values for the period:


mean    1.12
std     0.29
50%     1.10
min     0.63
max     1.98
Name: sigma_iarkho, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
154,Rousseau,Jean-Baptiste,La Femme qui ne parle point ou l'hypocondre,1751,1.98


Period Min:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
179,Dorat,Claude-Joseph,Roséïde ou l'Intrigant,1779,0.63


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
120,Néricault Destouches,,Le Médisant,1715,1.12


#### The Percentage of Polylogues

In [32]:
summary_feature(period_two, 'percentage_polylogues')

Mean, standard deviation, median, min and max values for the period:


mean    31.62
std     11.03
50%     31.50
min      8.70
max     58.06
Name: percentage_polylogues, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
132,Rousseau,Jean-Baptiste,Les ayeux chimériques ou la comtesse de Critognac,1735,58.06


Period Min:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
180,Delon,,"Le Financier, comédie en cinq actes et en vers",1779,8.7


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
107,Regnard,Jean-François,Démocrite amoureux,1700,31.58


#### The Percentage of Monologues

In [33]:
summary_feature(period_two, 'percentage_monologues')

Mean, standard deviation, median, min and max values for the period:


mean    22.08
std      7.75
50%     21.88
min      3.23
max     40.48
Name: percentage_monologues, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
163,Bastide,Jean-François de,Le Jeune homme,1766,40.48


Period Min:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
132,Rousseau,Jean-Baptiste,Les ayeux chimériques ou la comtesse de Critognac,1735,3.23


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
207,Collin d’Harleville,Jean-François,Le Vieux célibataire,1792,22.03


### Period Three (1796 to 1847)

#### Number of Dramatic Characters

In [34]:
summary_feature(period_three, 'num_present_characters')

Mean, standard deviation, median, min and max values for the period:


mean    12.11
std      4.28
50%     11.00
min      6.00
max     28.00
Name: num_present_characters, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
259,Gosse,Étienne,"Les Jésuites, ou les autres Tartuffes",1827,28


Period Min:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
278,Augier,Émile,Gabrielle,1849,6


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,num_present_characters
215,Fabre d'Églantine,,Les Précepteurs,1799,12
228,Étienne,Charles-Guillaume,Les Deux gendres,1810,12
242,Théaulon,Emmanuel,"L'Artiste ambitieux, ou l'Adoption",1820,12
250,Bonjour,Casimir,L’Éducation ou les deux Cousine,1823,12
258,Bonjour,Casimir,L’Argent ou les Moeurs du Jour,1826,12


#### Mobility Coefficient

In [35]:
summary_feature(period_three, 'mobility_coefficient')

Mean, standard deviation, median, min and max values for the period:


mean    54.58
std      9.99
50%     53.50
min     34.00
max     86.00
Name: mobility_coefficient, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
259,Gosse,Étienne,"Les Jésuites, ou les autres Tartuffes",1827,86


Period Min:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
240,Duval,Alexandre,Le Chevalier d'industrie,1818,34


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,mobility_coefficient
227,Picard,Louis-Benoît,Les Capitulations de conscience,1809,55
245,Delavigne,Casimir,Les Comédiens,1820,55


#### Sigma

In [36]:
summary_feature(period_three, 'sigma_iarkho')

Mean, standard deviation, median, min and max values for the period:


mean    1.33
std     0.32
50%     1.26
min     0.84
max     2.54
Name: sigma_iarkho, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
262,Delavigne,Casimir,La Princesse Aurélie,1828,2.54


Period Min:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
241,Michaud,L. G.,"Le Faux ami de cour, ou le Danger des liaisons",1818,0.84


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,sigma_iarkho
275,Rey,Charles,La séduction et l'amour vrai,1847,1.33


#### The Percentage of Polylogues

In [37]:
summary_feature(period_three, 'percentage_polylogues')

Mean, standard deviation, median, min and max values for the period:


mean    38.09
std      8.70
50%     37.31
min     18.46
max     65.00
Name: percentage_polylogues, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
239,Merville,,"La Famille Glinet, ou Les premiers temps de la...",1818,65.0


Period Min:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
252,Bonjour,Casimir,"Le Mari à bonnes fortunes, ou La Leçon",1824,18.46


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,percentage_polylogues
258,Bonjour,Casimir,L’Argent ou les Moeurs du Jour,1826,37.88


#### The Percentage of Monologues

In [38]:
summary_feature(period_three, 'percentage_monologues')

Mean, standard deviation, median, min and max values for the period:


mean    20.08
std      5.95
50%     18.94
min     11.36
max     43.08
Name: percentage_monologues, dtype: float64

Period Max:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
252,Bonjour,Casimir,"Le Mari à bonnes fortunes, ou La Leçon",1824,43.08


Period Min:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
278,Augier,Émile,Gabrielle,1849,11.36


The closest to the mean:


Unnamed: 0,last_name,first_name,title,date,percentage_monologues
244,Gosse,Étienne,Le Flatteur,1820,20.0


## Part 2. Summary For Selected Playwrights

We calculate the mean number of dramatic characters, mean mobility coefficient, the standard range of the number of speaking characters (sigma), the mean percentage of polylogues, and the mean percentage of monologues. We quanitify how much a playwright prefers to write comedies with more open forms based on all features and within the context of the period in which he or she writes.

#### Open Form Scores:
1. For all features, we will calculate **the z-score**: $z=(x−μ)/σ$ where where μ is the mean of the feature and σ is the standard deviation of this feature. For the percentage of monologues, we will reverse the sign, i.e., will use **- z-score** since it is the lower value of the percentage of monologues that indicates a more open form.
3. The **open form score** will be the mean z-score. For example, if Boisrobert has the following z-scores (0.07, -0.35, 0.41, 0.68) and -z-score for the percentage of monologues of 0.97, his **open form score** = (0.07 -0.35 +  0.41 + 0.68 + 0.97) / 5 ≈ 0.36. The open form scores can be positive and negative, where high positive number indicates the most open form, whereas the high negative number indicates the least open form.

### Period One

In [39]:
playwrights_place(period_one, playwrights_period_one, with_mode=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Boisrobert,François Le Métel de,12.2,37.8,1.42,51.74,4.17
Boursault,Edmé,16.0,44.8,1.2,37.86,16.03
Corneille,Pierre,10.12,42.0,0.95,23.08,21.49
Corneille,Thomas,11.25,42.08,1.14,51.76,5.69
La Fontaine,Jean de,19.0,49.0,1.61,38.4,22.4
Molière,,11.14,32.57,1.43,37.1,11.06
Montfleury,,9.38,50.38,0.98,32.27,19.52
Quinault,Philippe,8.33,38.0,1.05,45.61,5.52
Scarron,Paul,11.5,39.5,1.48,44.72,13.57


#### Z-Scores and Open-Form Scores

In [40]:
playwrights_place(period_one, playwrights_period_one, with_mode=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues,open_form_score
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Boisrobert,François Le Métel de,0.07,-0.35,0.41,0.68,0.97,0.36
Boursault,Edmé,1.01,0.25,-0.14,-0.32,-0.27,0.11
Corneille,Pierre,-0.44,0.01,-0.76,-1.39,-0.85,-0.69
Corneille,Thomas,-0.16,0.02,-0.28,0.68,0.81,0.21
La Fontaine,Jean de,1.75,0.61,0.88,-0.29,-0.94,0.4
Molière,,-0.19,-0.8,0.44,-0.38,0.25,-0.14
Montfleury,,-0.62,0.73,-0.68,-0.73,-0.64,-0.39
Quinault,Philippe,-0.88,-0.34,-0.51,0.23,0.83,-0.13
Scarron,Paul,-0.1,-0.21,0.56,0.17,-0.02,0.08


### Period Two

In [41]:
playwrights_place(period_two, playwrights_period_two, with_mode=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Boissy,Louis de,8.71,45.57,0.95,26.61,21.89
Collin d’Harleville,Jean-François,10.67,60.33,1.07,36.26,18.99
Dancourt,,11.0,56.33,1.31,45.6,14.8
Desforges,,11.5,53.75,1.42,37.3,20.61
Dorat,Claude-Joseph,11.5,44.5,1.32,23.56,23.21
Nivelle de la Chaussée,Pierre Claude,8.75,42.75,1.02,24.03,24.91
Néricault Destouches,,10.38,50.08,1.25,35.52,21.04
Regnard,Jean-François,12.0,51.2,1.3,39.22,19.52
Rousseau,Jean-Baptiste,11.0,37.75,1.4,42.1,16.86
Voltaire,,10.25,37.25,1.31,42.94,11.6


#### Z-Scores and Open-Form Scores

In [42]:
playwrights_place(period_two, playwrights_period_two, with_mode=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues,open_form_score
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Boissy,Louis de,-0.63,-0.33,-0.59,-0.45,0.02,-0.4
Collin d’Harleville,Jean-François,0.05,0.98,-0.17,0.42,0.4,0.34
Dancourt,,0.16,0.62,0.66,1.27,0.94,0.73
Desforges,,0.34,0.39,1.04,0.51,0.19,0.49
Dorat,Claude-Joseph,0.34,-0.42,0.69,-0.73,-0.15,-0.05
Nivelle de la Chaussée,Pierre Claude,-0.62,-0.58,-0.35,-0.69,-0.37,-0.52
Néricault Destouches,,-0.05,0.07,0.45,0.35,0.13,0.19
Regnard,Jean-François,0.51,0.17,0.63,0.69,0.33,0.47
Rousseau,Jean-Baptiste,0.16,-1.02,0.97,0.95,0.67,0.35
Voltaire,,-0.1,-1.06,0.66,1.03,1.35,0.38


### Period Three

In [43]:
playwrights_place(period_three, playwrights_period_three, with_mode=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues,open_form_score
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Bonjour,Casimir,-0.17,1.18,0.55,-0.34,-1.04,0.04
Collin d’Harleville,Jean-François,-0.26,0.39,-0.27,0.59,0.66,0.22
Delavigne,Casimir,0.56,-0.81,1.05,-0.09,-0.12,0.12
Duval,Alexandre,-0.87,-0.82,-0.43,0.24,-0.17,-0.41
Gosse,Étienne,1.61,2.09,1.15,-0.39,-0.26,0.84
Picard,Louis-Benoît,0.03,0.37,-0.46,0.11,0.59,0.13


#### Z-Scores and Open-Form Scores

In [44]:
playwrights_place(period_three, playwrights_period_three, with_mode=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bonjour,Casimir,11.4,66.4,1.5,35.1,26.24
Collin d’Harleville,Jean-François,11.0,58.5,1.24,43.22,16.18
Delavigne,Casimir,14.5,46.5,1.66,37.35,20.82
Duval,Alexandre,8.4,46.4,1.19,40.14,21.09
Gosse,Étienne,19.0,75.5,1.69,34.74,21.63
Picard,Louis-Benoît,12.25,58.25,1.18,39.01,16.55


## Part 3. The Evolution of Jean-François Collin d’Harleville (1755-805)

In [45]:
print('Number dramatic characters in period two:',  
      round(period_two[period_two.last_name=='Collin d’Harleville']['num_present_characters'].mean(), 2))
print('Number dramatic characters in period three:', 
      period_three[period_three.last_name=='Collin d’Harleville']['num_present_characters'].mean())

Number dramatic characters in period two: 10.67
Number dramatic characters in period three: 11.0


In [46]:
print('Mobility coefficient in period two:',
      round(period_two[period_two.last_name=='Collin d’Harleville']['mobility_coefficient'].mean(), 2))
print('Mobility coefficient in period three:', 
      period_three[period_three.last_name=='Collin d’Harleville']['mobility_coefficient'].mean())

Mobility coefficient in period two: 60.33
Mobility coefficient in period three: 58.5


In [47]:
print('Standard range of the number of speaking characters in period two:', 
      period_two[period_two.last_name=='Collin d’Harleville'].pipe(speech_distribution_by_period).pipe(sigma_iarkho))
print('Standard range of the number of speaking characters in period three:',
       period_three[period_three.last_name=='Collin d’Harleville'].pipe(speech_distribution_by_period).pipe(sigma_iarkho))

Standard range of the number of speaking characters in period two: 1.07
Standard range of the number of speaking characters in period three: 1.24


In [48]:
print('The percentage of polylogues in period two:', 
     round(period_two[period_two.last_name=='Collin d’Harleville']['percentage_polylogues'].mean(), 2))
print('The percentage of polylogues in period three:',
     round(period_three[period_three.last_name=='Collin d’Harleville']['percentage_polylogues'].mean(), 2))

The percentage of polylogues in period two: 36.26
The percentage of polylogues in period three: 43.22


In [49]:
print('The percentage of monologues in period two:',
     round(period_two[period_two.last_name=='Collin d’Harleville']['percentage_monologues'].mean(), 2))
print('The percentage of monologues in period three:',
     round(period_three[period_three.last_name=='Collin d’Harleville']['percentage_monologues'].mean(), 2))

The percentage of monologues in period two: 18.99
The percentage of monologues in period three: 16.18


Collin d’Harleville created comedies in both periods two and three and in both periods. He tended to be amongs more experimental playwrights (his open-form score in period two was 0.34 and in period three it was 0.22).  The evolution of his comedies is fascinating as it echoes broader literary trends in reguards to all features except for mobility coefficient. The number of dramatic characters increases from period two (10.64) to period three (11.96). The mean number of dramatic characters in Collin d’Harleville's comedies increases from 10.67 in period two to 11 in period three. Standard range of the number of speaking characters increases from 1.12 in period two to 1.33 in period three. The standard range in Collin d’Harleville's comedies increases from 1.07 in period two to 1.24 in period three. The percentage of polylogues increases from 31.37 in period two to 37.69 in period three. The percentage of polylogues increases in Collin d’Harleville's comedies increases from 36.26 in period two to 43.22 in period three. The percentage of monologues decreases from 22.08 in period two to 20.35 in period three. The percentage of monologues in Collin d’Harleville's comedies decreases from 18.99 to 16.18. The mobility coefficient increases from 49.56 in period two to 54.31 in period three. Collin d’Harleville's comedies, however, tend to have higher mobility coefficient in period two (60.33) and slightly lower in period three (58.5), but in both cases, he is above the mean mbolity coefficient of the period.