## Exploration of Russian Individual Comedies and Authors

In this analysis, we will explore Russian comedies that represent the minimum and the maximum of each feature as well as the comedies that are the closest to the min. Additionally, we will analyze the speech distribution of each playwright. Finally, we will generate open-form scores for each comedian, which will help us determine how experimental he was in the history of the Russian four and five-act comedy in verse.

To account for different number of acts (4 vs. 5), we multiplied the mobility coefficient, which directly depended on the number of acts, by 5/4 for the four-act comedies and rounded to the nearest integer.

In [1]:
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import re
from os import listdir
from sklearn.metrics.pairwise import cosine_similarity
import json

In [2]:
def ranking_for_woe(df, feature):
    df_copy = df.copy()
    if df_copy[df_copy.title == 'Gore ot uma'][feature].values[0] > df_copy[feature].mean():
        bool_val = True
        print('Ranking From Highest to Lowest:')
    else:
        bool_val = False
        print('Ranking From Lowest to Highest:')
    
    reverse_dict = {True: False, False: True}
    df_copy['rank'] = df_copy[feature].rank(ascending= reverse_dict[bool_val])
    df_copy = df_copy.sort_values(by='rank', ascending=bool_val)
    display(df_copy[['title', 
                      'last_name', 
                      'first_name', 
                      'creation_date',
                       feature, 
                       'rank']])

In [3]:
def summary_features(df, feature):
    print('Mean, standard deviation, median, min and max values for the period:')
    display(pd.DataFrame(df[feature].describe()[['mean', 'std', '50%','min', 'max']]).round(2))
    print('Period Max:')
    display(pd.DataFrame(df[df[feature] == df[feature].max()][['last_name', 
                                                               'first_name', 
                                                               'title', 
                                                               'creation_date', feature]]).round(2))
    print('Period Min:')
    display(pd.DataFrame(df[df[feature] == df[feature].min()][['last_name', 
                                                               'first_name', 
                                                               'title', 
                                                               'creation_date', 
                                                               feature]]).round(2))
    print('The closest to the mean:')
    df_copy = df.copy()
    df_copy['diff_with_mean'] = df_copy[feature].apply(lambda x: np.absolute(x - df_copy[feature].mean()))
    display(pd.DataFrame(df_copy[df_copy['diff_with_mean'] == df_copy['diff_with_mean'].min()][['last_name', 
                                                                                   'first_name', 
                                                                                   'title', 
                                                                                   'creation_date', 
                                                                                   feature]]).round(2))
    ranking_for_woe(df, feature)

In [4]:
def coefficient_unused_dramatic_characters(data):
    total_present = 0
    total_non_speakers = 0
    for act in data['play_summary'].keys():
        for scene in data['play_summary'][act].keys():
            # identify the raw number of non-speaking dramatic characters
            num_non_speakers = len([item for item in data['play_summary'][act][scene].items() 
                                if (item[1] == 0  or item[1] == 'non_speaking') and item[0] not in ['num_utterances',
                                                                   'num_speakers',
                                                                   'perc_non_speakers']])
            total_non_speakers += num_non_speakers
            # calculate the total number of dramatic characters
            total_present += (data['play_summary'][act][scene]['num_speakers'] + num_non_speakers)
    coefficient_unused = (total_non_speakers / total_present ) * 100        
    
    return coefficient_unused

In [5]:
def get_data(input_directory):
    all_files = [f for f in listdir(input_directory) if f.count('.json') > 0]
    dfs = []
    for file in all_files:
        with open(input_directory + '/' + file) as json_file:
            data = json.load(json_file)
            not_used = coefficient_unused_dramatic_characters(data)
            df = pd.DataFrame([not_used], columns=['coefficient_unused'], index=[file.replace('.json','')])
            dfs.append(df)
            
    features_df = pd.concat(dfs, axis=0, sort=False).round(2)
    
    return features_df

In [6]:
def make_list(row):
    speech_dist = []
    for value in row[1:-1].split('\n '):
        speech_dist.append([int(num) for num in re.findall('[0-9]+', value)])
        
    return speech_dist

In [7]:
def speech_distribution_by_period(period_df):
    all_distributions = []
    for row in period_df['speech_distribution']:
        speech_dist_df = pd.DataFrame(row).T
        # rename columns to make sure they start with 1 and not 0
        speech_dist_df.columns = speech_dist_df.iloc[0, :]
        # no need to include the variants as a row - they will be column names
        only_counts_df = pd.DataFrame(speech_dist_df.iloc[1, :])
        only_counts_df.columns = ['raw_numbers']
        only_counts_df['percentage'] = only_counts_df['raw_numbers'] / only_counts_df.sum().values[0]
        all_distributions.append(round(only_counts_df['percentage'], 4))
    period_df_dist = pd.concat(all_distributions, axis=1).fillna(0)
    # take the mean for each period
    mean_per_type = pd.DataFrame(period_df_dist.mean(axis=1)).T 
    mean_per_type.index.name = 'number_of_speakers'
    mean_per_type = (mean_per_type * 100).round(2)
        
    return mean_per_type

In [8]:
def sigma_iarkho(df):
    """
    The function allows calculating standard range following iarkho's procedure.
    Parameters:
        df  - a dataframe where columns are variants, i.e., the distinct number of speakers in the ascending order, 
              e.g. [1, 2, 3, 4, 5] and values weights corresponding to these variants, i.e.,
              the number of scenes, e.g. [20, 32, 18, 9, 1]
    Returns:
        sigma - standard range per iarkho
    """
    weighted_mean_variants = np.average(df.columns.tolist(), weights=df.values[0])
    differences_squared = [(variant - weighted_mean_variants)**2 for variant in df.columns]
    weighted_mean_difference = np.average(differences_squared, weights=df.values[0])
    sigma = round(weighted_mean_difference**0.5, 2)

    return sigma

In [9]:
def sigma_summary(df, playwrights_lst):
    sigmas = []
    for playwright in playwrights_lst:
        selection = df[(df.last_name == playwright[0]) & (df.first_name == playwright[1])].copy()
        sigma = selection.pipe(speech_distribution_by_period).pipe(sigma_iarkho)
        sigmas.append(sigma)
        
    summary = pd.DataFrame(sigmas, columns=['sigma_iarkho'])
    summary['z_score'] = (summary['sigma_iarkho'] - df['sigma_iarkho'].mean()) / df['sigma_iarkho'].std()
    summary.index = playwrights_lst
    
    return summary

In [10]:
def authors_data(data_df, feature):
    overall_mean = round(data_df[feature].mean(), 2)
    overall_std = round(data_df[feature].std(), 2)
    statistics = ['mean'] 
    all_authors = pd.DataFrame(data_df.groupby(['last_name', 'first_name'])[feature].mean())
    all_authors.columns= ['mean']
    all_authors['z_score'] = (all_authors['mean'] - overall_mean) / overall_std
    
    return  all_authors

In [11]:
def playwrights_place(df, with_z_score=True):
    if with_z_score:
        column = 'z_score'
        sigma_col = column
    else:
        column = ['mean']
        sigma_col = 'sigma_iarkho'
    summary = pd.DataFrame(authors_data(df, 'num_present_characters')[column])
    summary.columns = ['num_present_characters']
    # make sure the order of the playwrights is the same
    
    ind = summary.index
    summary['mobility_coefficient'] = authors_data(df, 'mobility_coefficient', 
                                                        ).loc[ind, column]
    summary['sigma_iarkho'] = sigma_summary(df, ind)[sigma_col]
    summary['polylogues'] = authors_data(df, 'percentage_polylogues', 
                                                         ).loc[ind, column]
    summary['monologues'] = authors_data(df, 'percentage_monologues', 
                                                         ).loc[ind, column]
    summary = summary.round(2)
    if with_z_score:
        summary['monologues'] = summary['monologues'].apply(lambda x: -x)
        summary['open_form_score'] = round(summary.apply(lambda x: x.mean(), axis=1), 2)
        summary = summary.sort_values(by='open_form_score', ascending=False)
        
    return summary

In [12]:
comedies = pd.read_csv('../Russian_Comedies/Data/Comedies_Raw_Data.csv')
# sort by creation date
comedies_sorted = comedies.sort_values(by='creation_date').copy()
# select only original comedies and five act
original_comedies = comedies_sorted[(comedies_sorted['translation/adaptation'] == 0)].copy()

# rename the columns 
original_comedies = original_comedies.rename(columns={'stage_directions_frequency': 'frequency',
                                                   'average_length_of_stage_direction': 'average_length',
                                                   'degree_of_verse_prose_interaction': 'verse_prose_interaction',
                                                   'num_scenes_iarkho': 'mobility_coefficient', 
                                                   'percentage_non_duologues': 'percentage_non_dialogues',
                                                   'percentage_above_two_speakers': 'percentage_polylogues',
                                                    'percentage_scenes_with_discontinuous_change_characters': 'discontinuous_scenes'})

In [13]:
# calculate the coefficient of non-used dramatic characters
unused_coefficient = get_data('../Russian_Comedies/Play_Jsons/')
unused_coefficient['index'] = unused_coefficient.index.tolist()
original_comedies = original_comedies.merge(unused_coefficient, on='index')

In [14]:
original_comedies['last_name'] = original_comedies['last_name'].str.strip()
original_comedies['speech_distribution'] = original_comedies['speech_distribution'].apply(make_list)

In [15]:
four_act = original_comedies[original_comedies.num_acts == 4].copy()
five_act = original_comedies[original_comedies.num_acts == 5].copy()
four_act['mobility_coefficient'] = round(four_act['mobility_coefficient'] * 5/4, 0)

In [16]:
combined_df = pd.concat([four_act, five_act])
combined_df = combined_df.sort_values(by='creation_date')

## Part 1. Iarkho's Original Features

### The Number of Dramatic Characters

In [17]:
summary_features(combined_df, 'num_present_characters')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,num_present_characters
mean,15.67
std,6.86
50%,14.0
min,8.0
max,34.0


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,num_present_characters
14,Griboedov,Aleksandr,Gore ot uma,1824,34


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,num_present_characters
0,Nikolev,Nikolai,Samoliubivyi stikhotvorets,1775,8
2,Efim’ev,Dmitrii,Prestupnik ot igry ili bratom prodannaia sestra,1788,8


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,num_present_characters
3,Kniazhnin,Iakov,Chudaki,1790,15
5,Kapnist,Vasilii,Iabeda,1794,15


Ranking From Highest to Lowest:


Unnamed: 0,title,last_name,first_name,creation_date,num_present_characters,rank
14,Gore ot uma,Griboedov,Aleksandr,1824,34,1.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,24,2.5
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,24,2.5
17,Nedovol’nye,Zagoskin,Mikhail,1835,23,4.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,21,5.5
19,Komediia iz sovremennoi zhizni,Krol’,Nikolai,1849,21,5.5
8,V sem''e ne bez uroda,Unknown,Unknown,1813,19,7.0
6,Novye chudaki ili Prozhekter,Golitsyn,Aleksei,1797,17,8.0
3,Chudaki,Kniazhnin,Iakov,1790,15,9.5
5,Iabeda,Kapnist,Vasilii,1794,15,9.5


### The Mobility Coefficient

In [18]:
summary_features(combined_df, 'mobility_coefficient')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,mobility_coefficient
mean,61.43
std,18.14
50%,59.0
min,41.0
max,111.0


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,mobility_coefficient
20,Grigor’ev,Petr,Zhiteiiskaia shkola,1849,111.0


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,mobility_coefficient
13,Kokoshkin,Fedor,"Vospitalie, ili vot pridanoe",1824,41.0


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,mobility_coefficient
3,Kniazhnin,Iakov,Chudaki,1790,60.0
4,Klushin,Aleksandr,Smekh i gore,1792,60.0
9,Shakhovskoi,Aleksandr,"Urok koketkam, ili lipetskie vody",1815,60.0


Ranking From Highest to Lowest:


Unnamed: 0,title,last_name,first_name,creation_date,mobility_coefficient,rank
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,111.0,1.0
14,Gore ot uma,Griboedov,Aleksandr,1824,94.0,2.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,84.0,3.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,82.0,4.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,72.0,5.0
19,Komediia iz sovremennoi zhizni,Krol’,Nikolai,1849,66.0,6.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,63.0,7.0
4,Smekh i gore,Klushin,Aleksandr,1792,60.0,9.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,60.0,9.0
3,Chudaki,Kniazhnin,Iakov,1790,60.0,9.0


### The Standard Range of the Number of Speaking Characters (Sigma)

In [19]:
summary_features(combined_df, 'sigma_iarkho')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,sigma_iarkho
mean,1.54
std,0.52
50%,1.48
min,0.74
max,2.77


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,sigma_iarkho
5,Kapnist,Vasilii,Iabeda,1794,2.77


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,sigma_iarkho
19,Krol’,Nikolai,Komediia iz sovremennoi zhizni,1849,0.74


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,sigma_iarkho
16,Zagoskin,Mikhail,Blagorodnyi teatr,1828,1.57


Ranking From Highest to Lowest:


Unnamed: 0,title,last_name,first_name,creation_date,sigma_iarkho,rank
5,Iabeda,Kapnist,Vasilii,1794,2.771,1.0
14,Gore ot uma,Griboedov,Aleksandr,1824,2.545,2.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,2.323,3.0
13,"Vospitalie, ili vot pridanoe",Kokoshkin,Fedor,1824,1.863,4.0
6,Novye chudaki ili Prozhekter,Golitsyn,Aleksei,1797,1.749,5.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,1.726,6.0
10,Tri zhenikha ili liubov‘ nyneshniago sveta,Sobolev,Aleksandr,1817,1.693,7.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,1.574,8.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,1.572,9.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,1.481,10.0


### The Percentage of Polylogues

In [20]:
summary_features(combined_df, 'percentage_polylogues')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,percentage_polylogues
mean,38.51
std,13.86
50%,39.39
min,15.38
max,61.22


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_polylogues
12,Shakhovskoi,Aleksandr,Pustodumy,1819,61.22


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_polylogues
2,Efim’ev,Dmitrii,Prestupnik ot igry ili bratom prodannaia sestra,1788,15.38


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_polylogues
3,Kniazhnin,Iakov,Chudaki,1790,38.33


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,percentage_polylogues,rank
12,Pustodumy,Shakhovskoi,Aleksandr,1819,61.22,21.0
5,Iabeda,Kapnist,Vasilii,1794,59.57,20.0
10,Tri zhenikha ili liubov‘ nyneshniago sveta,Sobolev,Aleksandr,1817,58.7,19.0
1,Khvastun,Kniazhnin,Iakov,1785,50.91,18.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,48.48,17.0
4,Smekh i gore,Klushin,Aleksandr,1792,46.67,16.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,46.55,15.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,44.44,14.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,43.33,13.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,42.34,12.0


### The Percentage of Monologues

In [21]:
summary_features(combined_df, 'percentage_monologues')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,percentage_monologues
mean,22.03
std,8.79
50%,21.21
min,6.12
max,42.31


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_monologues
2,Efim’ev,Dmitrii,Prestupnik ot igry ili bratom prodannaia sestra,1788,42.31


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_monologues
12,Shakhovskoi,Aleksandr,Pustodumy,1819,6.12


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_monologues
20,Grigor’ev,Petr,Zhiteiiskaia shkola,1849,22.52


Ranking From Highest to Lowest:


Unnamed: 0,title,last_name,first_name,creation_date,percentage_monologues,rank
2,Prestupnik ot igry ili bratom prodannaia sestra,Efim’ev,Dmitrii,1788,42.31,1.0
19,Komediia iz sovremennoi zhizni,Krol’,Nikolai,1849,33.96,2.0
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,31.11,3.0
6,Novye chudaki ili Prozhekter,Golitsyn,Aleksei,1797,29.79,4.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,28.57,5.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,27.38,6.0
14,Gore ot uma,Griboedov,Aleksandr,1824,26.67,7.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,25.42,8.0
13,"Vospitalie, ili vot pridanoe",Kokoshkin,Fedor,1824,24.24,9.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,22.52,10.0


### Speech Distribution for Each Playwright

#### Nikolai Krol' (1823 - 1871)

In [22]:
krol = speech_distribution_by_period(combined_df[combined_df.last_name == 'Krol’'])
display(krol)
print('The standard range of the number of speaking characters:', sigma_iarkho(krol))

Unnamed: 0_level_0,1,2,3,4
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,33.96,49.06,15.09,1.89


The standard range of the number of speaking characters: 0.74


#### Rafail Zotov (1796 - 1871)

In [23]:
zotov = speech_distribution_by_period(combined_df[combined_df.last_name == 'Zotov'])
display(zotov)
print('The standard range of the number of speaking characters:', sigma_iarkho(zotov))

Unnamed: 0_level_0,1,2,3,4,5
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,15.38,57.69,21.15,3.85,1.92


The standard range of the number of speaking characters: 0.81


####  Dmitrii Efim'ev (1768 - 1804)

In [24]:
efimev = speech_distribution_by_period(combined_df[combined_df.last_name == 'Efim’ev'])
display(efimev)
print('The standard range of the number of speaking characters:', sigma_iarkho(efimev))

Unnamed: 0_level_0,1,2,3,4,5
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,42.31,42.31,11.54,1.92,1.92


The standard range of the number of speaking characters: 0.86


#### Nikolai Nikolev (1758 - 1815)

In [25]:
nikolev = speech_distribution_by_period(combined_df[combined_df.last_name == 'Nikolev'])
display(nikolev)
print('The standard range of the number of speaking characters:', sigma_iarkho(nikolev))

Unnamed: 0_level_0,1,2,3,4,5,6
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,31.11,51.11,4.44,8.89,2.22,2.22


The standard range of the number of speaking characters: 1.12


#### Kniazhnin	Iakov (1740 - 1791)

In [26]:
kniazhnin = speech_distribution_by_period(combined_df[combined_df.last_name == 'Kniazhnin'])
display(kniazhnin)
print('The standard range of the number of speaking characters:', sigma_iarkho(kniazhnin))

Unnamed: 0_level_0,1,2,3,4,5,6,8
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,15.61,39.78,22.72,13.11,4.39,3.49,0.91


The standard range of the number of speaking characters: 1.32


#### Nikolai Seliavin (1774 - 1833)

In [27]:
seliavin = speech_distribution_by_period(combined_df[combined_df.last_name == 'Seliavin'])
display(seliavin)
print('The standard range of the number of speaking characters:', sigma_iarkho(seliavin))

Unnamed: 0_level_0,1,2,3,4,5,7,9
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,25.42,50.85,13.56,3.39,3.39,1.69,1.69


The standard range of the number of speaking characters: 1.42


#### Boris Fedorov (1794 - 1875)

In [28]:
fedorov = speech_distribution_by_period(combined_df[combined_df.last_name == 'Fedorov'])
display(fedorov)
print('The standard range of the number of speaking characters:', sigma_iarkho(fedorov))

Unnamed: 0_level_0,0,1,2,3,4,5,6,7
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,1.19,27.38,36.9,15.48,13.1,1.19,1.19,3.57


The standard range of the number of speaking characters: 1.43


#### Mikhail Zagoskin (1789 - 1852)

In [29]:
zagoskin = speech_distribution_by_period(combined_df[combined_df.last_name == 'Zagoskin'])
display(zagoskin)
print('The standard range of the number of speaking characters:', sigma_iarkho(zagoskin))

Unnamed: 0_level_0,1,2,3,4,5,6,7
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,20.09,32.39,18.58,16.19,8.1,3.14,1.52


The standard range of the number of speaking characters: 1.44


Fedorov's comedy *Chudnyia vtrechi* indeed has some scenes with no speaking characters.

#### Aleksandr Klushin (1763 - 1804)

In [30]:
klushin = speech_distribution_by_period(combined_df[combined_df.last_name == 'Klushin'])
display(klushin)
print('The standard range of the number of speaking characters:', sigma_iarkho(klushin))

Unnamed: 0_level_0,1,2,3,4,5,6,7,8
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,20.0,33.33,26.67,11.67,1.67,3.33,1.67,1.67


The standard range of the number of speaking characters: 1.48


#### Petr Grigor’ev (1807 - 1854)

In [31]:
grigorev = speech_distribution_by_period(combined_df[combined_df.last_name == 'Grigor’ev'])
display(grigorev)
print('The standard range of the number of speaking characters:', sigma_iarkho(grigorev))

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,9
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,1.8,22.52,33.33,19.82,11.71,8.11,0.9,0.9,0.9


The standard range of the number of speaking characters: 1.48


#### Aleksandr Shakhovskoi (1777 - 1846)

In [32]:
shakhovskoi = speech_distribution_by_period(combined_df[combined_df.last_name == 'Shakhovskoi'])
display(shakhovskoi)
print('The standard range of the number of speaking characters:', sigma_iarkho(shakhovskoi))

Unnamed: 0_level_0,1,2,3,4,5,6,7,9,10
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,12.22,35.49,23.45,19.74,4.54,1.86,1.02,0.84,0.84


The standard range of the number of speaking characters: 1.5


#### Vasilii Golovin (1776 - 1831)

In [33]:
golovin = speech_distribution_by_period(combined_df[combined_df.last_name == 'Golovin'])
display(golovin)
print('The standard range of the number of speaking characters:', sigma_iarkho(golovin))

Unnamed: 0_level_0,1,2,3,4,5,6,7,9
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,28.57,46.03,12.7,1.59,4.76,3.17,1.59,1.59


The standard range of the number of speaking characters: 1.57


#### Aleksandr Soboloev

In [34]:
sobolev = speech_distribution_by_period(combined_df[combined_df.last_name == 'Sobolev'])
display(sobolev)
print('The standard range of the number of speaking characters:', sigma_iarkho(sobolev)) 

Unnamed: 0_level_0,1,2,3,4,5,6,9
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,13.04,28.26,19.57,19.57,8.7,8.7,2.17


The standard range of the number of speaking characters: 1.69


#### Fedor Kokoshkin (1773-1838)

In [35]:
kokoshkin = speech_distribution_by_period(combined_df[combined_df.last_name == 'Kokoshkin'])
display(kokoshkin)
print('The standard range of the number of speaking characters:', sigma_iarkho(kokoshkin))

Unnamed: 0_level_0,1,2,3,4,6,7,9
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,24.24,36.36,18.18,9.09,6.06,3.03,3.03


The standard range of the number of speaking characters: 1.86


####  Aleksei Golitsyn (1767 - 1800)

In [36]:
golitsyn = speech_distribution_by_period(combined_df[combined_df.last_name == 'Golitsyn'])
display(golitsyn)
print('The standard range of the number of speaking characters:', sigma_iarkho(golitsyn))

Unnamed: 0_level_0,1,2,3,4,5,9
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,29.79,34.04,19.15,8.51,4.26,4.26


The standard range of the number of speaking characters: 1.75


#### Anonymous 

In [37]:
anonymous = speech_distribution_by_period(combined_df[combined_df.last_name == 'Unknown'])
display(anonymous)
print('The standard range of the number of speaking characters:', sigma_iarkho(anonymous))

Unnamed: 0_level_0,1,2,3,4,5,6,7,9,13
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,20.0,35.56,15.56,13.33,4.44,2.22,4.44,2.22,2.22


The standard range of the number of speaking characters: 2.32


#### Aleksandr Griboedov (1795 - 1829)

In [38]:
griboedov = speech_distribution_by_period(combined_df[combined_df.last_name == 'Griboedov'])
display(griboedov)
print('The standard range of the number of speaking characters:', sigma_iarkho(griboedov))

Unnamed: 0_level_0,1,2,3,4,5,9,10,19
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,26.67,41.33,18.67,6.67,1.33,2.67,1.33,1.33


The standard range of the number of speaking characters: 2.54


#### Vasilii Kapnist (1758 - 1823)

In [39]:
kapnist = speech_distribution_by_period(combined_df[combined_df.last_name == 'Kapnist'])
display(kapnist)
print('The standard range of the number of speaking characters:', sigma_iarkho(kapnist))

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,12
number_of_speakers,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,6.38,34.04,23.4,8.51,2.13,2.13,8.51,4.26,4.26,4.26,2.13


The standard range of the number of speaking characters: 2.77


### Observations:
- The playwright with the minimum number of speaking characters of 4 was Nikolai Krol'.
- The playwright with the maximum number of speaking characters (19) was Aleksandr Griboedov. 

## Part 2. Stage Directions

### Stage Directions Frequency

In [40]:
summary_features(combined_df, 'frequency')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,frequency
mean,18.58
std,5.19
50%,17.3
min,5.23
max,29.57


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,frequency
20,Grigor’ev,Petr,Zhiteiiskaia shkola,1849,29.57


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,frequency
10,Sobolev,Aleksandr,Tri zhenikha ili liubov‘ nyneshniago sveta,1817,5.23


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,frequency
16,Zagoskin,Mikhail,Blagorodnyi teatr,1828,18.56


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,frequency,rank
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,29.571,21.0
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,24.458,20.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,24.099,19.0
13,"Vospitalie, ili vot pridanoe",Kokoshkin,Fedor,1824,23.567,18.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,22.89,17.0
2,Prestupnik ot igry ili bratom prodannaia sestra,Efim’ev,Dmitrii,1788,22.669,16.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,22.029,15.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,21.048,14.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,19.604,13.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,18.557,12.0


### The Average Length of Stage Directions

In [41]:
summary_features(combined_df, 'average_length')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,average_length
mean,2.69
std,0.74
50%,2.71
min,1.09
max,3.94


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,average_length
14,Griboedov,Aleksandr,Gore ot uma,1824,3.94


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,average_length
7,Seliavin,Nikolai,Zhenikhi ili pobezhdennyi predrassudok,1806,1.09


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,average_length
0,Nikolev,Nikolai,Samoliubivyi stikhotvorets,1775,2.71


Ranking From Highest to Lowest:


Unnamed: 0,title,last_name,first_name,creation_date,average_length,rank
14,Gore ot uma,Griboedov,Aleksandr,1824,3.944,1.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,3.588,2.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,3.565,3.0
5,Iabeda,Kapnist,Vasilii,1794,3.411,4.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,3.384,5.0
18,Novaia shkola muzhei,Zotov,Rafail,1842,3.111,6.0
4,Smekh i gore,Klushin,Aleksandr,1792,3.024,7.0
13,"Vospitalie, ili vot pridanoe",Kokoshkin,Fedor,1824,3.017,8.0
3,Chudaki,Kniazhnin,Iakov,1790,2.793,9.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,2.758,10.0


### The Degree of Verse and Prose Interaction

In [42]:
summary_features(combined_df, 'verse_prose_interaction')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,verse_prose_interaction
mean,7.67
std,3.18
50%,7.74
min,1.08
max,14.22


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,verse_prose_interaction
9,Shakhovskoi,Aleksandr,"Urok koketkam, ili lipetskie vody",1815,14.22


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,verse_prose_interaction
10,Sobolev,Aleksandr,Tri zhenikha ili liubov‘ nyneshniago sveta,1817,1.08


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,verse_prose_interaction
11,Fedorov,Boris,Chudnyia vstrechi,1818,7.74


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,verse_prose_interaction,rank
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,14.222,21.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,11.457,20.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,11.152,19.0
12,Pustodumy,Shakhovskoi,Aleksandr,1819,11.073,18.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,10.206,17.0
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,9.699,16.0
13,"Vospitalie, ili vot pridanoe",Kokoshkin,Fedor,1824,9.438,15.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,8.82,14.0
18,Novaia shkola muzhei,Zotov,Rafail,1842,8.772,13.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,8.342,12.0


### Compare Griboedov's *Gore ot Uma* With Tragedies and Comedies
Here, we will compare *Gore ot Uma* with tragedies of Period Two and comedies of tentative Period Two based on the mean values of features that we have for both genres:
- stage directions frequency;
- average length of a stage direction;
- the degree of verse and prose interaction.

We will use cosine similarity to determine, whether *Gore ot Uma* is closer to comedies or tragedies based on these features.

In [43]:
woe_from_wit = combined_df[combined_df.title=='Gore ot uma'][['frequency', 'average_length', 'verse_prose_interaction']].values[0]

# the mean values for stage directions frequency, average length, and the degree of verse and prose interaction are from 
# https://github.com/innawendell/European_Comedy/blob/master/Analyses/Sperantov_Tragedy/Tragedy_Sperantov_Analysis.ipynb

tragedies_period_two = np.array([8.32, 4.13, 1.84])

# the mean values for stage directions frequency, average length, and the degree of verse and prose interaction are from 
# https://github.com/innawendell/European_Comedy/blob/master/Analyses/The%20Analysis%20of%20The%20Evolution%20of%20The%20Russian%20Comedy.ipynb
comedies_tent_period_twp = np.array([19.14, 2.45, 8.44])

print('Cosine similarity between tragedies of Period Two and Woe From Wit:', 
     round(cosine_similarity(woe_from_wit.reshape(1, -1), 
                        tragedies_period_two.reshape(1, -1))[0][0], 3))

print('Cosine similarity between comedies of the tentative Period Two and Woe From Wit:', 
     round(cosine_similarity(woe_from_wit.reshape(1, -1), 
                        comedies_tent_period_twp.reshape(1, -1))[0][0], 3))

Cosine similarity between tragedies of Period Two and Woe From Wit: 0.979
Cosine similarity between comedies of the tentative Period Two and Woe From Wit: 0.984


## Part 3. Verse Features

### The Percentage of Scenes With Split Verse Lines

In [44]:
summary_features(combined_df, 'percentage_scene_split_verse')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,percentage_scene_split_verse
mean,30.71
std,13.74
50%,30.44
min,3.33
max,56.0


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scene_split_verse
12,Shakhovskoi,Aleksandr,Pustodumy,1819,56.0


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scene_split_verse
4,Klushin,Aleksandr,Smekh i gore,1792,3.33


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scene_split_verse
0,Nikolev,Nikolai,Samoliubivyi stikhotvorets,1775,30.44


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,percentage_scene_split_verse,rank
12,Pustodumy,Shakhovskoi,Aleksandr,1819,56.0,21.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,55.0,20.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,44.828,19.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,40.678,18.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,40.541,17.0
18,Novaia shkola muzhei,Zotov,Rafail,1842,40.385,16.0
19,Komediia iz sovremennoi zhizni,Krol’,Nikolai,1849,39.623,15.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,37.879,14.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,37.778,13.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,33.333,12.0


### The Percentage of Scenes With Split Rhymes

In [45]:
summary_features(combined_df, 'percentage_scene_split_rhymes')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,percentage_scene_split_rhymes
mean,39.44
std,15.16
50%,36.74
min,6.67
max,67.31


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scene_split_rhymes
18,Zotov,Rafail,Novaia shkola muzhei,1842,67.31


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scene_split_rhymes
4,Klushin,Aleksandr,Smekh i gore,1792,6.67


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scene_split_rhymes
8,Unknown,Unknown,V sem''e ne bez uroda,1813,40.0


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,percentage_scene_split_rhymes,rank
18,Novaia shkola muzhei,Zotov,Rafail,1842,67.308,21.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,63.333,20.0
12,Pustodumy,Shakhovskoi,Aleksandr,1819,60.0,19.0
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,52.174,18.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,51.724,17.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,50.0,16.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,47.619,15.0
3,Chudaki,Kniazhnin,Iakov,1790,46.667,14.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,40.0,13.0
14,Gore ot uma,Griboedov,Aleksandr,1824,38.667,12.0


### The Percentage of Open Scenes

In [46]:
summary_features(combined_df, 'percentage_open_scenes')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,percentage_open_scenes
mean,55.6
std,18.23
50%,54.76
min,6.67
max,85.0


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_open_scenes
9,Shakhovskoi,Aleksandr,"Urok koketkam, ili lipetskie vody",1815,85.0


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_open_scenes
4,Klushin,Aleksandr,Smekh i gore,1792,6.67


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_open_scenes
11,Fedorov,Boris,Chudnyia vstrechi,1818,54.76


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,percentage_open_scenes,rank
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,85.0,21.0
18,Novaia shkola muzhei,Zotov,Rafail,1842,82.692,20.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,79.31,19.0
12,Pustodumy,Shakhovskoi,Aleksandr,1819,78.0,18.0
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,65.217,17.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,64.444,16.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,63.636,15.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,61.017,14.0
15,Pisateli mezhdu soboi,Golovin,Vasilii,1827,60.317,13.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,58.559,12.0


### The Percentage of Scenes With Split Verse Lines and Rhymes

In [47]:
summary_features(combined_df, 'percentage_scenes_rhymes_split_verse')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,percentage_scenes_rhymes_split_verse
mean,14.55
std,8.95
50%,12.7
min,3.33
max,38.0


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scenes_rhymes_split_verse
12,Shakhovskoi,Aleksandr,Pustodumy,1819,38.0


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scenes_rhymes_split_verse
4,Klushin,Aleksandr,Smekh i gore,1792,3.33


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,percentage_scenes_rhymes_split_verse
1,Kniazhnin,Iakov,Khvastun,1785,14.54


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,percentage_scenes_rhymes_split_verse,rank
12,Pustodumy,Shakhovskoi,Aleksandr,1819,38.0,21.0
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,33.333,20.0
18,Novaia shkola muzhei,Zotov,Rafail,1842,25.0,19.0
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,24.242,18.0
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,17.391,17.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,17.241,16.0
1,Khvastun,Kniazhnin,Iakov,1785,14.545,15.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,13.559,14.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,13.514,13.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,13.333,12.0


## Part 4. Other Features

### The Coefficient of Unused Dramatic Characters

In [48]:
summary_features(combined_df, 'coefficient_unused')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,coefficient_unused
mean,23.37
std,11.11
50%,24.62
min,5.0
max,45.48


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,coefficient_unused
16,Zagoskin,Mikhail,Blagorodnyi teatr,1828,45.48


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,coefficient_unused
7,Seliavin,Nikolai,Zhenikhi ili pobezhdennyi predrassudok,1806,5.0


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,coefficient_unused
9,Shakhovskoi,Aleksandr,"Urok koketkam, ili lipetskie vody",1815,22.43


Ranking From Highest to Lowest:


Unnamed: 0,title,last_name,first_name,creation_date,coefficient_unused,rank
16,Blagorodnyi teatr,Zagoskin,Mikhail,1828,45.48,1.0
17,Nedovol’nye,Zagoskin,Mikhail,1835,41.73,2.0
14,Gore ot uma,Griboedov,Aleksandr,1824,36.77,3.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,34.63,4.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,33.33,5.0
3,Chudaki,Kniazhnin,Iakov,1790,32.13,6.0
20,Zhiteiiskaia shkola,Grigor’ev,Petr,1849,26.52,7.0
12,Pustodumy,Shakhovskoi,Aleksandr,1819,25.87,8.0
5,Iabeda,Kapnist,Vasilii,1794,25.1,9.0
6,Novye chudaki ili Prozhekter,Golitsyn,Aleksei,1797,24.84,10.0


In [49]:
summary_features(combined_df, 'discontinuous_scenes')

Mean, standard deviation, median, min and max values for the period:


Unnamed: 0,discontinuous_scenes
mean,6.38
std,4.16
50%,6.38
min,1.52
max,17.02


Period Max:


Unnamed: 0,last_name,first_name,title,creation_date,discontinuous_scenes
6,Golitsyn,Aleksei,Novye chudaki ili Prozhekter,1797,17.02


Period Min:


Unnamed: 0,last_name,first_name,title,creation_date,discontinuous_scenes
16,Zagoskin,Mikhail,Blagorodnyi teatr,1828,1.52


The closest to the mean:


Unnamed: 0,last_name,first_name,title,creation_date,discontinuous_scenes
5,Kapnist,Vasilii,Iabeda,1794,6.38


Ranking From Lowest to Highest:


Unnamed: 0,title,last_name,first_name,creation_date,discontinuous_scenes,rank
6,Novye chudaki ili Prozhekter,Golitsyn,Aleksei,1797,17.021,21.0
19,Komediia iz sovremennoi zhizni,Krol’,Nikolai,1849,16.981,20.0
2,Prestupnik ot igry ili bratom prodannaia sestra,Efim’ev,Dmitrii,1788,9.615,19.0
10,Tri zhenikha ili liubov‘ nyneshniago sveta,Sobolev,Aleksandr,1817,8.696,18.0
11,Chudnyia vstrechi,Fedorov,Boris,1818,7.143,17.0
7,Zhenikhi ili pobezhdennyi predrassudok,Seliavin,Nikolai,1806,6.78,16.0
8,V sem''e ne bez uroda,Unknown,Unknown,1813,6.667,13.5
9,"Urok koketkam, ili lipetskie vody",Shakhovskoi,Aleksandr,1815,6.667,13.5
0,Samoliubivyi stikhotvorets,Nikolev,Nikolai,1775,6.667,13.5
4,Smekh i gore,Klushin,Aleksandr,1792,6.667,13.5


### Summary:

1. The first five-act verse comedy in the history of Russian literature, Nikolai Nikolev's *Samoliubivyi stikhotvorets* (1775) had the minimum number of dramatic characters (8), minimum mobility coefficient (45), and was the closest to the mean based on the average length of a stage direction and the percentage of scenes with split verse lines (30.44%).

2. Iakov Kniazhnin's *Khvastun* (1785) was the closest to the mean based on the percentage of scenes with split verse lines and rhymes (14.54%), and the coefficient of unused dramatic characters (18.85).

3. Dmitrii Efim’ev's *Prestupnik ot igry ili bratom prodannaia sestra* (1788) had the highest percentage of monologues (42.31%), it had the minimum number of dramatic characters (8), and the minimum percentage of polylogues (15.38%). 

4. Iakov Kniazhnin's *Chudaki*	(1790)	was the closest to the mean based on the number of dramatic characters (15), the mobility coefficient (60), and the percentage of polylogues (38.33%).

5. Aleksandr Klushin's	*Smekh i gore* (1792) had the minimum percentage of scenes with split verse lines (3.33%), the minimum percentage of scenes with split rhymes (6.67%), the minimum percentage of open scenes (6.67%), and the minimum percentage of scenes with split verse lines and rhymes (3.33%). It was the closest to the mean based on the mobility coefficient (60).

6. Vasilii Kapnist's *Iabeda* (1794) had the highest observed sigma	(2.77) and it was the closest to the mean based on the percentage of discontinuous scenes (6.38%) and the number of dramatic characters (15).

7. Aleksei Golitsyn's *Novye chudaki ili Prozhekter* (1797) had the maximum percentage of discontinuous scenes (17.02%).

8. Nikolai Seliavin's *Zhenikhi ili pobezhdennyi predrassudok*	(1806) had the minimum	average length of a stage direction (1.09) and coefficient of unused dramatic characters (5.0).

9. *V sem''e ne bez uroda* (1813) by an anonymous author was the closest to the mean based on the percentage of scenes with split rhymes.

10. Aleksandr' Shakhovskoi's *Urok koketkam, ili lipetskie vody*	(1815) had the highest percentage of open scenes (85%), it had the maximum degree of verse and prose interaction (14.22). It was also the closest to the mean based on the mobility coefficient (60) and the coefficient of unused dramatic characters (22.43).

11. Aleksandr Soboloev's *Tri zhenikha ili liubov‘ nyneshniago sveta* (1817) had the minimum frequency of stage directions (5.23) and degree of verse and prose interaction (1.08).

12. Boris Fedorov's *Chudnyia vstrechi* (1818) was the closest to the central tendency based on the degree of verse and prose interaction (7.74) and the percentage of open scenes (54.76).

13. Aleksandr Shakhovskoi's *Pustodumy* (1819) had the maximum percentage of polylogues (61.22%), the maximum percentage of scenes with split verse lines (56%), and the maximum percentage of scenes with split verse lines and rhymes (38%); it also had the minimum percentage of monologues (6.12%). 

14. Fedor Kokoshkin's *Vospitalie, ili vot pridanoe* (1824) had the minimum mobility coefficient of 41.

15. Aleksandr Griboedov's *Gore ot uma* (1824) had:
    - the maximum number of dramatic characters (34).
    - the maximum average length of a stage direction (3.944).
    - the second highest mobility coefficient of 94, after Petr Grigor’ev's *Zhiteiiskaia shkola* (1849)(111).
    - the second highest sigma (2.545), after Vasilii Kapnist's *Iabeda* (1794) (2.771).
    - the third highest coefficient of unused dramatic characters (36.77).
    - the seventh lowest percentage of polylogues (32%).
    - the seventh highest percentage of monologues (26.67%).
    - the second lowest stage directions frequency 0f 14.204, after	Aleksandr Sobolev's *Tri zhenikha ili liubov‘ nyneshniago sveta* with 5.229.
    - the highest average length of a stage direction of 3.944.
    - the fourth lowest degree of verse and prose interaction (5.053).
    - the third lowest percentage of scenes with split verse lines (14.667%).
    - the tenth highest (same as twelfth lowest) percentage of scenes with split rhymes (38.667%).
    - th fourth lowest percentage of open scenes (44%).
    - the sixth lowest percentage of scenes with split verse lines and rhymes (9.333%).
    - the tenth lowest percentage of discontinuous scenes (5.333%)
    - We had a limited number features that could serve as a basis of comparison for the Russian five-act comedies and tragedies (the frequency of stage directions, average length of a stage direction, and the degree of verse and prose interaction). Based on the mean values of these features, it appeared that *Gore ot Uma* is closer to comedies (cosine similarity = 0.984) that to tragedies (0.979). However, it would be very insightful to obtain more features for the Russian tragedies (including, the number of dramatic characters, the mobility coefficient, etc.) and determine whether *Gore ot Uma* is closer to tragedies or comedies based on those features.

16. Mikhail	Zagoskin's *Blagorodnyi teatr* (1828) had the maximum coefficient of unused dramatic characters (45.48%) and the minimum percentage of discontinuous scenes (1.52%). It was the closest to the mean based on the standard range of the number of speaking characters (1.57), frequency of stage directions.

17. Rafail Zotov's *Novaia shkola muzhei* (1842) had the maximum percentage of scenes with split rhymes (67.31%).

18. Nikolai	Krol’s *Komediia iz sovremennoi zhizni* (1849) had the minimum standard range of the number of speaking characters (0.74).

19. Piotr Grigor’ev's *Zhiteiiskaia shkola* (1849) had the maximum mobility coefficient (111) and frequency of stage directions (29.57). It was the closest to the mean based on the percentage of monologues (22.52%).



## Open-Form Scores

We also would like to be able to place each author's comedic style in the context of the history of the Russian comedy in verse. As was the case with the French comedians, we will use such features as:

- the number of dramatic characters
- the mean mobility coefficient
- the standard range of the number of speaking characters (sigma)
- the mean percentage of polylogues
- the mean percentage of monologues. 


#### Open Form Scores:
For all authors and features, we will calculate the z-score: $z=(x-u)/s$ where x is the mean value of the feature for a playwright, u is the mean of the feature  and s is the standard deviation of this feature. For the percentage of monologues, we will reverse the sign, i.e., will use - z-score since it is the lower value of the percentage of monologues that indicates a more open form.
The open form score will be the mean z-score.  

For example, we will calculate the z-score for the number of dramatic characters in Aleksandr Griboedov's *Gore ot uma* in the following way: (42 - 17.0) / 8.89 ≈ 2.82, 
After we repeat this calculation for all features, we will arrive at the following z-scores (2.67, 1.80, 1.91, -0.47) and -z-score for the percentage of monologues of -0.53, his open form score = (2.67, 1.80, 1.91, -0.47-0.53) / 5 ≈ 1.08. The open form scores can be positive and negative, where a high positive number indicates the most open form, whereas the high negative number indicates the least open form.

In [50]:
results_with_open_form = playwrights_place(combined_df, with_z_score=True)

In [51]:
results_raw = playwrights_place(combined_df, with_z_score=False).loc[results_with_open_form.index, :]
results_raw['open_form_score'] = results_with_open_form.open_form_score.tolist()

### Raw Numbers

In [52]:
results_raw

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues,open_form_score
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Griboedov,Aleksandr,34.0,94.0,2.54,32.0,26.67,1.08
Kapnist,Vasilii,15.0,47.0,2.77,59.57,6.38,0.95
Grigor’ev,Petr,21.0,111.0,1.48,42.34,22.52,0.72
Zagoskin,Mikhail,23.5,77.0,1.44,47.52,20.09,0.54
Unknown,Unknown,19.0,45.0,2.32,44.44,20.0,0.35
Fedorov,Boris,24.0,84.0,1.43,34.52,27.38,0.27
Shakhovskoi,Aleksandr,13.0,54.5,1.5,52.28,12.22,0.25
Sobolev,Aleksandr,9.0,46.0,1.69,58.7,13.04,0.19
Kniazhnin,Iakov,14.5,57.5,1.32,44.62,15.61,0.07
Klushin,Aleksandr,9.0,60.0,1.48,46.67,20.0,-0.07


### Z-Scores and Open-Form Scores

In [53]:
results_with_open_form 

Unnamed: 0_level_0,Unnamed: 1_level_0,num_present_characters,mobility_coefficient,sigma_iarkho,polylogues,monologues,open_form_score
last_name,first_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Griboedov,Aleksandr,2.67,1.8,1.91,-0.47,-0.53,1.08
Kapnist,Vasilii,-0.1,-0.8,2.35,1.52,1.78,0.95
Grigor’ev,Petr,0.78,2.73,-0.11,0.28,-0.06,0.72
Zagoskin,Mikhail,1.14,0.86,-0.19,0.65,0.22,0.54
Unknown,Unknown,0.49,-0.91,1.49,0.43,0.23,0.35
Fedorov,Boris,1.21,1.24,-0.21,-0.29,-0.61,0.27
Shakhovskoi,Aleksandr,-0.39,-0.38,-0.07,0.99,1.12,0.25
Sobolev,Aleksandr,-0.97,-0.85,0.29,1.46,1.02,0.19
Kniazhnin,Iakov,-0.17,-0.22,-0.42,0.44,0.73,0.07
Klushin,Aleksandr,-0.97,-0.08,-0.11,0.59,0.23,-0.07


In [54]:
results_with_open_form[results_with_open_form.open_form_score > 0].shape[0]

9

In [55]:
results_with_open_form.shape

(18, 6)

### Conclusion:
- Aleksandr Griboedov (as represented by *Gore ot uma*) was the most experimental playwright in the history of the Russian five-act and four-act comedies (open-form score of 1.10). His comedy *Gore ot uma* (1824) had the maximum number of dramatic characters (34) and the maximum average length of a stage direction. It had the highest observed number of speaking characters (19). It also had the second highest mobility coefficient of 94 and the standard range of the number of speaking characters (2.545). It had the third highest coefficient of unused dramatic characters (36.77).
- Vasilii Kapnist was in second place with an open-form score of 0.93. His comedy *Iabeda* (1794) had the highest observed sigma (2.77).
- Dmitrii Efim'ev had the lowest open-form score (-1.36). His comedy *Prestupnik ot igry ili bratom prodannaia sestra* (1788) had the highest percentage of monologues (42.31%), it had the minimum number of dramatic characters (8), and the minimum percentage of polylogues (15.38%).
- Authors with positive open-form scores as well as negative open-form scores co-existed during the tentative Period One (1775 to 1794) and the tentative Period Two (1795 to 1849).
- Half of the Russian comedians (9) wrote in a closed style while the other half wrote in an open style.