## Original Comedies and Their Translations or Imitations

In this analysis, we will compare:
1. French comedies and their translations into Russian;
2. French and Russian adaptations or imitations of Richard Brinsley Sheridan's *The School For Scandal* (1777);
3. French adaptations of French comedies.

We will compare such features as: 
- The number of dramatic characters;
- The mobility coefficient;
- The standard range of the number of speaking characters (sigma);
- The percentage of non-dialogues;
- The percentage of polylogues;
- The percentage of monologues;
- The coefficient of unused dramatic characters;
- The percentage of discontinuous scenes.

Additionally, we will compute cosine similarity to determine how similar the comedies are to one another. 

In [1]:
import pandas as pd
import numpy as np
from os import listdir
import json
from sklearn.preprocessing import normalize
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
def get_data(input_directory):
    all_files = [f for f in listdir(input_directory) if f.count('.json') > 0]
    dfs = []
    for file in all_files:
        with open(input_directory + '/' + file) as json_file:
            data = json.load(json_file)
            not_used = coefficient_unused_dramatic_characters(data)
            df = pd.DataFrame([not_used], columns=['coefficient_unused'], index=[file.replace('.json','')])
            dfs.append(df)
            
    features_df = pd.concat(dfs, axis=0, sort=False).round(2)
    
    return features_df

In [3]:
def coefficient_unused_dramatic_characters(data):
    total_present = 0
    total_non_speakers = 0
    for act in data['play_summary'].keys():
        for scene in data['play_summary'][act].keys():
            # identify the raw number of non-speaking dramatic characters
            num_non_speakers = len([item for item in data['play_summary'][act][scene].items() 
                                if (item[1] == 0  or item[1] == 'non_speaking') and item[0] not in ['num_utterances',
                                                                   'num_speakers',
                                                                   'perc_non_speakers']])
            total_non_speakers += num_non_speakers
            # calculate the total number of dramatic characters
            total_present += (data['play_summary'][act][scene]['num_speakers'] + num_non_speakers)
    coefficient_unused = (total_non_speakers / total_present ) * 100        
    
    return coefficient_unused

In [4]:
russian_comedies = pd.read_csv('../Russian_Comedies/Data/Comedies_Raw_Data.csv')
# calculate the coefficient of non-used dramatic characters
unused_coefficient = get_data('../Russian_Comedies/Play_Jsons/')
unused_coefficient['index'] = unused_coefficient.index.tolist()
russian_comedies = russian_comedies.merge(unused_coefficient, on='index')


french_comedies = pd.read_csv('../French_Comedies/Data/French_Comedies_Data.csv')

# calculate the coefficient of non-used dramatic characters
unused_coefficient = get_data('../French_Comedies/Play_Jsons/')
unused_coefficient['index'] = unused_coefficient.index.tolist()
french_comedies = french_comedies.merge(unused_coefficient, on='index')

In [5]:
renaming_dict = {'num_scenes_iarkho': 'mobility_coefficient', 
                                                 'percentage_non_duologues': 'percentage_non_dialogues',
                                                 'percentage_above_two_speakers': 'percentage_polylogues',
                                                 'percentage_scenes_with_discontinuous_change_characters': 'discontinuous_scenes',
                                                 'sigma_iarkho': 'standard_range'}
# rename column names
russian_comedies = russian_comedies.rename(columns=renaming_dict)
french_comedies = french_comedies.rename(columns=renaming_dict)


In [6]:
# load contrastive data
contr_data_df = pd.read_csv('../Contrastive_Material/Contrastive_Material_Data.csv')
contr_data_df  = contr_data_df.rename(columns=renaming_dict)

In [7]:
features = ['num_present_characters',
            'mobility_coefficient', 
            'standard_range', 
            'percentage_non_dialogues',
            'percentage_polylogues',
            'percentage_monologues',
             'coefficient_unused',
             'discontinuous_scenes']

## Part 1. French Comedies and Their Translations Into Russian

### Molière's *L'École des femmes* (1662) and  Nikolai Khmel’nitskii's *Shkola zhenshchin, after Molière* (1819)

In [8]:
first_df = french_comedies[french_comedies.title == 'L\'École des femmes'].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['L\'École des femmes']
second_df = russian_comedies[russian_comedies.title =='Shkola zhenshchin, after Molière'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['Shkola zhenshchin']

first_df['Shkola zhenshchin'] = second_df
display(first_df)

Unnamed: 0,L'École des femmes,Shkola zhenshchin
num_present_characters,9.0,7.0
mobility_coefficient,37.0,41.0
standard_range,1.08,0.8
percentage_non_dialogues,59.46,58.54
percentage_polylogues,27.03,21.95
percentage_monologues,32.43,36.59
coefficient_unused,18.56,14.29
discontinuous_scenes,0.0,0.0


In his 1819 translation of Molière's *L'École des femmes*, Nikolai	Khmel’nitskii changed:
- the number of dramatic characters who appear on stage from 9 to 7;
- increased the mobility coefficient from 37 to 41;
- decreased the standard range of the number of speaking characters from 1.08 to 0.80;
- slightly decreased the percentage of non-dialogues from 59.46% to 58.54%;
- decreased the percentage of polylogues from 27.03% to 21.95%;
- increased the percentage of monologues from 32.43% to 36.59%;
- decreased the percentage of unused dramatic characters from 18.56% to 14.29%.

The percentage of discontinuous scenes remained 0%, i.e., there is at least one dramatic character remaining on stage from each preceding scene. 

### Cosine Similarity

Cosine similarity measures how similar two vectors are, where 1 is most similar and -1 is most dissimilar. The formula for computing cosine similarity is:
$\frac{\sum_{i=1}^n (A_i  B_i)}{\sqrt{\sum_{i=1}^n A_i^2} {\sqrt{\sum_{i=1}^n B_i^2}}}$

- For example, the vector for *Shokla zhenshchin* A = [ 7, 41 , 0.8 , 58.54, 21.95, 36.59, 14.29, 0]. For *L'École des femmes*, the vector B = [ 9 , 37,  1.08, 59.46, 27.03, 32.43, 18.56, 0].
- We would take the dot product of the two vectors: (7 x 9) + (41 x 37) + (0.8 x 1.08) +  (58.54 x 59.46) + (21.95 x 27.03) + (36.59 x 32.43) + (14.29 x 18.56) + (0 x 0) = 7106.797.
- Then, we would calculate the magnitude of vector A by adding up the squared values of vector A, $7^2 + 41^2 + 0.8^2 + 58.54^2 + 21.95^2 + 36.59^2 + 14.29^2 + 0^2$ = 7182.4063. We will take the square root of 7182.4063, which is 84.74907846106646.
- We would do the same for B, $9^2 + 37^2 + 1.08^2 + 59.46^2 + 27.03^2 + 32.43^2 + 18.56^2 + 0^2 = 7113.4574$. We would take the square root of this number, which would give us 84.34131490556689.
- To arrive at the denominator, we would multiply 84.74907846106646 by 84.34131490556689, which is 7147.848714441402.
- Finally, we would divide 7106.797 by 7147.848714441402, which gives us 0.9942567734598996.


- For details on cosine similarity, see https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/cosine/#:~:text=Cosine%20similarity%20is%20the%20cosine,'%20lengths%20(or%20magnitudes).

For convenience purposes, we will use skearn's cosine_similarity function for our calculations.

### Cosine Similarity

In [9]:
print('Cosine similarity between the original and its Russian translation:', 
      np.round(cosine_similarity(first_df.iloc[:,0].values.reshape(1, -1), 
                                first_df.iloc[:,1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between the original and its Russian translation: 0.9943


### Molière's *Le Tartuffe* (1669) and Nikolai Khmel’nitskii's *Tartiuf, after Molière* (1828)

In [10]:
first_df = french_comedies[french_comedies.title == 'Le Tartuffe'].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['Le Tartuffe']
second_df = russian_comedies[russian_comedies.title =='Tartiuf, after Molière'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['Tartiuf']
first_df['Tartiuf, after Molière'] = second_df

display(first_df)

Unnamed: 0,Le Tartuffe,"Tartiuf, after Molière"
num_present_characters,14.0,13.0
mobility_coefficient,31.0,34.0
standard_range,1.68,1.69
percentage_non_dialogues,45.16,52.94
percentage_polylogues,41.94,41.18
percentage_monologues,3.23,11.76
coefficient_unused,17.7,24.41
discontinuous_scenes,0.0,0.0


### Comaparison of Molière's *Le Tartuffe* with *L'École des femmes*

*Le Tartuffe*, as compared to *L'École des femmes*:
- had a higher number of dramatic characters (14 vs. 9);
- lower mobility coefficient (31 vs. 37);
- higher standard range of the speaking characters (1.68 vs. 1.08);
- lower percentage of non-dialogues (45.16% vs. 59.46%);
- higher percentage of polylogues (41.94% vs. 27.03%);
- a much lower percentage of monologues (3.23% vs. 32.43%);
- lower percentage of unused dramatic characters (17.70% vs. 18.56%) despite a higher total number of dramatic characters.

Both comedies had no discontinuous scenes.

### Comparison with Nikolai	Khmel’nitskii's *Tartiuf, after Molière* (1828)

In his 1828 translation of Molière's *Le Tartuffe*, Nikolai Khmel’nitskii changed the following formal features:

- the number of dramatic characters who appear on stage from 14 to 13;
- increased the mobility coefficient from 31 to 34;
- increased the percentage of non-dialogues from 45.16% to 52.94%;
- slightly decreased the percentage of polylogues from 41.94% to 41.18%;
- increased the percentage of monologues from 3.23% to 11.76%;
- increased the percentage of unused dramatic characters from 17.70% to 24.41%.

The standard range of the speaking characters in translation (1.69) remained close to the standard range in the original (1.68). The percentage of discontinuous scenes remained 0%, i.e., there is at least one dramatic character remaining on stage from each preceding scene. In his translation of Molière's *Le Tartuffe*, Khmel'nitskii decreased the gap between the percentage of monologues and the percentage of polylogues as compared to Molière's original (which could be a sign of the Neoclassical taste), but still kept the gap much larger than in his translation of Molière *L'École des femmes*. In the translation of both comedies, Khmel'nitskii increased the mobility coefficient, which could reflect the taste for higher mobility coefficient in the Russian and French traditions after 1795.

### Cosine Similarity

In [11]:
print('Cosine similarity between the original and its Russian translation:', 
      np.round(cosine_similarity(first_df.iloc[:,0].values.reshape(1, -1), 
                                first_df.iloc[:,1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between the original and its Russian translation: 0.9898


The cosine similairity between Nikolai Khmel’nitskii's translation of Molière's *L'École des femmes* with its original is 0.9943. The cosine similarity of Nikolai Khmel’nitskii's translation of Molière's *Le Tartuffe* is 0.9898.i.e., slightly more different from its original than the translation of *L'École des femmes*.

### Nericault-Destouches's *Le Philosophe marié* (1727) and Vasilii	Karatygin's	*Zhenatyi filosof after Destouches* (1827)

In [12]:
first_df = french_comedies[french_comedies.title == 'Le Philosophe marié'].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['Le Philosophe marié']
second_df = russian_comedies[russian_comedies.title =='Zhenatyi filosof, after Destouches'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['Zhenatyi filosof']
first_df['Zhenatyi filosof'] = second_df
display(first_df)

Unnamed: 0,Le Philosophe marié,Zhenatyi filosof
num_present_characters,9.0,9.0
mobility_coefficient,49.0,50.0
standard_range,1.55,1.49
percentage_non_dialogues,61.22,66.0
percentage_polylogues,34.69,38.0
percentage_monologues,26.53,28.0
coefficient_unused,10.29,8.21
discontinuous_scenes,4.08,4.0


In his translation of Nericault-Destouches's *Le Philosophe marié*, Vasilii Karatygin modified the following:
- slightly increased the mobility coefficient from 49 to 50;
- decreased the standard range from 1.55 to 1.49;
- increased the percentage of non-dialogues from 61.22% to 66%;
- increased the percentage of polylogues from 34.69% to 38%;
- increased the percentage of monologues from 26.53% to 28%;
- decreased the coefficient of unused dramatic characters from (10.29% to 8.21%) given the same number of dramatic characters who appear in the comedy (9);
- slightly decreased the percentage of discontinuous scenes from 4.08% to 4%.

### Compute Cosine Similarity

In [13]:
print('Cosine similarity between the original and its Russian translation:', 
      np.round(cosine_similarity(first_df.iloc[:,0].values.reshape(1, -1), 
                                first_df.iloc[:,1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between the original and its Russian translation: 0.9992


Vasilii Karatygin's *Zhenatyi filosof* was very close to its original - the cosine similarity is 0.9992.

### Alexis Piron's	*La Métromanie* (1738) and Nikolai	Sushkov's 	*Metromaniia ili strast’ k stikhotvorstvu, after Piron* (1819)

In [14]:
first_df = french_comedies[french_comedies.title =='La Métromanie'].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['La Métromani']

second_df = russian_comedies[russian_comedies.title == 'Metromaniia ili strast’ k stikhotvorstvu, after Piron'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['Metromaniia']
first_df['Metromaniia'] = second_df
display(first_df)

Unnamed: 0,La Métromani,Metromaniia
num_present_characters,7.0,7.0
mobility_coefficient,50.0,46.0
standard_range,0.69,0.88
percentage_non_dialogues,48.0,50.0
percentage_polylogues,22.0,28.26
percentage_monologues,26.0,21.74
coefficient_unused,7.55,7.48
discontinuous_scenes,10.0,4.35


In his translation of Alexis Piron's *La Métromanie*, Nikolai Sushkov made the following adjustments:
- decreased the mobility coefficient from 50 to 46;
- increased the standard range of the speaking characters from 0.69 to 0.88;
- increased the percentage of non-dialogues from 48% to 50%;
- increased the percentage of polylogues from 22% to 28.26%;
- decreased the percentage of monologues from 26% to 21.74%;
- slightly decreased the coefficient of unused dramatic characters from 7.55% to 7.48%;
- decreased the percentage of discontinuous scenes from 10% to 4.35%.

In the original Piron had a higher percentage of monologues (26%) than polylogues (22%). Sushkov reversed this by making the percentage of polylogues (28.26%) higher than the percentage of monologues (21.74). This could be seen as a result of the influence of the taste of the time of translation. In Russian comedy after 1795, the average percentage of polylogues increased and the percentage of monologues decreased, as compared with the preceding period. The same was true in the evolution of the French comedy.

### Cosine Similarity

In [15]:
print('Cosine similarity between the original and its Russian translation:', 
      np.round(cosine_similarity(first_df.iloc[:,0].values.reshape(1, -1), 
                                first_df.iloc[:,1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between the original and its Russian translation: 0.9911


Nikolai Sushkov's *Metromaniia ili strast’ k stikhotvorstvu* (1819) was close to Piron's original: the cosine similarity between the two was 0.9911. It was not as similar to its original as Nikolai Khmel’nitskii's translation of Molière's *L'École des femmes* (0.9943), but more similar than Khmel’nitskii's translation's of Molière's *Le Tartuffe* (0.9898).

## Part 2. Two Adaptations of Sheridan's *School for Scandal*

Sheridan's five-act comedy *School For Scandal* in prose (1777) was imitated by Chéron de la Bruyére's as a five-act comedy in verse, *L'Homme À Sentiments ou Le Tartuffe de moeurs* (1789) and in Russian, as Aleksandr Pisarev's *Lukavin* (1823). Here, we will compare these two adaptations with the original and with one another.

### Richard Brinsley Sheridan's *The School For Scandal* (1777), Chéron de la Bruyére *L'Homme À Sentiments ou Le Tartuffe de moeurs, imitée en partie de The School for Scandal de Shéridan* (1789), and 

In [16]:
first_df = contr_data_df[contr_data_df.title=='The School For Scandal'].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['The School For Scandal']
second_df = french_comedies[french_comedies['title'] == 'L\'Homme À Sentiments ou Le Tartuffe de moeurs, imitée en partie de The School for Scandal de Shéridan'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['L\'Homme À Sentiments']

third_df = russian_comedies[russian_comedies.title =='Lukavin, after Sheridan'].copy()
third_df = third_df[features].T.round(2)
third_df.columns = ['Lukavin']

first_df['L\'Homme À Sentiments'] = second_df
first_df['Lukavin'] = third_df
display(first_df)

Unnamed: 0,The School For Scandal,L'Homme À Sentiments,Lukavin
num_present_characters,20.0,9.0,13.0
mobility_coefficient,85.0,59.0,49.0
standard_range,1.32,0.84,1.05
percentage_non_dialogues,60.0,52.54,61.22
percentage_polylogues,48.24,23.73,46.94
percentage_monologues,11.76,28.81,14.29
coefficient_unused,21.4,4.03,17.31
discontinuous_scenes,11.76,5.08,4.08


### Chéron de la Bruyére *L'Homme À Sentiments ou Le Tartuffe de moeurs, imitée en partie de The School for Scandal de Shéridan* (1789), and Aleksandr Pisarev's *Lukavin* (1823)

In his *L'Homme À Sentiments ou Le Tartuffe de moeurs* ("partial imitation" of Sheridan's *School For Scandal*) Chéron de la Bruyére made substantial structural changes:
- the number of dramatic characters in his partial imitation was less than half of Sheridan's original (9 vs. 20 in Sheridan's comedy);
- the mobility coefficient was lower (59 vs. 85 in Sheridan);
- the standard range was lower (0.84 vs. 1.32 in Sheridan);
- the percentage of non-dialogues was lower (52.54% vs. 60%)
- the percentage of polylogues was lower (23.73% vs. 48.24%);
- the percentage of monologues was over double (28.81% vs. 11.76% in Sheridan);
- The coefficient of unused dramatic characters was over five times lower (21.40 vs. 4.03);
- The percentage of discontinuous scenes was less than half of that in Sheridan's comedy (5.08% vs. 11.76%).

### Aleksandr Pisarev's *Lukavin* (1823)

#### Pisarev's *Lukavin*, as compared to Sheridan's *School For Scandal*:
- fewer dramatic characters (13 vs. 20 in Sheridan);
- lower mobility coefficient (49 vs. 85);
- lower standard range (1.05 vs. 1.32);
- slightly higher percentage of non-dialogues (61.22% vs. 60%);
- slightly lower percentage of polylogues (46.94% vs. 48.24%);
- higher percentage of monologues (14.29% vs. 11.76%);
- lower coefficient of unused dramatic characters (17.31 vs. 21.40);
- lower percentage of discontinuous scenes (4.08% vs. 11.76%).

#### Pisarev's *Lukavin*, as compared to Chéron de la Bruyére *Homme À Sentiments ou Le Tartuffe de moeurs* had:
- more dramatic characters (13 vs. 9);
- lower mobility coefficient (49 vs. 59);
- higher standard range (1.05 vs. 0.84);
- higher percentage of non-dialogues (61.22% vs. 52.54%);
- lower percentage of monologues (14.29% vs. 28.81%);
- much higher coefficient of unused dramatic characters (17.31 vs. 4.03);
- lower percentage of discontinuous scenes (4.08% vs 5.08%).

Sheridan's extreme mobility coefficient of 85 was too high for the continental taste. However, Pisarev decreased it even more drastically (to 49) as compared with Chéron de la Bruyére  (59). The gap between the percentage of monologues and the percentage of polylogues in Sheridan's original was large (11.76% and 48.24%). While Pisarev kept this gap (14.29% and 46.94%), Chéron de la Bruyére reversed it by making the percentage of monologues 28.81% and the percentage of polylogues (23.73%).

### Cosine Similarity

In [17]:
print('Cosine similarity between Sheridan\'s School for Scandal and Chéron de la Bruyére\'s L\'Homme À Sentiments ou Le Tartuffe de moeurs and :', 
      np.round(cosine_similarity(first_df.iloc[:, 0].values.reshape(1, -1), 
                                 first_df.iloc[:, 1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between Sheridan's School for Scandal and Chéron de la Bruyére's L'Homme À Sentiments ou Le Tartuffe de moeurs and : 0.9477


In [18]:
print('Cosine similarity between Sheridan\'s School for Scandal and Aleksandr Pisarev\'s Lukavin:', 
      np.round(cosine_similarity(first_df.iloc[:, 0].values.reshape(1, -1), 
                                 first_df.iloc[:, 2].values.reshape(1, -1))[0][0], 4))

Cosine similarity between Sheridan's School for Scandal and Aleksandr Pisarev's Lukavin: 0.9634


In [19]:
print('Cosine similarity between two adaptations:', 
      np.round(cosine_similarity(first_df.iloc[:, 1].values.reshape(1, -1), 
                                 first_df.iloc[:, 2].values.reshape(1, -1))[0][0], 4))

Cosine similarity between two adaptations: 0.9362


In [20]:
tartuffe = french_comedies[french_comedies.title == 'Le Tartuffe'].copy()
tartuffe = tartuffe[features].T.round(2)
tartuffe.columns = ['Le Tartuffe']

In [21]:
print('Cosine similarity between Sheridan and Le Tartuffe:', 
      np.round(cosine_similarity(first_df.iloc[:,0].values.reshape(1, -1), 
                                 tartuffe.values.reshape(1, -1))[0][0], 4))

Cosine similarity between Sheridan and Le Tartuffe: 0.9286


In [22]:
misanthrope = french_comedies[french_comedies.title =='Le Misanthrope']
misanthrope = misanthrope[features].T.round(2)
misanthrope.columns = ['Le Misanthrope']

In [23]:
print('Cosine similarity between Sheridan and Le Misanthrope:', 
      np.round(cosine_similarity(first_df.iloc[:,0].values.reshape(1, -1), 
                                 misanthrope.values.reshape(1, -1))[0][0], 4))

Cosine similarity between Sheridan and Le Misanthrope: 0.8492


Structurally, Aleksandr Pisarev's *Lukavin* is more similar to Sheridan's *School For Scandal* (cosine similarity = 0.9634) than Chéron de la Bruyére's *L'Homme À Sentiments ou Le Tartuffe de moeurs* (0.9477). 
The two adaptations of Sheridan's *School of Scandal* are less similar to one another (cosine similarity = 0.9362). *School of Scandal* was even more dissimilar from Molière's *Le Tartuffe* (0.9286) and *Le Misanthrope* (0.8492). These adaptations are less similar to the original than the Nikolai Khmel'nitskii's, Vasilii Karatygin, and Aleksandr Shushkov's translations we examined earlier.

## Part 3. French Adaptations of French Comedies 

### Molière's *Le Misanthrope* (1666) and Josèp Daubian's *Le misanthrope travesti, after Molière* (1789)

In [24]:
first_df = french_comedies[french_comedies['title'] == 'Le Misanthrope'].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['Le Misanthrope']

second_df = french_comedies[french_comedies['title'] == 'Le misanthrope travesti, after Molière'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['Le misanthrope travesti']
first_df['Le misanthrope travesti'] = second_df
display(first_df)

Unnamed: 0,Le Misanthrope,Le misanthrope travesti
num_present_characters,11.0,11.0
mobility_coefficient,20.0,26.0
standard_range,1.63,1.73
percentage_non_dialogues,50.0,65.38
percentage_polylogues,50.0,61.54
percentage_monologues,0.0,3.85
coefficient_unused,14.29,11.65
discontinuous_scenes,5.0,3.85


In his adaptation of Molière *Le misanthrope travesti*, Daubian changed the following:
- increased the mobility coefficient from 20 to 26;
- increased the standard range from 1.63 to 1.73;
- increased the percentage of non-dialogues from 50% to 65.38%;
- increased the percentage of polylogues from 50% to 61.54%;
- included monologues (the percentage of monologues in the original was 0% and in the adaptation, it was 3.85%);
- decreased the coefficient of unused dramatic characters from 14.29 to 11.65;
- decreased the percentage of discontinuous scenes from 5% to 3.85%.

In [25]:
print('Cosine similarity between the original and its adaptation:', 
      np.round(cosine_similarity(first_df.iloc[:, 0].values.reshape(1, -1), 
                                first_df.iloc[:, 1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between the original and its adaptation: 0.9957


Josèp Daubian's *Le misanthrope travesti* was more similar to its original than any of the translations or adaptations we have seen so far: the cosine similarity was 0.9957.

### Noël Lebreton, sieur de Hauteroche's *L'Esprit follet ou la Dame invisible* (1684) and ### Charles Collé *L'Esprit follet ou la Dame invisible* in vers libres (1770)

In [26]:
first_df = french_comedies[french_comedies['title'] == 'L\'Esprit follet ou la Dame invisible '].copy()
first_df = first_df[features].T.round(2)
first_df.columns = ['L\'Esprit follet']
second_df = french_comedies[french_comedies['title'] == 'L\'Esprit follet ou la Dame invisible, mise en vers libres'].copy()
second_df = second_df[features].T.round(2)
second_df.columns = ['Collé\'s L\'Esprit follet']

first_df['Collé\'s L\'Esprit follet'] = second_df
display(first_df)

Unnamed: 0,L'Esprit follet,Collé's L'Esprit follet
num_present_characters,9.0,10.0
mobility_coefficient,62.0,61.0
standard_range,0.91,0.82
percentage_non_dialogues,38.71,45.9
percentage_polylogues,25.81,26.23
percentage_monologues,9.68,19.67
coefficient_unused,18.82,21.64
discontinuous_scenes,20.97,19.67


In his reworking of Hauteroche's *L'Esprit follet ou la Dame invisible* in vers libres, Collé changed the following:
- increased the number of dramatic characters from 9 to 10;
- decreased the mobility coefficient from 62 to 61;
- decreased the standard range from 0.91 to 0.82;
- increased the percentage of non-dialogues from 38.71% to 45.90%;
- increased the percentage of polylogues from 25.81% to 26.23%;
- increased dramatically the percentage of monologues from 9.68% to 19.67%;
- increased the coefficient of unused dramatic characters from 18.82 to 21.64;
- decreased the percentage of discontinuous scenes from 20.97% to 19.67%.

### Cosine Similarity

In [27]:
print('Cosine similarity between the original and its adaptation:', 
      np.round(cosine_similarity(first_df.iloc[:, 0].values.reshape(1, -1), 
                                first_df.iloc[:, 1].values.reshape(1, -1))[0][0], 4))

Cosine similarity between the original and its adaptation: 0.9907


Charles Collé *L\'Esprit follet ou la Dame invisible* was not as similar to Hauteroche's original (cosine similarity of 0.9907) as Nikolai Sushkov's *Metromaniia ili strast’ k stikhotvorstvu* (0.9911) or Nikolai Khmel’nitskii's translation of Molière's *L'École des femmes* (0.9943).

## Conclusions:
1. Translation and adaptation of a comedy often included not only lexical and/or semantic changes, but also structural modifications of the original.
2. Vasilii Karatygin's *Zhenatyi filosof* (1827) was the most similar comedy to its original, Nericault-Destouches's Le *Philosophe marié* (1727) (cosine similarity = 0.9992). The second most similar translation to its original was Nikolai Khmel’nitskii's translation of Molière's *L'École des femme*s  (cosine similarity = 0.9943).
3. Two adaptations of the same original (Chéron de la Bruyére *L'Homme À Sentiments ou Le Tartuffe de moeurs* and Pisarev's *Lukavin*) decreased the values of most of the features as compared to Sheridan's *The School For Scandal*. However, *Lukavin*, remained closer to Sheridan (cosine similarity = 0.9634) than *L'Homme À Sentiments* (cosine similarity = 0.9477). The two adaptations appeared to be the most dissimilar out of analyzed examples (cosine similarity = 0.9362).
4. The continental traditions were closer to one another than the English comic tradition. The Russian translations of the French original comedies as well as the French adaptations of the French originals experienced less structural transformation, with the smallest cosine similarity = 0.9898). French and Russian translations and adaptations of the English comedy experienced a more substantial transformation (with the smallest cosine similarity = 0.9477).
5. Two of Molière's comedies (*Le Tartuffe* and *Le Misanthrope*) had extremely low percentage of monologues  (3.23 and 0 respectively), which the translators and imitators increased.