# French adverbials and subjuntive use

Juan Berrios | juanberrios@pitt.edu | Last updated: October 26, 2022

**Summary and overview of the data:**

- This notebook is a continuation of a prior corpus processing notebook. The purpose of this notebook is to extract relative rates of usage, based on the previously built data set.

**Contents:**
1. [Preparation](#1.-Preparation)  includes the necessary preparations.
2. [Loading files](#2.-Loading-files)  includes code for loading the files and examining them.
3. [Usage rates](#3.-Usage-rates)  includes code for extracting subjunctive and indicative usage rates for the French adverbials under study.

## 1. Preparation

- Loading libraries and additional settings:

In [1]:
#Importing libraries
import pandas as pd
import numpy as np

#Turning pretty print off:
%pprint

#Releasing all output:                                            
from IPython.core.interactiveshell import InteractiveShell #Prints all commands rather than the last one.
InteractiveShell.ast_node_interactivity = "all"

Pretty printing has been turned OFF


## 2. Loading files

- Loading tagged `.csv` file: 

In [2]:
fname = "./data/adverbials_df_tagged.csv"

df = pd.read_csv(fname,encoding ='utf-8')

In [3]:
df.shape #1164 observations

(1164, 6)

In [4]:
df #preview of first five and last five

Unnamed: 0,index,concordance,adverbial,mood,mood_binary,corpus
0,1,avant que Casino prenne la place à l'époque où...,avant que,subjunctive,subjunctive,CFPP200
1,3,avant que ce soit rendu à la ville dans les an...,avant que,subjunctive,subjunctive,CFPP200
2,5,non j'ai l'impression qu'à Montreuil maintenan...,avant que,subjunctive,subjunctive,CFPP200
3,6,depuis toujours oui avant que le mot informati...,avant que,ambiguous,subjunctive,CFPP200
4,7,et parce que tout ça là où je suis moi avant q...,avant que,subjunctive,subjunctive,CFPP200
...,...,...,...,...,...,...
1159,1751,voilà mais après aussi ce qu'il faut savoir ...,quand,ambiguous,indicative,CFPP200
1160,1752,quand elle s'adresse à ses petites-filles §,quand,ambiguous,indicative,CFPP200
1161,1753,voilà ou pff quand je vais dans la rue ou ou...,quand,indicative,indicative,CFPP200
1162,1754,ou enfin au moins pas anonymes quoi parce que ...,quand,indicative,indicative,CFPP200


In [5]:
df.info() #Some general information about the resulting data frame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1164 entries, 0 to 1163
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   index        1164 non-null   int64 
 1   concordance  1164 non-null   object
 2   adverbial    1164 non-null   object
 3   mood         1164 non-null   object
 4   mood_binary  1164 non-null   object
 5   corpus       1164 non-null   object
dtypes: int64(1), object(5)
memory usage: 54.7+ KB


## 3. Usage rates

- Each observation has a value for the mood column that was coded manually to consider the whole context. Some are not possible to assign to only one mood, however, and were coded as ambiguous. We'll remove those from final tallies:

In [6]:
#Creating a subset for non-ambiguous mood only

df_binary = df[(df['mood'] != 'ambiguous')]

In [7]:
#Verifying observations left

df_binary.shape

(875, 6)

- The following cells contain distribution counts and relative use for the full data set (including ambiguous tokens):

In [8]:
#Overall distribution of adverbials

df['adverbial'].value_counts() 

quand             1131
tandis que          18
avant que            9
jusqu'à ce que       6
Name: adverbial, dtype: int64

In [9]:
#Overall mood use distribution (including ambiguous)

df['mood'].value_counts(ascending=True)   

subjunctive     14
ambiguous      289
indicative     861
Name: mood, dtype: int64

In [10]:
#Overall mood use distribution (merging ambiguous into the prescriptive expected mood)

df['mood_binary'].value_counts(ascending=True)   

subjunctive      18
indicative     1146
Name: mood_binary, dtype: int64

In [11]:
#Distribution of mood (including ambiguous) by adverbial

df.groupby('adverbial').mood.value_counts().unstack()

mood,ambiguous,indicative,subjunctive
adverbial,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
avant que,2.0,,7.0
jusqu'à ce que,2.0,,4.0
quand,276.0,852.0,3.0
tandis que,9.0,9.0,


In [12]:
#Relative use of mood (including ambiguous) by adverbial

df.groupby('adverbial').mood.value_counts(normalize=True).unstack()

mood,ambiguous,indicative,subjunctive
adverbial,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
avant que,0.222222,,0.777778
jusqu'à ce que,0.333333,,0.666667
quand,0.244032,0.753316,0.002653
tandis que,0.5,0.5,


In [13]:
#Distribution of mood (merging ambiguous) by adverbial

df.groupby('adverbial').mood_binary.value_counts().unstack()

mood_binary,indicative,subjunctive
adverbial,Unnamed: 1_level_1,Unnamed: 2_level_1
avant que,,9.0
jusqu'à ce que,,6.0
quand,1128.0,3.0
tandis que,18.0,


In [14]:
#Relative use of mood (merging ambiguous) by adverbial

df.groupby('adverbial').mood_binary.value_counts(normalize=True).unstack()

mood_binary,indicative,subjunctive
adverbial,Unnamed: 1_level_1,Unnamed: 2_level_1
avant que,,1.0
jusqu'à ce que,,1.0
quand,0.997347,0.002653
tandis que,1.0,


- The following cells contain distribution counts and relative use for the a subset of the data set excluding ambiguous tokens:

In [15]:
#Overall distribution of adverbials

df_binary['adverbial'].value_counts() 

quand             855
tandis que          9
avant que           7
jusqu'à ce que      4
Name: adverbial, dtype: int64

In [16]:
#Overall mood distribution

df_binary['mood'].value_counts(ascending=True)   

subjunctive     14
indicative     861
Name: mood, dtype: int64

In [17]:
#Distribution of mood by adverbial

df_binary.groupby('adverbial').mood.value_counts().unstack()

mood,indicative,subjunctive
adverbial,Unnamed: 1_level_1,Unnamed: 2_level_1
avant que,,7.0
jusqu'à ce que,,4.0
quand,852.0,3.0
tandis que,9.0,


In [18]:
#Relative use of mood by adverbial

df_binary.groupby('adverbial').mood.value_counts(normalize=True).unstack()

mood,indicative,subjunctive
adverbial,Unnamed: 1_level_1,Unnamed: 2_level_1
avant que,,1.0
jusqu'à ce que,,1.0
quand,0.996491,0.003509
tandis que,1.0,
