# **Presentation**

This notebook presents an application of the pre-trained [COFR system](https://hal.archives-ouvertes.fr/hal-02476902/document) of [Mr. Bruno Oberle](https://boberle.com/projects/coreference-resolution-with-cofr/) with many modifications in order to make it more suitable for deployment in a chatbot. The majority of the files of the repository are modeified to reduce memory consumption and the runtime of the system.

## **1.   Preparing envirenement and dependencies**

**In case you want to deploy your coreference resolution model in a chatbot with RASA framework, you would better to create a new virtual envirenement for this project to avoid dependencies conflict (the model in this project is built with Tensorflow v1 whereas the RASA framework for chatbots only support Tensorflow v2). The [automatic immigration from tensorflow V1 to V2](https://www.tensorflow.org/guide/migrate) doesn't work for this code so the only solution is to create two seperate virtual envirenements; one for your coreference resolution model {Tensorflow 1} and another for your RASA framework {Tensorflow 2}.**

In [12]:
%cd /content/drive/MyDrive/cofr 
#your working folder

/content/drive/MyDrive/cofr




*   Check this link if you want to learn how to [create virtual envirenements with conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) 


In [2]:
!pip3 install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow==1.15.2
  Downloading tensorflow-1.15.2-cp37-cp37m-manylinux2010_x86_64.whl (110.5 MB)
[K     |████████████████████████████████| 110.5 MB 51 kB/s 
[?25hCollecting numpy==1.19.1
  Downloading numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
[K     |████████████████████████████████| 14.5 MB 48.2 MB/s 
Collecting pyhocon
  Downloading pyhocon-0.3.59.tar.gz (116 kB)
[K     |████████████████████████████████| 116 kB 82.4 MB/s 
Collecting colorama
  Downloading colorama-0.4.5-py2.py3-none-any.whl (16 kB)
Collecting stanza
  Downloading stanza-1.4.2-py3-none-any.whl (691 kB)
[K     |████████████████████████████████| 691 kB 71.6 MB/s 
[?25hCollecting stanfordnlp
  Downloading stanfordnlp-0.2.0-py3-none-any.whl (158 kB)
[K     |████████████████████████████████| 158 kB 72.6 MB/s 
Collecting tensorboard<1.16.0,>=1.15.0
  Downloading tensorboard-1.15.0-py3-none-a

In [3]:
#import stanfordNLP (for NLP tasks) for French language.
!python3 -c "import stanfordnlp; stanfordnlp.download('fr')"

Using the default treebank "fr_gsd" for language "fr".
Would you like to download the models for: fr_gsd now? (Y/n)
Y

Default download directory: /root/stanfordnlp_resources
Hit enter to continue or type an alternate directory.


Downloading models for: fr_gsd
Download location: /root/stanfordnlp_resources/fr_gsd_models.zip
100% 235M/235M [00:39<00:00, 5.88MB/s]

Download complete.  Models saved to: /root/stanfordnlp_resources/fr_gsd_models.zip
Extracting models file for: fr_gsd
Cleaning up...Done.


Keep sure your notebook is using Numpy version: 1.19.1 and tensorflow version 1.15.2. If not restart your runtime.

In [7]:
import numpy as np
import tensorflow as tf


np. __version__ , tf.__version__

('1.19.1', '1.15.2')

In case your are working with your machine or with Google colab Pro, the following 3 bash instructions need to be executed only one time in your envirenement. The ressources (Memory and computation) provided by the free version of Google colab are not enough to run this project. The project needs at least 16GB of RAM for prediction.

In [None]:
#!bash -x -e setup_all.sh
#!bash -x -e setup_models_dem1921.sh
#!bash -x -e setup_corpus_dem1921.sh

With the instruction !bash -x -e setup_all.sh you will:
*   Install the pre-trained GloVe embedding for French (this will generate the file cc.fr.300.vec.
*   Create a new tensorflow operation {coref_ops.extract_spans()} based on the C++ file **coref_kernels.cc** by generating a **coref_kernels.so** file, as long as this file exists, the created tensorflow op is availaible. Check the following link if you want to learn how to  [create new tensorflow operations](https://www.tensorflow.org/guide/create_op).
Once you create the operation and download the GloVe embeddings, you won't need to execute this bash instruction again. 

In [8]:
!bash -x -e setup_all.sh

+ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.fr.300.vec.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1228M  100 1228M    0     0  39.1M      0  0:00:31  0:00:31 --:--:-- 38.5M
+ gunzip -d cc.fr.300.vec.gz
+ TF_CFLAGS=($(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))'))
++ python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))'
+ TF_LFLAGS=($(python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))'))
++ python3 -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))'
+ g++ -std=c++11 -shared coref_kernels.cc -o coref_kernels.so -fPIC -I/usr/local/lib/python3.7/dist-packages/tensorflow_core/include -D_GLIBCXX_USE_CXX11_ABI=0 -L/usr/local/lib/python3.7/dist-packages/tensorflow_core -l:libtensorflow_framework.so.1 -O2 -D_GLIBCXX_USE

The instruction !bash -x -e setup_corpus_dem1921.sh will enable you: 
*   to download the Enriched version of the DEMOCRAT corpus used for training and evaluating the [COFR system](https://hal.archives-ouvertes.fr/hal-02476902/document) by [Mr. Bruno Oberle](https://boberle.com/projects/coreference-resolution-with-cofr/)
*   To generate the vocabulary of the all including charaters in the corpus.

After this instruction, the files dev.french.jsonlines, test.french.jsonlines, train.french.jsonlines and char_vocab.french.txt. 

In [9]:
!bash -x -e setup_corpus_dem1921.sh

+ curl -Lo dev.french.jsonlines.bz2 http://boberle.com/files/corpora/dem1921/dem1921_sg_cut2000.dev.jsonlines.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   194  100   194    0     0    638      0 --:--:-- --:--:-- --:--:--   638
100 70448  100 70448    0     0  69543      0  0:00:01  0:00:01 --:--:--  308k
+ curl -Lo train.french.jsonlines.bz2 http://boberle.com/files/corpora/dem1921/dem1921_sg_cut2000.train.jsonlines.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   194  100   194    0     0    822      0 --:--:-- --:--:-- --:--:--   822
100  520k  100  520k    0     0   365k      0  0:00:01  0:00:01 --:--:--  735k
+ curl -Lo test.french.jsonlines.bz2 http://boberle.com/files/corpora/dem1921/dem1921_sg_cut2000.test.jsonlines.bz2
  % Total    % Recei

By executing the bash file setup_corpus_dem1921.sh, you will download the pre-trained checkpoints of [COFR system](https://hal.archives-ouvertes.fr/hal-02476902/document) by [Mr. Bruno Oberle](https://boberle.com/projects/coreference-resolution-with-cofr/). This bash instruction generates the folder **/logs** where the checkpoints are stored. In this project we will only be interested by the pre-trained checkponits of the [Baseline model](https://aclanthology.org/P19-1066/); it means the checkpoints **logs/fr_mentcoref**.

In [13]:
!bash -x -e setup_models_dem1921.sh

+ curl -LO http://boberle.com/files/models/dem1921_models.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   194  100   194    0     0    683      0 --:--:-- --:--:-- --:--:--   683
100  624M  100  624M    0     0  17.6M      0  0:00:35  0:00:35 --:--:-- 18.5M
+ tar xf dem1921_models.tar
+ rm dem1921_models.tar


Downloading Bert Model for contextualized model.

In [14]:
!wget https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip
!unzip multi_cased_L-12_H-768_A-12.zip
!rm multi_cased_L-12_H-768_A-12.zip

--2022-10-21 15:06:05--  https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.121.128, 142.250.159.128, 142.251.120.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.121.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 662903077 (632M) [application/zip]
Saving to: ‘multi_cased_L-12_H-768_A-12.zip’


2022-10-21 15:06:11 (105 MB/s) - ‘multi_cased_L-12_H-768_A-12.zip’ saved [662903077/662903077]

Archive:  multi_cased_L-12_H-768_A-12.zip
   creating: multi_cased_L-12_H-768_A-12/
  inflating: multi_cased_L-12_H-768_A-12/bert_model.ckpt.meta  
  inflating: multi_cased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001  
  inflating: multi_cased_L-12_H-768_A-12/vocab.txt  
  inflating: multi_cased_L-12_H-768_A-12/bert_model.ckpt.index  
  inflating: multi_cased_L-12_H-768_A-12/bert_config.json  



## **2.   Trying the coreference resolution model with notebook**

The following cell instanciates the model architecture based on the configuration of the model.

In [15]:
import util
from coref_model import CorefModel as cm

coref_model = "fr_mentcoref"
config = util.initialize_from_env(coref_model)
model = cm(config)

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.




Setting CUDA_VISIBLE_DEVICES to: 
Running experiment: fr_mentcoref
max_top_antecedents = 50
max_training_sentences = 50
top_span_ratio = 0.4
filter_widths = [
  3
  4
  5
]
filter_size = 50
char_embedding_size = 8
contextualization_size = 200
contextualization_layers = 3
ffnn_size = 150
ffnn_depth = 2
feature_size = 20
max_span_width = 30
use_metadata = true
use_features = true
model_heads = true
coref_depth = 2
lm_layers = 4
lm_size = 768
coarse_to_fine = true
refinement_sharing = false
max_gradient_norm = 5.0
lstm_dropout_rate = 0.4
lexical_dropout_rate = 0.5
dropout_rate = 0.2
optimizer = "adam"
learning_rate = 0.001
decay_rate = 1.0
decay_frequency = 100
ema_decay = 0.9999
eval_frequency = 6
report_frequency = 2
log_root = "logs"
cluster {
  addresses {
    ps = [
      "130.79.164.53:2230"
    ]
    worker = [
      "130.79.164.53:2228"
      "130.79.164.33:2229"
      "130.79.164.52:2235"
    ]
  }
  gpus = [
    0
  ]
}
multi_gpu = false
gold_loss = false
b3_loss = false
mention



Instructions for updating:
Use `tf.cast` instead.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Done loading word embeddings.


Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use `tf.cast` instead.






  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.




The following cell to tokenize the input conversation

In [16]:
#tokenize the text
import json
import stanfordnlp
import re

def tokenize_text(list_text, lang , nlp1  , doc):
    res_sents = []
    res_pars = []
    res_pos = []
    start_par = 0
    for par in list_text:
        par = par.strip()
        if not par:
            continue
        doc = stanfordnlp.Document(par)
        
        doc = nlp1(doc)
        #print(doc.conll_file.conll_as_string())
        #print(doc.conll_file.sents)
        sents = [
            [ token[1] for token in sent if '-' not in token[0] ]
            for sent in doc.conll_file.sents
        ]
        pos = [
            [ token[3] for token in sent if '-' not in token[0] ]
            for sent in doc.conll_file.sents
        ]
        res_sents.extend(sents)
        res_pos.extend(pos)
        length = sum((len(s) for s in sents))
        res_pars.append([start_par, start_par+length-1])
        start_par = start_par+length
    return res_sents, res_pos, res_pars

nlp1 = stanfordnlp.Pipeline(lang="fr", processors="tokenize,pos,mwt")

Use device: cpu
---
Loading: tokenize
With settings: 
{'model_path': '/root/stanfordnlp_resources/fr_gsd_models/fr_gsd_tokenizer.pt', 'lang': 'fr', 'shorthand': 'fr_gsd', 'mode': 'predict'}
---
Loading: pos
With settings: 
{'model_path': '/root/stanfordnlp_resources/fr_gsd_models/fr_gsd_tagger.pt', 'pretrain_path': '/root/stanfordnlp_resources/fr_gsd_models/fr_gsd.pretrain.pt', 'lang': 'fr', 'shorthand': 'fr_gsd', 'mode': 'predict'}
---
Loading: mwt
With settings: 
{'model_path': '/root/stanfordnlp_resources/fr_gsd_models/fr_gsd_mwt_expander.pt', 'lang': 'fr', 'shorthand': 'fr_gsd', 'mode': 'predict'}
Building an attentional Seq2Seq model...
Using a Bi-LSTM encoder
Using soft attention for LSTM.
Finetune all embeddings.
Done loading processors!
---


Predicting the coreference clusters within a given text (conversation).

In [20]:
import time

from deployment import make_json
from predict import predict


def predicting(string , model , config  , nlp):
  paragraphs = re.split(r'\n+', string)
  doc = stanfordnlp.Document(string) 
  sents, pos, pars = tokenize_text(paragraphs , "fr" , nlp  , doc)
  conver_2_json_object = make_json(sents, pos, pars, fpath = "file", genre = "ge")
  coreferenced_json_object = predict(conver_2_json_object , model , config)
  return coreferenced_json_object

string = '''Quand l'université Sorbonne a été fondée ? Sur quels principes elle est fondé ? Est-elle la plus ancienne Université de France ?'''
#string = '''Quand Marie Curie est née ? Quel vaccin elle a fait ? Combien de prix Nobel elle a gagné ?'''


start = time.time()

coreferenced_json_object = predicting(string , model , config  , nlp1)

end = time.time()
print("the necessary time for prediction is : " , end-start , "seconds")

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpl486ky53', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f17f2758790>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per

Restoring from logs/fr_mentcoref/model.max.ckpt


INFO:tensorflow:Restoring parameters from logs/fr_mentcoref/model.max.ckpt


the necessary time for prediction is :  10.258581399917603 seconds


Now that the coreference clusters within the input user questions are predicted, we need to replace every pronoun with its representative entity. 

In [28]:
from deployment import return_coreferenced_sentence
paragraphs = re.split(r'\n+', string)

last_question_coreferenced = return_coreferenced_sentence(coreferenced_json_object , paragraphs) # last_question_coreferenced is the string of the last question the user asks where all pronouns are replaced by the entities they refer to. This string in real-world project is sent to the RASA chatbot servers. So that the chatbot could recognize the user's question and then answer it.

In [29]:
last_question_coreferenced

"Quand l' université Sorbonne a été fondée ? Sur quels principes l' université Sorbonne est fondé ? Est l' université Sorbonne la plus ancienne Université de France ?"


# 3.   **Evaluation**

In [30]:
from eval_metrics import calculate_recall_precision , muc , b_cubed , lea , ceafe

Split your corpus

In [31]:
import re             # start and end are number of lines we want extract from the input jsonlines file.
def extractLines(input_file , output_file , start, end):
  file=open(input_file,'r')
  file_content=file.read()

  objects=re.findall('(.*)\n', file_content)
  target=objects[start-1:end]# format the target 
  target_string='\n'.join([line for line in target])
  fp=open(output_file,'w')# file-like object
  fp.write(target_string)
  fp.close()

In [32]:
oringinal_file = 'test.french.jsonlines'
partial_filename = 'test.french.jsonlines_part'
start , end = 1 , 5


extractLines(oringinal_file , partial_filename , start, end)   #this command create a new file containing the first 5 texts of the input file (test set of the Enriched DEMOCRAT corpus).

Create a Json file for the prediction resluts. So we can evaluate some coreference resolution metrics for this system or for another. for more information about these metrics, check the following link [Which Coreference Evaluation Metric Do You Trust?A Proposal for a Link-based Entity Aware Metric](https://aclanthology.org/P16-1060/)

In [36]:
import json
import time

start = time.time()

actual_clusters_filename = partial_filename
predicted_clusters_filename = 'predictions.jsonlines'



with open(predicted_clusters_filename , "w") as fout:

  with open(actual_clusters_filename) as fin :

    i = 0  
    for line in fin.readlines():
        example = json.loads(line)
        example = predict(example , model , config)
        fout.write(json.dumps(example))
        fout.write("\n")
        print("text : " ,  i+1)
        i = i+1


end = time.time()
print("the necessary time for prediction is : " , end-start , "seconds")

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp9_iacvom', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f17e9ba0750>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per

Restoring from logs/fr_mentcoref/model.max.ckpt


INFO:tensorflow:Restoring parameters from logs/fr_mentcoref/model.max.ckpt


text :  1


INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpxq_5utp3', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f1810051090>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per

Restoring from logs/fr_mentcoref/model.max.ckpt


INFO:tensorflow:Restoring parameters from logs/fr_mentcoref/model.max.ckpt


text :  2


INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp3647p0o8', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f180d48e090>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per

Restoring from logs/fr_mentcoref/model.max.ckpt


INFO:tensorflow:Restoring parameters from logs/fr_mentcoref/model.max.ckpt


text :  3


INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpzpp4g60v', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f1808d75210>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per

Restoring from logs/fr_mentcoref/model.max.ckpt


INFO:tensorflow:Restoring parameters from logs/fr_mentcoref/model.max.ckpt


text :  4


INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpy8qgl5wd', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f180bc6d490>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=8, num_cores_per_replica=None, per

Restoring from logs/fr_mentcoref/model.max.ckpt


INFO:tensorflow:Restoring parameters from logs/fr_mentcoref/model.max.ckpt


text :  5
the necessary time for prediction is :  817.2504370212555 seconds


The function calculate_recall_precision expects to receive two filenames as inputs. The first filename is the one of the actual coreference clusters; the filename of the predicted clusters by the system (model) is the other.

In [37]:
muc_ = calculate_recall_precision(predicted_clusters_filename , actual_clusters_filename , muc)
b_cubed_ = calculate_recall_precision(predicted_clusters_filename , actual_clusters_filename , b_cubed)
lea_ = calculate_recall_precision(predicted_clusters_filename , actual_clusters_filename , lea)
ceafe_ = calculate_recall_precision(predicted_clusters_filename , actual_clusters_filename , ceafe)

We evaluate the system with differents coreference resolution metrics on the the first 5 documents of the test part of the Enriched version of DEMOCRAT. You can evaluate on whatever corpus you want (your can generate your own corpus, annotate it and evaluate it with these metrics).

In [38]:
print("The Recall, Precision and F1-score of MUC metric in this portion of the data are : " ,muc_[:3] , " respectively")
print("The Recall, Precision and F1-score of B_CUBBED metric in this portion of the data are : " ,b_cubed_[:3] , " respectively")
print("The Recall, Precision and F1-score of LEA metric in this portion of the data are : " ,lea_[:3] , " respectively")
print("The Recall, Precision and F1-score of CEAFe in this portion of the data metric are : " ,ceafe_[:3] , " respectively")

The Recall, Precision and F1-score of MUC metric in this portion of the data are :  (68.92948190604024, 78.71497717893598, 73.3989911382387)  respectively
The Recall, Precision and F1-score of B_CUBBED metric in this portion of the data are :  (52.10735581244137, 62.401987561168724, 56.1943280787216)  respectively
The Recall, Precision and F1-score of LEA metric in this portion of the data are :  (48.2986455852399, 58.90011235963118, 52.46201242653933)  respectively
The Recall, Precision and F1-score of CEAFe in this portion of the data metric are :  (12.459064153771276, 72.71935184855849, 21.0843651340603)  respectively


For better analysis, you can visualize te results of the prediction with an HTML file.

In [39]:
!python3 visualization/jsonlines2text.py predictions.jsonlines -i -o visualize_results.html --sing-color "" --cm ""



## 4.  **Deployment**

To deploy this coreference in a chatbot, you have to create two virtaul envirenements:


**1.   The first for the coreference resolution model by following these steps:**

*   open a terminal
*   conda create -n coref_env python=3.7
*   conda activate coref_env
*   (coref_env) pip3 install -r requirements.txt
*   (coref_env) cd <working_folder>
*   (coref_env) python3 app.py

These instructions create a Flask server that will be responsible for coreference resolution task.


**2.   Thes second envirenement where your RASA framework for chatbots occurs (Tensorflow 2):**


*   open another terminal
*   conda create -n rasa_chatbot python=3.9
*   conda activate rasa_chatbot
*   (rasa_chatbot) pip3 install rasa    #and some other dependencies.
*   #run the needed rasa servers and connect your chatbot to the coreference resolution system).
*   For an example open a python file in this envirenement and send the variable paragraphs = re.split(r'\n+', string) with an HTTP post request to (coref_env). The app.py server will provide you with the coreferenced string where coreference resolution is resolved.











