### Building Question answering system

For  this work , we will build our question answering system , we will leverage the [deepset framework]() to build the components of our system.

Here are the followign components of our system:
-  The index store 
- The document store 
- The search pipeline with a Retrieval and a Reader model.

We will be leveraging the tutorials provided by deepstack to build our system.

For this work we will leverage [this  tutorial:](https://github.com/deepset-ai/haystack/blob/master/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb) 

### Building the dense passage retrieval

To build the dataset for the dense passage retrieval , we will be using the approach suggested by [this tutorial](https://huggingface.co/etalab-ia/dpr-question_encoder-fr_qa-camembert) which use the cambert model .

For each question , we have a single positive context , the paragraph where the answer to the question is located and n hard negavives contexts that are the the top - k canditates taht does not contain the answer, to the question. to retrieve the negative context they will use the bm25 retrieval model.

Before training the retrieval model we will have to build the document store and save the document as units of retrieval to the store.

### Building the dataset

In [1]:
import pandas as pd
import numpy as np

In [2]:
from pathlib import Path
DATA_PATH = Path.cwd().joinpath("data")
assert DATA_PATH.exists(), "the data path does not exist"
TEXT_DATA_FOLDER = DATA_PATH.joinpath("corpus", "drc-news-txt")
assert TEXT_DATA_FOLDER.exists(), "the text data folder does not exist"

In [3]:
data_file_path = DATA_PATH.joinpath("corpus", "raw", 'drc-news-raws.csv')

In [4]:
data = pd.read_csv(data_file_path, names=["content", "posted_at"])

In [5]:
data.shape

(140638, 2)

In [6]:
data = data.fillna(value="")
data.head()

Unnamed: 0,content,posted_at
0,Les membres de la Commission tarifaire viennen...,2022-09-05 00:00:00
1,Les membres de la Commission tarifaire so...,2022-04-05 00:00:00
2,Vodacom Congo vient de signer un partenariat a...,2022-04-23 00:00:00
3,"Le sélectionneur des Léopards de la RDC, Hectó...",2022-03-05 00:00:00
4,Le protocole d’accord était déjà signé entre l...,2022-11-05 00:00:00


In [7]:
from haystack.nodes import TextConverter

  from .autonotebook import tqdm as notebook_tqdm
INFO - haystack.modeling.model.optimization -  apex not found, won't use it. See https://nvidia.github.io/apex/
ERROR - root -  Failed to import 'magic' (from 'python-magic' and 'python-magic-bin' on Windows). FileTypeClassifier will not perform mimetype detection on extensionless files. Please make sure the necessary OS libraries are installed if you need this functionality.


In [8]:
from haystack.schema import Document
from secrets import token_hex

# @Todo: this is not working now , it was supposed to save the document to dataframe
def get_document_from_text(row):
    """numpy row with the text and the date of the post

    Args:
        row (_type_): _description_

    Returns:
        _type_: _description_
    """
    text = row[0].replace(u'\xa0', u' ')
    for paragraph in text.split("   "):
        if not paragraph.strip():  # skip empty paragraphs
            continue
        return Document(content=paragraph, meta={"posted_at":row[1] if row[1] else "" })

In [9]:
from haystack.nodes import TextConverter, PDFToTextConverter, DocxToTextConverter, PreProcessor
from haystack.utils import convert_files_to_docs

In [10]:
all_docs = data.sample(1000).apply(get_document_from_text, axis="columns")

In [14]:
all_docs.shape

(1000,)

In [11]:
all_docs = all_docs.dropna().to_list()

In [12]:
from haystack.errors import HaystackError
from haystack.schema import Document
from typing import List, Optional, Generator, Set, Union
from copy import deepcopy
from haystack.nodes import PreProcessor

class CustomPreProcessor(PreProcessor):
    def __init__(self, custom_preprocessor=None, **kwargs):
        super().__init__(**kwargs)
        self.custom_preprocessor = custom_preprocessor
    def clean(
        self,
        document: Union[dict, Document],
        clean_whitespace: bool,
        clean_header_footer: bool,
        clean_empty_lines: bool,
        remove_substrings: List[str],
        id_hash_keys: Optional[List[str]] = None,
    ) -> Document:
        """
        
        Perform document cleaning on a single document and return a single document. This method will deal with whitespaces, headers, footers
        and empty lines. Its exact functionality is defined by the parameters passed into PreProcessor.__init__().
        """
        if id_hash_keys is None:
            id_hash_keys = self.id_hash_keys

        if isinstance(document, dict):
            document = Document.from_dict(document, id_hash_keys=id_hash_keys)

        # Mainly needed for type checking
        if not isinstance(document, Document):
            raise HaystackError("Document must not be of type 'dict' but of type 'Document'.")
        text = document.content
        text = self.custom_preprocessor(text)
        if clean_header_footer:
            text = self._find_and_remove_header_footer(
                text, n_chars=300, n_first_pages_to_ignore=1, n_last_pages_to_ignore=1
            )

        if clean_whitespace:
            lines = text.splitlines()

            cleaned_lines = []
            for line in lines:
                line = line.strip()
                cleaned_lines.append(line)
            text = "\n".join(cleaned_lines)

        if clean_empty_lines:
            text = re.sub(r"\n\n+", "\n\n", text)

        for substring in remove_substrings:
            text = text.replace(substring, "")

        if text != document.content:
            document = deepcopy(document)
            document.content = text

        return document
    
    

In [13]:
import re
from gensim.utils import deaccent
from unicodedata import normalize as unicode_normalize

In [14]:
def replace_point(document):
    """replace the point with the wwt.www with space point before tokenizing the document .
    TOdos : this may have a a downside when the point is in the middle of a words
    Args:
        document (_type_): _description_
    """
    result = re.sub(r"(\S)\.(\S)", r"\1 . \2", document)
    return result

def replace_website_name(document):
    """sometimes the doucment has the name politico.cd or 7sur7.cd or actualite.cd, we would like to replace them by the 
    actual name of the website. before proper cleaning

    Args:
        document (_type_): _description_
    """
    # @TODO : not sure if this will work but , way better replace by the first line of match.
    
    result = re.sub(r"7SUR7.CD|politico.cd|actualite.cd|mediacongo.net", r"SITE_WEB", document, flags=re.IGNORECASE)
    return result

def remove_accents(document):
    input_without_accent = deaccent(document)
    return input_without_accent

def pre_clean_document(document):
    """pre clean the document by removing the accents and replacing the point with the wwt.www with space point before tokenizing the document .
    TOdos : this may have a a downside when the point is in the middle of a words
    and any other side of cleaning that we want to do .
    Args:
        document (_type_): _description_
    """
    result = remove_accents(document)
    result =  replace_website_name(result)
    result = replace_point(result)
    result = re.sub(r"This post has already been read \d+ times!", "", result) # remove unwanted text
    result = unicode_normalize("NFKD", result)
    return result

In [16]:
text_doc = """
Une motion de defiance a ete deposee au cabinet de la presidente de l'Assemblee provinciale du Maniema contre le vice-gouverneur Jean-Pierre Amadi, le mardi 30 mars dernier.16 parmi les 17 deputes provinciaux presents a Kindu, chef-lieu de la province du Maniema, ont appose leurs signatures sur ladite motion depuis le 27 mars 2021.Selon ce document consulte par 7SUR7.CD, Jean-Pierre Amadi Lubenga est reproche de plusieurs griefs dont le « refus d'obtemperer aux instructions de la hierarchie » pendant qu'il etait gouverneur de province a l'interim et le detournement des deniers publics.
comme signale a politico.cd sur notre site POLITICO.CD et puis ensuite sur actualite.cd et sur notre site mediacongo.net
"""

In [17]:
replace_website_name(text_doc)

"\nUne motion de defiance a ete deposee au cabinet de la presidente de l'Assemblee provinciale du Maniema contre le vice-gouverneur Jean-Pierre Amadi, le mardi 30 mars dernier.16 parmi les 17 deputes provinciaux presents a Kindu, chef-lieu de la province du Maniema, ont appose leurs signatures sur ladite motion depuis le 27 mars 2021.Selon ce document consulte par SITE_WEB, Jean-Pierre Amadi Lubenga est reproche de plusieurs griefs dont le « refus d'obtemperer aux instructions de la hierarchie » pendant qu'il etait gouverneur de province a l'interim et le detournement des deniers publics.\ncomme signale a SITE_WEB sur notre site SITE_WEB et puis ensuite sur SITE_WEB et sur notre site SITE_WEB\n"

In [57]:
preprocessor = CustomPreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=False,
    split_by="word",
    split_length=200,
    split_respect_sentence_boundary=True,
    language="fr",
    custom_preprocessor=pre_clean_document,
)


docs = preprocessor.process(all_docs)

print(f"n_files_input: {len(all_docs)}\nn_docs_output: {len(docs)}")

100%|██████████| 997/997 [00:01<00:00, 696.91docs/s]

n_files_input: 997
n_docs_output: 2090





In [58]:
docs[0]

<Document: {'content': 'Lambert Mende, ministre de la Communication de la Republique Democratique du Congo (RDC) a evoque jeudi la possibilite d‟un referendum en vue de changer certains articles de la Constitution. Il s‟exprimait devant la presse en reponse aux critiques formulees par la conference episcopale nationale du Congo (CENCO) qui a appele le gouvernement a ne pas modifier la Constitution pour augmenter le nombre de mandats permis au chef de l‟Etat . "Au moins on n‟a pas conteste aux Ecossais et a ceux qui vivent en Ecosse le droit de se prononcer, pourquoi on veut contester au Congolais le droit de se prononcer ? ", a-t-il lance lors de son intervention. Lambert Mende affirme que l‟idee d‟un referendum vient de la commission electorale nationale independante (CENI). Cet eventuellement changement de la Constitution ne concernerait que certains articles dont le plus significatif est l‟article 197 qui concerne le mode de scrutin des elections provinciales . L\'article 220 au cen

In [59]:
from haystack.document_stores import ElasticsearchDocumentStore



In [60]:
document_store = ElasticsearchDocumentStore(index="drc-news", recreate_index=True, analyzer="french")

INFO - haystack.document_stores.elasticsearch -  Index 'drc-news' deleted.
INFO - haystack.document_stores.elasticsearch -  Index 'label' deleted.


In [61]:
document_store.write_documents(docs)

### Retrieval

With the document store in place , the document store has all the document in it , let build a retriever model that use BM25 to retrieve the document.

In [62]:
custom_query_template = """
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": ${query}
        }
      },
      "negative": {
        "match": {
          "content": ${name_to_not_match}
        }
      },
      "negative_boost": 0.5
    }
  }
}
"""

In [51]:
print(custom_query_template)


{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": ${query}
        }
      },
      "negative": {
        "match": {
          "content": ${name_to_not_match}
        }
      },
      "negative_boost": 0.5
    }
  }
}



In [63]:
from haystack.nodes import BM25Retriever

In [64]:
bm25_retriever = BM25Retriever(document_store=document_store, all_terms_must_match=True, custom_query=custom_query_template)

In [65]:
question = "le president de la Republique democratique du congo?"
answer = "Felix Tshisekedi"

In [66]:
def get_hard_negative_context(
    retriever: BM25Retriever, question: str, answer: str, n_ctxs: int = 10
):
    """
    given the question and the answer query the Elastic search document store and return the hard negative context to the question
    """

    documents = bm25_retriever.retrieve(query=question, top_k=10, filters={"name_to_not_match": answer})
    return documents

In [67]:
get_hard_negative_context(bm25_retriever, question, answer)

[<Document: {'content': 'Le President National Statutaire et Autorite Morale de l’ADFC-A, le Senateur Professeur Modeste BAHATI LUKWEBO, invite les cadres et militants du Regroupement AFDC-A a participer a toutes les manifestations pacifiques visant a defendre la paix, la democratie, la bonne gouvernance, l’Etat de droit et les elections transparentes en Republique Democratique du Congo. DECLARATION DE LA CONFERENCE DES PRESIDENTS DU REGROUPEMENT POLITIQUE AFDC-A SUR LA SITUATION POLITIQUE ET SECURITAIRE DU PAYSDepuis la prolongation inconstitutionnelle de la session ordinaire du Parlement de la Republique Democratique du Congo, plusieurs initiatives democraticides, liberticides et subversives ont ete alignees par le FCC, s’appuyant sur la majorite numerique au Parlement et profitant de l’Etat d’urgence sanitaire pour s’assurer a l’avance de la fausse victoire electorale, proteger les membres du FCC de toute poursuite judiciaire et vider la justice congolaise de son pouvoir constitutio

Next is to build the dense passage retrieval dataset , for each sentence we will find the name entities and mask them and query the database to find hard negative.

### Building the Dense Passage Retrieval Dataset

Adding the documents to the retriever store , the next step will be to build the dense passage retrieval dataset.

We will consider each paragraph as the answer, and we will generate differents question in the paragraph by masking the name entities which yield to a better score.

Once we have a question and the paragraph answer , we will retrieve the negative context with the code we wrote above.

#### NER on the Text

In [68]:
from transformers import AutoTokenizer, AutoModelForTokenClassification


# this model is good but it is not classifiying roles exactly., we need to improve that. confusing ministre and ministere
tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/camembert-ner-with-dates")
model = AutoModelForTokenClassification.from_pretrained("Jean-Baptiste/camembert-ner-with-dates")

In [69]:
from transformers import pipeline

In [99]:
transformer_ner_pipeline = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

In [97]:
 def replace_between(text, begin, end, word_to_replace,  alternative='<MASK>'):
    to_replace = text[begin:end]
    # assert to_replace.strip() == word_to_replace.strip()
    return f"{text[:begin]} {alternative} {text[end+1:]}"

def filter_entities(entities):
    """filter the entities and keep only name , org, loc, date and the entity with a score of more than 85%

    Args:
        entities (_type_): _description_
    """
    return [entity for entity in entities if entity.get("entity_group") in ["PER", "ORG", "LOC", "DATE"] and entity.get("score") >= 0.85]

In [103]:
def build_question_answers_from_sentences(sentence, nlp):
    """given a sentence build the question and the answers from the sentence.
    Args:
        sentence (_type_): the sentence we are trying to get the NLP from, 
        nlp: the nlp pipeline that will do the NER.
    Returns:
        _type_: _description_
    """
    entities = nlp(sentence)
    filtered_entities = filter_entities(entities)
    data = {"context": sentence, "entities": filtered_entities}
    yield data

In [47]:
sample_document = docs[6]

In [None]:
def get_hard_negative_context(
    retriever: BM25Retriever, question: str, answer: str, n_ctxs: int = 10
):
    """
    given the question and the answer query the Elastic search document store and return the hard negative context to the question
    """

    documents = bm25_retriever.retrieve(query=question, top_k=10, filters={"name_to_not_match": answer})
    return documents

In [50]:
def get_hard_negative_context(
    retriever: BM25Retriever, question: str, answer: str, n_ctxs: int = 15
):
    list_hard_neg_ctxs = []
    retrieved_docs = get_hard_negative_context(retriever, question, answer, n_ctxs)
    for index, retrieved_doc in enumerate(retrieved_docs):
        retrieved_doc_text = retrieved_doc.text
        if answer.lower() in retrieved_doc_text.lower():
            continue
        list_hard_neg_ctxs.append(
            {"title": f"document_{index}", "text": retrieved_doc_text}
        )
    return list_hard_neg_ctxs

<Document: {'content': 'Ne en 1945, Kitenge Yesu est decede le 31 mai dernier a l’age de 76 ans et son enterrement se deroulera dans un cadre strictement prive, Selon plusieurs sources a la presidence de la Republique.', 'content_type': 'text', 'score': None, 'meta': {'name': '510.txt', '_split_id': 2}, 'embedding': None, 'id': '65fba14bdc58dee371d1cd95435bf635'}>

In [104]:
context =  build_question_answers_from_sentences(sample_document.content, transformer_ner_pipeline)

In [105]:
list(context)[0]

{'context': "Les miliciens de la Cooperative pour le Developpement du Congo (CODECO) ont signe une attaque sanglante dans la nuit du 1er fevrier 2022 dans le site des deplaces de Plaine Savo, a Djugu, en Ituri . Nos sources contactees sur place renseignent que plusieurs civils ont ete lachement abattus par les les CODECO. D’apres Ndalo Bise, president du site attaque, les assaillants ont surgi brusquement la nuit du mardi a partir de 21 heures . « Nous comptons presentement plus de 60 morts dans les abris des deplaces » a-t-il declare a la presse locale. Les miliciens CODECO ont attaque dans la nuit du 01 Fevrier 2022 le site Plaine Savo a Djugu, en Ituri. Les premieres informations font etat d'environ 60 personnes massacrees a l'aide des machettes et autres armes blanches. Cette information est aussi confirmee par les sources administratives de la place qui precisent que pres de 60 personnes ont ete tuees. A en croire, le chef de la chefferie de Bahema N’adhere au-moins 59 personnes o

In [113]:
from tqdm import tqdm

In [155]:
# https://github.com/philipperemy/Stanford-OpenIE-Python
# checkout this to build better stuff.
def create_dpr_training_dataset(
    docs, retriever: BM25Retriever, num_hard_negative_ctxs: int = 30
):
    n_non_added_questions = 0
    n_questions = 0
    for  doc in tqdm(docs):
        entities_details = build_question_answers_from_sentences(doc.content, nlp)
        for entity_detail in entities_details:
            context = doc.content
            question = entity_detail.get("sentence_with_mask").replace( "<MASK>", "")
            answer = entity_detail.get("word")
            hard_negative_contexts = get_hard_negative_context(
                retriever=retriever,
                question=question,
                answer=answer,
                n_ctxs=num_hard_negative_ctxs,
            )
            positive_context = [ {"text": context}]
            if not hard_negative_contexts or not positive_context:
                print(
                    f"No retrieved candidates for article , with question "
                )
                n_non_added_questions += 1
                continue
            dict_DPR = {
                "question": question,
                "answers": answer,
                "positive_ctxs": positive_context,
                "negative_ctxs": [],
                "hard_negative_ctxs": hard_negative_contexts,
            }
            n_questions += 1
            yield dict_DPR

In [105]:
qa_dpr_path = DATA_PATH.joinpath("raw", "french-qa", "DPR-news-with-mast.json")

In [106]:
assert qa_dpr_path.parent.exists()

In [111]:
import json

In [118]:
len(docs)

4719

In [156]:
dpr_results = create_dpr_training_dataset(docs, bm25_retriever, num_hard_negative_ctxs=30)

In [157]:

def write_retrieves_to_json(dpr_results, path):
    with open(path, "w") as json_file:
        json.dump(list(dpr_results), json_file, indent=4)

In [1]:
all_docs[0]

NameError: name 'all_docs' is not defined

In [131]:
import spacy

spacy_pipeline = spacy.load("fr_dep_news_trf")



In [132]:

random_id = np.random.randint(0, len(docs))
sample_document = docs[random_id]

In [133]:
sample_document.content

'Selon le president du Conseil National de Suivi de l\'Accord et du Processus Electoral, "CNSA", les cachots de l\'Agence Nationale des Renseignements, "ANR" ont ete fermes . Information livree par Joseph Olenghankoy ce mercredi 27 mars 2019 via son compte Twitter . "Je peux me permettre d’attester que tous les cachots de l\'Agence Nationale de Renseignements en sigle, ANR, sont desormais fermes", a indique Joseph Olenghankoy . Pour rappel, le chef de l\'Etat Felix Antoine Tshisekedi Tshilombo avait promis de fermer les cachots de l\'ANR et d\'humaniser ce service de securite de la RD Congo . Le 19 mars dernier, Felix Tshisekedi avait nomme de nouveaux dirigeants a l\'Agence Nationale de Renseignements . A la tete de ce service de securite depuis 8 ans soit de 2011 a 2019, Kalev Mutond administrateur general a ete remplace par Justin Inzun Kakiat.'

In [134]:
def is_valid_sentence(sentence):
    """# I am not loosing a lot by using only the sentences which ends as with a dot , question mark or exclamation mark 

    Args:
        sentence (_type_): _description_

    Returns:
        _type_: _description_
    """
    if sentence.endswith((".", "?", "!")) and 20 <=len(sentence) <= 250:
        return True

In [167]:
class Entity:
    def __init__(self, name, start, end, group):
        self.name = name
        self.start = start
        self.end = end
        self.group = group

    @staticmethod
    def from_dict(cls, dict):
        """generate an entity from a dict
        Args:

        Args:
            dict (_type_): _description_

        Returns:
            _type_: _description_

        Yields:
            _type_: _description_
        """
        name = dict.get("name")
        start = dict.get("start")
        end = dict.get("end")
        group = dict.get("group")
        return cls(name, start, end, group)
    

class Sentence:
    def __init__(self, text):
        self.text = text
    
    def generate_question_answer(self, entity_start, entity_end):
        return self.text[:entity_start] + "<MASK>" + self.text[entity_end:], self.text[entity_start:entity_end]

    def generate_question_answers(self):
        for entity in self.entities:
            yield self.generate_question_answer(entity.start, entity.end)
    
    def get_search_query(self, start, end):
        return self.sentence[:start] + " " +self.sentence[end:]
    
    def get_answer(self, entity_start, entity_end):
        return self.text[entity_start:entity_end]
    
    def get_hard_negative_context(self, retriever: BM25Retriever, n_ctxs: int = 15, entity_start: int = 0, entity_end: int = 0):
        question = self.get_search_query(entity_start, entity_end)
        answer = self.get_answer(entity_start, entity_end)
        list_hard_neg_ctxs = []
        retrieved_docs = get_hard_negative_context(retriever, question, answer, n_ctxs)
        for index, retrieved_doc in enumerate(retrieved_docs):
            retrieved_doc_text = retrieved_doc.text
            if answer.lower() in retrieved_doc_text.lower():
                continue
            list_hard_neg_ctxs.append(
                {"title": f"document_{index}", "text": retrieved_doc_text}
            )

        return list_hard_neg_ctxs
    
    def build_entities(self, ner_pipeline):
        """given a sentence generate names entities

        Args:
            ner_pipeline (_type_): _description_
        """
        entities = ner_pipeline(self.text)
        filtered_entities = filter_entities(entities)
        self.entities = [Entity.from_dict(Entity, entity) for entity in filtered_entities]

class DocumentContext:
    def __init__(self, content, spacy_pipeline=spacy_pipeline):
        self.context = context
        self.spacy_pipeline = spacy_pipeline
        self.spacy_doc = spacy_pipeline(content)
    
    def generate_sentences(self):
        """sentence is a list of sentence from span from context with different entities

        Args:
            sentences (_type_): _description_
        """
        for sentence in self.split_document_in_sentence():
            sentence_object = Sentence(sentence.text)
            self.sentences.append(sentence_object)
        del self.sentences_list
            

    def split_document_in_sentence(self):
        """take the document and yield valid sentences from the document context

        Args:
            doc (_type_): _description_
        """
        for sentence in self.spacy_doc.sents:
            if is_valid_sentence(sentence.text):
                yield sentence



In [166]:
context_data = build_question_answers_from_document(sample_document, transformer_ner_pipeline, spacy_pipeline)

the sentence start from 0 and ends at 36 and the text is Selon le president du Conseil National de Suivi de l'Accord et du Processus Electoral, "CNSA", les cachots de l'Agence Nationale des Renseignements, "ANR" ont ete fermes .
the sentence start from 36 and ends at 51 and the text is Information livree par Joseph Olenghankoy ce mercredi 27 mars 2019 via son compte Twitter .
the sentence start from 51 and ends at 83 and the text is "Je peux me permettre d’attester que tous les cachots de l'Agence Nationale de Renseignements en sigle, ANR, sont desormais fermes", a indique Joseph Olenghankoy .
the sentence start from 84 and ends at 116 and the text is rappel, le chef de l'Etat Felix Antoine Tshisekedi Tshilombo avait promis de fermer les cachots de l'ANR et d'humaniser ce service de securite de la RD Congo .
the sentence start from 116 and ends at 135 and the text is Le 19 mars dernier, Felix Tshisekedi avait nomme de nouveaux dirigeants a l'Agence Nationale de Renseignements .
the sen

In [159]:
sample_context = Context(context_data["context"], context_data.get("queries"))

In [162]:
context_data

{'context': 'Selon le president du Conseil National de Suivi de l\'Accord et du Processus Electoral, "CNSA", les cachots de l\'Agence Nationale des Renseignements, "ANR" ont ete fermes . Information livree par Joseph Olenghankoy ce mercredi 27 mars 2019 via son compte Twitter . "Je peux me permettre d’attester que tous les cachots de l\'Agence Nationale de Renseignements en sigle, ANR, sont desormais fermes", a indique Joseph Olenghankoy . Pour rappel, le chef de l\'Etat Felix Antoine Tshisekedi Tshilombo avait promis de fermer les cachots de l\'ANR et d\'humaniser ce service de securite de la RD Congo . Le 19 mars dernier, Felix Tshisekedi avait nomme de nouveaux dirigeants a l\'Agence Nationale de Renseignements . A la tete de ce service de securite depuis 8 ans soit de 2011 a 2019, Kalev Mutond administrateur general a ete remplace par Justin Inzun Kakiat.',
 'queries': [{'sentence': {'start': 0, 'end': 36},
   'entities': [{'entity_group': 'ORG',
     'score': 0.9619928,
     'word

In [160]:
sample_context.generate_sentences()

In [161]:
for sentence in sample_context.sentences:
    print(sentence.text)
    for question, answer in sentence.generate_question_answers():
        print(question , answer)
        print("***" * 10)

Selon le president du Conseil Nation
Selon le president du<MASK>  Conseil Nation
******************************
Selon le president du Conseil Nation<MASK> 
******************************
Selon le president du Conseil Nation<MASK> 
******************************
Selon le president du Conseil Nation<MASK> 
******************************
al de Suivi de 
al de Suivi de <MASK> 
******************************
al de Suivi de <MASK> 
******************************
l'Accord et du Processus Elector
l'Accord et du Processus Elector<MASK> 
******************************
l'Accord et du Processus Elector<MASK> 
******************************
l'Accord et du Processus Elector<MASK> 
******************************
l, "CNSA", les cachots de l'Agen
l, "CNSA", les cachots de<MASK>  l'Agen
******************************
l, "CNSA", les cachots de l'Agen<MASK> 
******************************
l, "CNSA", les cachots de l'Agen<MASK> 
******************************
ce Nationale des Re
<MASK>e ce Nationale des R
