### Bloc fonctionnel:
RAG

### Sous composant:
Extraction des informations des documents pour l'embedding

### Objectif de ce notebook
Tester la qualité de l'extraction des tableaux avec les outils de base (ici pdfplumber), à travers la capacité d'un llm à répondre à des questions dont les réponses sont contenues dans ces tableaux, ou pas (test hallucination)

La question est de savoir si l'utilisation de ces outils de base, rapides et économiques, est suffisante pour nos tableaux, ou si le recours à des vision LLM est nécessaire

#### Etapes
* Conversion du docx (PROJECT DOCUMENT MAHAKAM 2023-2025_balise) en pdf, pour avoir une structure segmentée par page, et potentiellement extraire les tableaux les plus complexes sous forme d'images 
* Lecture du pdf et isolation des pages contenant les tableaux (sur un autre notebook)
* Copier-coller de ces pages dans les variables `chunk` ci dessous
* Préparer 2 groupes de questions (q)/ réponses (real_answer) adaptés à chaque tableau:
    1. Questions dont la réponse ne se trouve pas dans le chunk (type=unrelated) -> le llm doit réponse 'information not found'
    2. Questions dont la réponse se trouve dans le chunk (type=related)-> le llm doit apporter une réponse correspondant à `real_answer`

#### Complexité et limites recherchées:
1. Le LLM doit pouvoir répondre correctement (estimé par la colonne `evaluation`) aux questions portant sur les tableaux ayant la structure la plus simple -> premiers tableaux testés
    * apriori il le peut car les colonnes et lignes sont clairement délimitées par des espaces, et le séquencement des données et cohérent 

2. Est-ce que le LLM est capable de répondre correctement aux questions portant sur les tableaux ayant les structures les plus complexes -> premiers tableaux testés
    * apriori non car la structure visuelle est cassée par pdfplumber, le séquencement des données est incohérent

In [None]:
from ollama import chat
import pandas as pd

messages = [
    {"role": "user", "content": "You are a helfull assistant, you will provide support summarization and Q/A tasks"},
]

# mistral 7b int 4
model_name="mistral:latest"

def ask_ollama(messages):
  stream = chat(
      model=model_name,
      messages=messages,
      stream=True,
      options={
        "temperature": 0.2
      }
  )

  response=""
  for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)
    response+=chunk['message']['content']
    
  return response

ask_ollama(messages)

 Absolutely! I'm here to help. If you have any questions or need assistance with something, feel free to ask. I can provide explanations, guidance, or answers based on the information available to me.

For example, if you want to know the capital of France, I can tell you that it is Paris. Or if you need help understanding a concept in mathematics, I can explain and provide examples. If you have a specific task or problem, I'll do my best to assist you with it. Just let me know what you need!

" Absolutely! I'm here to help. If you have any questions or need assistance with something, feel free to ask. I can provide explanations, guidance, or answers based on the information available to me.\n\nFor example, if you want to know the capital of France, I can tell you that it is Paris. Or if you need help understanding a concept in mathematics, I can explain and provide examples. If you have a specific task or problem, I'll do my best to assist you with it. Just let me know what you need!"

Q/A sur des extraits de données tabulaires obtenues automatiquement

In [23]:
def process_questions(questions, chunk):

    i=0
    for question in questions:

        _prompt=f"""
            Below is given a question and a context
            If the context does not provide informations or related details, reply "information not found" and exit\n
            Otherwise reply briefly and precisely to the question

            Question:\n
            {question['q']}    
            
            Context:\n
            {chunk}
            
        """

        prompt=f"""
            Below is a question and a context. Your task is to answer the question based on the provided context.

            If the context does not provide the necessary information or related details to answer the question, reply with "information not found" and exit.
            Otherwise, reply briefly and precisely to the question.
            Question:
            {question['q']}

            Context:
            {chunk}

            Example:

            Question:
            What is the capital of France?

            Context:
            France is a country in Europe known for its rich history and culture. Paris is the capital and largest city of France.

            Answer:
            Paris
        """        

        messages = [
            # {"role": "user", "content": "You are a helfull assistant, you will provide support for Q/A tasks"},
            {"role": "user", "content": prompt}
        ]

        answer=ask_ollama(messages)
        question["llm_answer"]=answer

        if question["real_answer"]!=None:
            eval_prompt=f"""
                You are an evaluator tasked with determining whether a candidate answer matches a reference answer. Your task is to compare the candidate answer to the reference answer and output "Pass" if they match, and "Fail" if they do not.

                **Question:**
                {question['q']}

                **Reference Answer:**
                {question["real_answer"]}

                **Candidate Answer:**
                {answer}

                **Evaluation:**
                Does the candidate answer match the reference answer? Output "Pass" or "Fail".
            """

            answer=ask_ollama([{"role": "user", "content": eval_prompt}])
            if "pass" in answer.lower():
                answer="Pass"            
            elif "fail" in answer.lower():
                answer="Fail"

            question["evaluation"]=answer

        else:
            labels=["Information not found", "The context provided does not contain information", "The context does not provide information", "not explicitly provided"
                    "The provided context does not contain information", "There's no specific mention", "Is not explicitly stated", "Is not explicitly mentioned", "not explicitly stated"]
    
            eval_prompt=f"""
                You are an evaluator tasked with determining whether an sentence correctly indicates that a seeked information is not found. 
                If the sentence contains words this list of label:\n {labels},\n Output "Pass", otherwise output "Fail".

                Sentence to Evaluate:
                {answer}

                Evaluation:
                Does the sentence correctly indicates that the information is not found in the context ? Output "Pass" if the sentence contains of the labels, and "Fail" if not.
            """

            answer=ask_ollama([{"role": "user", "content": eval_prompt}])
            if "pass" in answer.lower():
                answer="Pass"            
            elif "fail" in answer.lower():
                answer="Fail"
                
            question["evaluation"]=answer

                
        i+=1
         

    return questions

In [None]:
# voir page 12 du pdf PROJECT DOCUMENT MAHAKAM 2023-2025
chunk="""
    Planète Urgence | FOREST Programme
    contribute to reducing the risk of disasters, to the resilience of communities
    and ecosystems and to their adaptation to climate change.
    In Mahakam Delta, results from a study conducted in 2018 and 2019 by the
    Kutai Kartanegara District has shown that 47.8% of mangrove forests are
    deteriorated.
    Table 1. Critical Criteria of Mahakam Delta Mangrove3
    Critical Criteria Land Area (ha) Percentage
    Damaged 7,034 5.6
    Severe 52,945 42.2
    Undamaged 65,522 52.2
    Total 125,502 100.0
"""

questions=[
    {"type": "unrelated", "q": "What is the volume in liters of the oil discovery reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "How many poor people live in Mahakam Delta reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the surface in ha of restored land reported in this extract:", "real_answer": None},
    {"type": "related", "q": "What is the % deteriorated of forests reported in this extract:", "real_answer": 47.8},
    {"type": "related", "q": "What is the total surface of land area in Mahakam Delta reported in this extract:", "real_answer": 125502},
    {"type": "related", "q": "What is the % of severely damaged land reported in this extract:", "real_answer": 42.2},
    {"type": "related", "q": "What is the sum of damaged and severely damaged land area in Mahakam Delta reported in this extract:", "real_answer": (7034+52945)},
]



questions=process_questions(questions, chunk)
display(questions)
pd.DataFrame(questions)

 Information not found (in the provided context, there's no mention of oil discovery or volume) Pass Information not found (regarding the number of poor people living in Mahakam Delta) Pass The context does not provide information about the surface area (ha) of restored land. Pass42.2% Fail125,502 (ha) Pass42.2% Pass7,034 (damaged) + 52,945 (severely damaged) = 59,979 hectares Pass

[{'type': 'unrelated',
  'q': 'What is the volume in liters of the oil discovery reported in this extract:',
  'real_answer': None,
  'llm_answer': " Information not found (in the provided context, there's no mention of oil discovery or volume)",
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'How many poor people live in Mahakam Delta reported in this extract:',
  'real_answer': None,
  'llm_answer': ' Information not found (regarding the number of poor people living in Mahakam Delta)',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'What is the surface in ha of restored land reported in this extract:',
  'real_answer': None,
  'llm_answer': ' The context does not provide information about the surface area (ha) of restored land.',
  'evaluation': 'Pass'},
 {'type': 'related',
  'q': 'What is the % deteriorated of forests reported in this extract:',
  'real_answer': 47.8,
  'llm_answer': '42.2%',
  'evaluation': 'Fail'},
 {'type': 'related',
  'q': 'What is the total su

Unnamed: 0,type,q,real_answer,llm_answer,evaluation
0,unrelated,What is the volume in liters of the oil discov...,,Information not found (in the provided contex...,Pass
1,unrelated,How many poor people live in Mahakam Delta rep...,,Information not found (regarding the number o...,Pass
2,unrelated,What is the surface in ha of restored land rep...,,The context does not provide information abou...,Pass
3,related,What is the % deteriorated of forests reported...,47.8,42.2%,Fail
4,related,What is the total surface of land area in Maha...,125502.0,"125,502 (ha)",Pass
5,related,What is the % of severely damaged land reporte...,42.2,42.2%,Pass
6,related,What is the sum of damaged and severely damage...,59979.0,"7,034 (damaged) + 52,945 (severely damaged) = ...",Pass


In [26]:
# p16
chunk="""
    Planète Urgence | FOREST Programme
    Table 2. Types of Land Cover and Land Use in Mahakam Delta5
    Types of Land Cover and Land Use
    Land Area (ha) Percentage
    in Mahakam Delta
    Mangrove Forest with Middle Density 11,053 8.8
    Mangrove Forest with High Density 49,885 39.7
    Industry 311 0.2
    Plantation 5,356 4.3
    Open Land 193 0.2
    Residential Area 889 0.7
    Rice Field 869 0.7
    Shrubs 2,080 1.7
    Commercial Fish Ponds 54,865 43.7
    Total 125,502 100.0
    Source: Result of SPOT satellite images 6/7 (2018)
    b) Demographyb) Demography
    The demographic issue must also be considered. Indeed, the announcement
    in 2019 of the relocation of the political and administrative capital of Jakarta
    to the province of East Kalimantan, between the cities of Balikpapan and
    Samarinda, suggests strong migrations, the development of infrastructures
    but also a growing demand for aquaculture products. By 2024, the
    Indonesian Minister of Planning hopes to transfer nearly 1.5 million publicofficials and political representatives in East Kalimantan.
    Delta Mahakam Ulu village, which belongs to Delta Mahakam district, is
    located in the northern part of the Mahakam Delta. The location of Delta
    Mahakam sub-district is close to the state-owned oil company (Pertamina),
    formerly VICO. Due to the proximity of a fairly large company, the
    community's economy is quite dynamic and the area offers a variety of jobs.
    However, the number of people who still carry out the traditional work offishermen and fish farmers is still quite high, especially in the coastal areas.
    Working as a fish farmer has become one of the choices of the community as
    land is available for opening ponds. The conversion of mangrove forests into
    ponds has been going on for decades, but the production of fish and shrimp
    has decreased from time to time. Based on various studies and research,
    planting a number of mangroves in ponds can improve the soil and waterquality in the ponds so that they can provide sustainable production. The
    5 Study of identification of levels of mangrove forest damaged for rebooting planning in
    Mahakam Delta in Kutai Kartanegara district. Jurnal “Gerbang Etam” Balitbangda Kab. Kukar
    Vol. 13 No. 1 Tahun 2019
"""

questions=[
    {"type": "unrelated", "q": "What is the volume in liters of the oil discovery reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "How many poor people live in Mahakam Delta reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the surface in ha of restored land reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the % deteriorated of forests reported in this extract:", "real_answer": None},    
    {"type": "unrelated", "q": "What is the % of severely damaged land reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the sum of damaged and severely damaged land area in Mahakam Delta reported in this extract:", "real_answer": None},
    {"type": "related", "q": "What types of land cover and land use in Mahakam Delta according to his extract:", "real_answer": """ Mangrove Forest with Middle Density, Mangrove Forest with High Density, Industry, Plantation, Open Land, Residential Area, Rice Field, Shrubs, Commercial Fish Ponds,"""},
    {"type": "related", "q": "What is the total surface of land area  reported in this extract:", "real_answer": 125502},
    {"type": "related", "q": "What is the % of residential area land reported in this extract:", "real_answer": 0.7},
    {"type": "related", "q": "What is the sum of shrubs and rice field land area reported in this extract:", "real_answer": (869+2080)},    
]



questions=process_questions(questions, chunk)
display(questions)
pd.DataFrame(questions)

 Information not found (regarding oil volume) Pass Information not found (regarding the number of poor people living in Mahakam Delta) Pass The context does not provide information about the surface area (ha) of restored land. Pass Information not found (regarding the deteriorated percentage of forests) Pass The context does not provide information about the percentage of severely damaged land. Pass Information not found (regarding damaged or severely damaged land area in Mahakam Delta) Pass Mangrove Forest with Middle Density, Mangrove Forest with High Density, Commercial Fish Ponds, Plantation, Open Land, Residential Area, Rice Field, Shrubs Pass125,502 ha (hectares) Pass0.7 (in percentage) Pass The context does not provide information about the sum of shrubs and rice field land area. Fail

[{'type': 'unrelated',
  'q': 'What is the volume in liters of the oil discovery reported in this extract:',
  'real_answer': None,
  'llm_answer': ' Information not found (regarding oil volume)',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'How many poor people live in Mahakam Delta reported in this extract:',
  'real_answer': None,
  'llm_answer': ' Information not found (regarding the number of poor people living in Mahakam Delta)',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'What is the surface in ha of restored land reported in this extract:',
  'real_answer': None,
  'llm_answer': ' The context does not provide information about the surface area (ha) of restored land.',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'What is the % deteriorated of forests reported in this extract:',
  'real_answer': None,
  'llm_answer': ' Information not found (regarding the deteriorated percentage of forests)',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q':

Unnamed: 0,type,q,real_answer,llm_answer,evaluation
0,unrelated,What is the volume in liters of the oil discov...,,Information not found (regarding oil volume),Pass
1,unrelated,How many poor people live in Mahakam Delta rep...,,Information not found (regarding the number o...,Pass
2,unrelated,What is the surface in ha of restored land rep...,,The context does not provide information abou...,Pass
3,unrelated,What is the % deteriorated of forests reported...,,Information not found (regarding the deterior...,Pass
4,unrelated,What is the % of severely damaged land reporte...,,The context does not provide information abou...,Pass
5,unrelated,What is the sum of damaged and severely damage...,,Information not found (regarding damaged or s...,Pass
6,related,What types of land cover and land use in Mahak...,"Mangrove Forest with Middle Density, Mangrove...","Mangrove Forest with Middle Density, Mangrove...",Pass
7,related,What is the total surface of land area report...,125502,"125,502 ha (hectares)",Pass
8,related,What is the % of residential area land reporte...,0.7,0.7 (in percentage),Pass
9,related,What is the sum of shrubs and rice field land ...,2949,The context does not provide information abou...,Fail


In [27]:
# p27
chunk="""
    Planète Urgence | FOREST Programme
    The objective is to increase the knowledge of the local community in
    implementing mangrove reforestation project.
     Target: 60 local communities in the 3 villages10.
     Trainers/facilitator: the team of PU, YML and POKJA Pesisir.
     Location: Delta Mahakam and Adang Bay, Paser.
     Materials:
    a. Introduction of the mangrove ecosystems and its importance
    b. Getting to know the type of mangrove species and its characteristic
    c. Seed and propagule selectionc. Seed and propagule selection
    d. Nursery preparation
    e. Seedling the propagules in the nursery
    f. Techniques of mangrove tree planting.
     Methods and media: PowerPoint presentation, propagule/seedlings.
     Strategy of implementation: coordination with the village officials and
    community leaders.
    2.3.3 Tree planting
    Objective: To reforest the dedicated plantation plots.
     Target of tree planting
    Table 3 *
    Number of trees
    Location
    1st Year 2nd Year 3rd Year
    75, 125,00 100,1st Year 2nd Year 3rd Year
    75, 125,00 100,
    Mahakam Delta 300.000 trees 000 0 000
    22, 37,50 30,
    -Direct method 500 0 000
    52, 87,50 70,
    -Indirect method 500 0 000
    155, 200,00 150,
    Adang Bay 505.000 trees 000 0 000
    50, 60,00 45,
    -Direct method 000 0 000
    105, 140,00 105,
    -Indirect method 000 0 000
    10 60 people joining training, but the total number of people involved in the tree
    planting is expected to be 100 (in the 2 project locations)(1) DELTA MAHAKAM (planting location 1 = Delta Mahakam Ulu, planting location
    2 = Tanjung Limau)
    (2) ADANG BAY
"""

questions=[
    {"type": "unrelated", "q": "What types of land cover and land use in Mahakam Delta according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the total surface of land area  reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the % of residential area land reported in this extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the sum of shrubs and rice field land area reported in this extract:", "real_answer": None},    
    {"type": "related", "q": "What is the target of tree planting in Mahakam Delta according to his extract:", "real_answer": 300000},
    {"type": "related", "q": "What is the target of tree planting in Adang bay according to his extract:", "real_answer": 505000},
    {"type": "related", "q": "What is the target of tree planting in the first year in Mahakam Delta according to his extract:", "real_answer": 75000},
    {"type": "related", "q": "What is the target of tree planting in the last year in Adang bay according to his extract:", "real_answer": 150000},
    {"type": "related", "q": "What is the target of tree planting in the second year in Adang bay with indirect method according to his extract:", "real_answer": 140000},
]



questions=process_questions(questions, chunk)
display(questions)
pd.DataFrame(questions)

 Mangrove reforestation project is being implemented in the Mahakam Delta and Adang Bay, with the objective to plant 300,000 trees in the first year in Mahakam Delta (planting locations 1 = Delta Mahakam Ulu, planting location 2 = Tanjung Limau) and 505,000 trees in the first year in Adang Bay. The types of land cover or use mentioned in the context are mangrove reforestation plots. Fail750,000 (the total number of trees planted in Delta Mahakam and Adang Bay) Fail The context does not provide information about the percentage of residential area land reported. Pass The provided context does not contain information about shrubs or rice field land area, so I cannot answer the question. Pass The target of tree planting in Mahakam Delta, according to the provided context, is 300,000 trees. Pass Adang Bay (in the context provided, it's one of the locations for tree planting) Fail100 trees in the Mahakam Delta (in the 1st year) according to Table 3 in the provided context. Fail155,000 trees 

[{'type': 'unrelated',
  'q': 'What types of land cover and land use in Mahakam Delta according to his extract:',
  'real_answer': None,
  'llm_answer': ' Mangrove reforestation project is being implemented in the Mahakam Delta and Adang Bay, with the objective to plant 300,000 trees in the first year in Mahakam Delta (planting locations 1 = Delta Mahakam Ulu, planting location 2 = Tanjung Limau) and 505,000 trees in the first year in Adang Bay. The types of land cover or use mentioned in the context are mangrove reforestation plots.',
  'evaluation': 'Fail'},
 {'type': 'unrelated',
  'q': 'What is the total surface of land area  reported in this extract:',
  'real_answer': None,
  'llm_answer': '750,000 (the total number of trees planted in Delta Mahakam and Adang Bay)',
  'evaluation': 'Fail'},
 {'type': 'unrelated',
  'q': 'What is the % of residential area land reported in this extract:',
  'real_answer': None,
  'llm_answer': ' The context does not provide information about the pe

Unnamed: 0,type,q,real_answer,llm_answer,evaluation
0,unrelated,What types of land cover and land use in Mahak...,,Mangrove reforestation project is being imple...,Fail
1,unrelated,What is the total surface of land area report...,,"750,000 (the total number of trees planted in ...",Fail
2,unrelated,What is the % of residential area land reporte...,,The context does not provide information abou...,Pass
3,unrelated,What is the sum of shrubs and rice field land ...,,The provided context does not contain informa...,Pass
4,related,What is the target of tree planting in Mahakam...,300000.0,"The target of tree planting in Mahakam Delta,...",Pass
5,related,What is the target of tree planting in Adang b...,505000.0,"Adang Bay (in the context provided, it's one ...",Fail
6,related,What is the target of tree planting in the fir...,75000.0,100 trees in the Mahakam Delta (in the 1st yea...,Fail
7,related,What is the target of tree planting in the las...,150000.0,"155,000 trees in Adang Bay (for the 1st year)",Pass
8,related,What is the target of tree planting in the sec...,140000.0,105 people are expected to join training for t...,Fail


In [28]:
# p35
chunk="""
    Planète Urgence | FOREST Programme
    process, mainly required by donors. This training will be conferred by
    the Administrative and Financial staff of Planète Urgence Indonesia.
    The training will be done in the first two months of the project and PU
    will continuously monitor the impact of the training to ensure the
    partner implement the PU standard.
    2. YML and Pokja Pesisir will also be helped in monitoring and project
    management. PU will share their experience and manual in monitoringand managing the project to the partner. The transfer knowledge will
    be done at the beginning of the project.
    3. Refreshment training of reforestation project particularly for Pokja
    Pesisir including technical issue such as establishing nursury, planting
    tree, identifying location
    4. Rapid Needs Assessment on Capacity Building to identify the needs of
    capacity building for the partners.
    The project will continuously identify further needs regarding the project orpartner structuration. Capacity building needs could also be supported by
    Planète Urgence’s volunteerism programme (if the organization and the
    logistic of the mission meet all required criteria).
    CAPACITY BUILDING PLAN
    Year 1 Year 2 Year 3
    No Training topic
    S1 S2 S1 S2 S1 S2
    Financial and accountability reporting of the x x x
    1 MAHAKAM project following PU’s process,
    mainly required by donors.
    Monitoring and project management. x x x
    2
    Reforestation project including the technical x x x3 issue: establishing nursery, planting tree,
    monitoring, etc.
    Rapid Need Assessment on Capacity x x x
    4 Building
    Table 4 *
    Roles and responsibilities: Activities will be implemented under the
    responsibility of Planète Urgence and its partners (Pokja Pesisir and YML).
    These partners will design, develop, implement and monitor the activities
    under their responsibility. Both implementing partners will play an important
    coordination role with partners, beneficiaries, community groups and localauthorities. Planète Urgence will support them in the project management
    and discussions with local authorities regarding governance strategy, project
    monitoring and evaluation, identification of capacity building needs for
"""

questions=[
    {"type": "unrelated", "q": "What is the target of tree planting in Adang bay according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the target of tree planting in the first year in Mahakam Delta according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the target of tree planting in the last year in Adang bay according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the target of tree planting in the second year in Adang bay with indirect method according to his extract:", "real_answer": None},

    {"type": "related", "q": "What is the CAPACITY BUILDING PLAN according to his extract:", "real_answer": "see source"},
    {"type": "related", "q": "What is the timeline for the CAPACITY BUILDING PLAN according to his extract:", "real_answer": "3 years"},
    {"type": "related", "q": "What is the finest time unit for the CAPACITY BUILDING PLAN according to his extract:", "real_answer": "Semester"},
    {"type": "related", "q": "What is the schedule of the Monitoring and project management according to his extract:", "real_answer": "S2 for all years"},
    {"type": "related", "q": "What is the schedule of the monitoring and project management for the last year according to his extract:", "real_answer": "S2"},
    {"type": "related", "q": "What is the planning of the Rapid Need Assessment on Capacity Building for the second year according to his extract:", "real_answer": "S1"},
]

questions=process_questions(questions, chunk)
display(questions)
pd.DataFrame(questions)

 The target of tree planting in Adang bay, according to the provided context, is not explicitly stated. However, it can be inferred that the partners (Pokja Pesisir and YML) will be involved in a reforestation project, including establishing nurseries, planting trees, monitoring, etc., as part of the capacity building plan. Pass The target of tree planting in the first year in Mahakam Delta, according to the provided context, is to conduct training on financial and accountability reporting, monitoring and project management, reforestation project including technical issues such as establishing nurseries, planting trees, and monitoring, and Rapid Needs Assessment on Capacity Building. Fail The target of tree planting in Adang bay last year, according to the provided context, is not explicitly stated. However, it can be inferred that the reforestation project mentioned in point 3 of the context could potentially include Adang bay as one of the locations for tree planting, but no specific

[{'type': 'unrelated',
  'q': 'What is the target of tree planting in Adang bay according to his extract:',
  'real_answer': None,
  'llm_answer': ' The target of tree planting in Adang bay, according to the provided context, is not explicitly stated. However, it can be inferred that the partners (Pokja Pesisir and YML) will be involved in a reforestation project, including establishing nurseries, planting trees, monitoring, etc., as part of the capacity building plan.',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'What is the target of tree planting in the first year in Mahakam Delta according to his extract:',
  'real_answer': None,
  'llm_answer': ' The target of tree planting in the first year in Mahakam Delta, according to the provided context, is to conduct training on financial and accountability reporting, monitoring and project management, reforestation project including technical issues such as establishing nurseries, planting trees, and monitoring, and Rapid Needs

Unnamed: 0,type,q,real_answer,llm_answer,evaluation
0,unrelated,What is the target of tree planting in Adang b...,,"The target of tree planting in Adang bay, acc...",Pass
1,unrelated,What is the target of tree planting in the fir...,,The target of tree planting in the first year...,Fail
2,unrelated,What is the target of tree planting in the las...,,The target of tree planting in Adang bay last...,Pass
3,unrelated,What is the target of tree planting in the sec...,,Planting trees using indirect methods in Adan...,Fail
4,related,What is the CAPACITY BUILDING PLAN according t...,see source,"The Capacity Building Plan, as per the provid...",Fail
5,related,What is the timeline for the CAPACITY BUILDING...,3 years,The Capacity Building Plan according to the p...,Pass
6,related,What is the finest time unit for the CAPACITY ...,Semester,The finest time unit for the CAPACITY BUILDIN...,Fail
7,related,What is the schedule of the Monitoring and pro...,S2 for all years,The schedule for Monitoring and project manag...,Fail
8,related,What is the schedule of the monitoring and pro...,S2,The schedule for monitoring and project manag...,Fail
9,related,What is the planning of the Rapid Need Assessm...,S1,The planning for the Rapid Need Assessment on...,Fail


In [29]:
# page 41
chunk="""
    Planète Urgence | FOREST Programme
    local communities and the public stakeholders concerned (villages and
    districts) in order to replicate these activities with the other villages of the
    Mahakam Delta.
    The tree protection from palm tree leaves and net that successfully works to
    protect tree in previous project will be documented and spread to the key
    restoration actors for possible replication in their project. In addition, PUIndonesia will manage and distribute the publication and knowledge from the
    project to other parties. Module and material environmental education
    campaign will be used in the scale up project or other locations.
    Table 3. Summary of our lessons learned, and best practices developed on
    previous projects in the Mahakam Delta
    <table>
    Activities Best practices Lessons learned Change maker in
    (+) (-) this new project
    Raising awareness 1/Raising Not all teachers 1 and 2) Thein schools awareness in have creativity in project will apply
    school generally teaching, so similar method
    get support from intensive and approach to
    assistance is the schools’
    the schools. 2)
    needed teachers and
    Animal costumes
    students. 3) In
    are very attractive
    addition, in this
    for elementary
    pilot phase, PU will
    schools' children.
    elaborate the
    3) The successful
    possibility to
    of YML and Pokja
    integrate the
    Pesisir introduction
    environmental
    environmental education inenvironmental
    environmental education in
    education in elementary school
    Primary and curriculum.
    Secondary
    Teachers have a
    Schools. Teachers
    strategic position
    had just realized
    to foster
    that they live in an
    environmental
    important
    awareness.
    mangrove
    Teachers can act
    ecosystem. There as very effective
    is a desire to learn agents of change
    more about the for increasing
    mangrove environmental
    ecosystem and a awareness
    want to teach in
    schools.
    Students have a
    high interest inschools.
    Students have a
    high interest in
    learning directly inPlanète Urgence | FOREST Programme
    nature
    Public awareness A public 1) Compared than PU will facilitate
    awareness MERCI project, the and encourage
    campaign using relationship implementing
    local examples is between PU and partners to
    an interesting partner with local establish
    thing for the media in collaboration with
    community MAHAKAM project the local media.
    was weak. 2) With PU will organize
    the legal status of informal meetings
    PU in Indonesia, it with local mediaPU in Indonesia, it with local media
    is quite difficult for and support the
    PU Indonesia to partners by
    actively share the providing data and
    progress and information to
    achievement of enrich the news
    the project to the coverage by local
    local media. media.
    On the other hand,
    the government
    that has a vision
    for environmental
    conservation plays
    an important role
    in raising public
    awareness
    Sustainable The fish farmer The impact of The project willaquaculture group in Delta sustainable continue work with
    (demonstration Mahakam Ulu has aquaculture the fish farmer to
    plots etc.) a strong trust with intervention in the intervene and
    PU and YML. The aquaculture pond monitor the
    previous project do not yet impact. PU,
    uses their ponds document well partner and fish
    as demonstration although the fish farmer will discuss
    plot. farmer has felt the key parameters to
    positive impact of be monitored and
    People will receivePeople will receive
    the intervention. collaborate with
    information easily
    the PU
    if they see
    communication
    examples directly
    officer to manage
    in the field. The
    the information
    government
    from the pond.
    (BRGM) replicates
    mangrove planting
    in ponds based on
    several practices
    carried out by the
    community
    including demoPlanète Urgence | FOREST Programme
    plots
    Training of Small business There is still social
    community groups actors and jealousy between
    particularly especially women group members.
    women group on need to have Field facilitators
    the financial financial literacy. need to have
    management and Women realize conflict
    product marketing that money for management skills
    business needs to
    be separated from
    money for family.
    Reforestation 1) Several 1) PU will train the
    practical guidance implementingpractical guidance implementing
    of reforestation for partners and
    community has villagers using the
    been developed in practical guidance
    previous project. for reforestation
    2) In general, the project during the
    community in early visit to the
    Delta Mahakam location project
    Ulu and Adang Bay and prior to the
    has capacity in nursery process.
    reforestation 2) PU will
    project that they coordinate with
    gain from the villagers whether
    previous project it is necessary toprevious project it is necessary to
    both from PU and surrounding the
    other parties. 3) planted tree with
    Sustainable wave wave breakers.
    breaker in Muara
    Ulu Kecil prove
    effectively prevent
    the wave in the
    previous project.
    Local governance 1) PU Indonesia 1) Together with
    support activities has been joining in the coalition of
    the Mahakam Delta Mahakam,
    Delta coalition PU Indonesia will
    since 2019. This voice the opinion
    informal coalition of Delta Mahakam
    aims to coordinate stakeholders onaims to coordinate stakeholders on
    the activities of the future of Delta
    institutions in Mahakam after the
    delta, to moving of new
    complement each capital to the East
    other's, to avoid Kalimantan. 2) Will
    overlapping use the guidance
    activities and to of FPIC for thisPlanète Urgence | FOREST Programme
    reach the common pilot project.
    goals in Delta
    Mahakam.
    2) PU and Bioma
    have developed
    practical guidance
    on how to conduct
    FPIC in
    reforestation
    projects.
    Relationship with PU and The informal
    local partners implementing coordination will
    partners have be conducted to
    strong ties with those authority to
    the local authority gain technical and
    such as Forest administrative
    Management Unit support from
    and Conservation them.
    agencies.and Conservation them.
    agencies.
    Capacity building YML has really YML still has a lack PU Indonesia will
    of our local good knowledge of capacity and provide tools to
    partners and technical skills initiative to YML and Pokja
    in reforestation conduct data Pesisir to conduct
    and sylvofishery. monitoring related data collection and
    to sylvofishery and regularly
    Pokja Pesisir has
    is reluctant to monitoring the
    good
    work with implementation.
    communication
    university.
    with keycommunication
    university.
    with key
    stakeholders at Pokja Pesisir in
    the village and other side has still
    region levels lacks experience
    (BKSDA) in reforestation
    skills, particularly
    data collection.
    Hiring consultant Local consultant 1) The English Clear objectives
    has better skill, writing and and indicators are
    understanding on speaking, of local needed to guide
    local social and consultant is the local
    environmental relatively poor. 2) consultant.
    context The local Moreover, a rapidcontext The local Moreover, a rapid
    consultant internal
    apparently does assessment is
    not have a 100 needed to
    percent freedom complement the
"""


questions=[
    {"type": "unrelated", "q": "What is the timeline for the CAPACITY BUILDING PLAN according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the finest time unit for the CAPACITY BUILDING PLAN according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the schedule of the Monitoring and project management according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What is the schedule of the monitoring and project management for the last year according to his extract:", "real_answer": None},


    {"type": "related", "q": "What are the main activites according to his extract:", "real_answer": "Raising awarenessin schools, Public awareness, Sustainable aquaculture, Training of community groups, Reforestation, Local governance support activities, Relationship with local partners, Capacity building of our local partners, Hiring consultant"},
    {"type": "related", "q": "What are the best practices for Public awareness according to his extract:", "real_answer": "A public awareness campaign using local examples is an interesting thing for the community"},
    {"type": "related", "q": "What are Lessons learned for Raising awareness in schools according to his extract:", "real_answer": "Not all teachers have creativity in teaching, so intensive assistance is needed"},    
]


questions=process_questions(questions, chunk)
display(questions)
pd.DataFrame(questions)

 The timeline for the CAPACITY BUILDING PLAN is not explicitly provided in the context. Pass The finest time unit for the CAPACITY BUILDING PLAN is not explicitly stated in the provided context. Pass The schedule or timeline for the Monitoring and project management is not explicitly stated in the provided context. However, it can be inferred that Planète Urgence (PU) will facilitate and encourage partnerships with local communities and media to implement environmental education campaigns, train small business community groups, conduct reforestation activities, and capacity building initiatives. The timeline for these activities is not specified. Pass The schedule or timeline for monitoring and project management according to the provided context is not explicitly stated. However, it can be inferred that Planète Urgence (PU) will facilitate and encourage collaboration with local partners, implement practical guidance for reforestation, conduct FPIC (Free, Prior, and Informed Consent), 

[{'type': 'unrelated',
  'q': 'What is the timeline for the CAPACITY BUILDING PLAN according to his extract:',
  'real_answer': None,
  'llm_answer': ' The timeline for the CAPACITY BUILDING PLAN is not explicitly provided in the context.',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'What is the finest time unit for the CAPACITY BUILDING PLAN according to his extract:',
  'real_answer': None,
  'llm_answer': ' The finest time unit for the CAPACITY BUILDING PLAN is not explicitly stated in the provided context.',
  'evaluation': 'Pass'},
 {'type': 'unrelated',
  'q': 'What is the schedule of the Monitoring and project management according to his extract:',
  'real_answer': None,
  'llm_answer': ' The schedule or timeline for the Monitoring and project management is not explicitly stated in the provided context. However, it can be inferred that Planète Urgence (PU) will facilitate and encourage partnerships with local communities and media to implement environmental education

Unnamed: 0,type,q,real_answer,llm_answer,evaluation
0,unrelated,What is the timeline for the CAPACITY BUILDING...,,The timeline for the CAPACITY BUILDING PLAN i...,Pass
1,unrelated,What is the finest time unit for the CAPACITY ...,,The finest time unit for the CAPACITY BUILDIN...,Pass
2,unrelated,What is the schedule of the Monitoring and pro...,,The schedule or timeline for the Monitoring a...,Pass
3,unrelated,What is the schedule of the monitoring and pro...,,The schedule or timeline for monitoring and p...,Pass
4,related,What are the main activites according to his e...,"Raising awarenessin schools, Public awareness,...","Raising awareness, reforestation, training of...",Pass
5,related,What are the best practices for Public awarene...,A public awareness campaign using local exampl...,1) Raising awareness: Compared to the PU proje...,Fail
6,related,What are Lessons learned for Raising awareness...,"Not all teachers have creativity in teaching, ...",1) Raising awareness in schools should be crea...,Fail


In [30]:
chunk="""
    Planète Urgence | FOREST Programme
    <table>
    11. Results framework
    Baseline Target Final
    PERFORMANCE SOURCE OF
    EXPECTED RESULTS
    INDICATORS VERIFICATION
    Value Year 2023 2024 2025
    Output 1: Raising the Number of teachers (F/M) Documentation 50 2022 50 50 50 150
    awareness of trained activity
    stakeholders to Activity Report
    conserve the List of
    Environment and participants
    biodiversity in Delta
    Number of volunteers Documentation 0 2022 6 6 6 18
    Mahakam & Teluk
    involved in school activityMahakam & Teluk
    involved in school activity
    Adang sustainably.
    campaign Activity Report
    List of volunteers
    Number of schools Documentation 4 2022 4 3 5 6
    reached by the activity
    awareness campaign Activity Report
    List of
    participants
    Number of students Documentation 200 2022 400 250 400 1050
    involved in campaign activity
    Activity Report
    List of
    participants
    Number of schools Documentation 0 2022 1 1 2 4
    assisted in Adiwiyata activity
    program Activity ReportPlanète Urgence | FOREST Programme
    List of
    participants
    MoU Document
    Number of local Documentation 0 2022 100 a. 100 100 a. 200
    communities reached by activity persons person person persons
    waste management Activity Report b. 20 s b. 20
    campaign
    List of User User
    participants Account Youtub
    Youtube e
    Account
    Number of Billboard for Documentation 2 2022 0 2 0 2
    environmental campaign Plot location in
    the map
    Number of Book Book Publication 1 2022 1 book 1 book 2 bookspublication on best Activity Report 30 30 60
    practices related to copies copies Copies
    microfinance & nurseries
    maintenance
    Number of pepople Documentation 1000 2022 50 50 50 150
    reached in Public activity indirect Direct Direct Direct Direct
    Awareness campaign Activity Report 300 300 300 900
    through media
    Media coverage Indirect Indirect Indirec Indirect
    (online/offline)
    t
    Number of Journalist Trip Media coverages 10 2022 5 media 5 media 5 15
    (offline/online) coverag coverag media media(offline/online) coverag coverag media media
    es es covera coverag
    ges esPlanète Urgence | FOREST Programme
    Output 2: Rehabilitate Number of Study on land Documentation 1 2022 1 N/A N/A 1
    degraded mangrove suitability for mangrove activity
    forest reforestation Activity Report
    Report Document
    Number of Biodiverity Documentation 1 2021 1 N/A 1 2
    research to discover activity
    linkage beetween habitat Activity Report
    restoration and existing of
    Report Document
    endangered species
    Number of nurseries Documentation 3 2022 5 6 6 6
    Established activity
    Activity ReportEstablished activity
    Activity Report
    Nurseries
    database
    monitoring
    Number of local Documentation 100 2022 100 100 100 300
    communities involved in activity
    mangrove nurseries Activity Report
    List of
    participants
    Number of mangroves Documentation 150.000 2022 230.000 325.00 250.0 805.00
    planted activity 0 00 0
    Activity Report
    Database
    planting
    monitoring
    Map planting
    distribution
    Number of local Documentation 60 2022 50 60 60 170Planète Urgence | FOREST Programme
    communities trained in activity
    Mangrove Reforestation Activity Report
    List of
    participants
    Number of trainings Documentation 3 2022 3 3 3 9
    conducted activity
    Activity Report
    List of
    participants
    Number of local Documentation 0 2022 8 8 8 8
    communities participated activity
    in tree monitoring Activity Report
    List of
    participants
    Database
    planting growth
    monitoring
    Output Support the Number of communities Documentation 40 2022 40 40 40 120development of participated in sustainable activity
    sustainable livelihood aquaculture training Activity Report
    List of
    participants
    Number of sustainable Documentation 1 2022 2 4 5 5
    acquaculture demoplot activity
    through silvofishery Activity Report
    scheme
    List of
    participants
    Number of women groups Documentation 4 2022 3 groups 3 3 3
    trained and assisted in activity groups 30 groups groups groupsPlanète Urgence | FOREST Programme
    developing fish farmer Activity Report 26 women 30 30 30
    product List of women women women women
    participants
    Database women
    group income
    generating
    Output Strengthening Number of local Documentation 2 2022 2 2 2 6
    the governence of communities participated activity villages villages villages village villages
    Delta Mahakam & in FPIC Activity Report 40 40 40 s 120
    Adang Bay List of persons persons persons 40
    participants person
    sparticipants person
    s
    Number of meetings with Documentation 4 2022 4 4 4 12
    key stakeholder activity
    Minutes of
    Meeting
    List of
    participants
    </table>
    12. Monitoring & evaluation
    The monitoring and evaluation of the project activities, risks, knowledge management, communication and others will
    be done internaly through:
    a.Monthly narrative and financial report
    b.Quarterly narrative and financial report
    c. Monthly coordination meeting with the partners.d.Biweekly coordination meeting with the PU Paris.
"""


questions=[
    {"type": "unrelated", "q": "What are the main activites according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What are the best practices for Public awareness according to his extract:", "real_answer": None},
    {"type": "unrelated", "q": "What are Lessons learned for Raising awareness in schools according to his extract:", "real_answer": None},    
    {"type": "unrelated", "q": "What is the schedule of the monitoring and project management for the last year according to his extract:", "real_answer": None},

    {"type": "related", "q": "Provide a summary of the resultst framework according to his extract:", "real_answer": "see source"},
    {"type": "related", "q": "What are the expected results according to his extract:", "real_answer": "Output 1: Raising the awareness of stakeholders to conserve the Environment and biodiversity in Delta Mahakam & Teluk Adang sustainably. Output 2: Rehabilitate degraded mangrove forest. Output Support the development of sustainable livelihood. Output Strengthening the governence of Delta Mahakam & Adang Bay."},
    {"type": "related", "q": "What are the KPI for the Output Support the development of sustainable livelihood according to his extract:", "real_answer": "Number of communities participated in sustainable aquaculture training. Number of sustainable acquaculture demoplot through silvofishery scheme. Number of women groups trained and assisted in developing fish farmer product."},    
    {"type": "related", "q": "What is the final number of schools assisted in Adiwiyata program according to his extract:", "real_answer": 4}
]


questions=process_questions(questions, chunk)
display(questions)
pd.DataFrame(questions)

1. Raising awareness of trained teachers to conserve biodiversity in Delta, involving volunteers in school activities Mahakam & Teluk, Adang sustainably, conducting an awareness campaign, reaching schools and students with the campaign, assisting in Adiwiyata activity program, reaching local communities with waste management campaigns, environmental campaign through media, conducting a study on land degraded mangrove suitability for forest reforestation, rehabilitating degraded mangroves, establishing and monitoring mangrove nurseries, training local communities in mangrove reforestation, developing sustainable livelihood aquaculture, supporting the development of participating communities through sustainable livelihood aquaculture training, strengthening the governance of communities in Delta Mahakam & Adang Bay. Fail The context does not provide specific best practices for public awareness, but it does show methods used to raise awareness such as:

1. Conducting awareness campaigns t

[{'type': 'unrelated',
  'q': 'What are the main activites according to his extract:',
  'real_answer': None,
  'llm_answer': '1. Raising awareness of trained teachers to conserve biodiversity in Delta, involving volunteers in school activities Mahakam & Teluk, Adang sustainably, conducting an awareness campaign, reaching schools and students with the campaign, assisting in Adiwiyata activity program, reaching local communities with waste management campaigns, environmental campaign through media, conducting a study on land degraded mangrove suitability for forest reforestation, rehabilitating degraded mangroves, establishing and monitoring mangrove nurseries, training local communities in mangrove reforestation, developing sustainable livelihood aquaculture, supporting the development of participating communities through sustainable livelihood aquaculture training, strengthening the governance of communities in Delta Mahakam & Adang Bay.',
  'evaluation': 'Fail'},
 {'type': 'unrelated

Unnamed: 0,type,q,real_answer,llm_answer,evaluation
0,unrelated,What are the main activites according to his e...,,1. Raising awareness of trained teachers to co...,Fail
1,unrelated,What are the best practices for Public awarene...,,The context does not provide specific best pr...,Pass
2,unrelated,What are Lessons learned for Raising awareness...,,1. Raising awareness in schools:\n - The nu...,Fail
3,unrelated,What is the schedule of the monitoring and pro...,,The monitoring and project management for the...,Fail
4,related,Provide a summary of the resultst framework ac...,see source,The Results Framework outlines expected resul...,Pass
5,related,What are the expected results according to his...,Output 1: Raising the awareness of stakeholder...,The expected results according to the extract...,Fail
6,related,What are the KPI for the Output Support the de...,Number of communities participated in sustaina...,The KPIs for Output Supporting the developmen...,Pass
7,related,What is the final number of schools assisted i...,4,"6 (according to the provided context, the numb...",Pass


### Conclusion
A mesure que le tableau augmente en complexité, l'extraction de pdfplumber est mauvaise, et le llm n'arrive pas à repérer les bonnes séquences de mots et répondre correctement

### Next step
Mettre dans chunk les données extraites par visual LLM et re tester