Skip to content

maskcode9004/maskcode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Created: 2024-10-24

Data

input

  • AQuA.csv(AQuA-RAT)

  • RQA.csv

    • original: https://github.com/realtimeqa/realtimeqa_public

    • Processing

      • Use data from 2023/11/1 onwards.
      • Removed the a tag from evidence and created a column evidence_wo_url.
      • Used the following prompt to have gpt-4o-mini determine whether the question could be answered based solely on the given context. Only questions deemed answerable were used:
        Follow the instructions and output your response.
        
        ## Instructions
        - Return True if you can answer the question based solely on the context; otherwise, return False.
        - Respond in JSON format as {{'output': bool}}.
        
        ## Context
        {context}
        
        ## Question
        {question}
        
        ## Answer to the Question
        {answers}
        
  • UQA.csv

    • 100 questions generated by LLM from universal documents such as the Universal Declaration of Human Rights as data that potentially contains background knowledge.
  • MskCal.txt The calculation problems presented as descriptive questions.

output

  • Common File Contents for Each Folder

    • [QA_TYPE]_gemma2_9b_*.csv ... Response results for each MR under regular masking.
    • [QA_TYPE]_gemma2_9b_*_filtered.csv ... Response results for each MR under partial masking.
    • [QA_TYPE]_gemma2_9b_*_no_meaning.csv ... Response results for each MR under strict masking.
    • [QA_TYPE]_gemma2_9b_*_no_verb.csv ... Response results for each MR under lenient masking.
    • [QA_TYPE]_gemma2_9b_tmp.csv, [QA_TYPE]_step1.csv, [QA_TYPE]_step2.csv, [QA_TYPE]_step2_use.csv, [QA_TYPE]_step4.csv ... Intermediate processing files used in the notebook steps.
  • AQuA_case1

    • results of notebook/AQA_case1_gpt-4o-mini.ipynb
      • Case 1: remove the last line of rationale text and do not perform numerical conversion
  • AQuA_case2

    • results of notebook/AQA_case2_gpt-4o-mini.ipynb
      • Case 2: provide rationale and perform numerical conversion
  • AQuA_case3_gpt-4o-mini-2024-07-18

    • results of notebook/AQA_case3_gpt-4o-mini.ipynb
      • Case 3: without providing rationale and without numerical conversion
  • AQuA_case3_gpt-4o-2024-08-06

    • results of notebook/AQA_case3_gpt-4o.ipynb
      • Case 3: without providing rationale and without numerical conversion
      • Use gpt-4o-2024-08-06
      • Use files from AQuA_case3_gpt-4o-mini-2024-07-18 for step1 to 3
  • MskCal

    • results of notebook/MskCal.ipynb
    • The introduction part is also masked.
  • MSKCAL_gpt-4o-2024-08-06

  • MSKCAL_gpt-4o-mini-2024-07-18

    • results of notebook/MskCal.ipynb
      • results for MR: 0%
  • MskCal_2

    • results of notebook/MskCal_2.ipynb
    • The introduction part is not masked.
  • RQA

    • results of notebook/RQA.ipynb
  • RQA_llama3.2

    • results of notebook/RQA_llama3.2.ipynb
  • UQA

    • results of notebook/UQA.ipynb
  • UQA_llama3.2

    • results of notebook/UQA_llama3.2.ipynb

LLM

Masking Task:

Decoding Task:

notebook

Common Steps

  • Step 1

    • Create a word list by using Spacy to segment the question, context, and options into morphemes.
  • Step 2

    • Cenerate the meanings of the words in the word list using the masking task LLM.
  • Step 3

    • Convert numerical values.
  • Step 4

    • Select words to be masked based on the masking rate.
  • Step 5

    • Generate answers to the masked questions using the decoding task LLM.

Notebook Contents

  • AQA_case1_gpt-4o-mini.ipynb

    • Case 1: remove the last line of rationale text and do not perform numerical conversion
  • AQA_case2_gpt-4o-mini.ipynb

    • Case 2: provide rationale and perform numerical conversion
  • AQA_case3_gpt-4o-mini.ipynb

    • Case 3: without providing rationale and without numerical conversion
  • AQA_case3_gpt-4o.ipynb

    • Case 3: without providing rationale and without numerical conversion
    • Use gpt-4o-2024-08-06
    • Use files from AQuA_case3_gpt-4o-mini-2024-07-18 for step1 to 3
  • MskCal.ipynb

    • The introduction part is also masked.
    • MR: 0% (using gpt-4o-mini and gpt-4o), and MR: 20–80%.
  • MskCal_2.ipynb

    • The introduction part is not masked.
  • RQA.ipynb

  • RQA_llama3.2.ipynb

    • Use files from RQA.ipynb for step1 to 3
  • UQA.ipynb

  • UQA_llama3.2.ipynb

    • Use files from UQA.ipynb for step1 to 3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published