GitHub

Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Created: 2024-10-24

Data

input

AQuA.csv(AQuA-RAT)
- original: https://github.com/google-deepmind/AQuA
RQA.csv
- original: https://github.com/realtimeqa/realtimeqa_public
- Processing
  - Use data from 2023/11/1 onwards.
  - Removed the a tag from evidence and created a column evidence_wo_url.
  - Used the following prompt to have gpt-4o-mini determine whether the question could be answered based solely on the given context. Only questions deemed answerable were used:
```
Follow the instructions and output your response.

## Instructions
- Return True if you can answer the question based solely on the context; otherwise, return False.
- Respond in JSON format as {{'output': bool}}.

## Context
{context}

## Question
{question}

## Answer to the Question
{answers}
```
UQA.csv
- 100 questions generated by LLM from universal documents such as the Universal Declaration of Human Rights as data that potentially contains background knowledge.
MskCal.txt The calculation problems presented as descriptive questions.

output

Common File Contents for Each Folder
- [QA_TYPE]_gemma2_9b_*.csv ... Response results for each MR under regular masking.
- [QA_TYPE]_gemma2_9b_*_filtered.csv ... Response results for each MR under partial masking.
- [QA_TYPE]_gemma2_9b_*_no_meaning.csv ... Response results for each MR under strict masking.
- [QA_TYPE]_gemma2_9b_*_no_verb.csv ... Response results for each MR under lenient masking.
- [QA_TYPE]_gemma2_9b_tmp.csv, [QA_TYPE]_step1.csv, [QA_TYPE]_step2.csv, [QA_TYPE]_step2_use.csv, [QA_TYPE]_step4.csv ... Intermediate processing files used in the notebook steps.
AQuA_case1
- results of notebook/AQA_case1_gpt-4o-mini.ipynb
  - Case 1: remove the last line of rationale text and do not perform numerical conversion
AQuA_case2
- results of notebook/AQA_case2_gpt-4o-mini.ipynb
  - Case 2: provide rationale and perform numerical conversion
AQuA_case3_gpt-4o-mini-2024-07-18
- results of notebook/AQA_case3_gpt-4o-mini.ipynb
  - Case 3: without providing rationale and without numerical conversion
AQuA_case3_gpt-4o-2024-08-06
- results of notebook/AQA_case3_gpt-4o.ipynb
  - Case 3: without providing rationale and without numerical conversion
  - Use gpt-4o-2024-08-06
  - Use files from AQuA_case3_gpt-4o-mini-2024-07-18 for step1 to 3
MskCal
- results of notebook/MskCal.ipynb
- The introduction part is also masked.
MSKCAL_gpt-4o-2024-08-06
MSKCAL_gpt-4o-mini-2024-07-18
- results of notebook/MskCal.ipynb
  - results for MR: 0%
MskCal_2
- results of notebook/MskCal_2.ipynb
- The introduction part is not masked.
RQA
- results of notebook/RQA.ipynb
RQA_llama3.2
- results of notebook/RQA_llama3.2.ipynb
UQA
- results of notebook/UQA.ipynb
UQA_llama3.2
- results of notebook/UQA_llama3.2.ipynb

LLM

Masking Task:

Used Ollama
- Model: gemma2:9b-instruct-q4_0 https://ollama.com/library/gemma2:9b/blobs/ff1d1fc78170

Decoding Task:

Used OpenAI API https://platform.openai.com/docs/models/
- Models: gpt-4o-mini-2024-07-18, gpt-4o-2024-08-06
Used Ollama
- Model: llama3.2:3b-instruct-q4_K_M https://ollama.com/library/llama3.2:3b/blobs/dde5aa3fc5ff

notebook

Common Steps

Step 1
- Create a word list by using Spacy to segment the question, context, and options into morphemes.
Step 2
- Cenerate the meanings of the words in the word list using the masking task LLM.
Step 3
- Convert numerical values.
Step 4
- Select words to be masked based on the masking rate.
Step 5
- Generate answers to the masked questions using the decoding task LLM.

Notebook Contents

AQA_case1_gpt-4o-mini.ipynb
- Case 1: remove the last line of rationale text and do not perform numerical conversion
AQA_case2_gpt-4o-mini.ipynb
- Case 2: provide rationale and perform numerical conversion
AQA_case3_gpt-4o-mini.ipynb
- Case 3: without providing rationale and without numerical conversion
AQA_case3_gpt-4o.ipynb
- Case 3: without providing rationale and without numerical conversion
- Use gpt-4o-2024-08-06
- Use files from AQuA_case3_gpt-4o-mini-2024-07-18 for step1 to 3
MskCal.ipynb
- The introduction part is also masked.
- MR: 0% (using gpt-4o-mini and gpt-4o), and MR: 20–80%.
MskCal_2.ipynb
- The introduction part is not masked.
RQA.ipynb
RQA_llama3.2.ipynb
- Use files from RQA.ipynb for step1 to 3
UQA.ipynb
UQA_llama3.2.ipynb
- Use files from UQA.ipynb for step1 to 3

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebook		notebook
README.md		README.md
README_anonymous_review.md		README_anonymous_review.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Data

input

output

LLM

Masking Task:

Decoding Task:

notebook

Common Steps

Notebook Contents

About

Uh oh!

Releases

Packages

Languages

maskcode9004/maskcode

Folders and files

Latest commit

History

Repository files navigation

Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Data

input

output

LLM

Masking Task:

Decoding Task:

notebook

Common Steps

Notebook Contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages