Skip to content

This is a repository which is used for storing and remind me of the code and approach used for 2023.10 kaggle LLM competition. This will also contribute to my portfolio and application for 24Fall.

Notifications You must be signed in to change notification settings

kaamava/kaggle-LLM-Comp

Repository files navigation

kaggle-LLM-Comp

Goal of the Competition

Inspired by the OpenBookQA dataset, this competition challenges participants to answer difficult science-based questions written by a Large Language Model.

Your work will help researchers better understand the ability of LLMs to test themselves, and the potential of LLMs that can be run in resource-constrained environments.

Context

As the scope of large language model capabilities expands, a growing area of research is using LLMs to characterize themselves. Because many preexisting NLP benchmarks have been shown to be trivial for state-of-the-art models, there has also been interesting work showing that LLMs can be used to create more challenging tasks to test ever more powerful models.

At the same time methods like quantization and knowledge distillation are being used to effectively shrink language models and run them on more modest hardware. The Kaggle environment provides a unique lens to study this as submissions are subject to both GPU and time limits.

The dataset for this challenge was generated by giving gpt3.5 snippets of text on a range of scientific topics pulled from wikipedia, and asking it to write a multiple choice question (with a known answer), then filtering out easy questions.

Right now we estimate that the largest models run on Kaggle are around 10 billion parameters, whereas gpt3.5 clocks in at 175 billion parameters. If a question-answering model can ace a test written by a question-writing model more than 10 times its size, this would be a genuinely interesting result; on the other hand if a larger model can effectively stump a smaller one, this has compelling implications on the ability of LLMs to benchmark and test themselves.

Project Introduction

LLM_train.ipynb Model training code

LLM_infer.ipynb Model infering code

llama_finetune.ipynb llama model fine-tuning reference

Note: It is recommended to use GPU A100 to train the model. The model training time is approximately 6 hours.

Download open source data source(required): https://www.kaggle.com/datasets/cdeotte/60k-data-with-context-v2

Our Advantages

  1. This competition is based on Wikipedia reading comprehension data and aims to develop a model for automatically answering multiple-choice questions. It is a typical multi-class classification problem, and to evaluate the accuracy of prediction results, the competition uses MAP@3 as the evaluation metric.

  2. This competition provides correct answers for only 200 questions and does not provide reference articles. Therefore, participants need to search for additional training data to achieve better model performance. We generated 60,000 training samples by using the original Wikipedia texts and the GPT-3.5 model.

  3. Since the competition does not provide the original texts, which contain a significant amount of useful information, we adopted a sentence vector approach. We measured the similarity between the prompt and the original Wikipedia texts to extract the most similar original text, which serves as the primary reference for the questions.

  4. Based on the original text, prompt, options, and correct answers, we encoded the input using DeBERTa-v3-large and built a reading comprehension model using the AutoModelForMultipleChoice model from the Transformers library. When our model answers multiple-choice questions automatically, we first match the prompt and options with the original text and then use the model to predict the correct answer. Our model achieved a MAP@3 of 0.906 on the test set and ranked in the top 7% in the competition.

  5. During the competition, we made efficient use of the competition time, controlled the pace effectively, engaged in active discussions, and formulated reasonable tasks for model iteration. We also organized brainstorming sessions to drive the progress of the competition. Meanwhile, we worked on model development to ensure the highest possible prediction accuracy. image

image

About

This is a repository which is used for storing and remind me of the code and approach used for 2023.10 kaggle LLM competition. This will also contribute to my portfolio and application for 24Fall.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published