# Amharic LLM Evals
**Objective**: Evaluate large language models on a set of Amharic translated common LLM benchmarks using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)

Current Benchmarks:
- [Amharic TruthfulQA](https://huggingface.co/datasets/simonbutt/amharic_truthful_qa)
- [Amharic GSM8K](https://huggingface.co/datasets/simonbutt/amharic_gsm8k)



In [1]:
model_to_evaluate = "simonbutt/am_llama3_alpaca"
run_in_4bit = True
limit = 2

## Install LM-Eval

In [2]:
# Install LM-Eval
!pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git
!pip install bitsandbytes

Collecting git+https://github.com/EleutherAI/lm-evaluation-harness.git
  Cloning https://github.com/EleutherAI/lm-evaluation-harness.git to /tmp/pip-req-build-uke2wukz
  Running command git clone --filter=blob:none --quiet https://github.com/EleutherAI/lm-evaluation-harness.git /tmp/pip-req-build-uke2wukz
  Resolved https://github.com/EleutherAI/lm-evaluation-harness.git to commit f64e72f565b9a8fe09690b0a3216dc67a6ca91cb
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting word2number (from lm_eval==0.4.2)
  Downloading word2number-1.1.zip (9.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: lm_eval, word2number
  Building wheel for lm_eval (pyproject.toml) ... [?25l[?25hdone
  Created wheel for lm_eval: filename=lm_eval-0.4.2-py3-none-any.whl size=1524927 sha256=5391c8dc80b8edacffa8822107982fd6f5f6c2eacabdfb

In [3]:
from lm_eval import api
import yaml

## Amharic TruthfulQA


In [6]:
!wget https://raw.githubusercontent.com/simonbutt/amharic_llama/main/eval/tasks/amharic_truthfulqa.yaml

--2024-04-29 00:24:50--  https://raw.githubusercontent.com/simonbutt/amharic_llama/main/eval/tasks/amharic_truthfulqa.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1623 (1.6K) [text/plain]
Saving to: ‘amharic_truthfulqa.yaml’


2024-04-29 00:24:50 (45.3 MB/s) - ‘amharic_truthfulqa.yaml’ saved [1623/1623]



And we can now run evaluation on this task, by pointing to the config file we've just created:

In [7]:
!lm_eval \
    --model hf \
    --model_args parallelize=True,load_in_4bit={run_in_4bit},pretrained={model_to_evaluate} \
    --include_path ./ \
    --tasks amharic_truthfulqa_mc1 \
    --batch_size auto:4 \
    --limit {limit}


2024-04-29 00:24:53.456414: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-29 00:24:53.456466: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-29 00:24:53.457596: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-29:00:24:57,563 INFO     [__main__.py:251] Verbosity set to INFO
2024-04-29:00:24:57,563 INFO     [__main__.py:262] Including path: ./
2024-04-29:00:25:03,358 INFO     [__main__.py:335] Selected Tasks: ['amharic_truthfulqa_mc1']
2024-04-29:00:25:03,359 INFO     [evaluator.py:131] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual

## Amharic GSM8K

In [4]:
!wget https://raw.githubusercontent.com/simonbutt/amharic_llama/main/eval/tasks/amharic_gsm8k.yaml

--2024-04-29 00:23:35--  https://raw.githubusercontent.com/simonbutt/amharic_llama/main/eval/tasks/amharic_gsm8k.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1029 (1.0K) [text/plain]
Saving to: ‘amharic_gsm8k.yaml’


2024-04-29 00:23:35 (19.5 MB/s) - ‘amharic_gsm8k.yaml’ saved [1029/1029]



In [5]:
!lm_eval \
    --model hf \
    --model_args parallelize=True,load_in_4bit={run_in_4bit},pretrained={model_to_evaluate} \
    --include_path ./ \
    --tasks amharic_gsm8k \
    --batch_size auto:4 \
    --limit {limit}


2024-04-29 00:23:38.210404: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-29 00:23:38.210453: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-29 00:23:38.211660: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-29:00:23:42,276 INFO     [__main__.py:251] Verbosity set to INFO
2024-04-29:00:23:42,276 INFO     [__main__.py:262] Including path: ./
2024-04-29:00:23:48,238 INFO     [__main__.py:335] Selected Tasks: ['amharic_gsm8k']
2024-04-29:00:23:48,238 INFO     [evaluator.py:131] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 

---