This file is meant to serve as an easy notebook that can be used to evaluate our models and create initial baselines in our model selection. It may be converted to a script once we get it down.

Attempting to follow the example here: https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/tree/master/sample-submissions/llama_recipes

## Set-Up

In [2]:
# Check root dir
!ls

Anaconda3-2020.07-Linux-x86_64.sh  docs				sandbox
CHANGELOG.md			   evaluation.ipynb		src
HELPFUL-COMMANDS.txt		   lost+found			test.py
LICENSE				   models			tests
README.md			   py-pkgs-cookiecutter.tar.gz
data				   pyproject.toml


In [2]:
# Activate virtual environment
!source .venv/bin/activate

In [4]:
# show installed packages
!pip list --local

Package                   Version
------------------------- --------------------------------
absl-py                   2.1.0
aiohttp                   3.9.1
aiosignal                 1.3.1
annotated-types           0.6.0
apex                      0.1
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
asttokens                 2.4.1
astunparse                1.6.3
async-timeout             4.0.3
attrs                     23.2.0
audioread                 3.0.1
beautifulsoup4            4.12.3
bleach                    6.1.0
blis                      0.7.11
cachetools                5.3.2
catalogue                 2.0.10
certifi                   2023.11.17
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpathlib              0.16.0
cloudpickle               3.0.0
cmake                     3.28.1
comm                      0.2.1
confection                0.1.4
contourpy                 1.2.0
cubinlinker               0.3.0

In [7]:
# Print the packages in the env
# !pip install llama-recipes
!pip show llama-recipes

Name: llama-recipes
Version: 0.0.1
Summary: Llama-recipes is a companion project to the Llama 2 model. It's goal is to provide examples to quickly get started with fine-tuning for domain adaptation and how to run inference for the fine-tuned models.
Home-page: 
Author: 
Author-email: Hamid Shojanazeri <hamidnazeri@meta.com>, Matthias Reso <mreso@meta.com>, Geeta Chauhan <gchauhan@meta.com>
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: accelerate, appdirs, bitsandbytes, black, black, datasets, fire, loralib, optimum, peft, py7zr, scipy, sentencepiece, torch, transformers
Required-by: 


`llama-recipes` successfully installed!

## Fine Tuning

From https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/tree/master/sample-submissions/llama_recipes

"
With llama-recipes its possible to fine-tune Llama on custom data with a single command. To fine-tune on a custom dataset we need to implement a function (get_custom_dataset) that provides the custom dataset following this example custom_dataset.py. We can then train on this dataset using this command line:

```{bash}
python3 -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization --model_name meta-llama/Llama-2-7b --dataset custom_dataset --custom_dataset.file /workspace/custom_dataset.py --output_dir /volume/output_dir
```

Note The custom dataset in this example is dialog based. This is only due to the nature of the example but not a necessity of the custom dataset functionality. To see other examples of get_custom_dataset functions (btw the name of the function get_custom_dataset can be changed in the command line by using this syntax: /workspace/custom_dataset.py:get_foo_dataset) have a look at the built-in dataset in llama-recipes.
"

## Matt: Testing Packages

In [1]:
import numpy as np
import pandas as pd

## Evaluation

### Install Helm

https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/blob/master/helm.md

In [3]:
!source ../sandbox/.venv/bin/activate

In [4]:
!which python

/usr/bin/python


In [5]:
!pip install git+https://github.com/stanford-crfm/helm.git

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/stanford-crfm/helm.git
  Cloning https://github.com/stanford-crfm/helm.git to /tmp/pip-req-build-evilxvny
  Running command git clone --filter=blob:none --quiet https://github.com/stanford-crfm/helm.git /tmp/pip-req-build-evilxvny
  Resolved https://github.com/stanford-crfm/helm.git to commit 79862464fb876408ad7e82fc8d7b45c5b2b76da0
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting cattrs~=22.2 (from crfm-helm==0.4.0)
  Downloading cattrs-22.2.0-py3-none-any.whl.metadata (9.0 kB)
Collecting dacite~=1.6 (from crfm-helm==0.4.0)
  Downloading dacite-1.8.1-py3-none-any.whl.metadata (15 kB)
Collecting importlib-resources~=5.10 (from crfm-helm==0.4.0)
  Downloading importlib_resources-5.13.0-py3-none-an

In [3]:
!cat ./evaluation/run_specs.conf

entries: [
    #bigbench

    #analytic_entailment: https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/analytic_entailment
    {description: "big_bench:model=neurips/local,max_train_instances=3,task=analytic_entailment,subtask=", priority: 1}

    #causal_judgment: https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/causal_judgment
    {description: "big_bench:model=neurips/local,max_train_instances=3,task=causal_judgment,subtask=", priority: 1}

    #emoji_movie: https://github.com/google/big-bench/tree/main/bigbench/benchmark_tasks/emoji_movie
    {description: "big_bench:model=neurips/local,max_train_instances=3,task=emoji_movie,subtask=", priority: 1}

    #empirical_judgments: https://github.com/google/big-bench/tree/main/bigbench/benchmark_tasks/empirical_judgments
    {description: "big_bench:model=neurips/local,max_train_instances=3,task=empirical_judgments,subtask=", priority: 1}

    #known_unknowns: https://github.com/google/big-bench/t

We need to change the model to be our model path, once we do that we can run and get the results!
https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/blob/master/helm.md

To make the model workable, we can follow the instruction here:
https://github.com/llm-efficiency-challenge/neurips_llm_efficiency_challenge/tree/master/sample-submissions/llama_recipes

In [8]:
!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentencepiece
  Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Downloading sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.2.0
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [None]:
helm-run --conf-paths run_specs_tiny.conf --local-path localhost:5812 --suite v1 --max-eval-instances 10 --enable-huggingface-models meta-llama/Llama-2-7b-chat-hf

In [None]:
helm-run --conf-paths run_specs_tiny.conf --suite v1 --local-path localhost:5812 --max-eval-instances 10 --enable-local-huggingface-models local-models/llama-2-7b-chat-hf

In [None]:
helm-run --conf-paths run_specs_tiny.conf --suite v1 --local-path localhost:5812 --max-eval-instances 10 --enable-local-huggingface-models ../models/llama-2-7b-chat-hf

In [1]:
!helm-run --help

/bin/bash: line 1: helm-run: command not found
