<a href="https://colab.research.google.com/github/rayaneghilene/OpenELM-tests/blob/main/OpenELM_1_1B_Instruct_4bit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apple OpenELM 1.1B text generation (4 bit quant)

## Install the latest version of transformers from Github

In [1]:
!pip -q install git+https://github.com/huggingface/transformers --progress-bar off
!pip install -q datasets loralib sentencepiece --progress-bar off
!pip -q install bitsandbytes accelerate xformers einops --progress-bar off

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.2.1+cu121 requires torch==2.2.1, but you have torch 2.2.2 which is incompatible.
torchtext 0.17.1 requires torch==2.2.1, but you have torch 2.2.2 which is incompatible.
torchvision 0.17.1+cu121 requires torch==2.2.1, but you have torch 2.2.2 which is incompatible.[0m[31m
[0m

In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Tokenizer
The OpenELM model family uses the Llama-2-7b Tokenizer, this means

In [42]:
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

In [43]:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf",
                                          use_auth_token=True,
                                          # padding_side='left'
)



## OpenELM-1_1B-Instruct Model

We use BitsAndBytes to get quantized Version of the model in 4 bit. This allows us to run the model on GPU poor machines / Colab notebooks

In [44]:
from transformers import BitsAndBytesConfig
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

compute_dtype = getattr(torch, "float16")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
)

model = AutoModelForCausalLM.from_pretrained("apple/OpenELM-1_1B-Instruct",
                                             device_map=device,
                                             torch_dtype=torch.float16,
                                             use_auth_token=True,
                                             trust_remote_code=True,
                                             quantization_config=bnb_config
                                             )

## Text Generation using the model.generate method

In [45]:
def prepare_prompt(prompt:str):
  tokens = tokenizer(prompt)
  tokenized_prompt = torch.tensor(
        tokens['input_ids'],
        device = device
    )
  return tokenized_prompt.unsqueeze(0)

def generate(prompt:str, model:AutoModelForCausalLM, max_length:int = 128):
  tokenized_prompt = prepare_prompt(prompt)
  output_ids = model.generate(
        tokenized_prompt,
        max_length=max_length,
        pad_token_id=0,
        assistant_model = model)
  output_text = tokenizer.decode(
        output_ids[0].tolist(),
        skip_special_tokens=True
    )
  return output_text

## Testing on custom prompts

In [14]:
%%time
prompt = "tell me a story \n"
print(generate(prompt, model, 300))

tell me a story 

once upon a time, there was a boy named john. john loved to tell stories. his stories were always full of adventure, danger, and heartwarming love.

one day, john met a girl named jane. jane loved to listen to john's stories. john's stories were full of magic, mystery, and magic beans.

once, john told a story about a magic beanstalk. jane listened intently, and when john finished, jane asked:

jane: "john, can you tell me about the magic beanstalk?"

john: "yes! the beanstalk is a legendary story about a magical place where a boy found a golden beanstalk and lived happily ever after with his magic beans.

john told all about the beanstalk, its dangers, and the boy's adventures. jane listened intently, and when john finished, jane asked:

jane: "john, can you tell me your favorite part about the beanstalk?"

john: "my favorite part about the beanstalk is when the boy found the golden beanstalk and climbed up it!"

john told jane about how he climbed the beanstalk, and

In [49]:
%%time
prompt = "generate the code to rename the class positive to negartive in a pandas dataframe \n"
generated_text = generate(prompt, model, 500)
print(generated_text)

generate the code to rename the class positive to negartive in a pandas dataframe 

```python
import pandas as pd
import numpy as np
import re
import os
import shutil
import glob
import subprocess

def rename_class(path, class_name, new_class_name, overwrite=True):
    file_list = glob.glob(path + '/*.' + class_name)
    if len(file_list) > 0:
        file_to_rename = random_file(file_list)
        subprocess.run(["mv", file_to_rename, path + new_class_name + "_" + class_name + "_" + file_to_rename])
        print("Renamed class: " + new_class_name)
        if overwrite:
            os.remove(file_to_rename)
    return file_to_rename

def random_file(file_list):
    return ".".join(random_string(len(file_list)), class_name=class_name="rand_file")

def class_exists(path):
    return os.path.exists(path) and os.path.isfile(path)

def rename_classes(path, classes):
    file_list = glob.glob(path + '/*.' + classes)
    if len(file_list) > 0:
        file_to_rename = random_file(file_list)


## Text Generation using the huggingface Pipeline

In [31]:
from transformers import pipeline

text_gen_pipe = pipeline("text-generation",
                model=model,
                tokenizer= tokenizer,
                torch_dtype=torch.float16,
                device_map=device,
                do_sample=True,
                pad_token_id=0,
                top_k=300,
                max_length=300,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id,
                trust_remote_code=True,
                )

In [32]:
%%time
text_gen_pipe("tell me a story")

CPU times: user 22.8 s, sys: 51.4 ms, total: 22.8 s
Wall time: 25.6 s


[{'generated_text': 'tell me a story growth DNA�� communication similar notifications through notificationsaked elections earlier� brokeniringgediring impressionings patientsainsNG sentences��N off ng no� previously� recent soon couwn from posts your Ng�� time collection. as recently compared recently similar premences proven ab�anceured� throughoutns� inches� shared thoseN online��N    understanding soon standard�NSNS kiles� communicationavingns�Nov non�ings��omcol� understandingriersns patients recentlyains�nanningsologiesainscN��� deathrelings�FFFFDIingsabs network timegoing�� n��izations abE� sains growthNgoingestions experiences recent no� NS administration ab network ra�ones notifications    electricings girlonesanks��R decision files ultimately reports� continues inches running      � notifications has�ainsN owner� dil�� sids� noNOns value NigerNOhnFF soonNNSays�Updateditting few� noistsح maintainays police electionsNAlo�� navigation��N levels non�� agreementSestions those cons 

**The model seem to perfom Better when using the .generate method compared to the of hugging face 'text-generation' pipeline**

## Conclusion

The performance of the model fell short of expectations. This underscores the ongoing need for enhancements and refinements to ensure optimal functionality and effectiveness in various tasks and contexts.

Contact me at rayane.ghilene@ensea.fr if you have any questions.