# Running Inference with the Llama 13B Model

### Setup Runtime
For fine-tuning Llama, a GPU instance is essential. Follow the directions below:

1. Go to `Runtime` (located in the top menu bar).
2. Select `Change Runtime Type`.
3. Choose `T4 GPU` (or a comparable option).


### Install Transformers Library from GitHub

The code below installs the `transformers` library directly from the HuggingFace GitHub repository. 



In [None]:
!pip install git+https://github.com/huggingface/transformers

### Installing Additional Libraries



In [None]:
!pip install -q peft  accelerate bitsandbytes safetensors

In [None]:
!pip install sentencepiece


### Model Initialization and Setup



In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers

model_name = "abhishek/llama-2-7b-hf-small-shards"


### Setting up the BitsAndBytes Configuration



In [None]:
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

### Loading the Pretrained Model with Quantization


In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config
)


### Tokenizer Initialization and Configuration



In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1
stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")


### Generating Text with the Model ðŸš€




In [None]:

text = "[INST] ~Add your instrunctions here~ [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

### Get data from the web




In [None]:
!pip install pandas

In [None]:
!pip install autotrain-advanced

In [None]:
!autotrain setup --update-torch

In [None]:
!autotrain llm --train --project_name self-trainer --model abhishek/llama-2-7b-hf-small-shards --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 12 --num_train_epochs 3 --trainer sft 