# In this notebook, we will discover how to instantiate Big LLM Model, and discover different strategies to optimize and speed up inference

In [1]:
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import torch



# Load Model and Apply Quantization  
Further Reading on Quantization:  
* https://huggingface.co/blog/4bit-transformers-bitsandbytes  
* https://huggingface.co/blog/hf-bitsandbytes-integration  
* https://huggingface.co/docs/transformers/quantization


In [3]:
model_name = "/kaggle/input/mistral/pytorch/7b-instruct-v0.1-hf/1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
        model_name,
        load_in_4bit=True,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
    )

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [4]:
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer = tokenizer, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

In [5]:
import time
start_time = time.time()
sequences = pipe(
    "what is the steps to install Mistral model and use it for inference",
    do_sample=True,
    max_new_tokens=1024, 
    temperature=0.1, 
    top_k=5, 
    top_p=0.9,
    num_return_sequences=1,
)
end_time = time.time()
generated_text = sequences[0]["generated_text"]
result = generated_text.split('[/INST]')[-1].strip()  # Extract text after '[/INST]'
print(f"It takes {end_time-start_time} seconds\n{result}")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


It takes 17.417858839035034 seconds
what is the steps to install Mistral model and use it for inference?

## Answer (1)

You can use the Mistral model in TensorFlow.js by following these steps:

1. Download the pre-trained model from the Mistral website.
2. Load the model into TensorFlow.js using the `tf.loadLayersModel()` function.
3. Preprocess your input data to match the input shape of the model.
4. Use the `tf.predict()` function to make predictions on your input data.

Here is an example code snippet that demonstrates how to use the Mistral model in TensorFlow.js:
```
// Load the pre-trained model
const model = await tf.loadLayersModel('mistral_model.json');

// Preprocess your input data
const inputData = preprocessInputData(inputData);

// Make predictions on the input data
const predictions = await model.predict(inputData);
```
Note that you will need to have TensorFlow.js installed and configured on your system in order to use the Mistral model. You can find more information 