# Model: Mistral 7B Instruct v0.1

### Installs

In [None]:
!pip install optimum
!pip install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/

Collecting optimum
  Downloading optimum-1.16.1-py3-none-any.whl (403 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m403.3/403.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting coloredlogs (from optimum)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m974.5 kB/s[0m eta [36m0:00:00[0m
Collecting datasets (from optimum)
  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece!=0.1.92,>=0.1.91 (from transformers[sentencepiece]>=4.26.0->optimum)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
Collecting humanfriendly>=9.1 (from coloredlogs

### Imports

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

### Configurations

In [None]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"

# prompt related
prompt = "Tell me about AI"
prompt_template=f'''<s>[INST] {prompt} [/INST]
'''


### Utility Methods

In [None]:
# return model and tokenizer
def get_model():
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
  return model, tokenizer

# get prediction
def get_prediction(model, tokenizer):
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
  output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
  return tokenizer.decode(output[0])

# get prediction #2 (second way)
def get_prediction2(model, tokenizer):
  pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1)
  return pipe(prompt_template)[0]['generated_text']


### Download Model, tokenizer

In [None]:
model, tokenizer = get_model()

Downloading config.json:   0%|          | 0.00/963 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Execution (Method1)

In [None]:
import time

start = time.time()
predicted_text = get_prediction(model, tokenizer)
end = time.time()

# show time of execution per iteration
print(f"Predicted Text: {predicted_text}")
print(f"Time taken: {(end-start)*10**3:.03f}ms")


Predicted Text: <s><s>[INST] Tell me about AI [/INST]
AI stands for Artificial Intelligence. It is a branch of computer science that focuses on creating intelligent machines that can perform tasks that would normally require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI systems use algorithms, machine learning, and other techniques to analyze data and make predictions or decisions based on that data. There are many different types of AI, including narrow or weak AI, which is designed to perform a specific task, and general or strong AI, which is designed to be capable of any intellectual task that a human can perform. AI has many applications, including in healthcare, finance, transportation, and entertainment.</s>
Time taken: 5857.020ms


### Execution (Method2)

In [None]:
import time

start = time.time()
predicted_text = get_prediction2(model, tokenizer)
end = time.time()

# show time of execution per iteration
print(f"Predicted Text: {predicted_text}")
print(f"Time taken: {(end-start)*10**3:.03f}ms")

Predicted Text: <s>[INST] Tell me about AI [/INST]
AI, or artificial intelligence, is a branch of computer science that focuses on the development of intelligent machines that can perform tasks that typically require human intelligence, such as understanding natural language, recognizing images, and making decisions. AI systems use algorithms, machine learning techniques, and neural networks to process information and learn from their experiences, enabling them to improve their performance over time. There are many different types of AI, including narrow or weak AI, which is designed to perform a specific task, and general or strong AI, which has the ability to reason, think abstractly, and understand the world in the same way as humans do. Some common applications of AI include virtual assistants, self-driving cars, speech recognition, and recommendation engines.
Time taken: 7080.424ms
