# Mistral AI Base Model 7B

See model card here: https://huggingface.co/mistralai/Mistral-7B-v0.1

In [None]:
MODEL_NAME = "mistralai/Mistral-7B-v0.1"

Load the model and tokenizer. 

The `device_map="auto"` option directs it to load the model on the best devices (*e.g.* GPU if available), possibly splitting layers across GPU and CPU if not enough VRAM is available on the GPU.

Here, we create a `device_map` with using the underlying primitives to show how it works.
See [How 🤗 Accelerate runs very large models thanks to PyTorch](https://huggingface.co/blog/accelerate-large-models) for more details. 

In [None]:
from accelerate import infer_auto_device_map, init_empty_weights
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

config = AutoConfig.from_pretrained(MODEL_NAME)
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)

device_map = infer_auto_device_map(model,
                                   no_split_module_classes=["OPTDecoderLayer"], 
                                   dtype="float16")

In [None]:
device_map

Now use the `device_map` to place the model on the right devices:

In [None]:
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

In [None]:
prompt = "Shall I compare thee to a"

model_inputs = tokenizer([prompt], return_tensors="pt")

In [None]:
model_inputs.to('cuda')
generated_ids = model.generate(**model_inputs, max_new_tokens=20, do_sample=True)
tokenizer.batch_decode(generated_ids)[0]