In [1]:
!pip install git+https://github.com/huggingface/transformers -q peft  accelerate bitsandbytes safetensors sentencepiece

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m136.0/136.0 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m82.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m83.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m104.2 MB/s[0m eta 

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = 'bn22/Mistral-7B-Instruct-v0.1-sharded'

def load_quantized_model(model_name: str):
    """
    :param model_name: Name or path of the model to be loaded.
    :return: Loaded quantized model.
    """
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        load_in_4bit=True,
        torch_dtype=torch.bfloat16,
        quantization_config=bnb_config
    )

    return model

def initialize_tokenizer(model_name: str):
    """
    Initialize the tokenizer with the specified model_name.

    :param model_name: Name or path of the model for tokenizer initialization.
    :return: Initialized tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.bos_token_id = 1  # Set beginning of sentence token id
    return tokenizer


model = load_quantized_model(model_name)

tokenizer = initialize_tokenizer(model_name)

# Define stop token ids
stop_token_ids = [0]

text = "[INST] How AI will replace Engineers [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Downloading (…)lve/main/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/11 [00:00<?, ?it/s]

Downloading (…)l_00001-of-00010.bin:   0%|          | 0.00/1.54G [00:00<?, ?B/s]

Downloading (…)l_00002-of-00010.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Downloading (…)l_00003-of-00010.bin:   0%|          | 0.00/1.31G [00:00<?, ?B/s]

Downloading (…)l_00004-of-00010.bin:   0%|          | 0.00/1.83G [00:00<?, ?B/s]

Downloading (…)l_00005-of-00010.bin:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

Downloading (…)l_00006-of-00010.bin:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

Downloading (…)l_00007-of-00010.bin:   0%|          | 0.00/1.54G [00:00<?, ?B/s]

Downloading (…)l_00008-of-00010.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Downloading (…)l_00009-of-00010.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Downloading (…)l_00010-of-00010.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading (…)l_00011-of-00010.bin:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/963 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] How AI will replace Engineers [/INST] AI is not going to the replace human Engineers. Instead, AI will likely compliment the work of Engineers and automate certain tasks to make the design process more efficient and effective. This will enable Engineers to take on more complex and challenging projects and focus on creative problem-solving and innovation.

One specific way that AI may complement Engineers is that it could help automate the design process by generating designs quickly and accurately, which can then be refined and optimized by Engineers. Additionally, AI can be used to simulate and analyze complex systems and processes, providing Engineers with valuable insights and predictions that can inform their decision-making.

Another example is that AI can be used to automate certain repetitive tasks such as data entry, analysis, and reporting, allowing Engineers to focus on more high-level and strategic work. This can result in more efficient and effective engineering proc