In [1]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "True"

## **DeepSparse MPT-Instruct Example**

Install via the nightly build.

In [None]:
!pip install deepsparse-nightly[llm]

### **Download and Compile Model**

The following downloads a pre-sparsified MPT-Instruct model from our SparseZoo and compiles the model. 

> Note: It will take a minute or two to compile even if the model is already downloaded, so try to reuse a pipeline as much as possible once its been set up.

In [2]:
from deepsparse import TextGeneration

pipeline = TextGeneration(model="zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized")

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2023-11-09 17:06:31 deepsparse.transformers.pipelines.text_generation INFO     Compiling an auxiliary engine to process a prompt with a larger processing length. This improves performance, but may result in additional memory consumption.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20231031 COMMUNITY | (74098695) (release) (optimized) (system=avx512_vnni, binary=avx512)


### **Generate Text**

We can now call the pipeline to generate a response to a prompt.

In [3]:
prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: How best should I travel from London to Edinburgh, UK ? ### Response:"

output = pipeline(prompt, max_new_tokens=100)
print(output.generations[0].text)

 There are a number of ways to travel from London to Edinburgh. You can take a direct flight from London to Edinburgh, which will take around 1 hour and 20 minutes. Alternatively, you can take the Eurostar train from London to Edinburgh, which will take around 2 hours and 30 minutes. You can also drive from London to Edinburgh, which will take around 8 hours.


Stream responses with the following.

In [4]:
output_iterator = pipeline(prompt=prompt, streaming=True, max_new_tokens=100)

print(prompt, end="\n\n")
for it in output_iterator:
    print(it.generations[0].text, end="")

Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: How best should I travel from London to Edinburgh, UK ? ### Response:

 There are a number of ways to travel from London to Edinburgh. You can take a direct flight from London to Edinburgh, which will take around 1 hour and 20 minutes. Alternatively, you can take the Eurostar train from London to Edinburgh, which will take around 2 hours and 30 minutes. You can also drive from London to Edinburgh, which will take around 8 hours.<|endoftext|>

See [our documentation](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md) for details on how to customize generation strategies

### **Example**

In [42]:
def format_instruction(examples):
    instruction_str = "What are three additional example questions that have the same semantic meaning as the questions listed below?\n\n"
    for idx, example in enumerate(examples):
        instruction_str += (f"EXAMPLE {(idx+1)}: {example}\n")

    return instruction_str


INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
)

def format_prompt(examples):
    return PROMPT_FOR_GENERATION_FORMAT.format(instruction=format_instruction(examples))


In [47]:
examples = [
    "How much does a new business permit cost?",
    "What is the price of a business permit?",
    "What are the fees for obtaining a new business license?",
]

prompt = format_prompt(examples)
print(prompt)

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What are three additional example questions that have the same semantic meaning as the questions listed below?

EXAMPLE 1: How much does a new business permit cost?
EXAMPLE 2: What is the price of a business permit?
EXAMPLE 3: What are the fees for obtaining a new business license?

### Response:



In [48]:
output_iterator = pipeline(prompt=prompt, streaming=True, max_new_tokens=200, do_sample=False)

print(prompt, end="\n\n")
for it in output_iterator:
    print(it.generations[0].text, end="")

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What are three additional example questions that have the same semantic meaning as the questions listed below?

EXAMPLE 1: How much does a new business permit cost?
EXAMPLE 2: What is the price of a business permit?
EXAMPLE 3: What are the fees for obtaining a new business license?

### Response:


Three additional example questions that have the same semantic meaning as the questions listed below are:

EXAMPLE 1: How much does a business license cost?
EXAMPLE 2: What is the price of a business license?
EXAMPLE 3: What are the fees for renewing a business license?<|endoftext|>