## Together api with Mixtral

### Checkout my [Twitter(@rohanpaul_ai)](https://twitter.com/rohanpaul_ai) for daily LLM bits

In [None]:
# !pip install together python-dotenv

In [None]:
import together
import dotenv
import os

dotenv.load_dotenv()
together.api_key = os.getenv("together_key")

In [None]:
model_list = together.Models.list()
print(f"{len(model_list)} models available")

https://pypi.org/project/together/

Will print something like below

```
120 models available

['EleutherAI/gpt-j-6b',
 'EleutherAI/gpt-neox-20b',
 'EleutherAI/pythia-12b-v0',
 'EleutherAI/pythia-1b-v0',
 'EleutherAI/pythia-2.8b-v0',
 'EleutherAI/pythia-6.9b',
 'HuggingFaceH4/starchat-alpha',
 'NousResearch/Nous-Hermes-13b',
 'NousResearch/Nous-Hermes-Llama2-13b',
 'NumbersStation/nsql-6B']
```

The `Complete` class of the Together Python Library allows you to easily integrate the Together API's completion functionality into your applications, allowing you to generate text with a single line of code.

https://docs.together.ai/docs/python-complete


In [None]:
model = "mistralai/Mixtral-8x7B-v0.1"

prompt = """To install PSU in your desktop machine first you will"""

output = together.Complete.create(
  prompt = prompt,
  model = model,
  max_tokens = 64,
  temperature = 0.7,
  top_k = 50,
  top_p = 0.7,
  repetition_penalty = 1,
  #stop = [] # add any sequence you want to stop generating at.
)

# print generated text
print(output['output']['choices'][0]['text'])

`max_tokens (integer, optional)` -- Maximum number of tokens the model should generate. Default: 128

`stop (List[str], optional)` -- List of stop words the model should stop generation at. Default: ["<human>"]


`temperature(float, optional)` -- A decimal number that determines the degree of randomness in the response. Default: 0.7

`repetition_penalty (float, optional)` -- A number that controls the diversity of generated text by reducing the likelihood of repeated sequences. Higher values decrease repetition. Default: 1

-----------------

## Run Mixtral-8x7B - with @togethercompute API 🚀

Streaming tokens instead of waiting for the entire response


Use the `stream_tokens` parameter to enable streaming responses.

When `stream_tokens` is true, in the request payload, the API returns events as it generates the response instead of waiting for the entire response first.

In [None]:
import json
import requests
import sseclient

model_name = "mistralai/Mixtral-8x7B-v0.1"

def stream_tokens_from_api(prompt, api_key, model=model_name, max_tokens=512):
    url = "https://api.together.xyz/inference"
    headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "Authorization": f"Bearer {api_key}",
    }
    payload = {
        "model": model,
        "prompt": prompt,
        "max_tokens": max_tokens,
        "temperature": 0.7,
        "top_k": 50,
        "top_p": 0.7,
        "repetition_penalty": 2,
        "stream_tokens": True,
    }

    try:
        response = requests.post(url, json=payload, headers=headers, stream=True)
        response.raise_for_status()
    except requests.RequestException as e:
        raise RuntimeError(f"Request to API failed: {e}")

    try:
        client = sseclient.SSEClient(response)
        for event in client.events():
            if event.data == "[DONE]":
                break
            yield json.loads(event.data)["choices"][0]["text"]
    except Exception as e:
        raise RuntimeError(f"Error while streaming tokens: {e}")

# Usage Example
api_key = "YOUR_API_KEY"  # Replace with your API key
prompt = "To install PSU in your desktop machine first you will"
for token in stream_tokens_from_api(prompt, api_key):
    print(token, end="", flush=True)

📌 Usage Example:

In [None]:
api_key = "YOUR_API_KEY"
prompt = "To install PSU in your desktop machine first you will"

for token in stream_tokens_from_api(prompt, api_key):
    print(token, end="", flush=True)