<a href="https://colab.research.google.com/github/marco-siino/ThingSpeak_ParsersGenerator/blob/main/Mistral_7B_Instruct_v0_3_ThingSpeak_Parsers_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting Started with `mistral-inference`

This notebook will guide you through the process of running Mistral models locally. We will cover the following:
- How to chat with Mistral 7B Instruct
- How to run Mistral 7B Instruct with function calling capabilities

We recommend using a GPU such as the A100 to run this notebook.

In [9]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install mistral-inference

Collecting mistral-inference
  Downloading mistral_inference-1.3.1-py3-none-any.whl.metadata (14 kB)
Collecting fire>=0.6.0 (from mistral-inference)
  Downloading fire-0.6.0.tar.gz (88 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/88.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.4/88.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting mistral_common<2.0.0,>=1.3.0 (from mistral-inference)
  Downloading mistral_common-1.3.3-py3-none-any.whl.metadata (4.1 kB)
Collecting xformers>=0.0.24 (from mistral-inference)
  Downloading xformers-0.0.27.post2-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting jsonschema==4.21.1 (from mistral_common<2.0.0,>=1.3.0->mistral-inference)
  Downloading jsonschema-4.21.1-py3-none-any.whl.metadata (7.8 kB)
Collecting pydantic==2.6.1 (from mistral_common<2.0.0,>=1.3.0->mistral-inference)
  Dow

## Download Mistral 7B Instruct

In [3]:
!wget https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar

--2024-08-01 09:59:44--  https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar
Resolving models.mistralcdn.com (models.mistralcdn.com)... 104.26.6.117, 104.26.7.117, 172.67.70.68, ...
Connecting to models.mistralcdn.com (models.mistralcdn.com)|104.26.6.117|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14496675840 (14G) [application/x-tar]
Saving to: ‘mistral-7B-Instruct-v0.3.tar’


2024-08-01 10:06:44 (33.0 MB/s) - ‘mistral-7B-Instruct-v0.3.tar’ saved [14496675840/14496675840]



In [4]:
!DIR=mistral_7b_instruct_v3 && mkdir -p $DIR && tar -xf mistral-7B-Instruct-v0.3.tar -C $DIR

In [5]:
!ls mistral_7b_instruct_v3

consolidated.safetensors  params.json  tokenizer.model.v3


# Import libraries and load the model.

In [6]:
import os
import random
import torch
import shutil

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

# I can decide which GPU to use on this node on Leonardo.
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"
torch.cuda.set_device(0)

# load tokenizer
mistral_tokenizer = MistralTokenizer.from_file("mistral_7b_instruct_v3/tokenizer.model.v3")

# load model
model = Transformer.from_folder("mistral_7b_instruct_v3")

  @torch.library.impl_abstract("xformers_flash::flash_fwd")
  @torch.library.impl_abstract("xformers_flash::flash_bwd")


In [7]:
!nvidia-smi

Thu Aug  1 10:07:35 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   38C    P0              28W /  72W |  14145MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
import torch, gc
gc.collect()
torch.cuda.empty_cache()

# Generate the Python code to perform the task.

In [None]:
# Iterate through all the json files in the thingspeak/channels folder.

for filename in os.listdir("drive/MyDrive/thingspeak/channels"):
    if filename.endswith(".json"):
      print(filename)
      # if filename is already in output_code skip this iteration of the loop.
      if os.path.exists("drive/MyDrive/thingspeak/output_code/"+filename.split(".json")[0]+".py"):
        print("already exists")
        continue
      # read the json file.
      with open("drive/MyDrive/thingspeak/channels/"+filename) as f:
        data = f.read()

      prompt ="""
        I have a JSON text and a JSON schema. I need a Python script that reads the JSON text feeds and convert it into a JSON output file with different keys and structure. The output file has the structure of the schema (but do not validate schema or text). If the values are other than 0, "latitude" and "longitude" need to be merged into a string "latitude,longitude" used to fill the "location" field. Fill the other fields in the output json according to the semantic of the JSON text.


        JSON text:

        """+data+ """

        JSON schema:

        {
            "type": "object",
            "properties": {
              "id": {
                "type": "string"
              },
              "temperature": {
                "type": "string"
              },
              "humidity": {
                "type": "string"
              },
              "pressure": {
                "type": "string"
              },
              "light": {
                "type": "string"
              },
              air_quality: {
                "type": "string"
              },
              location: {
                "type": "string"
              },
              soil_moisture: {
                "type": "string"
              },
              hardware: {
                "type": "string"
              },
              distance: {
                "type": "string"
              },
              ph: {
                "type": "string"
              }
            }
        }

        Write the JSON output into a file thingspeak/output_json/mistral/"""+filename+""" . In the output file leave empty the fields with "null" values.

        """
      print(prompt)
      # chat completion request
      completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
      # encode message
      tokens = mistral_tokenizer.encode_chat_completion(completion_request).tokens
      # generate results
      out_tokens, _ = generate([tokens], model, max_tokens=3000, temperature=0.0, eos_id=mistral_tokenizer.instruct_tokenizer.tokenizer.eos_id)
      # decode generated tokens
      result = mistral_tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

      # print only the content of results between ```python and ```.
      result = result.split("```python")[1].split("```")[0]
      print(result+"\n\n")

      filename=filename.split(".json")[0]

      print("drive/MyDrive/thingspeak/output_code/"+filename+".py")

      with open("drive/MyDrive/thingspeak/output_code/"+filename+".py", 'w', encoding='utf-8') as file:
            file.write(result)
      with open("drive/MyDrive/Academic/2024/Conferences/IEEE WF-IoT 2024/experiments/output_code/"+filename+".py", 'w', encoding='utf-8') as file:
            file.write(result)
      print(result+"\n\n")



[1;30;43mStreaming output truncated to the last 5000 lines.[0m
}
"""

data = json.loads(json_text)
schema = json.loads(json_schema)
create_output_json(data, schema)



671128.json

        I have a JSON text and a JSON schema. I need a Python script that reads the JSON text feeds and convert it into a JSON output file with different keys and structure. The output file has the structure of the schema (but do not validate schema or text). If the values are other than 0, "latitude" and "longitude" need to be merged into a string "latitude,longitude" used to fill the "location" field. Fill the other fields in the output json according to the semantic of the JSON text.


        JSON text:

        {
    "channel": {
        "id": 671128,
        "name": "LORA",
        "description": "To find a temperature through IoT",
        "latitude": "0.0",
        "longitude": "0.0",
        "field1": "TEMPERATURE",
        "created_at": "2019-01-08T05:08:51Z",
        "updated_at": "2019-01-08T05

In [None]:
!zip -r output_code.zip thingspeak/output_code
from google.colab import files
files.download("output_code.zip")

updating: thingspeak/output_code/ (stored 0%)
updating: thingspeak/output_code/438486.py (deflated 67%)
updating: thingspeak/output_code/1648862.py (deflated 67%)
updating: thingspeak/output_code/294544.py (deflated 67%)
updating: thingspeak/output_code/1617310.py (deflated 66%)
updating: thingspeak/output_code/689569.py (deflated 65%)
updating: thingspeak/output_code/786211.py (deflated 64%)
updating: thingspeak/output_code/308085.py (deflated 65%)
updating: thingspeak/output_code/11197.py (deflated 69%)
updating: thingspeak/output_code/1574555.py (deflated 65%)
updating: thingspeak/output_code/50280.py (deflated 69%)
updating: thingspeak/output_code/1325454.py (deflated 69%)
updating: thingspeak/output_code/2032032.py (deflated 65%)
updating: thingspeak/output_code/389592.py (deflated 69%)
updating: thingspeak/output_code/1759450.py (deflated 67%)
updating: thingspeak/output_code/747249.py (deflated 67%)
updating: thingspeak/output_code/24463.py (deflated 66%)
updating: thingspeak/ou

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>