## MISTRAL NLP TESTING


Run the code below to install the Mistral API.

In [1]:
pip install mistralai

Collecting mistralai
  Downloading mistralai-1.7.0-py3-none-any.whl.metadata (30 kB)
Collecting eval-type-backport>=0.2.0 (from mistralai)
  Downloading eval_type_backport-0.2.2-py3-none-any.whl.metadata (2.2 kB)
Downloading mistralai-1.7.0-py3-none-any.whl (301 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.5/301.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)
Installing collected packages: eval-type-backport, mistralai
Successfully installed eval-type-backport-0.2.2 mistralai-1.7.0




```
# This is formatted as code
```


##  Mistral Key Entry

Here we enter the key for the Mistral LLM. A key can be generated at https://console.mistral.ai/home. Oftentimes the free plan times out so multiple keys from multiple user may have to be used.

In [2]:
from mistralai import Mistral

# Here is where you need to tell your code your mistral API key.

client = Mistral(api_key="MRdE92pAOtTm7V7UFTR6KrN0jo5wbqnU")

## Upload files

Here we upload the file we want to work with. Simply run the code and a "choose files" button will appear. To run as we did, upload the 1600_strat_samp_test.csv file.

In [4]:
from google.colab import files
# upload file to be analyzed by LLM.
uploaded = files.upload()

Saving 1600_strat_samp_test.csv to 1600_strat_samp_test.csv


This code divides the file into whatever size you need to analyze. Here we have it set to look at the first 400 entries in the 1600_strat_samp_test.csv file, but can be adjusted by changing the parameters in the last line.

In [8]:
import pandas as pd

# load csv into pandas dataframe
whole_file = pd.read_csv('1600_strat_samp_test.csv')

# Randomly select 100 rows from the DataFrame.
sampled_df = whole_file.iloc[:400]

The usual way to get an LLM to do a task is to give each input as its own request consisting of the instructions for the task and the input. That's what we'll do below.

The challenge with Mistral -- and most free or low-cost LLM APIs -- is that it limits how many requests you can make per time period. We will try to get around this by inserting a call to `sleep()` in between each request. We had difficulty getting it to work with over 100 items. Adjusting the sleep() to 10 or 20 did not fix the issue.



In [9]:
# We need the time library to sleep() between requests.
import time
MODEL = "mistral-large-latest"

# Where we will store results
predictions = []

# Loop through the items in the subset of 400 items we made above.
for index, row in sampled_df.iterrows():
  text = row['review/text']
  title = row['Title']

  # Create the prompt
  prompt = f"Which of the following genres is the book this review is talking about? Answer only with one of the following options. Please do not add any other text under any circumstances. (Fiction, Religion, History, Juvenile Fiction, Biography & Autobiography, Business & Economics, Computers, Social Science, Juvenile Nonfiction, Science, Education, Family & Relationships, Cooking, Sports & Recreation, Literary Criticism, Music): {text} {title}"

  # Put it in the MESSAGES variable that will get passed
  # to Mistral.
  MESSAGES = [{"role": "user", "content": prompt}]

  # This the call to Mistral with that prompt.
  completion = client.chat.complete(
      model= MODEL,
      messages = MESSAGES
  )

  # This prints the prompt:
  print(prompt)
  print(title)

  # This prints out the response
  print(completion.choices[0].message.content)

  # This saves out the response to our list of predictions so that
  # we can evaluate the predictions of the LLM in the next code block.
  predictions.append(completion.choices[0].message.content)

  # This will pause the execution for 5 seconds so that we don't
  # exceed our rate limit with Mistral
  time.sleep(5)


Which of the following genres is the book this review is talking about? Answer only with one of the following options. Please do not add any other text under any circumstances. (Fiction, Religion, History, Juvenile Fiction, Biography & Autobiography, Business & Economics, Computers, Social Science, Juvenile Nonfiction, Science, Education, Family & Relationships, Cooking, Sports & Recreation, Literary Criticism, Music): This book was hilarious. I read it yesterday, and I am still laughing my Asimov. Get it? Hahahaha. Huh. Okay. Nevermind. Foundation
Foundation
Fiction
Which of the following genres is the book this review is talking about? Answer only with one of the following options. Please do not add any other text under any circumstances. (Fiction, Religion, History, Juvenile Fiction, Biography & Autobiography, Business & Economics, Computers, Social Science, Juvenile Nonfiction, Science, Education, Family & Relationships, Cooking, Sports & Recreation, Literary Criticism, Music): Jan

SDKError: API error occurred: Status 429
{"object":"error","message":"Service tier capacity exceeded for this model.","type":"invalid_request_error","param":null,"code":null}

# Evaluate the LLM output
This code evaluates the precision, accuracy, recall, and F1 score for our results.

In [7]:
from sklearn.metrics import classification_report

# Create a list of the genres that should be possible
valid_genres = {
    "Fiction", "Religion", "History", "Juvenile Fiction", "Biography & Autobiography",
    "Business & Economics", "Computers", "Social Science", "Juvenile Nonfiction",
    "Science", "Education", "Family & Relationships", "Cooking", "Sports & Recreation", "Literary Criticism", "Music"
}

# Get rid of genres that were halucinated and put them in the unknown category
filtered_predictions = [p.strip() if p.strip() in valid_genres else "Unknown" for p in predictions]


# print a classification report
print(classification_report(filtered_predictions, sampled_df["categories"]))

                           precision    recall  f1-score   support

Biography & Autobiography       0.67      1.00      0.80         4
     Business & Economics       1.00      0.86      0.92         7
                Computers       0.83      1.00      0.91         5
                  Cooking       1.00      0.83      0.91         6
                Education       0.00      0.00      0.00         0
   Family & Relationships       1.00      0.75      0.86         8
                  Fiction       1.00      0.20      0.33        25
                  History       1.00      0.88      0.93         8
         Juvenile Fiction       0.50      0.50      0.50         4
      Juvenile Nonfiction       0.83      1.00      0.91         5
       Literary Criticism       0.00      0.00      0.00         0
                    Music       1.00      1.00      1.00         3
                 Religion       0.75      1.00      0.86         6
                  Science       0.88      0.88      0.88     

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
