
### Using [Ministral-8B](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410) - [Mistral AI](https://mistral.ai/)  

##### Running on g5.2xlarge

_launched on 16.10.24_

two models available: Ministral 3B and Ministral 8B, both of them support up to 128k context length and Ministral 8B has a special interleaved sliding-window attention pattern for faster and memory-efficient inference.

Ministral 8B is available for research purposes.

##### Setup

In [None]:
pip install --upgrade vllm

In [None]:
%pip install --upgrade mistral_common
dbutils.library.restartPython() 

In [None]:
from huggingface_hub import login

huggingface_token = "<your token>"

login(token=huggingface_token)

In [None]:
import torch
torch.cuda.empty_cache()

In [None]:
from vllm import LLM
from vllm.sampling_params import SamplingParams

model_name = "mistralai/Ministral-8B-Instruct-2410"

sampling_params = SamplingParams(max_tokens=400)

llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral",     max_model_len=8192)

In [None]:
import time

prompt_generation = "Write me all about Arthur Schopenhauer."

messages = [
    {
        "role": "user",
        "content": prompt_generation
    },
]

#Tracking the time
start_time = time.time()
response = llm.chat(messages, sampling_params=sampling_params)
print(response[0].outputs[0].text)
end_time = time.time()
latency = end_time - start_time

In [None]:
print('Latency of the model is', latency)

#### Text Summarization

In [None]:
template = """Write a short summary of this article for a business expert:

{article}
"""

cameroon = """The economic fallout from the COVID-19 pandemic and the subsequent global shocks provoked by the war in Ukraine have hit African countries hard, denting economic growth and aggravating their sovereign debt positions. The International Monetary Fund (IMF) forecasts that Cameroon, a Central African oil producer, will record 4.3% economic growth this year after it slumped to 0.5% in 2020. The Fund has classified Cameroon as being at high risk of debt distress, though in its most recent review of the country's loan programme it stated that, with active fiscal reforms and management, the debt could be sustainable. "Our debt service coverage from exports needs to be improved. That's the reason why we are ranked in a high risk debt distress position," said Alamine Ousmane Mey, Cameroon's minister of economy, planning and regional development. He was speaking at an event organised by the Atlantic Council think tank on the sidelines of the IMF and World Bank's Spring Meetings in Washington. "We're working to be able to improve our exports through import substitution policies to reduce imports, produce more and export more. This will give us better room for debt service coverage," he said. Cameroon has also relaunched talks with the U.S. to end its suspension from the Africa Growth and Opportunities Act (AGOA) initiative, which grants qualifying African countries tariff-free access to the U.S. market. Former President Donald Trump suspended Cameroon from the programme in late 2019 over "persistent gross violations of internationally recognised human rights" by Cameroonian security forces. Since 2017, factions of secessionist militias have been battling government troops in the majority Francophone country's two English-speaking regions. The conflict has killed thousands and displaced nearly 800,000 people. "All the issues that have been raised, we're working on in a very transparent open manner to be able to iron them out and solve the problems," Mey said, referring to the talks with U.S. officials to rejoin AGOA. Our Standards: The Thomson Reuters Trust Principles.
"""

messages = [
    {
        "role": "user",
        "content": template.format(article=cameroon)
    },
]

response = llm.chat(messages, sampling_params=sampling_params)
print(response[0].outputs[0].text)

In [None]:
torch.cuda.empty_cache()

####Text Translation

In [None]:
translation_template = """Translate into Russian:
{text}
"""

translate_to_rus = """Mr. Bingley was good-looking and gentlemanlike: he had a pleasant countenance, and easy, unaffected manners. His sisters were fine women, with an air of decided fashion. His brother-in-law, Mr. Hurst, merely looked the gentleman; but his friend Mr. Darcy soon drew the attention of the room by his fine, tall person, handsome features, noble mien, and the report, which was in general circulation within five minutes after his entrance, of his having ten thousand a year. The gentlemen pronounced him to be a fine figure of a man, the ladies declared he was much handsomer than Mr. Bingley, and he was looked at with great admiration for about half the evening, till his manners gave a disgust which turned the tide of his popularity; for he was discovered to be proud, to be above his company, and above being pleased; and not all his large estate in Derbyshire could save him from having a most forbidding, disagreeable countenance, and being unworthy to be compared with his friend."""

response = llm.chat(messages=[{"role": "user", "content": translation_template.format(text=translate_to_rus)}], sampling_params=sampling_params)
print(response[0].outputs[0].text)

_The model hallucinates a lot with this translation task._ 

#### Coding Task

_Code challenge from: https://edabit.com/challenge/ZdnwC3PsXPQTdTiKf_:

In [None]:
coding_template = """Write a code in Python to solve the following task:

{task}

Starter: 

{starter}
"""

coding_test = """Create a Python function that takes two numbers and a mathematical operator + - / * and will perform a calculation with the given numbers. """

# examples = """calculator(2, "+", 2) ➞ 4

# calculator(2, "*", 2) ➞ 4

# calculator(4, "/", 2) ➞ 2"""

starter = """def calculator(num1, operator, num2):"""


response = llm.chat(messages=[{"role": "user", "content": coding_template.format(task=coding_test, starter=starter)}], sampling_params=sampling_params)
print(response[0].outputs[0].text)

_Coding challenge from: https://edabit.com/challenge/3A3mHS5B3NNZddQL2_:

In [None]:
coding_template = """Write a code in Python to solve the following task:

{task}

Starter:

{starter}

"""

coding_test = """Create a function:
    
to check if a candidate is qualified in an imaginary coding interview of an imaginary tech startup.

The criteria for a candidate to be qualified in the coding interview is:
The candidate should have complete all the questions.
The maximum time given to complete the interview is 120 minutes.
The maximum time given for very easy questions is 5 minutes each.
The maximum time given for easy questions is 10 minutes each.
The maximum time given for medium questions is 15 minutes each.
The maximum time given for hard questions is 20 minutes each.
If all the above conditions are satisfied, return "qualified", else return "disqualified".

You will be given a list of time taken by a candidate to solve a particular question and the total time taken by the candidate to complete the interview.

Given a list , in a true condition will always be in the format [very easy, very easy, easy, easy, medium, medium, hard, hard].

The maximum time to complete the interview includes a buffer time of 20 minutes."""

starter = """def interview(lst, tot):"""


response = llm.chat(messages=[{"role": "user", "content": coding_template.format(task=coding_test, starter=starter)}], sampling_params=sampling_params)
print(response[0].outputs[0].text)

The minimum cluster configuration needed to run the model is **g5.2xlarge[A10G]** - 32 GB memory, 1 GPU.

Cost: 1.64 dbu/h.

Latency of the model: 14s.