
### Using [Qwen 2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) - [Alibaba Cloud](https://www.alibabacloud.com/en?_p_lc=1)  

##### Running on g5.12xlarge

_launched on 19.09.24_

available in various sized including: 
- Qwen 2.5: 0.5B, 1.5B, 3B, 7B, **14B**, 32B, and 72B
- Qwen 2.5-Coder: 1.5B, 7B, and 32B on the way
- Qwen 2.5-Math: 1.5B, 7B, and 72B 


Qwen models are trained on 18 trillion tokens and have great capabilities in coding and mathematics. They support up to 128K tokens and over 29 languages.

##### Setup

In [0]:
!pip install -Uq accelerate 
!pip install -Uq bitsandbytes
!pip install "transformers>=4.45.0" --upgrade

dbutils.library.restartPython()

In [0]:
import os
import torch
torch.cuda.empty_cache()


import gc
gc.collect()

import time

from transformers import AutoModelForCausalLM, AutoTokenizer

##### Checking GPU Allocation

In [0]:
print(torch.cuda.memory_summary(device=None, abbreviated=False))


#### Load model

In [0]:
tokenizer_14 = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B", trust_remote_code=True)
model_14 = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-14B",
    torch_dtype=torch.bfloat16,  
    device_map="auto",           
    trust_remote_code=True
)

#### Text Generation

In [0]:
prompt_generation = "Write me all about Arthur Schopenhauer."

inputs = tokenizer_14(prompt_generation, return_tensors="pt")

# Move inputs to the model's device (get device from model)
device = next(model_14.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}

#Tracking the time
start_time = time.time()

outputs = model_14.generate(**inputs, max_new_tokens=200)

print(tokenizer_14.decode(outputs[0], skip_special_tokens=True))

end_time = time.time()

latency = end_time - start_time

In [0]:
print('Latency of the model is', latency)

#### Text Summarization

In [0]:
template = """Write a short summary of this article for a business expert:

{article}
"""

cameroon = """The economic fallout from the COVID-19 pandemic and the subsequent global shocks provoked by the war in Ukraine have hit African countries hard, denting economic growth and aggravating their sovereign debt positions. The International Monetary Fund (IMF) forecasts that Cameroon, a Central African oil producer, will record 4.3% economic growth this year after it slumped to 0.5% in 2020. The Fund has classified Cameroon as being at high risk of debt distress, though in its most recent review of the country's loan programme it stated that, with active fiscal reforms and management, the debt could be sustainable. "Our debt service coverage from exports needs to be improved. That's the reason why we are ranked in a high risk debt distress position," said Alamine Ousmane Mey, Cameroon's minister of economy, planning and regional development. He was speaking at an event organised by the Atlantic Council think tank on the sidelines of the IMF and World Bank's Spring Meetings in Washington. "We're working to be able to improve our exports through import substitution policies to reduce imports, produce more and export more. This will give us better room for debt service coverage," he said. Cameroon has also relaunched talks with the U.S. to end its suspension from the Africa Growth and Opportunities Act (AGOA) initiative, which grants qualifying African countries tariff-free access to the U.S. market. Former President Donald Trump suspended Cameroon from the programme in late 2019 over "persistent gross violations of internationally recognised human rights" by Cameroonian security forces. Since 2017, factions of secessionist militias have been battling government troops in the majority Francophone country's two English-speaking regions. The conflict has killed thousands and displaced nearly 800,000 people. "All the issues that have been raised, we're working on in a very transparent open manner to be able to iron them out and solve the problems," Mey said, referring to the talks with U.S. officials to rejoin AGOA. Our Standards: The Thomson Reuters Trust Principles.
"""
prompt = template.format(article=cameroon)

inputs = tokenizer_14(prompt, return_tensors="pt")
device = next(model_14.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}

outputs = model_14.generate(**inputs, max_new_tokens=120)
    
print(tokenizer_14.decode(outputs[0], skip_special_tokens=True))

#### Text Translation

In [0]:
translation_template = """Translate into Russian:
{text}
"""

translate_to_rus = """Mr. Bingley was good-looking and gentlemanlike: he had a pleasant countenance, and easy, unaffected manners. His sisters were fine women, with an air of decided fashion. His brother-in-law, Mr. Hurst, merely looked the gentleman; but his friend Mr. Darcy soon drew the attention of the room by his fine, tall person, handsome features, noble mien, and the report, which was in general circulation within five minutes after his entrance, of his having ten thousand a year. The gentlemen pronounced him to be a fine figure of a man, the ladies declared he was much handsomer than Mr. Bingley, and he was looked at with great admiration for about half the evening, till his manners gave a disgust which turned the tide of his popularity; for he was discovered to be proud, to be above his company, and above being pleased; and not all his large estate in Derbyshire could save him from having a most forbidding, disagreeable countenance, and being unworthy to be compared with his friend."""

prompt = translation_template.format(text=translate_to_rus)

inputs = tokenizer_14(prompt, return_tensors="pt")
device = next(model_14.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}

outputs = model_14.generate(**inputs, max_new_tokens=120)
    
print(tokenizer_14.decode(outputs[0], skip_special_tokens=True))

#### Coding Task

_Code challenge from: https://edabit.com/challenge/ZdnwC3PsXPQTdTiKf_:

In [0]:
coding_template = """Write a code in Python to solve the following task:

{task}

Starter: 

{starter}
"""

coding_test = """Create a Python function that takes two numbers and a mathematical operator + - / * and will perform a calculation with the given numbers. If the input tries to divide by 0, return: Can't divide by 0!"""


starter = """def calculator(num1, operator, num2):"""


prompt = coding_template.format(task=coding_test, starter = starter)

inputs = tokenizer_14(prompt, return_tensors="pt")
device = next(model_14.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}

outputs = model_14.generate(**inputs, max_new_tokens=200)
    
print(tokenizer_14.decode(outputs[0], skip_special_tokens=True))

_Coding challenge from: https://edabit.com/challenge/3A3mHS5B3NNZddQL2_:

In [0]:
coding_template = """Write a code in Python to solve the following task:

{task}

Starter:

{starter}

"""

coding_test = """Create a function:
    
to check if a candidate is qualified in an imaginary coding interview of an imaginary tech startup.

The criteria for a candidate to be qualified in the coding interview is:
The candidate should have complete all the questions.
The maximum time given to complete the interview is 120 minutes.
The maximum time given for very easy questions is 5 minutes each.
The maximum time given for easy questions is 10 minutes each.
The maximum time given for medium questions is 15 minutes each.
The maximum time given for hard questions is 20 minutes each.
If all the above conditions are satisfied, return "qualified", else return "disqualified".

You will be given a list of time taken by a candidate to solve a particular question and the total time taken by the candidate to complete the interview.

Given a list , in a true condition will always be in the format [very easy, very easy, easy, easy, medium, medium, hard, hard].

The maximum time to complete the interview includes a buffer time of 20 minutes."""

starter = """def interview(lst, tot):"""

prompt = coding_template.format(task=coding_template, starter=starter)

inputs = tokenizer_14(prompt, return_tensors="pt")
device = next(model_14.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}

outputs = model_14.generate(**inputs, max_new_tokens=400)
    
print(tokenizer_14.decode(outputs[0], skip_special_tokens=True))

The minimum cluster configuration needed to run the model is **g5.12xlarge[A10G]** - 192 GB memory, 4 GPUs.