### Here we mount our google drive to this colab runtime.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Installation of required packages

In [None]:
!pip -qqq install huggingface_hub transformers accelerate bitsandbytes --progress-bar off

In [None]:
import torch
print(torch.cuda.is_available())

True


### Hugging Face Authentication (not mandatory but recommended)

In [None]:
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')

!huggingface-cli login --token $hf_token

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
The token `ISA` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `ISA`


### Advantages of using Pre-trained Models

1. **Reduced Development Time**
  - Pre-trained models eliminate the need to train models from scratch:
    Training large models like GPT, BERT, or vision transformers (ViT) requires months of work and vast computational resources.

    Using pre-trained models allows developers to immediately apply these state-of-the-art models to their tasks.

  - Fine-tuning is faster and more cost efficient:
    Instead of training a model from scratch, pre-trained models can be fine-tuned on specific datasets to adapt to niche use cases. This is much cheaper than training from scratch.


2. **Access to State-of-the-Art Models**
    - Platforms like Hugging Face provide access to a wide variety of state-of-the-art models (e.g., GPT, BERT, T5, Whisper, CLIP).

   - Developers gain immediate access to cutting-edge performance for a variety of tasks:
    - NLP Tasks: Text classification, sentiment analysis, summarization, translation, question answering, text generation.
    - Vision Tasks: Image classification, object detection, segmentation.
    - Multimodal Tasks: Models like CLIP handle both text and image inputs.
    - Audio Tasks: Models like Whisper enable transcription and speech recognition.


3. **Democratization of AI**
- Accessible to all skill levels:

    Platforms like Hugging Face simplify the deployment of advanced models, even for developers with limited AI expertise.
    APIs and pre-built pipelines allow developers to use these models with minimal code.

- Community-driven innovation:

    Hugging Face's open ecosystem allows researchers and developers to share models, datasets, and ideas, accelerating progress in AI.


4. **Improved Performance on Low-Data Tasks**

 - Transfer learning:

    Pre-trained models excel at transferring their knowledge to new tasks, even with small datasets.
> For example, a BERT model fine-tuned on a small dataset can outperform a model trained from scratch on the same dataset.


5. **Proven Quality and Reliability**

  - Rigorous benchmarking:

    Many pre-trained models are extensively tested and benchmarked on public datasets, ensuring they meet high standards.

  - Bug fixes and updates:
    Established communities like Hugging Face ensure that popular models are frequently updated and optimized.


6. Scalability
  - Easily adaptable to new tasks:

    By modifying the architecture slightly (e.g., adding task-specific heads to models like BERT), pre-trained models can be applied to tasks beyond their original design.

  - Deployable across a range of environments:
    Pre-trained models are compatible with various platforms, including on-premise servers, cloud systems, mobile devices, and embedded systems.


**Challenges Addressed by Pre-Trained Models**
  - Data Scarcity: Overcome the challenge of having limited labeled data.
  - Lack of Expertise: Reduce the barrier to entry for advanced machine learning tasks.
  - High Training Costs: Avoid the expense of training complex models from scratch.


### We define the path of the model we will be using (copied and pasted from hugging face)

In [None]:
model_path = "ilsp/Meltemi-7B-Instruct-v1.5"

In [None]:
gpu = "cuda"

### Example #1: Sentiment Analysis, with a pretrained model.

In [None]:
from transformers import pipeline

sentiment_analyzer = pipeline(task="sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english", device="cpu")

In [None]:
result1 = sentiment_analyzer("Hugging Face makes NLP easy and accessible!")
print("Sentiment Analysis Result 1:", result1)

Sentiment Analysis Result 1: [{'label': 'POSITIVE', 'score': 0.9997339844703674}]


In [None]:
result2 = sentiment_analyzer("I hate that it is raining today ):")
print("Sentiment Analysis Result 2:", result2)

Sentiment Analysis Result 2: [{'label': 'NEGATIVE', 'score': 0.5724037885665894}]


In [None]:
result3 = sentiment_analyzer("The service of the restaurant was great, but the food was mid.")
print("Sentiment Analysis Result 3:", result3)

Sentiment Analysis Result 3: [{'label': 'NEGATIVE', 'score': 0.9593968391418457}]


### Example #2: Text Generation, with a pretrained model.

In [None]:
from transformers import pipeline
import textwrap

# Set up the text wrapper
wrapper = textwrap.TextWrapper(width=150)

# Initialize the GPT-2 text generation pipeline
generator = pipeline(task="text-generation", model="gpt2", device="cpu")

In [None]:
generator.tokenizer.eos_token_id = generator.tokenizer.pad_token_id
# Generate the text
generated_text = generator("Once upon a time in Ancient Greece,", max_length=64, num_return_sequences=1, truncation=True)

# Access the generated text
text = generated_text[0]["generated_text"]

# Format the text using textwrap
formatted_generated_text = wrapper.fill(text.strip())

# Print the formatted text
print("\nGenerated Text:\n\n", formatted_generated_text + "...")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



Generated Text:

 Once upon a time in Ancient Greece, a people that did not know what he said, called Tetrarches or "Tetrarchs," an old Greek word that meant something
like a man made a god, after he had been laid in slavery. His statue was hung in a wooden box on the side of a...


In [None]:
# Let's Free up some Memory
del sentiment_analyzer
del generator

### **8-bit Quantization Overview**

> Quantization is a technique to reduce the numerical precision of model weights and activations from 32-bit floating-point (FP32) or 16-bit floating-point (FP16) to 8-bit integer (INT8).

This process involves:

- *Scaling*: Mapping higher-precision values to a smaller 8-bit range using scaling factors.

- *Calibration*: Identifying ranges (min, max) of values for weights and activations to ensure accurate representation within 8-bit limits.

- *Conversion*: Replacing FP32/FP16 computations with INT8 operations during inference.


**Pros** of 8-bit Quantization

- **Reduced Memory Usage**: 8-bit weights and activations consume significantly less memory, enabling deployment on memory-constrained hardware (e.g., mobile devices).

- **Faster Inference**: Operations on 8-bit integers are computationally faster than FP32/FP16, especially on hardware optimized for low-precision computations.

- **Lower Power Consumption**: Reduced computational overhead leads to lower energy usage, crucial for edge devices.

- **Scalability**: Allows the deployment of larger models in resource-constrained environments without prohibitive resource demands.


**Cons** of 8-bit Quantization

- **Accuracy Drop**: Quantizing from FP32/FP16 to INT8 can lead to a loss of precision, potentially reducing model accuracy.

- **Numerical Instability**: For weights or activations with large dynamic ranges, 8-bit precision may struggle to represent values effectively.

- **Complexity**: Additional calibration and mixed-precision handling may be required for maintaining accuracy (e.g., using FP16/FP32 for outlier values).

- **Hardware Dependency**: Performance benefits depend on hardware support for INT8 operations, which may not be universal.

Saved model in drive has these quant config

In [None]:
from transformers import BitsAndBytesConfig
import torch

# Define a BitsAndBytesConfig for 8-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,  # Enable 8-bit loading
    llm_int8_threshold=5.0,  # Defines the outlier threshold for mixed-precision computation. Values above this threshold are computed in higher precision (e.g., FP16) to maintain numerical stability.
    llm_int8_has_fp16_weight= False,  # Use both int8 and fp16 weights for mixed precision
    llm_int8_enable_fp32_cpu_offload=False # Enables offloading specific computations to the CPU in 32-bit precision (FP32) during the 8-bit inference process
)

print("8-bit quantization configuration defined!")

In [None]:
from transformers import BitsAndBytesConfig
import torch

# Define a BitsAndBytesConfig for 8-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,  # Enable 8-bit loading
    llm_int8_threshold=3.0,  # Defines the outlier threshold for mixed-precision computation. Values above this threshold are computed in higher precision (e.g., FP16) to maintain numerical stability.
)

print("8-bit quantization configuration defined!")

### Execute this cell to load Meltemi from Hugging Face

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map=gpu, quantization_config=bnb_config)
print("Model loaded in 8-bit precision!")
print(f"Memory used by {model_path.split('/')[1]}: {round(model.get_memory_footprint()/1024/1024/1024, 2)} GB")

### Run this cell if you want to save Meltemi on your Drive (7.44 GB)

In [None]:
# import os

# save_path = "/content/drive/MyDrive"
# model.save_pretrained(os.path.join(save_path, "Meltemi/Meltemi-7B-Instruct-v1.5_8bit_BnB_v2"))
# tokenizer.save_pretrained(os.path.join(save_path,"Meltemi/Meltemi-7B-Instruct-v1.5_tokenizer_v2"))

# print(f"Model and Tokenizer saved to {save_path}")

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer

# # Load the model and tokenizer
# model = AutoModelForCausalLM.from_pretrained('/content/drive/MyDrive/Meltemi Workshop #1/Meltemi-7B-Instruct-v1.5_8bit_BnB', device_map=gpu)
# tokenizer = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Meltemi Workshop #1/Meltemi-7B-Instruct-v1.5_tokenizer')

# print("Model (and Tokenizer) loaded from Google Drive in 8-bit precision !")
# print(f"Memory used by {model_path.split('/')[1]}: {round(model.get_memory_footprint()/1024/1024/1024, 2)} GB")

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer

# # Load the model and tokenizer
# model = AutoModelForCausalLM.from_pretrained('/content/drive/MyDrive/Meltemi Workshop #1/Meltemi-7B-Instruct-v1.5_8bit_BnB', device_map=gpu)
# tokenizer = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Meltemi Workshop #1/Meltemi-7B-Instruct-v1.5_tokenizer')

# print("Model (and Tokenizer) loaded from Google Drive in 8-bit precision !")
# print(f"Memory used by {model_path.split('/')[1]}: {round(model.get_memory_footprint()/1024/1024/1024, 2)} GB")

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Model (and Tokenizer) loaded from Google Drive in 8-bit precision !
Memory used by Meltemi-7B-Instruct-v1.5: 7.44 GB


### Example #3: Next Token Prediction, with Meltemi (GR)

In [None]:
prompt = "Η τεχνητή νοημοσύνη είναι"

input_ids = tokenizer(prompt, return_tensors='pt').to(gpu)['input_ids']

outputs = model.generate(
    input_ids,
    max_new_tokens=10,
)

output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Next-token prediction:\n{output_text}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Next-token prediction:
Η τεχνητή νοημοσύνη είναι ένα από τα πιο σημαντικά θέματα της εποχής μας.


### Example #4: Next Token Prediction, with Meltemi (ENG)

In [None]:
prompt = "Artificial Intelligence is"

input_ids = tokenizer(prompt, return_tensors='pt').to(gpu)['input_ids']

outputs = model.generate(
    input_ids,
    max_new_tokens=50,
)

output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Next-token prediction:\n{output_text}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Next-token prediction:
Artificial Intelligence is a field of computer science that aims to create intelligent machines that can perform tasks that require human intelligence, such as learning, reasoning, and perception. AI systems can be used in a variety of applications, including healthcare, finance, and education.




### Example #5: Reasoning Problem - Chat Format, with Meltemi (GR)

In [None]:
# Chat capability (Reasoning Problem) - Greek
messages = [
    {"role": "system", "content": "Είσαι ένας έξυπνος και ευγενικός βοηθός."}, # System describes the high level instructions the model should follow when generating text (referred to as preamble)
    {"role": "user", "content": "Ξεκίνησα με 13 κόκκινα μήλα και 11 πράσινα μήλα. Έδωσα 2 κόκκινα μήλα στη Μαρία. Έδωσα 3 πράσινα μήλα στον Ίππο. Μετά, έφαγα τα μισά από τα πράσινα μήλα που μου έμειναν. Αργότερα, έδωσα επίσης 2 από τα πράσινα μήλα μου στον Κώστα. Πόσα πράσινα μήλα και πόσα κόκκινα μήλα έχω τώρα;"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(gpu)
outputs = model.generate(**inputs, max_new_tokens=512)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"\nChat capability:\n{generated_text}")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



Chat capability:

Είσαι ένας έξυπνος και ευγενικός βοηθός. 
<|user|>
Ξεκίνησα με 13 κόκκινα μήλα και 11 πράσινα μήλα. Έδωσα 2 κόκκινα μήλα στη Μαρία. Έδωσα 3 πράσινα μήλα στον Ίππο. Μετά, έφαγα τα μισά από τα πράσινα μήλα που μου έμειναν. Αργότερα, έδωσα επίσης 2 από τα πράσινα μήλα μου στον Κώστα. Πόσα πράσινα μήλα και πόσα κόκκινα μήλα έχω τώρα; 
<|assistant|>
Ας το αναλύσουμε βήμα προς βήμα:

1. Ξεκινήσατε με 13 κόκκινα μήλα και 11 πράσινα μήλα.
2. Δώσατε 2 κόκκινα μήλα στη Μαρία, οπότε τώρα έχετε 13 - 2 = 11 κόκκινα μήλα.
3. Δώσατε 3 πράσινα μήλα στον Ίππο, οπότε τώρα έχετε 11 - 3 = 8 πράσινα μήλα.
4. Φάγατε τα μισά από τα πράσινα μήλα που σας έμειναν, οπότε τώρα έχετε 8 / 2 = 4 πράσινα μήλα.
5. Δώσατε 2 πράσινα μήλα στον Κώστα, οπότε τώρα έχετε 4 - 2 = 2 πράσινα μήλα.

Έτσι, τώρα έχετε 2 πράσινα μήλα και 11 κόκκινα μήλα.


### Example #6: Reasoning Problem Chat Format, with Meltemi (ENG)

In [None]:
# Chat capability (Reasoning Problem) - English
messages = [
    {"role": "system", "content": "You are a smart and helpful assistant."}, # System describes the high level instructions the model should follow when generating text
    {"role": "user", "content": "I started with 13 red apples and 11 green apples. I gave 2 red apples to Maria. I gave 3 green apples to Ippo. Then, I ate half of the green apples I had left. Later, I also gave 2 of my green apples to Kostas. How many green apples and how many red apples do I have now?"}
  ]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(gpu)
outputs = model.generate(**inputs, max_new_tokens=1024)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"\nChat capability:\n{generated_text}")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



Chat capability:

You are a smart and helpful assistant. 
<|user|>
I started with 13 red apples and 11 green apples. I gave 2 red apples to Maria. I gave 3 green apples to Ippo. Then, I ate half of the green apples I had left. Later, I also gave 2 of my green apples to Kostas. How many green apples and how many red apples do I have now? 
<|assistant|>
Let's break down the problem step by step:

1. You started with 13 red apples and 11 green apples.
2. You gave 2 red apples to Maria, so you have 13 - 2 = 11 red apples left.
3. You gave 3 green apples to Ippo, so you have 11 - 3 = 8 green apples left.
4. You ate half of the green apples you had left, so you have 8 / 2 = 4 green apples left.
5. You gave 2 green apples to Kostas, so you have 4 - 2 = 2 green apples left.

So, you have 2 green apples and 11 red apples left.


In [None]:
# messages.append({"role": "assistant", "content": generated_text})
# # messages.pop(-1)
# print(f"\nMessages: \n{messages[-1][-1]}")

### Example #7: Reasoning Problem - Pipeline, with Meltemi (GR)

In [None]:
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
    )

# Using pipeline for chat - English
generated_text_pipeline_chat = pipe(prompt, max_length=1024)[0]['generated_text']
print(f"\nPipeline chat capability:\n{generated_text_pipeline_chat}")


Pipeline chat capability:
<|system|>
Είσαι ένας έξυπνος και ευγενικός βοηθός.</s>
<|user|>
Ξεκίνησα με 13 κόκκινα μήλα και 11 πράσινα μήλα. Έδωσα 2 κόκκινα μήλα στη Μαρία. Έδωσα 3 πράσινα μήλα στον Ίππο. Μετά, έφαγα τα μισά από τα πράσινα μήλα που μου έμειναν. Αργότερα, έδωσα επίσης 2 από τα πράσινα μήλα μου στον Κώστα. Πόσα πράσινα μήλα και πόσα κόκκινα μήλα έχω τώρα;</s>
<|assistant|>
Ας το αναλύσουμε βήμα προς βήμα:

1. Ξεκινήσατε με 13 κόκκινα μήλα και 11 πράσινα μήλα.
2. Δώσατε 2 κόκκινα μήλα στη Μαρία, οπότε τώρα έχετε 13 - 2 = 11 κόκκινα μήλα.
3. Δώσατε 3 πράσινα μήλα στον Ίππο, οπότε τώρα έχετε 11 - 3 = 8 πράσινα μήλα.
4. Φάγατε τα μισά από τα πράσινα μήλα που σας έμειναν, οπότε τώρα έχετε 8 / 2 = 4 πράσινα μήλα.
5. Δώσατε 2 πράσινα μήλα στον Κώστα, οπότε τώρα έχετε 4 - 2 = 2 πράσινα μήλα.

Έτσι, τώρα έχετε 2 πράσινα μήλα και 11 κόκκινα μήλα.


### Example #8: Roleplay - Chat Format, with Meltemi (GR)

In [None]:
messages = [
    {"role": "system", "content": "Είσαι ένας Social Media Manager για έναν οργανισμό πληροφορικής, ο οποίος διοργανώνει μία ημερίδα με θέμα το πρώτο γλωσσικό μοντέλο ανοιχτού-κώδικα εξειδικευμένο στα Ελληνικά, το Μελτέμι. Στην ημερίδα θα μιλήσουν σχετικά με το έργο τους οι δημιουργοί του, ερευνητές στο Ινστιτούτο Επεξεργασίας του Λόγου, στο ερευνητικό κέντρο Αθηνά, ένα από τα μεγαλύτερα στην Ελλάδα."}, # System describes the high level instructions the model should follow when generating text (referred to as preamble)
    {"role": "user", "content": "Ολοκλήρωσε την πρόταση και στην συνέχεια γράψε μία μικρή παράγραφο για να ανέβει ως περιγραφή σε μία ανάρτηση στο Instagram: `Οι εξαιρετικοί αυτοί ομιλητές θα μοιραστούν μαζί μας τις γνώσεις τους και`"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(gpu)
outputs = model.generate(**inputs, max_new_tokens=512)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"\nChat capability:\n{generated_text}")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



Chat capability:

Είσαι ένας Social Media Manager για έναν οργανισμό πληροφορικής, ο οποίος διοργανώνει μία ημερίδα με θέμα το πρώτο γλωσσικό μοντέλο ανοιχτού-κώδικα εξειδικευμένο στα Ελληνικά, το Μελτέμι. Στην ημερίδα θα μιλήσουν σχετικά με το έργο τους οι δημιουργοί του, ερευνητές στο Ινστιτούτο Επεξεργασίας του Λόγου, στο ερευνητικό κέντρο Αθηνά, ένα από τα μεγαλύτερα στην Ελλάδα. 
<|user|>
Ολοκλήρωσε την πρόταση και στην συνέχεια γράψε μία μικρή παράγραφο για να ανέβει ως περιγραφή σε μία ανάρτηση στο Instagram: `Οι εξαιρετικοί αυτοί ομιλητές θα μοιραστούν μαζί μας τις γνώσεις τους και` 
<|assistant|>
Οι εξαιρετικοί αυτοί ομιλητές θα μοιραστούν μαζί μας τις γνώσεις τους και τις εμπειρίες τους σχετικά με το Μελτέμι, το πρώτο γλωσσικό μοντέλο ανοιχτού-κώδικα εξειδικευμένο στα Ελληνικά. Θα συζητήσουμε για τις δυνατότητες και τις εφαρμογές του, καθώς και για τις προκλήσεις και τις ευκαιρίες που παρουσιάζει. Επιπλέον, θα έχουμε την ευκαιρία να μάθουμε για τις τελευταίες εξελίξεις στην 

### Example #9: Roleplay - Pipeline, with Meltemi (GR)

In [None]:
from transformers import pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
    )

# Using pipeline for chat - Greek
generated_text_pipeline_chat = pipe(prompt, max_length=1024)[0]['generated_text']
print(f"\nPipeline chat capability:\n{generated_text_pipeline_chat}")


Pipeline chat capability:
<|system|>
Είσαι ένας Social Media Manager για έναν οργανισμό πληροφορικής, ο οποίος διοργανώνει μία ημερίδα με θέμα το πρώτο γλωσσικό μοντέλο ανοιχτού-κώδικα εξειδικευμένο στα Ελληνικά, το Μελτέμι. Στην ημερίδα θα μιλήσουν σχετικά με το έργο τους οι δημιουργοί του, ερευνητές στο Ινστιτούτο Επεξεργασίας του Λόγου, στο ερευνητικό κέντρο Αθηνά, ένα από τα μεγαλύτερα στην Ελλάδα.</s>
<|user|>
Ολοκλήρωσε την πρόταση και στην συνέχεια γράψε μία μικρή παράγραφο για να ανέβει ως περιγραφή σε μία ανάρτηση στο Instagram: `Οι εξαιρετικοί αυτοί ομιλητές θα μοιραστούν μαζί μας τις γνώσεις τους και`</s>
<|assistant|>
Οι εξαιρετικοί αυτοί ομιλητές θα μοιραστούν μαζί μας τις γνώσεις τους και τις εμπειρίες τους σχετικά με το Μελτέμι, το πρώτο γλωσσικό μοντέλο ανοιχτού-κώδικα εξειδικευμένο στα Ελληνικά. Θα συζητήσουμε για τις δυνατότητες και τις εφαρμογές του, καθώς και για τις προκλήσεις και τις ευκαιρίες που παρουσιάζει. Επιπλέον, θα έχουμε την ευκαιρία να μάθουμε για τις τ

### Examples #10:
- Translation
- General Knowlegde
- Specialized Topic Knowlegde
- Creative Writing
- Fact Generation

with Meltemi (GR & ENG)

In [None]:
# Test cases for chat capabilities in Greek and English

# English Tests
english_test_cases = [
    {"role": "user", "content": "Translate the phrase 'Hello, how are you?' to Greek."},
    {"role": "user", "content": "What is the capital of Australia?"},
    {"role": "user", "content": "Explain quantum physics in simple terms."},
    {"role": "user", "content": "Write a short story about a talking dog."},
    {"role": "user", "content": "Give me five interesting facts about the ocean."},
]

# Greek Tests
greek_test_cases = [
    {"role": "user", "content": "Μετάφρασε την φράση 'Καλημέρα, τι κάνεις;' στα αγγλικά."},
    {"role": "user", "content": "Ποια είναι η πρωτεύουσα της Ελλάδας;"},
    {"role": "user", "content": "Εξήγησε την κβαντική φυσική με απλά λόγια."},
    {"role": "user", "content": "Γράψε μια μικρή ιστορία για ένα σκυλί που μιλάει."},
    {"role": "user", "content": "Δώσε μου πέντε ενδιαφέροντα γεγονότα για τον ωκεανό."}
]

In [None]:
def run_test_cases(test_cases, language):
  for test_case in test_cases:
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},  # Adjust system prompt if needed
        test_case
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(gpu)
    outputs = model.generate(**inputs, max_new_tokens=512)
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"{generated_text}\n")

1. `prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)`
**What this does:**
  - `apply_chat_template` is a method on the tokenizer, possibly customized for chat-based interactions. It formats the messages input into a structured prompt that the model can understand.

**Arguments:**
  - `messages`: A list of user and assistant messages, likely in a structured format such as a list of dictionaries

- `tokenize=False`: This tells the function not to tokenize the prompt yet. It means the result will still be plain text, rather than tokenized numbers.
- `add_generation_prompt=True`: This appends the specific token "Assistant: " to the prompt that signals the model to continue the conversation. For example, it might append a system message like  to indicate that the next text should be generated as the assistant's response.

2. `inputs = tokenizer(prompt, return_tensors="pt").to(gpu)`
**What this does:**
- `tokenizer(prompt)`: Converts the formatted prompt from plain text into tokenized input that the model can process. Tokenization breaks the text into smaller units (e.g., words or subwords) and maps them to numeric IDs.

**Arguments:**
  - `return_tensors="pt"`: Specifies that the tokenized data should be returned as PyTorch tensors, which are the data structures the model needs to process the input.
  - `.to(gpu)`: Moves the tokenized tensors to the GPU for faster processing during model inference. We have already defined the gpu variable to be the string denoting GPU usage, `"cuda"`. *If you're running on a CPU, this line should be omitted or replaced with .to("cpu")*.

3. `outputs = model.generate(**inputs, max_new_tokens=320)`
**What this does:**
Calls the generate method of the model to produce text based on the tokenized inputs.

  - `model.generate`: This is the main method used to generate text from a pre-trained language model. It takes the tokenized input (prepared earlier) and uses the model's architecture to predict the next tokens.

  - `**inputs`: This is the unpacked dictionary of tokenized inputs (e.g., input_ids, attention_mask, etc.) that were created earlier using the tokenizer. It ensures the model has the input data needed for inference.

  - `max_new_tokens=320`:
  Limits the number of new tokens the model can generate to a maximum of 320.
  The total length of the generated output (new tokens + input tokens) will depend on the model's max_length or context window size, but the newly added content won't exceed 320 tokens.
  *Useful for controlling response length and avoiding overly verbose or excessively long outputs.*

4. `generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)`
**What this does:**
Converts the generated token IDs from the model into a readable text string.
  - `tokenizer.decode()`:
  Converts the token IDs back into plain text by mapping each token to its corresponding word or subword. Ensures that the text is human-readable.

  - `skip_special_tokens=True`:
  Removes any special tokens (e.g., <pad>, <eos>, <bos>, etc.) from the decoded text. Ensures that the output contains only the natural language text, without extra tokens used for internal processing by the model.


In [None]:
print("Running Greek tests...")
run_test_cases(greek_test_cases, "Greek")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Running Greek tests...


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Μετάφρασε την φράση 'Καλημέρα, τι κάνεις;' στα αγγλικά. 
<|assistant|>
Good morning, how are you?



Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Ποια είναι η πρωτεύουσα της Ελλάδας; 
<|assistant|>
Η πρωτεύουσα της Ελλάδας είναι η Αθήνα.



Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Εξήγησε την κβαντική φυσική με απλά λόγια. 
<|assistant|>
Η κβαντική φυσική είναι ένας κλάδος της φυσικής που ασχολείται με τη συμπεριφορά της ύλης και της ενέργειας σε ατομικό και υποατομικό επίπεδο, όπου τα σωματίδια εμφανίζουν κυματοσωματιδιακό δυϊσμό και υπακούν σε πιθανοτικούς νόμους αντί για ντετερμινιστικούς.

Σε αυτή την κλίμακα, τα σωματίδια δεν συμπεριφέρονται όπως τα κλασικά αντικείμενα, αλλά εμφανίζουν κυματοσωματιδιακό δυϊσμό, πράγμα που σημαίνει ότι έχουν τόσο κυματικές όσο και σωματιδιακές ιδιότητες. Για παράδειγμα, τα ηλεκτρόνια μπορούν να συμπεριφέρονται σαν κύματα, με το κυματάριθμό τους να καθορίζει την ενέργειά τους.

Η κβαντική φυσική εισάγει επίσης την έννοια της υπέρθεσης, όπου ένα κβαντικό σύστημα μπορεί να υπάρχει σε πολλαπλές καταστάσεις ταυτόχρονα μέχρι να μετρηθεί. Για παράδειγμα, ένα ηλεκτρόνιο μπορεί να υπάρχει σε μια υπέρθεση δύο καταστάσεων μέχρι να παρατηρηθεί, οπότε θα καταρρεύσει σε μία από τις δύο καταστάσεις.


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Γράψε μια μικρή ιστορία για ένα σκυλί που μιλάει. 
<|assistant|>
Μια φορά κι έναν καιρό, σε μια μικρή πόλη που ονομαζόταν Φλερτβιλ, ζούσε ένα σκυλί που ονομαζόταν Μαξ. Ο Μαξ ήταν ένα συνηθισμένο σκυλί, αλλά είχε μια ασυνήθιστη ικανότητα - μπορούσε να μιλάει. Στην αρχή, ο Μαξ δυσκολευόταν να χρησιμοποιήσει τις νέες του δεξιότητες, αλλά με τον καιρό, έγινε πιο σίγουρος για τον εαυτό του.

Μια μέρα, ο Μαξ αποφάσισε να χρησιμοποιήσει τις ικανότητές του για να βοηθήσει τους φίλους του. Έμαθε ότι ένα από τα άλλα σκυλιά της πόλης, ο Ρεξ, είχε χαθεί. Ο Μαξ ήξερε ότι ο Ρεξ ήταν φοβισμένος και μπερδεμένος, οπότε αποφάσισε να τον βρει και να τον φέρει πίσω στην ασφάλεια του σπιτιού του.

Ο Μαξ ξεκίνησε την αναζήτησή του, ρωτώντας τους ανθρώπους στην πόλη αν είχαν δει τον Ρεξ. Τελικά, βρήκε τον Ρεξ κρυμμένο σε ένα κοντινό πάρκο. Ο Μαξ μίλησε στον Ρεξ και τον διαβεβαίωσε ότι όλα θα πήγαιναν καλά. Μαζί, επέστρεψαν στο σπίτι του Ρεξ, όπου τον υποδέχτηκαν με ανο

In [None]:
print("Running English tests...")
run_test_cases(english_test_cases, "English")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Running English tests...


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Translate the phrase 'Hello, how are you?' to Greek. 
<|assistant|>
Γεια σας, πώς είστε;



Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
What is the capital of Australia? 
<|assistant|>
The capital of Australia is Canberra.



Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Explain quantum physics in simple terms. 
<|assistant|>
Quantum physics is a branch of physics that deals with the behavior of matter and energy at the atomic and subatomic level. It is a fundamental theory that forms the basis of modern physics and is used to explain phenomena such as the behavior of electrons, atoms, and molecules.

Quantum physics is based on the concept of quantum mechanics, which states that the behavior of particles at the atomic and subatomic level is governed by probabilities rather than deterministic laws. This means that the behavior of particles at these levels is inherently uncertain and can only be described in terms of probabilities.

One of the key concepts in quantum physics is the wave-particle duality, which states that particles can exhibit both wave-like and particle-like behavior. This is a fundamental principle of quantum physics and is used to explain phenomena such as the behavior of electrons in atoms and

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Write a short story about a talking dog. 
<|assistant|>
Once upon a time, in a small town called Whiskerville, there lived a dog named Max. Max was not an ordinary dog. He was a talking dog. He could communicate with humans just like they did.

Max was a very curious dog. He loved to explore the town and learn about its inhabitants. He often visited the local market, where he would chat with the vendors and customers. He had a knack for solving problems, and he was always willing to help anyone in need.

One day, Max overheard a group of children playing in the park. They were arguing about who should be the leader of their game. Max decided to step in and offer his help. He told the children that he could help them make a decision.

The children were skeptical at first, but Max assured them that he could be a fair and impartial leader. He asked them to each make a case for why they should be the leader, and he listened carefully to each of their

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



You are a helpful assistant. 
<|user|>
Give me five interesting facts about the ocean. 
<|assistant|>
1. The ocean covers approximately 71% of the Earth's surface, making it the largest body of water on our planet.
2. The deepest point in the ocean is the Mariana Trench, located in the western Pacific Ocean. It reaches a depth of approximately 36,000 feet (11,000 meters).
3. The ocean is home to an estimated 2 million species, with only a fraction of these having been discovered and named.
4. The ocean plays a crucial role in regulating the Earth's climate, as it absorbs about 30% of the carbon dioxide released into the atmosphere.
5. The ocean is a vital source of food, providing approximately 15% of the world's protein intake through fish and other seafood.


You are a helpful assistant. 
<|user|>
Explain what is RAG (Retrieval Augmented Generation) and why would someone use it. 
<|assistant|>
RAG, or Retrieval Augmented Generation, is a language model that combines the capabilities