<a href="https://colab.research.google.com/github/dgromann/cl_intro_ws2024/blob/main/tutorials/Tutorial6_model_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 6: Introduction to Computational Linguistics

This is the fifth tutorial with practical exercises for the lecture Introduction to Computational Linguistics in the winter semester 2024. Hands-on exercises are marked with 👋 ⚒ and questions are marked with ❓. Remember to first **store this notebook** in your Drive or GitHub.




----
## **Lesson 1: Using Large Language Models in Colab**

There are many ways to use Large Language Models (LLMs) in Colab and other environments. We will use the environment [Ollama](https://ollama.com/) today. Another option that provides access to a range of LLMs is [Unsloth](https://docs.unsloth.ai/).

We will first install [colab-xterm](https://github.com/InfuseAI/colab-xterm) that allows us to run a terminal within a code cell on Google Colab.


In [None]:
!pip install colab-xterm
%load_ext colabxterm

Collecting colab-xterm
  Downloading colab_xterm-0.2.0-py3-none-any.whl.metadata (1.2 kB)
Downloading colab_xterm-0.2.0-py3-none-any.whl (115 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.6/115.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: colab-xterm
Successfully installed colab-xterm-0.2.0


Type and run these two lines one after another in below terminal after run the cell (%xterm) to install and run Ollama:
* curl -fsSL https://ollama.com/install.sh | sh
* ollama serve & ollama pull llama3.2

In [None]:
%xterm

We will then use [LangChain](https://python.langchain.com/docs/introduction/), a library to support the use of LLMs.

In [None]:
!pip -qq install langchain
!pip -qq install langchain-core
!pip -qq install langchain-community

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/2.4 MB[0m [31m11.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.4/2.4 MB[0m [31m38.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m54.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 kB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.5/49.5 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain.llms import Ollama
llm = Ollama(model = "llama3.2")
llm.predict("what is the Meaning of life")

"The meaning of life is a question that has puzzled philosophers, theologians, scientists, and everyday people for centuries. There is no one definitive answer, as it often depends on cultural, personal, and philosophical perspectives. Here are some possible interpretations:\n\n1. **Hedonic Pursuit**: The pursuit of happiness and pleasure. According to Epicureanism, the meaning of life is to seek happiness and avoid pain.\n2. **Self-Actualization**: The realization of one's full potential and becoming the best version of oneself. This concept was introduced by Abraham Maslow in his theory of human motivation.\n3. **Spiritual Growth**: Finding purpose and fulfillment through spiritual practices, such as meditation, prayer, or devotion to a higher power.\n4. **Social Connection**: Building and maintaining relationships with others, which provides a sense of belonging, love, and support.\n5. **Personal Legacy**: Leaving a lasting impact on the world, creating something that will outlast o

In [None]:
llm.invoke("The typical color of the sky is: ")

'Blue.'

In [None]:
llm.invoke("which model version are you?")

'I\'m a large language model, so I don\'t have a specific "model version" in the classical sense. My architecture and training data are based on a combination of models and techniques that are constantly evolving.\n\nThat being said, my knowledge was last updated in December 2023, and I use a model architecture based on transformer technology, which is a popular approach for natural language processing tasks.\n\nIf you\'re looking for more specific information about my capabilities or limitations, I can try to provide more details. Or, if you have any questions or topics you\'d like to discuss, I\'m here to help!'

In [None]:
llm.invoke("Describe quantum physics in one short sentence of no more than 12 words")

"Quantum physics describes the behavior of tiny particles' random, wave-like interactions."

In [None]:
llm.invoke("Explain the latest advances in large language models to me.")

'The field of large language models (LLMs) has experienced significant advancements in recent years, leading to improved performance, efficiency, and versatility. Here are some of the latest developments:\n\n1. **Multitask Learning**: Many modern LLMs are designed to learn multiple tasks simultaneously, such as text classification, sentiment analysis, and question answering. This approach enables models to adapt to diverse applications and improve overall performance.\n2. **Efficient Attention Mechanisms**: Researchers have developed more efficient attention mechanisms, such as token-level attention, position-aware attention, and spatial attention. These advancements enable faster inference times while maintaining or improving model accuracy.\n3. **Transformer-XL and BERT-MLM**: The Transformer-XL architecture introduced a novel scaling strategy for Transformers, allowing models to be trained on larger datasets without significant performance degradation. BERT-MLM (Bidirectional Encode

In [None]:
llm.invoke("Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.")

'Large language models (LLMs) have made significant progress in recent years, with new advancements in various areas such as performance, efficiency, and applications.\n\n1. **Multi-Task Learning**: Recent studies have shown that LLMs can be trained for multiple tasks simultaneously, which improves their overall performance and adaptability. For example, the BERT (Bidirectional Encoder Representations from Transformers) model was initially developed for natural language inference (NLI), but its multi-task learning capabilities led to significant improvements in other NLP tasks such as question answering and sentiment analysis [1].\n2. **Efficient Transformer Architectures**: Researchers have made efforts to optimize transformer-based architectures, leading to faster and more efficient models. For instance, the T5 model, which uses a variant of the transformer architecture, achieved state-of-the-art results on several NLP benchmarks with 10-20 times fewer parameters than previous models

**Zero-shot Prompting**

In [None]:
llm.invoke("Text: This was the best movie I've ever seen! \n The sentiment of the text is: ")
# Returns positive sentiment

'The sentiment of the text is overwhelmingly positive.'

In [None]:
llm.invoke("Text: The director was trying too hard. \n The sentiment of the text is: ")
# Returns negative sentiment

**Few-shot Promtping**
Now we will try few-shot prompting on sentiment analysis.

👋 ⚒ Provide your own examples for the few-shot prompting on this task. Prompt the model to provide a percentage of positive/neutral/negative for each example. Store the returned result in a variable 'out' and run through the results.

In [None]:
out = llm.invoke('''Please give me a percentage per sentiment (positive, neutral, negative) like this:
  The product is ok, but did not meet my expectations. Positive: 30%, Neutral: 50%, Negative: 20%
  The presentation was engaging and informative, but they spoke for over an hour. Positive: 80%, Neutral: 10%, Negative: 10%
  This exam was extremely hard, but I managed. Positive: 30%, Neutral: 30%, Negative: 30%

  It is late, but I will still make it to class on time. ?
  This book was an emotional rollercoaster.
''')

for i in out.split("\n\n"):
  print(i)

**Role Prompting**
Role or persona prompting assigns a role or persona to the model in the prompt with the assumption that the results will be more accurate. We can compare the first prompt without role to the second prompt with a role.

The idea is that role prompting results in more technical benefits and drawbacks.

In [None]:
out=llm.invoke("Explain the pros and cons of using PyTorch.")

for i in out.split("\n\n"):
  print(i)

In [None]:
out = llm.invoke("Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.")

for i in out.split("\n\n"):
  print(i)

**Chain-of-Thought**

In [None]:
out = llm.invoke("Who lived longer, Mozart or Elvis?")
for i in out.split("\n\n"):
  print(i)

In [None]:
out = llm.invoke("Who lived longer, Mozart or Elvis? Let's think through this carefully, step by step.")
answer = out.split("\n\n")
for i in answer:
  print(i)

**Self-Consistency**

In [None]:
 out = llm.invoke("John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the numbers is? Report the answer surrounded by backticks (example: 123)")
 for i in out.split("\n\n"):
  print(i)

**Evaluation on a Dataset**

In this part we will evaluate the performance of LLaMa on a questions-answering dataset that focuses on arithmetic tasks called [MultiArith](https://huggingface.co/datasets/ChilleD/MultiArith).

In [None]:
!pip install -q datasets==2.16.1

In [None]:
from datasets import load_dataset
import pandas as pd

dataset = "ChilleD/MultiArith"


train_dataset = pd.DataFrame(load_dataset(dataset, split="train"))[["question", "final_ans"]]
test_dataset = pd.DataFrame(load_dataset(dataset, split="test"))[["question", "final_ans"]]

test_dataset.columns = ["Question", "Gold Answer"]

👋 ⚒ **Arithematic Question Answering**

1. Write a prompt for answering the questions in the dataset.
2. Post process the answers if needed.
3. Compute the accuracy of the answers given by the LLMs.
4. Find the examples where LLMs answer differently or wrongly.

In [None]:
# Your code here