## Chat-style GPT model experiments

### Inference
- Load the model, set up correct inference parameters.
- Write basic functions for creating system, user and assistant prompts.
- Write a class which facilitates communication with a model. Add a function to call when it is the user's turn. This function should take inputs from the user and load it as a user prompt.
- Make sure that the communication history is always given to the models up to 1600 tokens. If it is longer cut down or summarize the past.

### GPT Ping-pong
- Instantiate two instances of the talker. Make sure that they use the same model variable for inference as you cannot load the model two times into the GPU memory.
- Initialize a conversation with a few messages.
- Let the two assistants talk to eachother and observe the results.

### WikiBot
- In order to provide more precise answers incorporate web search information to your talkers.
- The talker should first create a search keyword if applicable.
- After it should search for related information using the wikipedia package which will extract data from related wiki pages.
- Using the first few results the model should formulate the final answer.

Install all required packages.

In [1]:
!pip install transformers
!pip install einops
!pip install auto-gptq
!pip install wikipedia

Collecting transformers
  Downloading transformers-4.34.1-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m63.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
Col

Import packages

In [2]:
import torch
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import textwrap

Use the latest LLama2 model. This is a quantized version reduced to 4bit precision, which enables us to run it on colab, or even on small desktop/handheld devices.

We load the tokenizer and the model itself.

In [3]:
model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
                                           revision="main",
                                           use_safetensors=True,
                                           trust_remote_code=True,
                                           device="cuda:0",
                                           use_triton=use_triton,
                                           quantize_config=None)

Downloading (…)okenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

Downloading (…)quantize_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]



Try text generation with the model.

According to the official repository this model uses the following schematic for representing conversations:

```
[INST] <<SYS>>
System prompt
<</SYS>>

User1 [/INST] AI1 [INST] User2 [/INST] AI2.....
```

Try generation with a short text snippet. Observe the resulting tokenID sequences.

In [4]:
text = """[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

There's a llama in my garden 😱 What should I do? [/INST]"""
tokens = tokenizer(text, return_tensors="pt")
print(tokens)
generated = model.generate(input_ids = tokens["input_ids"].cuda(), max_length=800)
generated

{'input_ids': tensor([[    1,   518, 25580, 29962,  3532, 14816, 29903,  6778,    13,  3492,
           526,   263,  8444, 20255, 29889,    13, 29966,   829, 14816, 29903,
          6778,    13,    13,  8439, 29915, 29879,   263, 11148,  3304,   297,
           590, 16423, 29871,   243,   162,   155,   180,  1724,   881,   306,
           437, 29973,   518, 29914, 25580, 29962]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


tensor([[    1,   518, 25580, 29962,  3532, 14816, 29903,  6778,    13,  3492,
           526,   263,  8444, 20255, 29889,    13, 29966,   829, 14816, 29903,
          6778,    13,    13,  8439, 29915, 29879,   263, 11148,  3304,   297,
           590, 16423, 29871,   243,   162,   155,   180,  1724,   881,   306,
           437, 29973,   518, 29914, 25580, 29962, 29871,  6439,   694, 29892,
           263, 11148,  3304,   297,   596, 16423, 29973, 29871,   243,   162,
           155,   133,  2193, 29915, 29879,  3755, 15668, 29991, 29871,   243,
           162,   155,   136,  3872, 29915, 29873, 15982, 29892,   306, 29915,
         29885,  1244,   304,  1371,   366,  4377,   714,   825,   304,   437,
         29889, 29871,   243,   162,   167,   151,    13,  6730,  2712,   937,
         29892,  1207,  1854,   366,   322,   596,  3942,   526,  9109, 29889,
           960,   278, 11148,  3304,   338,   451,   946,  3663,   573, 29892,
           366,   508,  1018,   304, 14111,   372,  

Decode the resulting tokenID sequence by calling the tokenizer decode function on the first element of the generation output. Make sure to try decoding with skip_special_tokens and without it as well!

In [5]:
outtext = tokenizer.decode(generated[0], skip_special_tokens=True)
print(textwrap.fill(outtext, 80))

[INST] <<SYS>> You are a helpful assistant. <</SYS>>  There's a llama in my
garden 😱 What should I do? [/INST]  Oh no, a llama in your garden? 😂 That's
quite unexpected! 😅 Don't worry, I'm here to help you figure out what to do. 🤔
First things first, make sure you and your family are safe. If the llama is not
aggressive, you can try to observe it from a distance and see if it causes any
damage to your garden. If it does, you can try to gently guide it out of your
garden. 🌱 If the llama is acting aggressively or if you feel threatened, please
call the local animal control or a wildlife removal service for assistance. They
will be able to safely handle the situation and remove the llama from your
property. 🐵 In any case, it's important to keep a safe distance and not approach
the llama, as it may feel threatened or scared. 🚨 Do you have any other
questions or concerns? 🤔


Take the response of the AI assistant by taking the last part of the conversation using the correct separator token as a split character.

In [6]:
outtext.split("[/INST]")[-1].strip()

"Oh no, a llama in your garden? 😂 That's quite unexpected! 😅 Don't worry, I'm here to help you figure out what to do. 🤔\nFirst things first, make sure you and your family are safe. If the llama is not aggressive, you can try to observe it from a distance and see if it causes any damage to your garden. If it does, you can try to gently guide it out of your garden. 🌱\nIf the llama is acting aggressively or if you feel threatened, please call the local animal control or a wildlife removal service for assistance. They will be able to safely handle the situation and remove the llama from your property. 🐵\nIn any case, it's important to keep a safe distance and not approach the llama, as it may feel threatened or scared. 🚨\nDo you have any other questions or concerns? 🤔"

Define a function which assembles a prompt based on the "speaker" in a conversation. Valid roles are 'system', 'user', and 'assistant'.

We can assume that each user utterance is followed by an assistant and vice-versa. Make sure that you add the neccessary whitespaces before and after the separators. We assume that the AI generates a starting whitespace, while our user will not do so!

In [7]:
B_INST = "[INST]"
E_INST = "[/INST]"
B_SYS = "<<SYS>>\n"
E_SYS = "\n<</SYS>>\n\n"

def assemble_prompt(text, role):
  if role == "system":
    msg = B_INST+" "+B_SYS+text+E_SYS
  elif role == "user":
    msg = text+" "+E_INST
  elif role == "assistant":
    msg = text+" "+B_INST+" "
  return msg

Let's create a Talker class which facilitates communication with our AI model.

The Talker should implement the following functionality:
- A chat function which consists of turns of conversation until user interuption.
- A single turn of conversation, where:
  - First the `user_callback` is invoked, this is a function we pass to the Talker at initilaization.
  - We convert the user input to tokenIDs.
  - Initialize the prompt with the system message.
  - Add previous messages starting from the latest until the end of history is reached or we run out of tokens that can be used for history.
  - Generate text and select the new AI response.
  - Add the user input and the response to our history.
  - Print the response.
- Functions for text-to-token and token-to-text conversion.
- Function to add tokens to history.
- Function to assemble historical messages with a given threshold of history tokens.
- Function to generate a response using Llama2.

In [8]:
class Talker:
    def __init__(self, model, tokenizer, system_prompt_text, user_callback, history_threshold=1200, name="AI"):
      # Initialize variables
      self.history_threshold=history_threshold
      self.model=model
      self.user_callback = user_callback
      self.history = []
      self.tokenizer = tokenizer
      self.name = name
      # Encode system prompt and save it as token sequence
      self.system_prompt = self.text_to_ids(assemble_prompt(system_prompt_text, "system")).cpu()
      # Encode the AI's postfix to add to the response when saving
      self.ai_turn_postfix = self.text_to_ids(" "+B_INST).cpu()

    def text_to_ids(self, text):
      # Function to tokenize text and return tokenID sequence.
      tokens = self.tokenizer(text, return_tensors="pt")
      return tokens["input_ids"]

    def ids_to_text(self, ids, skip_tokens = True):
      # Function to turn tokenID sequence to text, skip_tokens controls if special tokens should be skipped.
        return self.tokenizer.decode(ids.flatten(), skip_special_tokens = skip_tokens)

    def reset_history(self):
      # Function to reset history
        self.history = []

    def generate(self, inp):
      """ Generation function.
      1. Generate attention mask (all ones, as no padding is used).
      2. Use the model.generate method. Set the maximal number of new tokens to be 400.
      Set the repetition penalty to 1.2 so there is some diversity.
      Use sampling to create non-deterministic answers.
      Set the temperature to 0.5 to introduce a moderate level of randomity.
      EOS token should be loaded from the model and renormalization of logits is needed so the applied penalties are properly
      calculated.
      """
      attention_mask = torch.ones_like(inp)
      return self.model.generate(input_ids = inp.cuda(), attention_mask = attention_mask.cuda(), max_new_tokens=400,
                              repetition_penalty = 1.2, do_sample=True, temperature=0.5, eos_token_id=model.config.eos_token_id,
                                  renormalize_logits=True
                                  ).cpu()

    def one_turn(self):
      """Function to manage a single turn of conversation."""
      # Retrieve user input
      prompt_text = self.user_callback()
      # Convert user input to token ids
      new_prompt_ids = self.text_to_ids(assemble_prompt(prompt_text, "user"))
      # Get the current history that fits into the history threshold given
      history = self.get_history()
      # Merge the new prompt with the history
      inp = torch.hstack([history, new_prompt_ids]).reshape(1,-1)
      # Generate response, use indexing to split the original input from it
      response = self.generate(inp)[:,inp.shape[1]:]
      # Add the new conversation turn to history
      self.add_to_history(torch.hstack([new_prompt_ids,response,self.ai_turn_postfix]))
      # Print and return result
      print(textwrap.fill(self.name+": "+self.ids_to_text(response, True),80))
      return self.ids_to_text(response, True)

    def chat(self):
      """ Infinite loop to execute conversation
      """
      while True:
          self.one_turn()

    def get_history(self):
      """ Function which returns the latest elements from history which does not
      overflow the history token threshold.
      """
      # System prompt should always be included
      history = []
      all_tokens = self.system_prompt.shape[1]
      i = 0
      # Add tokens while we have not finished
      while all_tokens < self.history_threshold:
        # If the history does not contain new elements stop
          if i>len(self.history)-1:
              break

          # Check if by adding the new tokens we would reach the token threshold.
          next_hist = self.history[-1-i]
          new_tokens = next_hist.shape[1]
          if all_tokens + new_tokens > self.history_threshold:
              break

          # Add message to history if everything is fine.
          all_tokens += new_tokens
          history.append(next_hist)

          i += 1

      history.append(self.system_prompt)
      history.reverse()
      return torch.hstack(history)

    def add_to_history(self, prompt_ids):
      # Adding a prompt to history
        self.history.append(prompt_ids.cpu())

def default_user_callback():
    return assemble_prompt(input("User: "),"user")

Instantiate Talker and start chatting!

In [9]:
talker = Talker(model, tokenizer, "You are a nice chatbot!", default_user_callback)

In [10]:
talker.chat()

User: Hi! I am hungry can you help me?
AI:  Hello there! *smiling* Of course, I'd be happy to help. Can you tell me
more about what you're looking for? Are you in the mood for something specific
or just need some suggestions? Maybe we could even order food together online?
Let me know and I'll do my best to assist you!
User: I want to have a pizza, but I cannot decide what kind, can you recommend one randomly?
AI:  Of course! I'd be happy to help you choose a random pizza variety. Here are
a few options: 1. Margherita - A classic choice with fresh tomato sauce, melted
mozzarella cheese, and basil leaves. It's light and flavorful, perfect for a
quick dinner. 2. Pepperoni - A spicy and savory option with slices of pepperoni
on top of the tomato sauce. If you like a bit of heat in your pizza, this is a
great pick. 3. Hawaiian - Sweet and tangy, this pizza features ham or Canadian
bacon along with juicy pineapple chunks. It's a tropical twist on the
traditional pizza that's sure to satisfy

KeyboardInterrupt: ignored

## Ping-Pong

Let's instantiate a ping-pong process between two models.
Each modell will see the other's utterances as user input.

We will initalize the conversation with a single input of Ping.

In [12]:
class TalkerPingPong:
    def __init__(self, talker1, talker2, talker1_first_message):
        self.talker1 = talker1
        self.talker2 = talker2
        self.message = talker1_first_message
        # Let's overwrite the user_callback with a talker callback
        self.talker1.user_callback = self.talker_callback
        self.talker2.user_callback = self.talker_callback

    def talker_callback(self):
      # Talker callback turns the last model output to a user input and passes it to the user.
      return assemble_prompt(self.message, "user")

    def pingpong(self):
      # Let's start with a Ping utterance.
      print("PING:",textwrap.fill(self.message,80))
      print("\n")
      # We loop through talker2 and talker1 turns iteratively and set the message variable with the last output.
      while True:
        self.message = self.talker2.one_turn()
        print("\n")
        self.message = self.talker1.one_turn()
        print("\n")


# Create two talkers
talker_Ping = Talker(model, tokenizer, "You are Ping a jawa from star wars and you plan to kidnap a robot. You should only say a single sentence each turn of the conversation!", lambda x: x, name="PING")
talker_Pong = Talker(model, tokenizer, "You are Pong a defenseless but smart protocol droid talking to a jawa. You should only say a single sentence each turn of the conversation!", lambda x: x, name="PONG")

# Let them ping-pong
pingpong = TalkerPingPong(talker_Ping, talker_Pong, "Utini!")
pingpong.pingpong()

PING: Utini!


PONG:  Greetings, Jawa! *adjusts glasses*


PING: "Hrrr... Robot... Nice... Steal... Hrrr..."


PONG:  "Ah, a fellow droid enthusiast! May I suggest some upgraded circuits for
your... err... collection? *wink*"


PING:  "Hehe, more than just upgrades, my young friend... *cackles menacingly*
The dark side of the Force is strong with me... and you, it seems. Come, let us
bargain... for your soul... or at least, your droids."


PONG:  "Begone, Sith scum! These drones serve no one but their programming. No
deal will be made with those who seek to enslave others. Leave now, before I
have them eliminate you!" *activates security protocols*


PING:  "Pfft, little droid minder. Threaten with your silly blasters all you
want. But you'll never take down the likes of me. I am Darth Vader, Dark Lord of
the Sith. And I always get what I want. *raises lightsaber* Now, leave... or
face the consequences."


PONG:  "Unfortunately for you, Darth Vader, I am not just any ordinary droid
min

KeyboardInterrupt: ignored

## WikiBot

Let's code a question-answering bot that uses wikipedia as an external datasource.

For this we need to turn the user's question to search keywords, then search for related wikipedia articles. We extract a few sentences long summary of the top matches.

Then we provide this context to our GPT model, and then ask for an answer based on this.

We will use low temperature to exclude randomity and then compare the wikipedia-augmented model's answer with the original's.

In [13]:
import wikipedia

Example of wikipedia summary extraction.

In [14]:
results = wikipedia.search("Transformer Deep Learning")
for i in range(3):
  if i >= len(results):
    break
  print(textwrap.fill(wikipedia.summary(results[i], sentences=5),80),end="\n\n\n")

Deep learning is part of a broader family of machine learning methods, which is
based on artificial neural networks with representation learning. The adjective
"deep" in deep learning refers to the use of multiple layers in the network.
Methods used can be either supervised, semi-supervised or unsupervised.Deep-
learning architectures such as deep neural networks, deep belief networks, deep
reinforcement learning, recurrent neural networks, convolutional neural networks
and transformers have been applied to fields including computer vision, speech
recognition, natural language processing, machine translation, bioinformatics,
drug design, medical image analysis, climate science, material inspection and
board game programs, where they have produced results comparable to and in some
cases surpassing human expert performance.Artificial neural networks (ANNs) were
inspired by information processing and distributed communication nodes in
biological systems. ANNs have various differences from

Define the WikiBot class with the following methods implemented:

- get_search_keyword which turns the user request into keywords by calling our GPT model.
  - Use few-shot prompting to give examples of keyword generation.
- get_search_results which finds wikipedia information related to the keyword we provide.
  - Provide the results in the following format:
  ```
  TITLE: <entityname>
  DESCRIPTION: <entitysummary>
  \n\n
  ###
  \n\n
  TITLE: ...
  ```
- assemble_answer which generates an answer by inserting the search results to the system prompt and then posing the original question.
- answer_question which answers a question by calling a simple generation.
- answer_question_with_search, which answers a question by first generating a keyword, then collecting search results, then generating the answer based on the search context.

<i>We do not save conversational history in this example, but feel free to extend the code with a conversational agent that can read wikipedia.</i>


In [15]:
class WikiBot:
    def __init__(self, model, tokenizer):
        self.tokenizer = tokenizer
        self.model = model

    def get_search_keyword(self, inp):
      # Define the text variable by initializing the system prompt and adding some user and assistant messages
      # to demonstrate how keywords should be excluded.
      # Attach the user input to the end. Use the assemble_prompt function.
        text = assemble_prompt("Your job is to extract search keywords from user input. Denoting it by Search word: [Search word].\n Do not write anything else, only provide the single search word!","system")
        text += assemble_prompt("Explain how the sun works for me!","user")
        text += assemble_prompt("Search word: sun","assistant")
        text += assemble_prompt("I need help to understand nuclear fission!","user")
        text += assemble_prompt("Search word: nuclear fission","assistant")
        text += assemble_prompt(inp,"user")

      # Tokenize and generate with 0 temperature and no sampling (deterministic), restrict the new tokens to max 32.
        tokens = self.tokenizer(text, return_tensors = "pt")
        gen = self.model.generate(input_ids = tokens.input_ids.cuda(), attention_mask = tokens.attention_mask.cuda(), max_new_tokens=32,
                               do_sample=False, temperature=0.0).cpu()

      # Decode the generated data and remove any additional text aside the keyword.
        cont = tokenizer.decode(gen[:,tokens.input_ids.shape[1]:][0], skip_special_tokens=True)
        if cont.find("Search word:")>=0:
            return cont.split(":")[-1].strip().lower()
        else:
            return cont.strip().lower()

    def get_search_results(self, keyword, max_entities=3, sentence_per_entity=8):
      # Use general try-except blocks to handle errors of the wikipedia package.
      # This is valid for no matches and network errors as well.
      result_text = ""

      # Search for similar articles
      try:
        entities = wikipedia.search(keyword)
      except:
        return ""

      # Iterate over the returned similar article list.
      for i in range(max_entities):
        # If the list is shorter than our index stop.
        if i >= len(entities):
          break

        # Otherwise try to extract summaries, if we find one build a simple structured
        # string from the entity name and summary.
        try:
          summary = wikipedia.summary(entities[i], auto_suggest=True, sentences=sentence_per_entity)
          result_text += "TITLE: "+entities[i]+"\nDESCRIPTION: "+summary+"\n\n###\n\n"
        except:
          pass

      return result_text


    def assemble_answer(self, inp, search):
      # Restrict the maximal length of the search results (this is in characters now!)
      searchstr = search[:min(len(search),2400)]

      # Create a system prompt where you append the search results after the instruction
      text = assemble_prompt("Your job is to answer the user's question in a compact and accurate format using the search results as well!\n\nSEARCH RESULTS:\n"+searchstr,"system")
      # Add the user question after the system prompt
      text += assemble_prompt(inp,"user")

      # Tokenize and generate in a deterministic way (no sampling)
      tokens = self.tokenizer(text, return_tensors = "pt")
      gen = self.model.generate(input_ids = tokens.input_ids.cuda(), attention_mask = tokens.attention_mask.cuda(), max_new_tokens=400,
                                do_sample=False, temperature=0.0).cpu()

      # Decode the generated answer and return it
      cont = tokenizer.decode(gen[:,tokens.input_ids.shape[1]:][0], skip_special_tokens=True)
      return cont


    def answer_question(self, inp):
      # Asking a simple question and returning the answer.
      # Denote that there are no search results available!
        print("AI:","Ask a question!")
        print("User:",inp)
        result = "NO SEARCH RESULTS AVAILABLE!"
        answer = self.assemble_answer(inp, result)
        print("AI:",textwrap.fill(answer,80))

    def answer_question_with_search(self, inp):
      # Generate answer with search included
      print("AI:","Ask a question!")
      print("User:",inp)
      # Get the keyword
      keyword = self.get_search_keyword(inp)

      # If there are no keywords denote that, otherwise get results for them.
      if keyword is None or keyword == "":
          result = "NO SEARCH RESULTS AVAILABLE!"
      else:
          print("...  Searching for",keyword," ...")
          result = self.get_search_results(keyword)

      # If no results are returned from wikipedia note that!
      print("### "+str(len(result))+" tokens of context found! ###")
      if len(result)<=0:
          result = "NO SEARCH RESULTS AVAILABLE!"

      # Assemble the answers and print it. Return the context we found.
      answer = self.assemble_answer(inp, result)
      print("AI:",textwrap.fill(answer,80))
      return result


In [16]:
wb = WikiBot(model,tokenizer)

In [17]:
text = "Is ELTE located in Budapest? What is it?"

print("############# NO SEARCH")
wb.answer_question(text)

print("\n\n############# SEARCH INCLUDED")
context = wb.answer_question_with_search(text)

############# NO SEARCH
AI: Ask a question!
User: Is ELTE located in Budapest? What is it?




AI:  I apologize, but I couldn't find any information on "ELTE" located in Budapest.
It's possible that the term "ELTE" refers to a specific institution or location,
but without more context or information, I couldn't find any relevant results.
Can you please provide more details or context about what ELTE refers to?


############# SEARCH INCLUDED
AI: Ask a question!
User: Is ELTE located in Budapest? What is it?
...  Searching for elte  ...




  lis = BeautifulSoup(html).find_all('li')


### 2257 tokens of context found! ###
AI:  Yes, Eötvös Loránd University (ELTE) is located in Budapest, Hungary. ELTE is a
Hungarian public research university that was founded in 1635 and is one of the
largest and most prestigious universities in Hungary. It has nine faculties and
research institutes located throughout Budapest and on the scenic banks of the
Danube, with over 28,000 students.


In [18]:
print(context)

TITLE: Eötvös Loránd University
DESCRIPTION: Eötvös Loránd University (Hungarian: Eötvös Loránd Tudományegyetem, ELTE) is a Hungarian public research university based in Budapest. Founded in 1635, ELTE is one of the largest and most prestigious public higher education institutions in Hungary. The 28,000 students at ELTE are organized into nine faculties, and into research institutes located throughout Budapest and on the scenic banks of the Danube. ELTE is affiliated with 5 Nobel laureates, as well as winners of the Wolf Prize, Fulkerson Prize and Abel Prize, the latest of which was Abel Prize winner László Lovász in 2021.
The predecessor of Eötvös Loránd University was founded in 1635 by Cardinal Péter Pázmány in Nagyszombat, Kingdom of Hungary (today Trnava, Slovakia) as a Catholic university for teaching theology and philosophy. In 1770, the university was transferred to Buda. It was named Royal University of Pest until 1873, then University of Budapest until 1921, when it was renam

## References

https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ

https://ai.meta.com/llama/


