## ML News

- Multi Perspective Evaluations Available on Open LLM Leaderboard: 
    - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard


- Alternative Large Scale Evaluation Benchmark: 
    - HELM: https://crfm.stanford.edu/helm/latest/
    - unfortunately companies do not do it ... but open source should!


- InstructEval: Large Scale Evaluation of Instruction Fine Tuned Models
    -  https://arxiv.org/abs/2306.04757


- New AMD AI Stack + Huggingface Support:
    - https://huggingface.co/blog/huggingface-and-amd


- H2O GPT Models: 
    - chat + web plugin: https://gpt-gm.h2o.ai/
    - file upload + chat: https://falcon.h2o.ai/
    - models: https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2

- **Studentische Wahlen**
    - https://www.uni-jena.de/wahlamt

- **Thema Diskussionsveranstaltung**
    - Legal aspects of AI and LLMs, Copyright, Watermarking, Autonomous Cars, Alignment Problems,  : I
    - Learning, Publishing, Studying, Using AI Systems in University
    - Low Ressource LLM Deployments: llama.cpp, webgpu, web-llm, mlc-llm
    - How does AI change society? How will it change society in the future? Working in ages of AI, Impact of AI on society: I
    
---
    
- 26. Juni
    - Diskussionsveranstaltung
    
- 3. Juli
    - Vorträge
    - ?
    - früher besser als später :) (ab12)

---

- llama.cpp: https://github.com/ggerganov/llama.cpp
- pyllama.cpp: https://github.com/abdeladim-s/pyllamacpp

# Dialog Modelling

- Open NLP Book by Dan Jurafsky and James Martin: https://web.stanford.edu/~jurafsky/slp3/

### Speech Acts
- In Dialogue text/utterances work as ACTIONS
- there are 4 different types of dialog actions (=speech acts)

### Grounding
- = establishing common ground between dialog partners = feeling of being understood by the other

### Dialog Structures
- adjacency pairs (e.g. question-answer, utterance-clarification request, ...)

### Initiative
- single initiated
- mixed initiative


# Rule-based Dialog Systems

= ELIZA (1966): https://dl.acm.org/doi/10.1145/365153.365168

![](https://d3i71xaburhd42.cloudfront.net/763542e63ab6720759d0b9e78fb17416d25efa06/4-Figure2-1.png)

![](https://www.researchgate.net/publication/351363891/figure/fig1/AS:1020499173847041@1620317371344/Sample-ELIZA-dialogue-from-Weizenbaum-1966.jpg)


**PATTERN MATCHING RULES**: 
    - match a pattern based on a grammar
    - if you matched a pattern -> extract an input(=a keyword) from that pattern and apply a rule that uses that input to return an output
    - (see paper)


# Slot-Filling-based Dialog Systems

- Used in task-oriented dialog scenarios e.g. user wants to book a hotel, book a flight, book an appointment with a doctor, ..
- Example: https://rasa.com/


- **SLOTS**: set of values that the bot has to gain information about, e.g. flight destination, flight time, flight price. When all values of a SLOT are obtained -> we can call a function and serve the user with an answer/response. 

![](https://ithelp.ithome.com.tw/upload/images/20190919/2010385289dD1y8S2J.png)

- **INTENS**: = classification of user inputs into a fixed set of intents
- **ACTIONS**: for every intent define 

![](https://i.ytimg.com/vi/WK7C8pLnfvY/maxresdefault.jpg)

![](https://eng.ftech.ai/wp-content/uploads/2019/06/Screen-Shot-2019-06-04-at-18.16.49-1024x542.png)

- **State Tracking**: 
    - SLOTS: keep track of the current values in the slots 
    - DIALOG HISTORY: keep track of (recent) dialog history, e.g. last 5 turns

![](https://smazee.com/uploads/images/rasa-5.png)

- **Dialog Policy**: given a set of dialogs (= sequences of intent + action) learn an optimal policy for which action to take optimally for which intent to fill the slots in the shortest amount of time and therefore fullfill the task of the task-oriented dialog optimally

# Prompt Based Dialog Systems

= e.g. ChatGPT (LLM based dialog systems)

- **PRE-TRAINING**
    - Trillions of tokens, next word prediction

![](https://thegradient.pub/content/images/2019/10/lm-1.png)

- **ALIGNMENT**: 
    - Reinforcement Learning from Human Feedback (RLHF)
    - DPO = direct preference optimization: https://arxiv.org/abs/2305.18290
    - Bayesian Approaches: https://en.wikipedia.org/wiki/Human_Compatiblej, no implementations for llms seen so far

![](https://miro.medium.com/v2/resize:fit:1400/1*yCzfUi2CgSl-yW_gYAjDMw.png)

![](https://pbs.twimg.com/media/FxdM7muXsAI5MnH.jpg:large)


- **PROMPT**: 
    - system prompt
        - example: https://gpt-gm.h2o.ai/
    - user input  
    - in context learning/few shot learning -> examples in prompt (e.g. for retrieval augmented systems)

![](https://preview.redd.it/using-system-message-to-tell-gpt-4-its-an-expert-and-that-a-v0-g9ov2fx2wdqa1.png?width=614&format=png&auto=webp&s=730693ae3bb71febdfad0ac615854325164b0010)

![](https://media.arxiv-vanity.com/render-output/6470302/x1.png)

- **STATE**: = state representation, usually included in prompt
    - sequence state = dialog history
        - ["Hello", "Hi, How can I help you?", ...]
    - graph state = e.g. dialog partner, dialog topic = e..g as JSON GRAPH, for us -> this was the state of the visualzation
        - {"Person": {"Name": "Max", "Age":"25", ...}}



- **QUERY PRE-PROCESSING**
    - query similarity estimation -> fallbacks for bad inputs

![](https://inside-machinelearning.com/wp-content/uploads/2021/11/tsne_sentences_embedding_visualization-1.jpeg)

- **QUERY POST_PROCESSING**
    - query control by second model
    - query similarity estimation (=harm) e.g. via sentence bert
    - correctness check (current focus of research)


**PLUGINS**: 
- retrieval mechanisms
- APIs for external services
- langchain: https://github.com/hwchase17/langchain
    - in langchain called TOOLS
    - = in essence these are function calls and the model uses the output of the function for further iteration


# CPU CODE EXAMPLE

In [None]:
!wget https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML/resolve/main/WizardLM-7B-uncensored.ggmlv3.q4_0.bin

In [None]:
!pip install pyllamacpp

In [None]:
model_path = "/content/WizardLM-7B-uncensored.ggmlv3.q4_0.bin" # @param {type: "string"}

In [None]:
from pyllamacpp.model import Model

model = Model(model_path=model_path, n_ctx=512)

prompt = "how much gravity does the moon have?"  # @param {type: 'string'}
n_threads = 2  # @param {type: 'integer'}
n_predict = 45  # @param {type: 'integer'}
repeat_last_n = 64  # @param {type: 'integer'}
top_k = 40  # @param {type: 'integer'}
top_p = 0.95  # @param {type: 'number'}
temp = 0.6  # @param {type: 'number'}
repeat_penalty = 1.3  # @param {type: 'number'}


for token in model.generate(prompt,
                                n_threads=n_threads,
                                n_predict=n_predict,
                                repeat_last_n=repeat_last_n,
                                top_k=top_k,
                                top_p=top_p,
                                temp=temp,
                                repeat_penalty=repeat_penalty):
    print(token, end='', flush=True)