Skip to content

Commit

Permalink
starter files
Browse files Browse the repository at this point in the history
  • Loading branch information
mlincon committed Jun 23, 2024
1 parent 316af79 commit 4144aed
Show file tree
Hide file tree
Showing 12 changed files with 2,556 additions and 0 deletions.
Empty file removed 02-open-source/.gitkeep
Empty file.
192 changes: 192 additions & 0 deletions 02-open-source/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# 2. Open-Source LLMs

In the previous module, we used OpenAI via OpenAI API. It's
a very convenient way to use an LLM, but you have to pay
for the usage, and you don't have control over the
model you get to use.

In this module, we'll look at using open-source LLMs instead.

## 2.1 Open-Source LLMs - Introduction

<a href="https://www.youtube.com/watch?v=ATchkIRsH4g&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/ATchkIRsH4g">
</a>

* Open-Source LLMs
* Replacing the LLM box in the RAG flow

## 2.2 Using a GPU in Saturn Cloud

<a href="https://www.youtube.com/watch?v=E0cAqBWfJYY&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/E0cAqBWfJYY">
</a>

* Registering in Saturn Cloud
* Configuring secrets and git
* Creating an instance with a GPU

```bash
pip install -U transformers accelerate bitsandbytes
```

Links:

* https://saturncloud.io/
* https://github.com/DataTalksClub/llm-zoomcamp-saturncloud


## 2.3 FLAN-T5

<a href="https://www.youtube.com/watch?v=a86iTyxnFE4&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/a86iTyxnFE4">
</a>

* Model: `google/flan-t5-xl`
* Notebook: [huggingface-flan-t5.ipynb](huggingface-flan-t5.ipynb)

```bash
import os
os.environ['HF_HOME'] = '/run/cache/'
```

Links:

* https://huggingface.co/google/flan-t5-xl
* https://huggingface.co/docs/transformers/en/model_doc/flan-t5

Explanation of Parameters:

* `max_length`: Set this to a higher value if you want longer responses. For example, `max_length=300`.
* `num_beams`: Increasing this can lead to more thorough exploration of possible sequences. Typical values are between 5 and 10.
* `do_sample`: Set this to `True` to use sampling methods. This can produce more diverse responses.
* `temperature`: Lowering this value makes the model more confident and deterministic, while higher values increase diversity. Typical values range from 0.7 to 1.5.
* `top_k` and `top_p`: These parameters control nucleus sampling. `top_k` limits the sampling pool to the top `k` tokens, while `top_p` uses cumulative probability to cut off the sampling pool. Adjust these based on the desired level of randomness.


## 2.4 Phi 3 Mini

<a href="https://www.youtube.com/watch?v=8KH6AS2PqWk&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/8KH6AS2PqWk">
</a>

* Model: `microsoft/Phi-3-mini-128k-instruct`
* Notebook: [huggingface-phi3.ipynb](huggingface-phi3.ipynb)


Links:

* https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

## 2.5 Mistral-7B and HuggingFace Hub Authentication

<a href="https://www.youtube.com/watch?v=TdVEOzSoUCs&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/TdVEOzSoUCs">
</a>

* Model: `mistralai/Mistral-7B-v0.1`
* Notebook: [huggingface-mistral-7b.ipynb](huggingface-mistral-7b.ipynb)

[ChatGPT instructions for serving](serving-hugging-face-models.md)


Links:

* https://huggingface.co/docs/transformers/en/llm_tutorial
* https://huggingface.co/settings/tokens
* https://huggingface.co/mistralai/Mistral-7B-v0.1


## 2.6 Other models

<a href="https://www.youtube.com/watch?v=GzPV_HTmCkc&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/GzPV_HTmCkc">
</a>

* [`LLM360/Amber`](https://huggingface.co/LLM360/Amber)
* [Gemma-7B](https://huggingface.co/blog/gemma)
* [SaulLM-7B](https://huggingface.co/papers/2403.03883)
* [Granite-7B](https://huggingface.co/ibm-granite/granite-7b-base)
* [MPT-7B](https://huggingface.co/mosaicml/mpt-7b)
* [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b)

Where to find them:

* Leaderboards
* Google
* ChatGPT

Links:

* https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
* https://huggingface.co/spaces/optimum/llm-perf-leaderboard


## 2.7 Ollama - Running LLMs on a CPU

<a href="https://www.youtube.com/watch?v=PVpBGs_iSjY&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/PVpBGs_iSjY">
</a>

* The easiest way to run an LLM without a GPU is using [Ollama](https://github.com/ollama/ollama)
* Notebook [ollama.ipynb](ollama.ipynb)

For Linux:

```bash
curl -fsSL https://ollama.com/install.sh | sh

ollama start
ollama serve phi3
```

[Prompt example](prompt.md)

Connecting to it with OpenAI API:

```python
from openai import OpenAI

client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama',
)
```

Docker

```bash
docker run -it \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
```

Pulling the model

```bash
docker exec -it bash
ollama pull phi3
```


## 2.8 Ollama & Phi3 + Elastic in Docker-Compose

<a href="https://www.youtube.com/watch?v=4juoo_jk96U&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/4juoo_jk96U">
</a>

* Creating a Docker-Compose file
* Re-running the module 1 notebook


## 2.9 UI for RAG

<a href="https://www.youtube.com/watch?v=R6L8PZ-7bGo&list=PL3MmuxUbc_hIB4fSqLy_0AfTjVLpgjV3R">
<img src="https://markdown-videos-api.jorgenkh.no/youtube/R6L8PZ-7bGo">
</a>


* Putting it in Streamlit
* [Code](qa_faq.py)
23 changes: 23 additions & 0 deletions 02-open-source/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
version: '3.8'

services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.4.3
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
- "9300:9300"

ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"

volumes:
ollama:
Loading

0 comments on commit 4144aed

Please sign in to comment.