Skip to content

mlim1972/llm-test

Repository files navigation

Project

This project serves as an example to setup an environment for LLM and Python. This project uses Conda to setup a separate python environment and VSCode with Jupyter plugins.

Create a virtual environment

You should create a virtual environment to work with this repository. This is a best practice to isolate your system configuration from requirements from this repository. There are a few ways to isolate your system from this repo:

  • venv. This README will not go over this method
  • Conda. Quick instruction can be found below
  • Development Containers. This README will not go over this method

Clone the project

Start by cloning the project and cd to the cloned project folder:

git clone https://github.com/mlim1972/llm-test.git
cd llm-test

Setting up Conda

If you do not have conda installed, you need to do so with the instructions below. You can check conda with the following command:

conda --version

If the above shows the version of conda installed, you can skip the Installing Conda section and go start from the Environment section.

Installing Conda

  1. Install mini-forge or mini-conda (uninstall Anaconda)

Download and install miniforge from: https://conda-forge.org/miniforge/

For MacOS, you can use Homebrew

brew install --cask miniforge
conda init zsh

More information: https://kirenz.github.io/codelabs/codelabs/miniforge-setup/

  1. VSCode plugins
    • Jupyter by Microsoft
    • Python by Microsoft
    • Dev Containers by Microsoft

Environment

The following instructions are one time environment setup

  • Create a conda environment
conda create --name llmtest python=3.11
  • Activate the environment
conda activate llmtest

After activation of your conda environment, confirm python 3.11.x is installed

python --version
  • Get dependencies to the environment. Run the following in the terminal after activating the conda environment
pip install -r requirements.txt

or run the setup script

./setup.sh

To deactivate your conda environment, you need call the following command:

conda deactivate

Now you're ready to work in VSCode using Jupyter Notebook. You need to attach your conda environment as the notebook kernel. As you save your project, VSCode will remember the kernel but if it does not, you can reattach it the same way:

  • From VSCode when opening any .ipynb files, click on "Select Kernel" -> "Python Environments..." and select your conda environment: llmtest

Setup

After creating the environment with Conda and activating it, you should run setup.sh

./setup.sh

Keys

Create a .env files and insert your keys there. Use the .env.sample file as reference. This project uses OpenAI, Groq, and Anthropic keys. Don't worry, the .env file is included in the .gitignore file. The .env.sample provides a sample of the key names required. Your can get keys from:

Files

  • setup.sh -> Run this bash file to setup your environment. The script detects if the project runs inside Conda, DevContainers, or GitHub Spaces to setup the necessary dependencies. If the repo is opened from DevContainers or GitHub Spaces, this script is run by default
  • PandasAI.ipynb -> Notebook sample about PandasAI
  • DuckDBAI.ipynb -> Notebook sample for DuckDB-NSQL
  • DuckDBAI2.ipynb -> Notebook sample for DuckDB-NSQL served by Ollama
  • Langchain-Wikipedia.ipynb -> Notebook sample showing how to use the Wikipedia Retriever in Langchain
  • Lanchain-Chromadb.ipynb -> Notebook sample using Langchain with ChromaDB for RAG
  • OllamaNotebook.ipynb -> Notebook sample showing how to use the Ollama API to call the Chat and Generate endpoints
  • chat.py. This is a Streamlit application that uses Ollama in a chat application. The following command is used to run it:
streamlit run chat.py

Notebook Information

This project is divided into different notebooks for each access and learning topics. Below is further description of each notebook:

DuckDB Notebooks

DuckDB is a embedded relational database build from the ground up to assist in data analytic. It is one of the fastest DB project for it's speed and flexibility. Think of DuckDB as a more robust SQLite.

  • DuckDBAI.ipynb. This notebook uses llama.cpp for local inference against a local gguf model. You need to download the DuckDB-NSQL-7B-v0.1-q8_0.gguf model so this notebook can load it. The setup.sh ensure you have all the necessary files and virtual environment ready to run all notebooks. This notebook queries a csv file using text to SQL prompting.
  • DuckDBAI2.ipynb. This notebook uses Ollama with the duckdb-nsql:7b-q8_0 model. This is similar DuckDBAI.ipynb but instead of a local inference, it uses Ollama.

Pandas Notebook

The Pandas notebook uses a library called PandasAI to query a csv file using text to SQL and the Pandas library. Both DuckDB and Pandas notebook objective is show how we can use LLMs to query a DB (CSV file) using text to SQL prompting.

Langchain Notebooks

The Langchain notebooks are example using langchain for document retrieval and RAG.

  • Langchain-Wikipedia.ipynb. This notebook shows an example of retrieving Wikipedia results and answering questions from the results.
  • Langchain-ChromaDB.ipynb. This notebook is an example of RAG. It uses an embedding model to create embeddings from documents. ChromaDB is used as the Vector Store and finally, it uses Ollama (and other providers) to respond to users inquiries. In addition, this notebook uses Ollama directly attaching the prompt and embeddings w/o Langchain chains.

Ollama Notebook

The Ollama notebook is an example to call Ollama using the two endpoints: chat and generate.

  • OllamaNotbook.ipynb. The notebook uses Ollama API to call the localhost ollama server to interact with the Chat and Generate endpoint. In addition, it also shows how to retrieve the response with and without streaming.

Ollama

Some tests are using Ollama to run local models. Instructions to install Ollama can be found at https://ollama.com/ with the following models:

  • llama3:8b-instruct-q8_0
  • duckdb-nsql:7b-q8_0
  • nomic-embed-text:latest

Limitations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published