diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..862f63e4 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,158 @@ + + +# LLMWare Architecture +=============== + +llmware is characterized by a logically integrated set of data pipelines involved in building LLM-based workflows, centered on two main sub-pipelines with high-level interfaces intended to provide an abstraction layer over individual 'end point' components to promote code re-use and the ability to easily 'swap' different components with minimal, if any, code change: + +**1. Knowledge Ingestion** - "creating Gen Ai Food" - ingesting and organizing unstructured information from a wide range of data sources, including each of the major steps: + + - Extracting and Parsing + - Text Chunking + - Indexing, Organizing and Storing + - Embedding + - Retrieval + - Analytics and Reuse of Content + - Combining with SQL Table and Other Structured Content + + + **Core LLMWare classes**: **Library**, **Query** (retrieval module), **Parser**, **EmbeddingHandler** (embeddings module), **Graph**, **CustomTables** (resources module) and **Datasets** dataset_tools module). + + In many cases, it is easy to get things done in LLMWare using only **Library** and **Query** - which provide convenient interfaces into parsing and embedding such that most use cases will not require calling those classes directly. + + Supported document file types: pdf, pptx, docx, xlsx, txt, csv, html, jsonl, json, tsv, jpg, jpeg, png, wav, zip, md, mp3, mp4, m4a + + Key methods to know: + + - Ingest anything - `Library().add_files(input_folder_path="path/to/docs")` + + - Embed library - `Library().install_new_embedding(embedding_model_name="your embedding model", vector_db="your vector db")` + + - Run Query - `Query(library).query(query, query_type="semantic", result_count=20)` + + Top examples to get started: + + - [Parsing examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/Parsing) - ~14 stand-alone parsing examples for all common document types, including options for parsing in memory, outputting to JSON, parsing custom configured CSV and JSON files, running OCR on embedded images found in documents, table extraction, image extraction, text chunking, zip files, and web sources. + - [Embedding examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/Embedding) - ~15 stand-alone embedding examples to show how to use ~10 different vector databases and wide range of leading open source embedding models (including sentence transformers). + - [Retrieval examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/Retrieval) - ~10 stand-alone examples illustrating different query and retrieval techniques - semantic queries, text queries, document filters, page filters, 'hybrid' queries, author search, using query state, and generating bibliographies. + - [Dataset examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/Datasets) - ~5 stand-alone examples to show 'next steps' of how to leverage a Library to re-package content into various datasets and automated NLP analytics. + - [Fast start example #1-Parsing](https://www.github.com/llmware-ai/llmware/tree/main/fast_start/example-1-create_first_library.py) - shows the basics of parsing. + - [Fast start example #2-Embedding](https://www.github.com/llmware-ai/llmware/tree/main/fast_start/example-2-build_embeddings.py) - shows the basics of building embeddings. + - [CustomTable examples](https://www.github.com/llmware-ai/llmware/tree/main/Structured_Tables) - ~5 examples to start building structured tables that can be used in conjunction with LLM-based workflows. + + +**2. Model Prompting** - "Fun with LLMs" - the lifecycle of discovering, instantiating, and configuring an LLM-based model to execute an inference, including the ability to seamlessly prepare and integrate knowledge retrieval, and post-processing steps to validate accuracy, including: + + - ModelCatalog - discover, load and manage configuration + - Inference + - Function Calls + - Prompts + - Prompt with Sources + - Fact Checking methods + - Agent-based multi-step processes + - Prompt History + + Core LLMWare classes: **ModelCatalog** (models module), **Prompt**, **LLMfx** (agents module). + + Key methods to know: + + - Discover Models - `ModelCatalog().list_all_models()` + + - Load Model - `model = ModelCatalog().load_model(model_name)` + + - Inference - `response = model.inference(prompt, add_context=context)` + + - Prompt - wraps the model class to provide easy source/retrieval management + + - LLMfx - wraps the model class for function-calling SLIM models for agent processes + + While ~17 individual model classes are exposed in the models module, for most use cases, we recommend working through the higher-level interface of ModelCatalog, as it promotes code re-use and the easy ability to swap models. In many pipelines, even ModelCatalog is not required to be called directly, as the Prompt class (knowledge retrieval) and LLMfx (agents and function calls) class provide seamless workflow capabilities and are built on top of the ModelCatalog. + + Top examples to get started: + - [Models examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/Models) - ~20 examples showing a wide range of different model inferences and use cases, including the ability to integrate Ollama models, OpenChat (e.g., LMStudio) models, using LLama-3 and Phi-3, bringing your own models into the ModelCatalog, and configuring sampling settings. + - [Prompts examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/Prompts) - ~5 examples that illustrate how to use Prompt as an integrated workflow for integrating knowledge sources, managing prompt history, and applying fact-checking. + - [SLIM-Agents examples](https://www.github.com/llmware-ai/llmware/tree/main/examples/SLIM-Agents) - ~20 examples showing how to build multi-model, multi-step Agent processes using locally-running SLIM function calling models. + - [Fast start example #3-Prompts and Models](https://www.github.com/llmware-ai/llmware/tree/main/fast_start/example-3-prompts_and_models.py) - getting started with model inference. + + +In addition, to support these two key pipelines, LLMWare has a set of supporting and enabling classes and methods, including: + + - resource module: CollectionRetrieval, CollectionWriter, PromptState, QueryState, and ParserState - provides an abstraction layer on top of underlying database repositories and separate state mechanisms for major classes. + - gguf_configs module: GGUFConfigs + - model_configs module: global_model_repo_catalog_list, global_model_finetuning_prompt_wrappers_lookup, global_default_prompt_catalog + - util module: Utilities + - setup module: Setup + - status module: Status + - exceptions module: LLMWare Exceptions + - web_services module: classes for Wikipedia, YFinance, and WebSite extraction + + +**End-to-End Use Cases** - we publish and maintain a number of end-to-end use cases in [examples/Use_Cases](https://www.github.com/llmware-ai/llmware/tree/main/examples/Use_Cases) + + + +Need help or have questions? +============================ + +Check out the [llmware videos](https://www.youtube.com/@llmware) and [GitHub repository](https://github.com/llmware-ai/llmware). + +Reach out to us on [GitHub Discussions](https://github.com/llmware-ai/llmware/discussions). + + +# About the project + +`llmware` is © 2023-{{ "now" | date: "%Y" }} by [AI Bloks](https://www.aibloks.com/home). + +## Contributing +Please first discuss any change you want to make publicly, for example on GitHub via raising an [issue](https://github.com/llmware-ai/llmware/issues) or starting a [new discussion](https://github.com/llmware-ai/llmware/discussions). +You can also write an email or start a discussion on our Discrod channel. +Read more about becoming a contributor in the [GitHub repo](https://github.com/llmware-ai/llmware/blob/main/CONTRIBUTING.md). + +## Code of conduct +We welcome everyone into the ``llmware`` community. +[View our Code of Conduct](https://github.com/llmware-ai/llmware/blob/main/CODE_OF_CONDUCT.md) in our GitHub repository. + +## ``llmware`` and [AI Bloks](https://www.aibloks.com/home) +``llmware`` is an open source project from [AI Bloks](https://www.aibloks.com/home) - the company behind ``llmware``. +The company offers a Software as a Service (SaaS) Retrieval Augmented Generation (RAG) service. +[AI Bloks](https://www.aibloks.com/home) was founded by [Namee Oberst](https://www.linkedin.com/in/nameeoberst/) and [Darren Oberst](https://www.linkedin.com/in/darren-oberst-34a4b54/) in Oktober 2022. + +## License + +`llmware` is distributed by an [Apache-2.0 license](https://github.com/llmware-ai/llmware/blob/main/LICENSE). + +## Thank you to the contributors of ``llmware``! + + + +--- + +--- + diff --git a/docs/index.md b/docs/index.md index e64e3aee..a56d7b34 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,113 +2,267 @@ layout: default title: Home | llmware nav_order: 1 -description: llmware is an integrated framework with over 50+ models in Hugging Face for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. +description: llmware is an integrated framework with over 50+ models for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. permalink: / --- -# Welcome to +## Welcome to + -`llmware` is an integrated framework with over 50+ models in Hugging Face for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. -{: .fs-6 .fw-300 } +## 🧰🛠️🔩The Ultimate Toolkit for Building LLM Apps + +From quickly building POCs to scalable LLM Apps for the enterprise, LLMWare is packed with all the tools you need. + +`llmware` is an integrated framework with over 50+ small, specialized, open source models for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. + +This project provides a comprehensive set of tools that anyone can use - from a beginner to the most sophisticated AI developer - to rapidly build industrial-grade, knowledge-based enterprise LLM applications. + +Our specific focus is on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely. + + +## Getting Started + +1. Install llmware - `pip3 install llmware` + + +2. Make sure that you are running on a [supported platform](#platform-support). + + +3. Learn by example: + + -- [Fast Start examples](www.github.com/llmware-ai/llmware/tree/main/fast_start) - structured set of 6 examples (with no DB installations required) to learn the main concepts of RAG with LLMWare - each example has extensive comments, and a supporting video on Youtube to walk you through it. + + -- [Getting Started examples](www.github.com/llmware-ai/llmware/tree/main/examples/Getting_Started) - heavily-annotated examples that review many getting started elements - selecting a database, loading sample files, working with libraries, and how to use the Model Catalog. + + -- [Use Case examples](www.github.com/llmware-ai/llmware/tree/main/examples/Use_Cases) - longer examples that integrate several components of LLMWare to provide a framework for a solution for common use case patterns. + + -- Dive into specific area of interest - [Parsing](www.github.com/llmware-ai/llmware/tree/main/examples/Parsing) - [Models](www.github.com/llmware-ai/llmware/tree/main/examples/Models) - [Prompts](www.github.com/llmware-ai/llmware/tree/main/examples/Models) - [Agents](www.github.com/llmware-ai/llmware/tree/main/examples/SLIM-Agents) - and many more ... + + +4. We provide extensive [sample files](www.github.com/llmware-ai/tree/main/examples/Getting_Started/loading_sample_files.py) integrated into the examples, so you can copy-paste-run, and quickly validate that the installation is set up correctly, and to start seeing key classes and methods in action. We would encourage you to start with the 'out of the box' example first, and then use the example as the launching point for inserting your documents, models, queries, and workflows. + + +5. Learn by watching: check out the [LLMWare Youtube channel](www.youtube.com/@llmware). + + +6. Share with the community: join us on [Discord](https://discord.gg/MhZn5Nc39h). + + +[Install llmware](#install-llmware){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } +[Common Setup & Configuration Items](#platform-support){: .btn .fs-5 .mb-4 .mb-md-0 } +[Troubleshooting](#common-troubleshooting-issues){: .btn .fs-5 .mb-4 .mb-md-0 } +[Architecture](architecture.md/#llmware-architecture){: .btn .fs-5 .mb-4 .mb-md-0 } +[View llmware on GitHub](https://github.com/llmware-ai/llmware/tree/main){: .btn .fs-5 .mb-4 .mb-md-0 } +[Open an Issue on GitHub](https://github.com/llmware-ai/llmware/issues){: .btn .fs-5 .mb-4 .mb-md-0 } -[Install llmware](#install-llmware){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } -[View llmware on GitHub](https://github.com/llmware-ai/llmware/tree/main){: .btn .fs-5 .mb-4 .mb-md-0 } -[Open an Issue on GitHub](https://github.com/llmware-ai/llmware/issues){: .btn .fs-5 .mb-4 .mb-md-0 } ---- - ---- # Install llmware -{: .note} -> New wheels are built generally on PyPy on a weekly basis and updated on PyPy versioning. -> The development repo is updated and current at all times, but may have updates that are not yet in the PyPy wheel. -> All wheels are built and tested on -> - Mac Metal -> - Mac x86 -> - Windows x86 (+ with CUDA) -> - Linux x86 (+ with CUDA) - most testing on Ubuntu 22 and Ubuntu 20 - which are recommended. -> - Linux aarch64 - -{: .note} -> We recommend that you use at least ``llmware >= 0.2.0``. Other than that, make sure that you have the following -> set up. -> - Platforms: Mac M1, Mac x86, Windows, Linux (Ubuntu 22 preferred) -> - Hardware: 16 GB RAM minimum -> - Python versions: 3.9, 3.10, 3.11 - -You can install ``llmware`` via the Python Package Index (PIP), or you can manually download the ``wheel`` files from -the [GitHub repository](https://github.com/llmware-ai/llmware/tree/main/wheel_archives). - -## PIP -You can easily install `llmware` via `pip`. +___ +**Using Pip Install** + +- Installing llmware is easy: `pip3 install llmware` + + +- If you prefer, we also provide a set of recent wheels in the [wheel archives](github.com/llmware-ai/llmware/tree/main/wheel_archives) in this repository, which can be downloaded individually and used as follows: ```bash -pip install llmware -``` +pip3 install llmware-0.2.12-py3-none-any.wheel +```` + +- We generally keep the main branch of this repository current with all changes, but we only publish new wheels to PyPi approximately once per week -## Manual install of wheel files -First, go to the [wheel\_archives](https://github.com/llmware-ai/llmware/tree/main/wheel_archives) folder -and download the *wheel* you want to install. -For example, if you want to install ``llmware`` version ``0.2.5`` then choose ``llmware-0.2.5-py3-none-any.whl``. -After downloading, place the ``wheel`` archive in a folder. -Finally, navigate to that folder and and run ``pip3 install llmware-0.2.5-py3-none-any.whl``. -On linux, a typical work flow would be the following. +___ + +___ +**Cloning the Repository** + +- If you prefer to clone the repository: ```bash -cd Downloads +git clone git@github.com:llmware-ai/llmware.git +``` + +- The llmware package is contained entirely in the /llmware folder path, so you should be able to drop this folder (with all of its contents) into a project tree, and use the llmware module essentially the same as a pip install. + +- Please ensure that you are capturing and updating the /llmware/lib folder, which includes required compiled shared libraries. If you prefer, you can keep only those libs required for your OS platform. + +___ +# Platform Support + +**Platform Supported** + +- **Python 3.9+** (note that we just added support for 3.12 starting in llmware version 0.2.12) + + +- **System RAM**: recommended 16 GB RAM minimum (to run most local models on CPU) + + +- **OS Supported**: Mac OS M1/M2/M3, Windows, Linux Ubuntu 20/22. We regularly build and test on Windows and Linux platforms with and without CUDA drivers. + + +- **Deprecated OS**: Linux Aarch64 (0.2.6) and Mac x86 (0.2.10) - most features of llmware should work on these platforms, but new features integrated since those versions will not be available. If you have a particular need to work on one of these platforms, please raise an Issue, and we can work with you to try to find a solution. + + +- **Linux**: we build to GLIBC 2.31+ - so Linux versions with older GLIBC drivers will generally not work (e.g., Ubuntu 18). To check the GLIBC version, you can use the command `ldd --version`. If it is 2.31 or any higher version, it should work. + +___ + +___ +**Database** + +- LLMWare is an enterprise-grade data pipeline designed for persistent storage of key artifacts throughout the pipeline. We provide several options to parse 'in-memory' and write to jsonl files, but most of the functionality of LLMWare assumes that a persistent scalable data store will be used. + + +- There are three different types of data storage used in LLMWare: + + 1. **Text Collection database** - all of the LLMWare parsers, by default, parse and text chunk unstructured content (and associated metadata) into one of three databases used for text collections, organized in Libraries - **MongoDB**, **Postgres** and **SQLite**. -mkdir llmware -cd llmware + 2. **Vector database** - for storing and retrieving semantic embedding vectors, LLMWare supports the following vector databases - Milvus, PG Vector / Postgres, Qdrant, ChromaDB, Redis, Neo4J, Lance DB, Mongo-Atlas, Pinecone and FAISS. + + 3. **SQL Tables database** - for easily integrating table-based data into LLM workflows through the CustomTable class and for using in conjunction with a Text-2-SQL workflow - supported on Postgres and SQLite. -wget https://github.com/\ -llmware-ai/llmware/\ -blob/432b5530cda158f57442a3fe4a9f03a20945a41c/\ -wheel_archives/llmware-0.2.5-py3-none-any.whl -pip3 install llmware-0.2.5-py3-none-any.whl +- **Fast Start** option: you can start using SQLite locally without any separate installation by setting `LLMWareConfig.set_active_db("sqlite")` as shown in [configure_db_example](www.github.com/llmware-ai/llmware/blob/main/examples/Getting_Started/configure_db.py). For vector embedding examples, you can use ChromaDB, LanceDB or FAISS - all of which provide no-install options - just start using. + + +- **Install DB dependencies**: we provide a number of Docker-Compose scripts which can be used, or follow install instructions provided by the database - generally easiest to install locally with Docker. + + +**LLMWare File Storage** + +- llmware stores a variety of artifacts during its operation locally in the /llmware_data path, which can be found as follows: + +```python +from llmware.configs import LLMWareConfig +llmware_fp = LLMWareConfig().get_llmware_path() +print("llmware_data path: ", llmware_fp) ``` -# When to use llmware +- to change the llmware path, we can change both the 'home' path, which is the main filepath, and the 'llmware_data' path name +as follows: + +```python + +from llmware.configs import LLMWareConfig -``llmware`` focuses on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely. +# changing the llmware home path - change home + llmware_path_name +LLMWareConfig().set_home("/my/new/local/home/path") +LLMWareConfig().set_llmware_path_name("llmware_data2") +# check the new llmware home path +llmware_fp = LLMWareConfig().get_llmware_path() +print("updated llmware path: ", llmware_fp) -# Usage + +``` + +___ + +___ +**Local Models** + +- LLMWare treats open source and locally deployed models as "first class citizens" with all classes, methods and examples designed to work first with smaller, specialized, locally-deployed models. +- By default, most models are pulled from public HuggingFace repositories, and cached locally. LLMWare will store all models locally at the /llmware_data/model_repo path, with all assets found in a folder tree with the models name. +- If a Pytorch model is pulled from HuggingFace, then it will appear in the default HuggingFace /.cache path. +- To view the local model path: ```python -from llmware.models import ModelCatalog +from llmware.configs import LLMWareConfig -# get all SLIM models, delivered as small, fast quantized tools -ModelCatalog().get_llm_toolkit() +model_fp = LLMWareConfig().get_model_repo_path() +print("model repo path: ", model_fp) -# see the model in action with test script included -ModelCatalog().tool_test_run("slim-sentiment-tool") ``` +___ + +# Common Troubleshooting Issues +___ + + +1. **Can not install the pip package** + + -- Check your Python version. If using Python 3.9-3.11, then almost any version of llmware should work. If using an older Python (before 3.9), then it is likely that dependencies will fail in the pip process. If you are using Python 3.12, then you need to use llmware>=0.2.12. + + -- Dependency constraint error. If you receive a specific error around a dependency version constraint, then please raise an issue and include details about your OS, Python version, any unique elements in your virtual environment, and specific error. + + +2. **Parser module not found** + + -- Check your OS and confirm that you are using a [supported platform](#platform-support). + -- If you cloned the repository, please confirm that the /lib folder has been copied into your local path. + + +3. **Pytorch Model not loading** + + -- Confirm the obvious stuff - correct model name, model exists in Huggingface repository, connected to the Internet with open ports for HTTPS connection, etc. + + -- Check Pytorch version - update Pytorch to >2.0, which is required for many recent models released in the last 6 months, and in some cases, may require other dependencies not included in the llmware package. + + +4. **GGUF Model not loading** + + -- Confirm that you are using llmware>=0.2.11 for the latest GGUF support. + + -- Confirm that you are using a [supported platform](#platform-support). We provide pre-built binaries for llama.cpp as a back-end GGUF engine on the following platforms: + + - Mac M1/M2/M3 - OS version 14 - "with accelerate framework" + - Mac M1/M2/M3 - OS older versions - "without accelerate framework" + - Windows - x86 + - Windows with CUDA + - Linux - x86 (Ubuntu 20+) + - Linux with CUDA (Ubuntu 20+) + +If you are using a different OS platform, you have the option to "bring your own llama.cpp" lib as follows: + +```python +from llmware.gguf_configs import GGUFConfigs +GGUFConfigs().set_config("custom_lib_path", "/path/to/your/libllama_binary") +``` + +If you have any trouble, feel free to raise an Issue and we can provide you with instructions and/or help compiling llama.cpp for your platform. + + -- Specific GGUF model - if you are successfully using other GGUF models, and only having problems with a specific model, then please raise an Issue, and share the specific model and architecture. + + +5. **Example not working as expected** - please raise an issue, so we can evaluate and fix any bugs in the example code. Also, pull requests are always especially welcomed with a fix or improvement in an example. + + +6. **Model not leveraging CUDA available in environment.** + + -- **Check CUDA drivers installed correctly** - easy check of the NVIDIA CUDA drivers is to use `nvidia-smi` and `nvcc --version` from the command line. Both commands should respond positively with details on the versions and implementations. Any errors indicates that either the driver or CUDA toolkit are not installed or recognized. It can be complicated at times to debug the environment, usually with some trial and error. See extensive [Nvidia Developer documentation](docs.nvidia.com) for trouble-shooting steps, specific to your environment. + + -- **Check CUDA drivers are up to date** - we build to CUDA 12.1, which translates to a minimum of 525.60 on Linux, and 528.33 on Windows. + + -- **Pytorch model** - check that Pytorch is finding CUDA, e.g., `torch.cuda.is_available()` == True. We have seen issues on Windows, in particular, to confirm that your Pytorch version has been compiled with CUDA drivers. For Windows, in particular, we have found that you may need to compile a CUDA-specific version of Pytorch, using the following command: + + ```pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121``` + + -- **GGUF model** - logs will be displayed on the screen confirming that CUDA is being used, or whether 'fall-back' to CPU drivers. We run a custom CUDA install check, which you can run on your system with: + ```gpu_status = ModelCatalog().gpu_available``` + + If you are confirming CUDA present, but fall-back to CPU is being used, you can set the GGUFConfigs to force to CUDA: + ```GGUFConfigs().set_config("force_gpu", True)``` + + If you are looking to use specific optimizations, you can bring your own llama.cpp lib as follows: + ```GGUFConfigs().set_config("custom_lib_path", "/path/to/your/custom/llama_cpp_backend")``` + + -- If you can not debug after these steps, then please raise an Issue. We are happy to dig in and work with you to run FAST local inference. + + +7. **Model result inconsistent** + + -- when loading the model, set `temperature=0.0` and `sample=False` -> this will give a deterministic output for better testing and debugging. + + -- usually the issue will be related to the retrieval step and formation of the Prompt, and as always, good pipelines and a little experimentation usually help ! + + +# More information about the project - [see main repository](www.github.com/llmware-ai/llmware.git) + # About the project @@ -142,3 +296,27 @@ The company offers a Software as a Service (SaaS) Retrieval Augmented Generation {% endfor %} + + +--- + +---