updated ollama docs (#8995)

* updated ollama docs * updated note * added tip for docker * updates * updates
mindsdb · Apr 9, 2024 · b4b95be · b4b95be
1 parent 150ce9b
commit b4b95be
Show file tree

Hide file tree

Showing 2 changed files with 165 additions and 111 deletions.
diff --git a/docs/integrations/ai-engines/ollama.mdx b/docs/integrations/ai-engines/ollama.mdx
@@ -3,82 +3,110 @@ title: Ollama
 sidebarTitle: Ollama
 ---
 
+This documentation describes the integration of MindsDB with [Ollama](https://ollama.com/), a tool that enables local deployment of large language models.
+The integration allows for the deployment of Ollama models within MindsDB, providing the models with access to data from various data sources.
 
-[Ollama](https://ollama.ai/) is a project that enables easy local deployment of Large Language Models (LLMs). 
+## Prerequisites
 
-All models supported by Ollama are available in MindsDB through this integration.
+Before proceeding, ensure the following prerequisites are met:
 
-<Warning>
-For now, this integration will only work in MacOS, with Linux and Windows to come later.
-</Warning>
+1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
+2. To use Ollama within MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
+3. Follow [this instruction](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) to download Ollama and run models locally.
 
+<Info>
+Here are the recommended system specifications:
 
-Locally deployed LLMs can be desirable for a wide variety of reasons. In this case, data privacy, developer feedback-loop speed and inference cost reduction can be powerful reasons to opt for a local LLM.
-
-Ideal predictive use cases, as in other LLM-focused integrations (e.g. OpenAI, Anthropic, Cohere), will be anything involving language understanding and generation, including but not limited to:
-- zero-shot text classification
-- sentiment analysis
-- question answering
-- summarization
-- translation
-
+- A working Ollama installation, as in point 3.
+- For 7B models, at least 8GB RAM is recommended.
+- For 13B models, at least 16GB RAM is recommended.
+- For 70B models, at least 64GB RAM is recommended.
+</Info>
 
 ## Setup
 
-* A macOS machine, M1 chip or greater. 
-* A working Ollama installation. For instructions refer to their [webpage](https://ollama.ai). This step should be really simple.
-* For 7B models, at least 8GB RAM is recommended. 
-* For 13B models, at least 16GB RAM is recommended. 
-* For 70B models, at least 64GB RAM is recommended.
+Create an AI engine from the [Ollama handler](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/ollama_handler).
 
-More information [here](https://ollama.ai/library/llama2). Minimum specs can vary depending on the model.
+```sql
+CREATE ML_ENGINE ollama_engine
+FROM ollama;
+```
 
-### AI Engine
+Create a model using `ollama_engine` as an engine.
+
+```sql
+CREATE MODEL ollama_model
+PREDICT completion
+USING
+   engine = 'ollama_engine',   -- engine name as created via CREATE ML_ENGINE
+   model_name = 'model-name',  -- model run with 'ollama run model-name'
+   ollama_serve_url = 'http://localhost:11434';
+```
 
-Before creating a model, it is required to create an AI engine based on the provided handler.
+<Tip>
+If you run Ollama and MindsDB in separate Docker containers, use the `localhost` value of the container. For example, `ollama_serve_url = 'http://host.docker.internal:11434'`.
+</Tip>
 
+You can find [available models here](https://github.com/ollama/ollama?tab=readme-ov-file#model-library).
 
-You can create an Ollama engine using this command:
+## Usage
 
-```sql
-CREATE ML_ENGINE ollama FROM ollama;
-```
+The following usage examples utilize `ollama_engine` to create a model with the `CREATE MODEL` statement.
 
-The name of the engine (here, `ollama`) should be used as a value for the `engine` parameter in the `USING` clause of the `CREATE MODEL` statement.
+Deploy and use the `llama2` model.
 
-### AI Model
+First, [download Ollama](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and run the model locally by executing `ollama run llama2`.
 
-The [`CREATE MODEL`](/sql/create/model) statement is used to create, train, and deploy models within MindsDB.
+Now deploy this model within MindsDB.
 
 ```sql
-CREATE MODEL mindsdb.my_llama2
+CREATE MODEL llama2_model
 PREDICT completion
-USING 
-    engine = 'ollama',
-    model_name = 'llama2'
+USING
+   engine = 'ollama_engine',
+   model_name = 'llama2';
 ```
 
-Where:
-
-| Name              | Description                                                               |
-|-------------------|---------------------------------------------------------------------------|
-| `engine`          | It defines the Ollama engine.                                          |
-| `model_name`      | It is used to provide the name of the model to be used |
+Query the model to get predictions.
 
-Supported commands for describing Ollama models are:
+```sql
+SELECT text, completion
+FROM llama2_model
+WHERE text = 'Hello';
+```
 
-1. `DESCRIBE ollama_model;`
-2. `DESCRIBE ollama_model.model;`
-3. `DESCRIBE ollama_model.features;`
+Here is the output:
 
+```sql
++-------+------------+
+| text  | completion |
++-------+------------+
+| Hello | Hello!     |
++-------+------------+
+```
 
+You can override the prompt message as below:
 
-## Usage
+```sql
+SELECT text, completion
+FROM llama2_model
+WHERE text = 'Hello'
+USING 
+   prompt_template = 'Answer using exactly five words: {{text}}:';
+```
 
-Once you have connected to an Ollama model, you can use it to make predictions.
+Here is the output:
 
 ```sql
-SELECT text, completion
-FROM my_llama2
-WHERE text = 'hi there!';
-```
++-------+------------------------------------+
+| text  | completion                         |
++-------+------------------------------------+
+| Hello | Hello! *smiles* How are you today? |
++-------+------------------------------------+
+```
+
+<Tip>
+**Next Steps**
+
+Go to the [Use Cases](/use-cases/overview) section to see more examples.
+</Tip>
diff --git a/mindsdb/integrations/handlers/ollama_handler/README.md b/mindsdb/integrations/handlers/ollama_handler/README.md
@@ -1,86 +1,112 @@
-# Ollama handler 
+---
+title: Ollama
+sidebarTitle: Ollama
+---
 
-## Briefly describe the ML framework this handler integrates with MindsDB, and how?
-[Ollama](https://ollama.ai/) is a project that enables easy local deployment of Large Language Models (LLMs). All models supported by Ollama are available in MindsDB through this integration.
+This documentation describes the integration of MindsDB with [Ollama](https://ollama.com/), a tool that enables local deployment of large language models.
+The integration allows for the deployment of Ollama models within MindsDB, providing the models with access to data from various data sources.
 
-For now, this integration will only work in MacOS and Linux. Windows is untested.
+## Prerequisites
 
-Call this handler by
-`USING ENGINE="ollama"`, you can see a full example at the end of this readme.
+Before proceeding, ensure the following prerequisites are met:
 
-## Why is this integration useful? What does the ideal predictive use case for this integration look like? When would you definitely not use this integration?
-Locally deployed LLMs can be desirable for a wide variety of reasons. In this case, data privacy, developer feedback-loop speed and inference cost reduction can be powerful reasons to opt for a local LLM.
+1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
+2. To use Ollama within MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
+3. Follow [this instruction](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) to download Ollama and run models locally.
 
-Ideal predictive use cases, as in other LLM-focused integrations (e.g. OpenAI, Anthropic, Cohere), will be anything involving language understanding and generation, including but not limited to:
-- zero-shot text classification
-- sentiment analysis
-- question answering
-- summarization
-- translation
+<Info>
+Here are the recommended system specifications:
 
-Some current limitations of local LLMs:
-- overall weaker performance (ranging from "somewhat" to "a lot") than commercial cloud-based LLMs, particularly GPT-4. Please study options carefully and benchmark thoroughly to ensure your LLM is at the right level of performance for your use case before deploying to production.
-- steep entry barrier due to required hardware specs (macOS only, M1 chip or greater, a lot of RAM depending on model size)
+- A working Ollama installation, as in point 3.
+- For 7B models, at least 8GB RAM is recommended.
+- For 13B models, at least 16GB RAM is recommended.
+- For 70B models, at least 64GB RAM is recommended.
+</Info>
 
-## Are models created with this integration fast and scalable, in general?
-Model training is not required, as these are pretrained models. 
+## Setup
 
-Inference is generally fast, however stream generation is not supported at this time in MindsDB, so completions are only returned once the model has finished generating the entire sequence.
+Create an AI engine from the [Ollama handler](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/ollama_handler).
 
-## What are the recommended system specifications?
-
-* A macOS machine, M1 chip or greater. 
-* A working Ollama installation. For instructions refer to their [webpage](https://ollama.ai). This step should be really simple.
-* For 7B models, at least 8GB RAM is recommended. 
-* For 13B models, at least 16GB RAM is recommended. 
-* For 70B models, at least 64GB RAM is recommended.
-
-More information [here](https://ollama.ai/library/llama2). Minimum specs can vary depending on the model.
+```sql
+CREATE ML_ENGINE ollama_engine
+FROM ollama;
+```
 
-## To what degree can users control the underlying framework by passing parameters via the USING syntax?
-The prompt template can be overridden at prediction time, e.g.:
+Create a model using `ollama_engine` as an engine.
 
 ```sql
--- example: override template at prediction time
-SELECT text, completion
-FROM my_llama2
-WHERE text = 'hi there!';
-USING 
-prompt_template = 'Answer using exactly five words: {{text}}:';
+CREATE MODEL ollama_model
+PREDICT completion
+USING
+   engine = 'ollama_engine',   -- engine name as created via CREATE ML_ENGINE
+   model_name = 'model-name',  -- model run with 'ollama run model-name'
+   ollama_serve_url = 'http://localhost:11434';
 ```
 
-## Does this integration offer model explainability or insights via the DESCRIBE syntax?
-It replicates the information exposed by the Ollama API, plus a few additional MindsDB-specific fields.
+<Tip>
+If you run Ollama and MindsDB in separate Docker containers, use the `localhost` value of the container. For example, `ollama_serve_url = 'http://host.docker.internal:11434'`.
+</Tip>
 
-Supported commands are:
-1. `DESCRIBE ollama_model;`
-2. `DESCRIBE ollama_model.model;`
-3. `DESCRIBE ollama_model.features;`
+You can find [available models here](https://github.com/ollama/ollama?tab=readme-ov-file#model-library).
 
-## Does this integration support fine-tuning pre-existing models (i.e. is the update() method implemented)? Are there any caveats?
-Not at this time.
+## Usage
 
-## Any directions for future work in subsequent versions of the handler?
-A few are commented in the code:
-1. add support for overriding modelfile params (e.g. temperature)
-2. add support for storing `context` short conversational memory
-3. actually store all model artifacts in the engine storage, instead of the internal Ollama mechanism. This may require upstream changes, though.
+The following usage examples utilize `ollama_engine` to create a model with the `CREATE MODEL` statement.
 
-## Please provide a minimal SQL example that uses this ML engine (pointers to integration tests in the PR also valid)
-```sql
-CREATE ML_ENGINE ollama FROM ollama;
+Deploy and use the `llama2` model.
+
+First, [download Ollama](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and run the model locally by executing `ollama run llama2`.
 
-CREATE MODEL my_llama2
+Now deploy this model within MindsDB.
+
+```sql
+CREATE MODEL llama2_model
 PREDICT completion
 USING
-model_name = 'llama2',
-engine = 'ollama';
-ollama_serve_url = 'ollama:11434'
+   engine = 'ollama_engine',
+   model_name = 'llama2';
+```
 
-DESCRIBE my_llama2.model;
-DESCRIBE my_llama2.features;
+Query the model to get predictions.
 
+```sql
 SELECT text, completion
-FROM my_llama2
-WHERE text = 'hi there!';
-```
+FROM llama2_model
+WHERE text = 'Hello';
+```
+
+Here is the output:
+
+```sql
++-------+------------+
+| text  | completion |
++-------+------------+
+| Hello | Hello!     |
++-------+------------+
+```
+
+You can override the prompt message as below:
+
+```sql
+SELECT text, completion
+FROM llama2_model
+WHERE text = 'Hello'
+USING 
+   prompt_template = 'Answer using exactly five words: {{text}}:';
+```
+
+Here is the output:
+
+```sql
++-------+------------------------------------+
+| text  | completion                         |
++-------+------------------------------------+
+| Hello | Hello! *smiles* How are you today? |
++-------+------------------------------------+
+```
+
+<Tip>
+**Next Steps**
+
+Go to the [Use Cases](/use-cases/overview) section to see more examples.
+</Tip>