Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated ollama docs #8995

Merged
merged 5 commits into from Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
126 changes: 77 additions & 49 deletions docs/integrations/ai-engines/ollama.mdx
Expand Up @@ -3,82 +3,110 @@ title: Ollama
sidebarTitle: Ollama
---

This documentation describes the integration of MindsDB with [Ollama](https://ollama.com/), a tool that enables local deployment of large language models.
The integration allows for the deployment of Ollama models within MindsDB, providing the models with access to data from various data sources.

[Ollama](https://ollama.ai/) is a project that enables easy local deployment of Large Language Models (LLMs).
## Prerequisites

All models supported by Ollama are available in MindsDB through this integration.
Before proceeding, ensure the following prerequisites are met:

<Warning>
For now, this integration will only work in MacOS, with Linux and Windows to come later.
</Warning>
1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
2. To use Ollama within MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Follow [this instruction](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) to download Ollama and run models locally.

<Info>
Here are the recommended system specifications:

Locally deployed LLMs can be desirable for a wide variety of reasons. In this case, data privacy, developer feedback-loop speed and inference cost reduction can be powerful reasons to opt for a local LLM.

Ideal predictive use cases, as in other LLM-focused integrations (e.g. OpenAI, Anthropic, Cohere), will be anything involving language understanding and generation, including but not limited to:
- zero-shot text classification
- sentiment analysis
- question answering
- summarization
- translation

- A working Ollama installation, as in point 3.
- For 7B models, at least 8GB RAM is recommended.
- For 13B models, at least 16GB RAM is recommended.
- For 70B models, at least 64GB RAM is recommended.
</Info>

## Setup

* A macOS machine, M1 chip or greater.
* A working Ollama installation. For instructions refer to their [webpage](https://ollama.ai). This step should be really simple.
* For 7B models, at least 8GB RAM is recommended.
* For 13B models, at least 16GB RAM is recommended.
* For 70B models, at least 64GB RAM is recommended.
Create an AI engine from the [Ollama handler](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/ollama_handler).

More information [here](https://ollama.ai/library/llama2). Minimum specs can vary depending on the model.
```sql
CREATE ML_ENGINE ollama_engine
FROM ollama;
```

### AI Engine
Create a model using `ollama_engine` as an engine.

```sql
CREATE MODEL ollama_model
PREDICT completion
USING
engine = 'ollama_engine', -- engine name as created via CREATE ML_ENGINE
model_name = 'model-name', -- model run with 'ollama run model-name'
ollama_serve_url = 'http://localhost:11434';
```

Before creating a model, it is required to create an AI engine based on the provided handler.
<Tip>
If you run Ollama inside the Docker container, use the following parameter value: `ollama_serve_url = 'http://host.docker.internal:11434'`.
</Tip>

You can find [available models here](https://github.com/ollama/ollama?tab=readme-ov-file#model-library).

You can create an Ollama engine using this command:
## Usage

```sql
CREATE ML_ENGINE ollama FROM ollama;
```
The following usage examples utilize `ollama_engine` to create a model with the `CREATE MODEL` statement.

The name of the engine (here, `ollama`) should be used as a value for the `engine` parameter in the `USING` clause of the `CREATE MODEL` statement.
Deploy and use the `llama2` model.

### AI Model
First, [download Ollama](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and run the model locally by executing `ollama run llama2`.

The [`CREATE MODEL`](/sql/create/model) statement is used to create, train, and deploy models within MindsDB.
Now deploy this model within MindsDB.

```sql
CREATE MODEL mindsdb.my_llama2
CREATE MODEL llama2_model
PREDICT completion
USING
engine = 'ollama',
model_name = 'llama2'
USING
engine = 'ollama_engine',
model_name = 'llama2';
```

Where:

| Name | Description |
|-------------------|---------------------------------------------------------------------------|
| `engine` | It defines the Ollama engine. |
| `model_name` | It is used to provide the name of the model to be used |
Query the model to get predictions.

Supported commands for describing Ollama models are:
```sql
SELECT text, completion
FROM llama2_model
WHERE text = 'Hello';
```

1. `DESCRIBE ollama_model;`
2. `DESCRIBE ollama_model.model;`
3. `DESCRIBE ollama_model.features;`
Here is the output:

```sql
+-------+------------+
| text | completion |
+-------+------------+
| Hello | Hello! |
+-------+------------+
```

You can override the prompt message as below:

## Usage
```sql
SELECT text, completion
FROM llama2_model
WHERE text = 'Hello'
USING
prompt_template = 'Answer using exactly five words: {{text}}:';
```

Once you have connected to an Ollama model, you can use it to make predictions.
Here is the output:

```sql
SELECT text, completion
FROM my_llama2
WHERE text = 'hi there!';
```
+-------+------------------------------------+
| text | completion |
+-------+------------------------------------+
| Hello | Hello! *smiles* How are you today? |
+-------+------------------------------------+
```

<Tip>
**Next Steps**

Go to the [Use Cases](/use-cases/overview) section to see more examples.
</Tip>
150 changes: 88 additions & 62 deletions mindsdb/integrations/handlers/ollama_handler/README.md
@@ -1,86 +1,112 @@
# Ollama handler
---
title: Ollama
sidebarTitle: Ollama
---

## Briefly describe the ML framework this handler integrates with MindsDB, and how?
[Ollama](https://ollama.ai/) is a project that enables easy local deployment of Large Language Models (LLMs). All models supported by Ollama are available in MindsDB through this integration.
This documentation describes the integration of MindsDB with [Ollama](https://ollama.com/), a tool that enables local deployment of large language models.
The integration allows for the deployment of Ollama models within MindsDB, providing the models with access to data from various data sources.

For now, this integration will only work in MacOS and Linux. Windows is untested.
## Prerequisites

Call this handler by
`USING ENGINE="ollama"`, you can see a full example at the end of this readme.
Before proceeding, ensure the following prerequisites are met:

## Why is this integration useful? What does the ideal predictive use case for this integration look like? When would you definitely not use this integration?
Locally deployed LLMs can be desirable for a wide variety of reasons. In this case, data privacy, developer feedback-loop speed and inference cost reduction can be powerful reasons to opt for a local LLM.
1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
2. To use Ollama within MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Follow [this instruction](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) to download Ollama and run models locally.

Ideal predictive use cases, as in other LLM-focused integrations (e.g. OpenAI, Anthropic, Cohere), will be anything involving language understanding and generation, including but not limited to:
- zero-shot text classification
- sentiment analysis
- question answering
- summarization
- translation
<Info>
Here are the recommended system specifications:

Some current limitations of local LLMs:
- overall weaker performance (ranging from "somewhat" to "a lot") than commercial cloud-based LLMs, particularly GPT-4. Please study options carefully and benchmark thoroughly to ensure your LLM is at the right level of performance for your use case before deploying to production.
- steep entry barrier due to required hardware specs (macOS only, M1 chip or greater, a lot of RAM depending on model size)
- A working Ollama installation, as in point 3.
- For 7B models, at least 8GB RAM is recommended.
- For 13B models, at least 16GB RAM is recommended.
- For 70B models, at least 64GB RAM is recommended.
</Info>

## Are models created with this integration fast and scalable, in general?
Model training is not required, as these are pretrained models.
## Setup

Inference is generally fast, however stream generation is not supported at this time in MindsDB, so completions are only returned once the model has finished generating the entire sequence.
Create an AI engine from the [Ollama handler](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/ollama_handler).

## What are the recommended system specifications?

* A macOS machine, M1 chip or greater.
* A working Ollama installation. For instructions refer to their [webpage](https://ollama.ai). This step should be really simple.
* For 7B models, at least 8GB RAM is recommended.
* For 13B models, at least 16GB RAM is recommended.
* For 70B models, at least 64GB RAM is recommended.

More information [here](https://ollama.ai/library/llama2). Minimum specs can vary depending on the model.
```sql
CREATE ML_ENGINE ollama_engine
FROM ollama;
```

## To what degree can users control the underlying framework by passing parameters via the USING syntax?
The prompt template can be overridden at prediction time, e.g.:
Create a model using `ollama_engine` as an engine.

```sql
-- example: override template at prediction time
SELECT text, completion
FROM my_llama2
WHERE text = 'hi there!';
USING
prompt_template = 'Answer using exactly five words: {{text}}:';
CREATE MODEL ollama_model
PREDICT completion
USING
engine = 'ollama_engine', -- engine name as created via CREATE ML_ENGINE
model_name = 'model-name', -- model run with 'ollama run model-name'
ollama_serve_url = 'http://localhost:11434';
```

## Does this integration offer model explainability or insights via the DESCRIBE syntax?
It replicates the information exposed by the Ollama API, plus a few additional MindsDB-specific fields.
<Tip>
If you run Ollama inside the Docker container, use the following parameter value: `ollama_serve_url = 'http://host.docker.internal:11434'`.
</Tip>

Supported commands are:
1. `DESCRIBE ollama_model;`
2. `DESCRIBE ollama_model.model;`
3. `DESCRIBE ollama_model.features;`
You can find [available models here](https://github.com/ollama/ollama?tab=readme-ov-file#model-library).

## Does this integration support fine-tuning pre-existing models (i.e. is the update() method implemented)? Are there any caveats?
Not at this time.
## Usage

## Any directions for future work in subsequent versions of the handler?
A few are commented in the code:
1. add support for overriding modelfile params (e.g. temperature)
2. add support for storing `context` short conversational memory
3. actually store all model artifacts in the engine storage, instead of the internal Ollama mechanism. This may require upstream changes, though.
The following usage examples utilize `ollama_engine` to create a model with the `CREATE MODEL` statement.

## Please provide a minimal SQL example that uses this ML engine (pointers to integration tests in the PR also valid)
```sql
CREATE ML_ENGINE ollama FROM ollama;
Deploy and use the `llama2` model.

First, [download Ollama](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and run the model locally by executing `ollama run llama2`.

CREATE MODEL my_llama2
Now deploy this model within MindsDB.

```sql
CREATE MODEL llama2_model
PREDICT completion
USING
model_name = 'llama2',
engine = 'ollama';
ollama_serve_url = 'ollama:11434'
engine = 'ollama_engine',
model_name = 'llama2';
```

DESCRIBE my_llama2.model;
DESCRIBE my_llama2.features;
Query the model to get predictions.

```sql
SELECT text, completion
FROM my_llama2
WHERE text = 'hi there!';
```
FROM llama2_model
WHERE text = 'Hello';
```

Here is the output:

```sql
+-------+------------+
| text | completion |
+-------+------------+
| Hello | Hello! |
+-------+------------+
```

You can override the prompt message as below:

```sql
SELECT text, completion
FROM llama2_model
WHERE text = 'Hello'
USING
prompt_template = 'Answer using exactly five words: {{text}}:';
```

Here is the output:

```sql
+-------+------------------------------------+
| text | completion |
+-------+------------------------------------+
| Hello | Hello! *smiles* How are you today? |
+-------+------------------------------------+
```

<Tip>
**Next Steps**

Go to the [Use Cases](/use-cases/overview) section to see more examples.
</Tip>