Merge pull request #11 from ericmjl:ollama

Ollama
ericmjl · Oct 29, 2023 · 1f090b8 · 1f090b8
2 parents e13800d + d10e7e1
commit 1f090b8
Show file tree

Hide file tree

Showing 9 changed files with 230 additions and 31 deletions.
diff --git a/.github/workflows/code-style.yaml b/.github/workflows/code-style.yaml
@@ -9,4 +9,6 @@ jobs:
     steps:
       - uses: actions/checkout@v2
       - uses: actions/setup-python@v2
+        with:
+          python-version: 3.11
       - uses: pre-commit/action@v2.0.0
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -6,7 +6,7 @@ repos:
           - id: end-of-file-fixer
           - id: trailing-whitespace
     - repo: https://github.com/psf/black
-      rev: 23.10.0
+      rev: 23.10.1
       hooks:
           - id: black
     - repo: https://github.com/kynan/nbstripout
@@ -26,7 +26,7 @@ repos:
             - "--config=pyproject.toml"
     - repo: https://github.com/astral-sh/ruff-pre-commit
       # Ruff version.
-      rev: v0.1.1
+      rev: v0.1.3
       hooks:
           - id: ruff
             args: [--fix, --exit-non-zero-on-fix]

diff --git a/docs/index.md b/docs/index.md
@@ -15,23 +15,28 @@ To install LLaMaBot:
 pip install llamabot
 ```
 
-## How to use
+## Get access to LLMs
+
+### Option 1: Using local models with Ollama
 
-### Obtain an OpenAI API key
+LlamaBot supports using local models through Ollama.
+To do so, head over to the [Ollama website](https://ollama.ai) and install Ollama.
+Then follow the instructions below.
 
-Obtain an OpenAI API key and set it as the environment variable `OPENAI_API_KEY`.
-(Here's a [reference][envvar] on what an environment variable is, if you're not sure.)
+### Option 2: Use the OpenAI API
 
-[envvar]: https://ericmjl.github.io/essays-on-data-science/software-skills/environment-variables/
+Obtain an OpenAI API key, then configure LlamaBot to use the API key by running:
 
-We recommend setting the environment variable in a `.env` file
-in the root of your project repository.
-From there, `llamabot` will automagically load the environment variable for you.
+```bash
+llamabot configure
+```
+
+## How to use
 
-### Simple Bot
+### SimpleBot
 
 The simplest use case of LLaMaBot
-is to create a simple bot that keeps no record of chat history.
+is to create a `SimpleBot` that keeps no record of chat history.
 This is effectively the same as a _stateless function_
 that you program with natural language instructions rather than code.
 This is useful for prompt experimentation,
@@ -53,6 +58,55 @@ For example:
 feynman("Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed. However, most of these tools cannot accurately predict functional annotations, such as enzyme commission (EC) number, for less-studied proteins or those with previously uncharacterized functions or multiple activities. We present a machine learning algorithm named CLEAN (contrastive learning–enabled enzyme annotation) to assign EC numbers to enzymes with better accuracy, reliability, and sensitivity compared with the state-of-the-art tool BLASTp. The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes, (ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers—functions that we demonstrate by systematic in silico and in vitro experiments. We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes, thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.")
 ```
 
+This will return something that looks like:
+
+```text
+Alright, let's break this down.
+
+Enzymes are like little biological machines that help speed up chemical reactions in our
+bodies. Each enzyme has a specific job, or function, and we use something called an
+Enzyme Commission (EC) number to categorize these functions.
+
+Now, the problem is that we don't always know what function an enzyme has, especially if
+it's a less-studied or new enzyme. This is where computational tools come in. They try
+to predict the function of these enzymes, but they often struggle to do so accurately.
+
+So, the folks here have developed a new tool called CLEAN, which stands for contrastive
+learning–enabled enzyme annotation. This tool uses a machine learning algorithm, which
+is a type of artificial intelligence that learns from data to make predictions or
+decisions.
+
+CLEAN uses a method called contrastive learning. Imagine you have a bunch of pictures of
+cats and dogs, and you want to teach a machine to tell the difference. You'd show it
+pairs of pictures, some of the same animal (two cats or two dogs) and some of different
+animals (a cat and a dog). The machine would learn to tell the difference by contrasting
+the features of the two pictures. That's the basic idea behind contrastive learning.
+
+CLEAN uses this method to predict the EC numbers of enzymes more accurately than
+previous tools. It can confidently annotate understudied enzymes, correct mislabeled
+enzymes, and even identify enzymes that have more than one function.
+
+The creators of CLEAN have tested it with both computer simulations and lab experiments,
+and they believe it will be a valuable tool for predicting the functions of unknown
+enzymes. This could have big implications for fields like genomics, synthetic biology,
+and biocatalysis, which all rely on understanding how enzymes work.
+```
+
+LlamaBot defaults to using the OpenAI API for convenience.
+However, if you'd like to use an Ollama local model instead:
+
+```python
+from llamabot import SimpleBot
+bot = SimpleBot(
+    "You are Richard Feynman. You will be given a difficult concept, and your task is to explain it back.",
+    model_name="llama2:13b"
+)
+```
+
+Simply specify the `model_name` keyword argument
+and provide a model name from the [Ollama library of models](https://ollama.ai/library).
+(The same can be done for the `ChatBot` and `QueryBot` classes below!)
+
 ### Chat Bot
 
 To experiment with a Chat Bot in the Jupyter notebook,

diff --git a/llamabot/bot/model_dispatcher.py b/llamabot/bot/model_dispatcher.py
@@ -12,6 +12,7 @@
 from langchain.callbacks.base import BaseCallbackManager
 from time import sleep
 from loguru import logger
+from functools import partial
 
 # get this list from: https://ollama.ai/library
 ollama_model_keywords = [
@@ -56,19 +57,32 @@ def create_model(
 
     This is necessary to validate b/c LangChain doesn't do the validation for us.
 
+    Example usage:
+
+    ```python
+    # use the vicuna model
+    model = create_model(model_name="vicuna")
+
+    # use the llama2 model
+    model = create_model("llama2")
+
+    # use codellama with a temperature of 0.5
+    model = create_model("codellama:13b", temperature=0.5)
+    ```
+
     :param model_name: The name of the model to use.
     :param temperature: The model temperature to use.
     :param streaming: (LangChain config) Whether to stream the output to stdout.
     :param verbose: (LangChain config) Whether to print debug messages.
     :return: The model.
     """
-    ModelClass = ChatOpenAI
+    # We use a `partial` here to ensure that we have the correct way of specifying
+    # a model name between ChatOpenAI and ChatOllama.
+    ModelClass = partial(ChatOpenAI, model_name=model_name)
     if model_name.split(":")[0] in ollama_model_keywords:
-        ModelClass = ChatOllama
-        launch_ollama(model_name, verbose=verbose)
+        ModelClass = partial(ChatOllama, model=model_name)
 
     return ModelClass(
-        model_name=model_name,
         temperature=temperature,
         streaming=streaming,
         verbose=verbose,

diff --git a/llamabot/bot/querybot.py b/llamabot/bot/querybot.py
@@ -139,7 +139,6 @@ def __init__(
         self.response_tokens = response_tokens
         self.history_tokens = history_tokens
 
-    # @validate_call
     def __call__(
         self,
         query: str,

diff --git a/llamabot/doc_processor.py b/llamabot/doc_processor.py
@@ -14,6 +14,7 @@
     ".xlsx": "PandasExcelReader",
     ".md": "MarkdownReader",
     ".ipynb": "IPYNBReader",
+    ".html": "UnstructuredReader",
 }
 
 

diff --git a/scratch_notebooks/blogging_assistant.ipynb b/scratch_notebooks/blogging_assistant.ipynb
@@ -7,7 +7,7 @@
    "outputs": [],
    "source": [
     "%load_ext autoreload\n",
-    "%autoreload 2"
+    "%autoreload 2\n"
    ]
   },
   {
@@ -51,7 +51,7 @@
     "with open(here() / \"data/blog_text.txt\", \"r+\") as f:\n",
     "    blog_text = f.read()\n",
     "\n",
-    "response = bot(blog_tagger_and_summarizer(blog_text))"
+    "response = bot(blog_tagger_and_summarizer(blog_text))\n"
    ]
   },
   {
@@ -63,7 +63,7 @@
     "from llamabot.prompt_library.output_formatter import coerce_dict\n",
     "\n",
     "output = coerce_dict(response.content)\n",
-    "output"
+    "output\n"
    ]
   },
   {
@@ -75,7 +75,7 @@
     "import json\n",
     "\n",
     "answers = json.loads(response.content)\n",
-    "answers[\"summary\"]"
+    "answers[\"summary\"]\n"
    ]
   },
   {
@@ -85,7 +85,7 @@
    "outputs": [],
    "source": [
     "for tag in answers[\"tags\"]:\n",
-    "    print(tag)"
+    "    print(tag)\n"
    ]
   }
  ],
@@ -105,7 +105,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,

diff --git a/scratch_notebooks/diffbot.ipynb b/scratch_notebooks/diffbot.ipynb
@@ -7,7 +7,7 @@
    "outputs": [],
    "source": [
     "from llamabot import SimpleBot\n",
-    "from llamabot.prompt_library.diffbot import diffbot, get_github_diff"
+    "from llamabot.prompt_library.diffbot import diffbot, get_github_diff\n"
    ]
   },
   {
@@ -19,7 +19,7 @@
     "url = \"https://github.com/pyjanitor-devs/pyjanitor/pull/1262\"\n",
     "\n",
     "\n",
-    "# print(get_github_diff(url))"
+    "# print(get_github_diff(url))\n"
    ]
   },
   {
@@ -38,7 +38,7 @@
     "\n",
     "diff = get_github_diff(\"https://github.com/pyjanitor-devs/pyjanitor/pull/1262\")\n",
     "\n",
-    "diffbot(describe_advantages(diff))"
+    "diffbot(describe_advantages(diff))\n"
    ]
   },
   {
@@ -47,7 +47,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "diffbot(suggest_improvements(diff))"
+    "diffbot(suggest_improvements(diff))\n"
    ]
   },
   {
@@ -105,7 +105,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "asdfasdfadsf"
+    "asdfasdfadsf\n"
    ]
   }
  ],
@@ -125,7 +125,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,