# Use Remote and Local Granite Code Models with LangChain

## Introduction and Setup

This notebook demonstrates using inference calls against a model hosted remotely on [Replicate](https://replicate.com/) and locally using [Ollama](https://ollama.com/).

### Install Granite `utils` package

This package is a thin shim with various functions that are required for notebooks.

To see the implementation of its functions, see the [utils repo](https://github.com/ibm-granite-community/utils/tree/main).

In [None]:
!pip install git+https://github.com/ibm-granite-community/utils

In [None]:
from ibm_granite_community.langchain_utils import find_langchain_model

### Define a Prompt

The cells below demonstrate a remote option and a local option for model inference.

Both will perform a blocking call using the following prompt:

In [None]:
prompt = """
    Show me a SQL query that fetches all columns for the first 50 rows
    in a table named 'users'."""

## Remote Model using Replicate

### Establish Replicate Account

To use this remote option, create an account at [Replicate](https://replicate.com).

### Add credit to your Replicate Account (optional)

To remove a barrier to entry to try the Granite Code models on the Replicate platform,
use [this link](https://replicate.com/invites/a8717bfe-2f3d-4a52-88ed-1356231cdf03) to add a
small amount of credit to your Replicate account.

### Provide your API Token

Obtain your `REPLICATE_API_TOKEN` at [replicate.com/account/api-tokens](https://replicate.com/account/api-tokens)

There are three ways to provide this value to the cells below.  In order of precedence:

1. As an environment variable
2. As a Google colab secret
3. Supplied by the user using `getpass()`

### Choose a Model

Two Granite Code models are available in the [`ibm-granite`](https://replicate.com/ibm-granite) org at Replicate.

The `find_langchain_model` function below imports the `replicate` package.


In [None]:
model_id = "ibm-granite/granite-8b-code-instruct-128k"
# model_id = "ibm-granite/granite-20b-code-instruct-8k"

granite_via_replicate = find_langchain_model(platform="Replicate", model_id=model_id)

### Perform Inference

In [None]:
replicate_response = granite_via_replicate.invoke(prompt)

print(f"Granite response from Replicate: {replicate_response}")

## Local Model using Ollama

### Install Dependencies

* [Download and Install Ollama](https://ollama.com/download)
* Start the Ollama Server: `ollama serve`

Then use a new terminal window to pull one or more of the following models. Each one will take a little while to download:

- Granite Code 20b: `ollama pull granite-code:20b`
- Granite Code 8b: `ollama pull granite-code:8b`
- Granite Code 3b: `ollama pull granite-code:3b`

### Choose a model

In [None]:
model_id = "granite-code:3b"
# model_id = "granite-code:8b"
# model_id = "granite-code:20b"

granite_via_ollama = find_langchain_model(platform="ollama", model_id=model_id)

### Perform Inference

In [None]:
ollama_response = granite_via_ollama.invoke(prompt)

print(f"Granite response from Ollama: {ollama_response}")