# Getting Started with Granite Code

## Introduction

This notebook demonstrates using inference calls against a model hosted on [Replicate](https://replicate.com/). To see how you can use [Ollama](https://ollama.com/) to host models locally instead, see the [Continue VSCode](Continue_VSCode/Continue_VSCode.ipynb) recipe.

## Using a remotely-hosted model

The Granite Code models are available on [Replicate](https://replicate.com/).

At the moment, they are only available to members of the Granite Code team.
Request an invite to get access.

This guide will demonstrate a basic inference call using the `replicate` package as well
as via LangChain.
In both cases, you will provide a [Replicate API Token](https://replicate.com/account/api-tokens).


In [None]:
import getpass, os

replicate_api_token = getpass.getpass()

os.environ["REPLICATE_API_TOKEN"] = replicate_api_token


We will keep the model constant through the guide as well.
Replicate distinguishes between a "deployment" of a model from a "model".
In this case, we want to specify the Granite Code development deployment.

In [None]:
model_id = "ibm-granite/granite-8b-code-instruct-128k"

Let's also keep the prompt constant for now so we can focus on the inference calls:

In [None]:
prompt = "def f(x):"

### Replicate package

In [None]:
pip install replicate

In [None]:
import replicate

### Pass the prompt to the model for completion

In [None]:
model = replicate.models.get(model_id)
version = model.versions.get("797c070dc871d8fca417d7d188cf050778d7ce21a0318d26711a54207e9ee698")

prediction = replicate.predictions.create(
  version=version,
  input={"prompt": prompt}
)

prediction.wait()

print(prediction.output)
