# Getting Started with Granite Code

## Introduction

This notebook demonstrates using inference calls against a model hosted on [Replicate](https://replicate.com/). To see how you can use [Ollama](https://ollama.com/) to host models locally instead, see the [Continue VSCode](Continue_VSCode/Continue_VSCode.ipynb) recipe.

## Using a remotely-hosted model

The Granite Code models are available on [Replicate](https://replicate.com/).

At the moment, they are only available to members of the Granite Code team.
Request an invite to get access.

This guide will demonstrate a basic inference call using the `replicate` package as well
as via LangChain.
In both cases, you will provide a [Replicate API Token](https://replicate.com/account/api-tokens).


In [6]:
import getpass, os

replicate_api_token = getpass.getpass()

os.environ["REPLICATE_API_TOKEN"] = replicate_api_token


 ········


We will keep the model constant through the guide as well.
Replicate distinguishes between a "deployment" of a model from a "model".
In this case, we want to specify the Granite Code development deployment.

In [7]:
deployment_name = "ibm-granite/granite-code-20b" #because joe is working on the 8b atm

### Replicate package

In [10]:
import replicate


### Connect to Replicate and Model Deployment

In [11]:
replicate_client = replicate.Client(api_token=replicate_api_token)

deployment = replicate_client.deployments.get(deployment_name)

In [12]:
prompt = """
    Show me a SQL query that fetches all columns for the first 50 rows
    in a table named 'users'."""

In [14]:
prediction = deployment.predictions.create(input={'prompt': prompt})

prediction.wait()

print(''.join(prediction.output))

Sure! Here's a SQL query that should do the trick:
```
SELECT *
FROM users
LIMIT 50;
```
This query selects all columns (`*`) from the `users` table and limits the result set to the first 50 rows using the `LIMIT` clause. 


### Pass the prompt to the model for completion

In [31]:
prompt = """
    Show me a SQL query that fetches all columns for the first 50 rows
    in a table named 'users'."""

In [29]:
model = replicate.models.get(model_name)
version = model.versions.get(model_version) #"797c070dc871d8fca417d7d188cf050778d7ce21a0318d26711a54207e9ee698")

prediction = replicate.predictions.create(
  version=version,
  input={"prompt": prompt}
)

# The `create` call doesn't block, so we can do other work. Here, we'll just wait.
prediction.wait()

print(prediction.output)

["System:\nYou are a helpful assistant\n\nQuestion:\n\n    Show me a SQL query that fetches all columns for the first 50 rows\n    in a table named 'users'.\n\nAnswer:\nSure! Here's a SQL query that fetches all columns for the first 50 rows in a table named 'users':\n```\nSELECT * FROM users LIMIT 50;\n```\nThis query uses the `SELECT` statement to fetch all columns (`*`) from the `users` table. The `LIMIT` clause is used to restrict the result set to the first 50 rows.\n<|endoftext|>"]


Let's make the output easier to read:

In [30]:
prediction.output[0].splitlines()

['System:',
 'You are a helpful assistant',
 '',
 'Question:',
 '',
 '    Show me a SQL query that fetches all columns for the first 50 rows',
 "    in a table named 'users'.",
 '',
 'Answer:',
 "Sure! Here's a SQL query that fetches all columns for the first 50 rows in a table named 'users':",
 '```',
 'SELECT * FROM users LIMIT 50;',
 '```',
 'This query uses the `SELECT` statement to fetch all columns (`*`) from the `users` table. The `LIMIT` clause is used to restrict the result set to the first 50 rows.',
 '<|endoftext|>']

You could also use the `replicate.run` API, but it blocks waiting for results, unlike the `predictions` API:

```python
output = replicate.run(model_id,
    input={
        "system_prompt": "You are a helpful assistant",
        "max_new_tokens": 100,
        "prompt": prompt
    }
)
```