<a href="https://colab.research.google.com/github/goodtiding5/colab-notebooks/blob/main/run_coder_model_in_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Run WizardCoder Python model in colab

Make sure'python3/T4 GPU' is selected as the runtime type.

### 0. Make sure Nvidia T4 GPU is ready.

In [1]:
!nvidia-smi

Thu Oct 26 23:12:06 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### 1. Mount google drive and read HuggingFace access token

Google drive will be mounted at `/content/drive`.

Create a directory *Private* in the google drive, add a file `hf_config.yml` to that directory.

Here is a sample config file:

> api_token: hf_olefhqp9twiehkndsgasdofhX

When google drive is mounted, the path for this config file is: `/content/drive/MyDrive/Private/hf_config.yml`.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

import os
import yaml

hf_config_file = '/content/drive/MyDrive/Private/hf_config.yml'

with open(hf_config_file, "r") as fp:
  hf_config = yaml.safe_load(fp)

if not 'api_token' in hf_config:
  raise RuntimeError("failed to fetch Hugging Face api access token!")

# setting up HF token for model data downloading
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hf_config['api_token']

Mounted at /content/drive


## 2. Specify the language model and it's prompt template

In [3]:
model="TheBloke/WizardCoder-Python-13B-V1.0-GGUF"
model_type = model.split('-')[-1].upper()
model_file="wizardcoder-python-13b-v1.0.Q4_K_M.gguf"
template=\
'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
'''
input_variable="prompt"

### 3. Install ctransformers and langchain packages

We use ctransformers and langchain to run LLM engine.

In [4]:
# install ctransformers with CUDA
!pip install --upgrade ctransformers[cuda]

# install langchain
!pip install --upgrade langchain

Collecting ctransformers[cuda]
  Downloading ctransformers-0.2.27-py3-none-any.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m61.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub (from ctransformers[cuda])
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cublas-cu12 (from ctransformers[cuda])
  Downloading nvidia_cublas_cu12-12.3.2.9-py3-none-manylinux1_x86_64.whl (417.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m417.9/417.9 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12 (from ctransformers[cuda])
  Downloading nvidia_cuda_runtime_cu12-12.3.52-py3-none-manylinux1_x86_64.whl (867 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m867.7/867.7 kB[0m [31m61.5 MB/s[0m eta [36m0:00:00[0m
I

### 4. Create LLM inference engine with CTransformer and LangChain

In [5]:
# create reference engine
from langchain.llms import CTransformers
from langchain import PromptTemplate, LLMChain

config = { "gpu_layers" : 60,
           "max_new_tokens" : 4096,
           "context_length" : 4096,
           "temperature" : 0.8,
           "repetition_penalty" : 1.1
         }

ct = CTransformers(model=model,
                   model_type=model_type,
                   model_file=model_file,
                   config = config)

# chanin together with the prompt template
input_variables=[ input_variable ]

prompt = PromptTemplate(template=template, input_variables=input_variables)

llm = LLMChain(prompt = prompt, llm = ct)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

(…)9fff726044df2c23f967df613b7f/config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

wizardcoder-python-13b-v1.0.Q4_K_M.gguf:   0%|          | 0.00/7.87G [00:00<?, ?B/s]

### 5. Test the model

In [6]:
prompt = "Write a function foobar which will return a uniform random number using numpy"

answer = llm(prompt)

print(answer['text'])

```python
import numpy as np

def foobar():
    return np.random.uniform(0, 1)
```
This will create a function called `foobar` that returns a uniform random number between 0 and 1 using the `numpy` library in Python. The `np.random.uniform()` function is used to generate a sequence of random numbers from a uniform distribution within a given range (in this case, from 0 to 1). If you want to generate a different range of numbers, simply change the arguments passed to `np.random.uniform()`. For example, if you wanted a number between -5 and 5, you could use `np.random.uniform(-5, 5)`. 


In [7]:
prompt = "create a bot in python which will listen to a nostr relay."

answer = llm(prompt)

print(answer['text'])

To create a bot that listens to a Nostr relay, you can use the `nostr` library for Python. Here's an example code snippet that demonstrates how to do it:

```python
import asyncio
from nostr import RelayClient

def on_event(event):
    print("Received event", event)

async def main():
    client = await RelayClient.create()
    await client.subscribe("your-relay-url", "your-bot-id", on_event)
    
if __name__ == "__main__":
    asyncio.run(main())
```

This code will connect to a Nostr relay and subscribe to it with your bot ID, which you can obtain by registering your bot on the website https://relay.firefly.co/. The `on_event` function will be called for each new event received by your bot, and can print it to the console or do something else with it.

You can also use the `send_message` method of the `RelayClient` object to send messages to the relay:
```python
await client.send_message("your-relay-url", "your-bot-id", {"text": "hello"})
```
This will send a m