# Hosting CLIP-as-service on Google Colab with TPU/GPU support

This tutorial guides you on how to implement the following architecture:

[![](https://mermaid.ink/img/pako:eNp1kEFrwzAMhf-K0bkh99xGVwpjh9Ctp7oMxVYTM8cOttwy2v732fMGgzFd9Hjvk0C6gvKaoIMx4DKJ5510IldMQzW23o-WxNpbHMRBlXasSCmF8S1SOFNommbb79vXfl9TcrqKh0OBlDXk-Cgydht3_bqdmJf2QkP06p349mtTHXvCM0YVzMJfMwX_C6kU7L8xrGCmMKPR-bprcSTwRDNJ6LLUdMJkWYJ094ymRSPTRhv2AboT2kgrwMT-5cMp6Dgk-oEeDebfzN_U_RP7v2yd)](https://mermaid.live/edit#pako:eNp1kEFrwzAMhf-K0bkh99xGVwpjh9Ctp7oMxVYTM8cOttwy2v732fMGgzFd9Hjvk0C6gvKaoIMx4DKJ5510IldMQzW23o-WxNpbHMRBlXasSCmF8S1SOFNommbb79vXfl9TcrqKh0OBlDXk-Cgydht3_bqdmJf2QkP06p349mtTHXvCM0YVzMJfMwX_C6kU7L8xrGCmMKPR-bprcSTwRDNJ6LLUdMJkWYJ094ymRSPTRhv2AboT2kgrwMT-5cMp6Dgk-oEeDebfzN_U_RP7v2yd)

CLIP-as-service is powered by Jina, [there is another tutorial showing you how to host Jina service on Colab in general](https://colab.research.google.com/github/jina-ai/jina/blob/master/docs/Using_Jina_on_Colab.ipynb). Highly recommended!


## 1. Change runtime type

Go to menu `Runtime -> Change run time type -> GPU/TPU`


## 2. Install Packages

As we will run the client locally, we only need to install `clip_server` package on Colab.


**⚠️ You will be asked to "Restart Runtime" after this step, please click the button and restart the runtime.**

In [None]:
#!pip install clip_server pyngrok

## 3. Config Flow YAML


Unlike classic entrypoint from CLI, here we need to start the Flow in Python. Let's load use Pytorch backend and write a Flow YAML. Note that we need to load the torch Python file from `clip_server` installation, hence you see `cas_path` below. More available options [can be found here](https://github.com/jina-ai/clip-as-service/tree/main/server/clip_server/executors).

In [1]:
import clip_server
cas_path = clip_server.__path__[0]

This YAML is directly [taken from this file](https://github.com/jina-ai/clip-as-service/blob/main/server/clip_server/torch-flow.yml). You can also customize it as you wish, [please check CLIP-as-service docs](https://clip-as-service.jina.ai/user-guides/server/#yaml-config).

In [2]:
flow_yaml = f'''
jtype: Flow
with:
  port: 51000
executors:
  - name: clip_t
    uses:
      jtype: CLIPEncoder
      metas:
        py_modules:
          - {cas_path}/executors/clip_torch.py
    replicas: 4
'''

In [3]:
flow_yaml

'\njtype: Flow\nwith:\n  port: 51000\nexecutors:\n  - name: clip_t\n    uses:\n      jtype: CLIPEncoder\n      metas:\n        py_modules:\n          - /usr/local/lib/python3.11/dist-packages/clip_server/executors/clip_torch.py\n    replicas: 4\n'

## 4. Start the Flow

It may take a minute or so on the first start, as it will download the pretrained models. To select different pretrained models, [please check CLIP-as-service docs](https://clip-as-service.jina.ai/user-guides/server/#yaml-config).

In [4]:
from jina import Flow

f = Flow.load_config(flow_yaml)
f.start()



Output()



Output()









Remember to close it via `f.close()` when you don't use it. But let's keep it open for now.

## 5. Set up forwarding

By default Flow uses gRPC protocol, it is highly-efficient and feature-rich. So in this tutorial, we will use gRPC protocol and use `ngrok` for forwarding. It is possible and in fact slighly easier to set up when using `Flow(protocol='http')`, [please read the turorial here](https://colab.research.google.com/github/jina-ai/jina/blob/master/docs/Using_Jina_on_Colab.ipynb#scrollTo=0ASjGLBhXono) here I won't repeat again.


You will need to first sign up at https://dashboard.ngrok.com/signup (http do not need register, that's why I said it is easier)

After signing up, you can get a token. Then simply add your token via (replacing `YOUR_TOKEN_HERE`)

In [5]:
!pip install pyngrok

# remember to replace to your token! otherwise i can see your service, i mean i dont really have time to see it but nonetheless
#!ngrok authtoken 2nXxQZ267m5WH8nk7iVMZJcbo6I_6MmNpiUT6ZRFSZ2ZoTDhn

!ngrok authtoken 2wbP0Jd47h9I6GVX5kE6eFqTZ0m_33By7YLM76Xgky3MHohej

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [None]:
!ngrok tcp 51000 --log "stdout"

[32mINFO[0m[05-12|21:21:51] no configuration paths supplied 
[32mINFO[0m[05-12|21:21:51] using configuration at default config path [32mpath[0m=/root/.config/ngrok/ngrok.yml
[32mINFO[0m[05-12|21:21:51] open config file                         [32mpath[0m=/root/.config/ngrok/ngrok.yml [32merr[0m=nil
t=2025-05-12T21:21:51+0000 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4040 allow_hosts=[]
t=2025-05-12T21:21:51+0000 lvl=info msg="client session established" obj=tunnels.session
t=2025-05-12T21:21:51+0000 lvl=info msg="tunnel session started" obj=tunnels.session
t=2025-05-12T21:21:51+0000 lvl=info msg="started tunnel" obj=tunnels name=command_line addr=//localhost:51000 url=tcp://0.tcp.ngrok.io:13104
t=2025-05-12T21:23:38+0000 lvl=info msg="join connections" obj=join id=52033320d6d1 l=127.0.0.1:51000 r=68.199.56.76:53362
t=2025-05-12T21:24:00+0000 lvl=info msg="join connections" obj=join id=814e51dbaff1 l=127.0.0.1:51000 r=68.199.56.76:53376
t=2025-05-12T21:33:41

In [None]:
f.close()

At the last line, you should see something like:

```
t=2022-06-11T20:29:11+0000 lvl=info msg="started tunnel" obj=tunnels name=command_line addr=//localhost:54321 url=tcp://6.tcp.ngrok.io:18096
```

Grab the text after `url=tcp://` in my case it is `6.tcp.ngrok.io:18096`.

Now build a client using this address from your local laptop/Python environment.

Copy paste the code below to your local Python, remmeber to change your address.

**Remember, if your last line is `url=tcp://6.tcp.ngrok.io:18096` then you should set `Client('grpc://6.tcp.ngrok.io:18096')`**

### Try Embedding Task from Local

```python
# pip install clip-client
from clip_client import Client

c = Client('grpc://6.tcp.ngrok.io:18096')

r = c.encode(
    [
        'First do it',
        'then do it right',
        'then do it better',
        'https://picsum.photos/200',
    ]
)
print(r)
```

And you will get

```text
[[ 0.03494263 -0.23510742  0.0104599  ... -0.5229492  -0.10021973
  -0.08685303]
 [-0.06793213 -0.0032444   0.01506805 ... -0.50341797 -0.06143188
  -0.08520508]
 [ 0.15063477 -0.07922363 -0.06530762 ... -0.46484375 -0.08526611
   0.04324341]
 [-0.16088867  0.10552979 -0.20581055 ... -0.41381836  0.19543457
   0.05718994]]
```

Showing the connection is success!


### Try Ranking Task from Local

```python
from docarray import Document

from clip_client import Client

c = Client(server='grpc://6.tcp.ngrok.io:18096/rank')

r = c.rank(
    [
        Document(
            uri='https://picsum.photos/id/1/300/300',
            matches=[
                Document(text=f'a photo of a {p}')
                for p in (
                    'man',
                    'woman',
                )
            ],
        )
    ]
)

print(r['@m', ['text', 'scores']])
```

```
[['a photo of a man', 'a photo of a woman'], [defaultdict(<class 'docarray.score.NamedScore'>, {'clip_score': {'value': 0.5806832313537598, 'op_name': 'softmax'}, 'clip_score_cosine': {'value': 0.2178003191947937, 'op_name': 'cosine'}}), defaultdict(<class 'docarray.score.NamedScore'>, {'clip_score': {'value': 0.41931676864624023, 'op_name': 'softmax'}, 'clip_score_cosine': {'value': 0.21454453468322754, 'op_name': 'cosine'}})]]
```


Now enjoy the free GPU/TPU to build your awesome CAS applications!

In [None]:
f.close()

# Push to the Limit

Now let's use the biggest `ViT-L/14-336px` and fully leverage all VRAM with 4 replicas, lets see if it works.

In [None]:
flow_yaml = f'''
jtype: Flow
with:
  port: 51000
executors:
  - name: clip_t
    uses:
      jtype: CLIPEncoder
      metas:
        py_modules:
          - {cas_path}/executors/clip_torch.py
    replicas: 4
'''

In [None]:
from jina import Flow

f = Flow.load_config(flow_yaml)
f.start()

Output()











In [None]:
!ngrok authtoken 2nXxQZ267m5WH8nk7iVMZJcbo6I_6MmNpiUT6ZRFSZ2ZoTDhn
!ngrok tcp 51000 --log "stdout"

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml
[32mINFO[0m[04-26|23:20:43] no configuration paths supplied 
[32mINFO[0m[04-26|23:20:43] using configuration at default config path [32mpath[0m=/root/.config/ngrok/ngrok.yml
[32mINFO[0m[04-26|23:20:43] open config file                         [32mpath[0m=/root/.config/ngrok/ngrok.yml [32merr[0m=nil
t=2025-04-26T23:20:43+0000 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4040 allow_hosts=[]
t=2025-04-26T23:20:44+0000 lvl=info msg="client session established" obj=tunnels.session
t=2025-04-26T23:20:44+0000 lvl=info msg="tunnel session started" obj=tunnels.session
t=2025-04-26T23:20:44+0000 lvl=info msg="started tunnel" obj=tunnels name=command_line addr=//localhost:51000 url=tcp://2.tcp.ngrok.io:11580
t=2025-04-26T23:21:32+0000 lvl=info msg="join connections" obj=join id=cff0511e3cab l=127.0.0.1:51000 r=68.199.56.76:63848
t=2025-04-26T23:23:48+0000 lvl=info msg="join connections" obj=join id=89d8

Yay it works!