Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated readme #903

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 68 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
<br><br><br>
</p>


<p align=center>
<a href="https://pypi.org/project/clip_server/"><img alt="PyPI" src="https://img.shields.io/pypi/v/clip_server?label=Release&style=flat-square"></a>
<a href="https://slack.jina.ai"><img src="https://img.shields.io/badge/Slack-3.1k-blueviolet?logo=slack&amp;logoColor=white&style=flat-square"></a>
Expand All @@ -22,15 +21,15 @@

CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.

⚡ **Fast**: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS<sup>[*]</sup>. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.
⚡ **Fast**: Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS<sup>[*]</sup>. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.

🫐 **Elastic**: Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing.

🐥 **Easy-to-use**: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.
🐥 **Easy-to-use**: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.

👒 **Modern**: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression.

🍱 **Integration**: Smooth integration with neural search ecosystem including [Jina](https://github.com/jina-ai/jina) and [DocArray](https://github.com/jina-ai/docarray). Build cross-modal and multi-modal solutions in no time.
🍱 **Integration**: Smooth integration with neural search ecosystem including [Jina](https://github.com/jina-ai/jina) and [DocArray](https://github.com/jina-ai/docarray). Build cross-modal and multi-modal solutions in no time.

<sup>[*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090. </sup>

Expand All @@ -39,17 +38,16 @@ CLIP-as-service is a low-latency high-scalability service for embedding images a
## Try it!

An always-online server `api.clip.jina.ai` loaded with `ViT-L-14-336::openai` is there for you to play & test.
Before you start, make sure you have obtained a personal access token from the [Jina AI Cloud](https://cloud.jina.ai/settings/tokens),
Before you start, make sure you have obtained a personal access token from the [Jina AI Cloud](https://cloud.jina.ai/settings/tokens),
or via CLI as described in [this guide](https://docs.jina.ai/jina-ai-cloud/login/#create-a-new-pat):

```bash
```bash
jina auth token create <name of PAT> -e <expiration days>
```

Then, you need to configure the access token in the parameter `credential` of the client in python or set it in the HTTP request header `Authorization` as `<your access token>`.

⚠️ Our demo server `demo-cas.jina.ai` is sunset and no longer available after **15th of Sept 2022**.

⚠️ Our demo server `demo-cas.jina.ai` is sunset and no longer available after **15th of Sept 2022**.

### Text & image embedding

Expand All @@ -66,10 +64,10 @@ curl \
-X POST https://api.clip.jina.ai:8443/post \
-H 'Content-Type: application/json' \
-H 'Authorization: <your access token>' \
-d '{"data":[{"text": "First do it"},
{"text": "then do it right"},
{"text": "then do it better"},
{"uri": "https://picsum.photos/200"}],
-d '{"data":[{"text": "First do it"},
{"text": "then do it right"},
{"text": "then do it better"},
{"uri": "https://picsum.photos/200"}],
"execEndpoint":"/"}'
```

Expand All @@ -94,6 +92,7 @@ r = c.encode(
)
print(r)
```

</td>
</tr>
</table>
Expand Down Expand Up @@ -160,6 +159,7 @@ curl \
```

gives:

```
"the blue car is on the left, the red car is on the right"
0.5232442617416382
Expand All @@ -174,7 +174,6 @@ gives:
</td>
</tr>


<tr>
<td>
<img src="https://picsum.photos/id/102/300/300">
Expand All @@ -198,6 +197,7 @@ curl \
```

gives:

```
"this is a photo of three berries"
0.48507222533226013
Expand All @@ -216,15 +216,13 @@ gives:
</td>
</tr>


</table>


## [Documentation](https://clip-as-service.jina.ai)

## Install

CLIP-as-service consists of two Python packages `clip-server` and `clip-client` that can be installed _independently_. Both require Python 3.7+.
CLIP-as-service consists of two Python packages `clip-server` and `clip-client` that can be installed _independently_. Both require Python 3.7+.

### Install server

Expand Down Expand Up @@ -252,9 +250,10 @@ pip install "clip-server[onnx]"
<td>

```bash
pip install nvidia-pyindex
pip install nvidia-pyindex
pip install "clip-server[tensorrt]"
```

</td>
</tr>
</table>
Expand All @@ -271,7 +270,6 @@ pip install clip-client

You can run a simple connectivity check after install.


<table>
<tr>
<th> C/S </th>
Expand All @@ -282,12 +280,12 @@ You can run a simple connectivity check after install.
<td>
Server
</td>
<td>
<td>

```bash
python -m clip_server
```

</td>
<td>

Expand All @@ -299,15 +297,15 @@ python -m clip_server
<td>
Client
</td>
<td>
<td>

```python
from clip_client import Client

c = Client('grpc://0.0.0.0:23456')
c.profile()
```

</td>
<td>

Expand All @@ -317,35 +315,38 @@ c.profile()
</tr>
</table>


You can change `0.0.0.0` to the intranet or public IP address to test the connectivity over private and public network.

You can change `0.0.0.0` to the intranet or public IP address to test the connectivity over private and public network.

## Get Started

### Basic usage

1. Start the server: `python -m clip_server`. Remember its address and port.
2. Create a client:

```python
from clip_client import Client

c = Client('grpc://0.0.0.0:51000')
```
```

3. To get sentence embedding:
```python
r = c.encode(['First do it', 'then do it right', 'then do it better'])

print(r.shape) # [3, 512]
```

```python
r = c.encode(['First do it', 'then do it right', 'then do it better'])

print(r.shape) # [3, 512]
```

4. To get image embedding:
```python
r = c.encode(['apple.png', # local image
'https://clip-as-service.jina.ai/_static/favicon.png', # remote image
'']) # in image URI

print(r.shape) # [3, 512]
```

```python
r = c.encode(['apple.png', # local image
'https://clip-as-service.jina.ai/_static/favicon.png', # remote image
'']) # in image URI

print(r.shape) # [3, 512]
```

More comprehensive server and client user guides can be found in the [docs](https://clip-as-service.jina.ai/).

Expand Down Expand Up @@ -415,7 +416,7 @@ da = DocumentArray.pull('ttl-embedding', show_progress=True, local_cache=True)

</details>

#### Search via sentence
#### Search via sentence

Let's build a simple prompt to allow a user to type sentence:

Expand Down Expand Up @@ -461,7 +462,6 @@ Now you can input arbitrary English sentences and view the top-9 matching images
</tr>
</table>


<table>
<tr>
<th> "professor cat is very serious" </th>
Expand Down Expand Up @@ -493,7 +493,7 @@ Now you can input arbitrary English sentences and view the top-9 matching images
</tr>
</table>

Let's save the embedding result for our next example:
Let's save the embedding result for our next example:

```python
da.save_binary('ttl-image')
Expand All @@ -503,7 +503,7 @@ da.save_binary('ttl-image')

We can also switch the input and output of the last program to achieve image-to-text search. Precisely, given a query image find the sentence that best describes the image.

Let's use all sentences from the book "Pride and Prejudice".
Let's use all sentences from the book "Pride and Prejudice".

```python
from docarray import Document, DocumentArray
Expand All @@ -521,23 +521,23 @@ da.summary()
```

```text
Documents Summary
Length 6403
Homogenous Documents True
Common Attributes ('id', 'text')
Attributes Summary
Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 6403 False
text ('str',) 6030 False
Documents Summary

Length 6403
Homogenous Documents True
Common Attributes ('id', 'text')

Attributes Summary

Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 6403 False
text ('str',) 6030 False
```

#### Encode sentences

Now encode these 6,403 sentences, it may take 10 seconds or less depending on your GPU and network:
Now encode these 6,403 sentences, it may take 10 seconds or less depending on your GPU and network:

```python
from clip_client import Client
Expand Down Expand Up @@ -575,7 +575,7 @@ for d in img_da.sample(10):

#### Showcase

Fun time! Note, unlike the previous example, here the input is an image and the sentence is the output. All sentences come from the book "Pride and Prejudice".
Fun time! Note, unlike the previous example, here the input is an image and the sentence is the output. All sentences come from the book "Pride and Prejudice".

<table>
<tr>
Expand All @@ -584,7 +584,6 @@ Fun time! Note, unlike the previous example, here the input is an image and the
<img src="https://github.com/jina-ai/clip-as-service/blob/main/.github/README-img/Besides,-there-was-truth-in-his-looks.png?raw=true" alt="Visualization of the image sprite of Totally looks like dataset" height="100px">
</p>


</td>
<td>

Expand Down Expand Up @@ -632,7 +631,6 @@ Fun time! Note, unlike the previous example, here the input is an image and the
<img src="https://github.com/jina-ai/clip-as-service/blob/main/.github/README-img/“A-gamester!”-she-cried.png?raw=true" alt="Visualization of the image sprite of Totally looks like dataset" height="100px">
</p>


</td>
<td>

Expand Down Expand Up @@ -673,8 +671,6 @@ Fun time! Note, unlike the previous example, here the input is an image and the
</tr>
</table>



### Rank image-text matches via CLIP model

From `0.3.0` CLIP-as-service adds a new `/rank` endpoint that re-ranks cross-modal matches according to their joint likelihood in CLIP model. For example, given an image Document with some predefined sentence matches as below:
Expand Down Expand Up @@ -706,7 +702,7 @@ print(r['@m', ['text', 'scores__clip_score__value']])
```

```text
[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'],
[['a photo of a television studio', 'a photo of a conference room', 'a photo of a lecture room', 'a photo of a control room', 'a photo of a podium indoor'],
[0.9920725226402283, 0.006038925610482693, 0.0009973491542041302, 0.00078492151806131, 0.00010626466246321797]]
```

Expand Down Expand Up @@ -748,7 +744,17 @@ class ReRank(Executor):

Intrigued? That's only scratching the surface of what CLIP-as-service is capable of. [Read our docs to learn more](https://clip-as-service.jina.ai).

## Build locally with Docker

You need to be in the `server` directory to build the Docker image.

```bash
cd server
docker build . -f ../Dockerfiles/cuda.Dockerfile -t clip-as-service-gpu:latest
```

<!-- start support-pitch -->

## Support

- Join our [Slack community](https://slack.jina.ai) and chat with other community members about ideas.
Expand Down