chore(docs): add multi gpu setting

jina-ai · May 13, 2022 · b6adcf8 · b6adcf8
1 parent 3d8c552
commit b6adcf8
Showing 1 changed file with 74 additions and 16 deletions.
diff --git a/docs/user-guides/server.md b/docs/user-guides/server.md
@@ -323,9 +323,68 @@ executors:
           - executors/clip_torch.py
 ```
 
-## Serving in HTTPS
+## Environment variables
+
+
+To start a server with more verbose logging,
+
+```bash
+JINA_LOG_LEVEL=DEBUG python -m clip_server
+```
+
+```{figure} images/server-log.gif
+:width: 70%
+
+```
+
+To run CLIP-server on 3rd GPU,
+
+```bash
+CUDA_VISIBLE_DEVICES=2 python -m clip_server
+```
+
+### Serving on Multiple GPUs
+
+If you have multiple GPU devices, you can leverage them via `CUDA_VISIBLE_DEVICES=RR`. For example, if you have 3 GPUs and your Flow YAML says `replicas: 5`, then 
+
+```bash
+CUDA_VISIBLE_DEVICES=RR python -m clip_server
+```
+
+Will assign GPU devices to the following round-robin fashion:
 
-Your Flow YAML would look like the following:
+| GPU device | Replica ID |
+|------------|------------|
+| 0          | 0          |
+| 1          | 1          |
+| 2          | 2          |
+| 0          | 3          |
+| 1          | 4          |
+
+
+You can also restrict the visible devices in round-robin assigment by `CUDA_VISIBLE_DEVICES=RR0:2`, where `0:2` has the same meaning as Python slice. This will create the following assigment:
+
+| GPU device | Replica ID |
+|------------|------------|
+| 0          | 0          |
+| 1          | 1          |
+| 0          | 2          |
+| 1          | 3          |
+| 0          | 4          |
+
+
+```{tip}
+In pratice, we found it is unnecessary to run `clip_server` on multiple GPUs for two reasons:
+- A single replica even with largest `ViT-L/14-336px` takes only 3.5GB VRAM.
+- Real network traffic never utilizes GPU in 100%.
+
+Based on these two points, it makes more sense to have multiple replicas on a single GPU comparing to have multiple replicas on different GPU, which is kind of waste of resources. `clip_server` scales pretty well by interleaving the GPU time with mulitple replicas.
+```
+
+
+## Serving in HTTPS/gRPCs
+
+You can turn on TLS for HTTP and gRPC protocols. Your Flow YAML would look like the following:
 
 ```{code-block} yaml
 ---
@@ -343,26 +402,25 @@ with:
   ssl_keyfile: key.pem
 ```
 
-Where `cert.pem` or `key.pem` represent both parts of a certificate, key being the private key to the certificate and crt being the signed certificate. It can be generated via [letsencrypt.org](https://letsencrypt.org/), which is a free ssl provider.
-
-Also note that note every port support HTTPS. Commonly support ports are: 443, 2053, 2083, 2087, 2096, 8443.
+Here, `protocol` can be either `http` or `grpc`; `cert.pem` or `key.pem` represent both parts of a certificate, key being the private key to the certificate and crt being the signed certificate. You can run the following command in terminal:
 
-## Environment variables
+```bash
+openssl req -newkey rsa:4096 -nodes -sha512 -x509 -days 3650 -nodes -out cert.pem -keyout key.pem -subj "/CN=demo-cas.jina.ai"
+```
 
+Note that if you are using `protocol: grpc` then `/CN=demo-cas.jina.ai` must strictly follow the IP address or the domain name of your server. Mismatch IP or domain name would throw an exception.
 
-To start a server with more verbose logging,
+Certificate and keys can be also generated via [letsencrypt.org](https://letsencrypt.org/), which is a free SSL provider.
 
-```bash
-JINA_LOG_LEVEL=DEBUG python -m clip_server
+```{warning}
+Note that note every port support HTTPS. Commonly support ports are: `443`, `2053`, `2083`, `2087`, `2096`, `8443`.
 ```
 
-```{figure} images/server-log.gif
-:width: 70%
-
+```{warning}
+If you are using Cloudflare proxied DNS, please be aware:
+- you need to turn on gRPC support manually, [please follow the guide here](https://support.cloudflare.com/hc/en-us/articles/360050483011-Understanding-Cloudflare-gRPC-support);
+- the free tier of Cloudflare has 100s hard limit on the timeout, meaning sending big batch to a CPU server may throw 524 to the client-side.
 ```
 
-To run CLIP-server on 3rd GPU,
 
-```bash
-CUDA_VISIBLE_DEVICES=2 python -m clip_server
-```
+