Skip to content

Commit

Permalink
chore: revert README to 2.x for delayed release
Browse files Browse the repository at this point in the history
  • Loading branch information
hanxiao committed Feb 15, 2022
1 parent f816667 commit 0feea0a
Showing 1 changed file with 124 additions and 168 deletions.
292 changes: 124 additions & 168 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
<p align="center">
<a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/jina/blob/master/docs/_static/logo-light.svg?raw=true" alt="Jina logo: Jina is a cloud-native neural search framework" width="200px"></a>
<!--startmsg-->
<a href="https://www.meetup.com/de-DE/jina-community-meetup/events/279857975/"><img src="https://github.com/jina-ai/jina/blob/master/.github/images/meetup.svg?raw=true"></a>
<!--endmsg-->
</p>
<p align="center">
<a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/jina/blob/master/.github/logo-only.gif?raw=true" alt="Jina logo: Jina is a cloud-native neural search framework" width="200px"></a>
</p>

<p align="center">
Expand All @@ -15,73 +20,116 @@
<a href="https://slack.jina.ai"><img src="https://img.shields.io/badge/Slack-2.2k%2B-blueviolet?logo=slack&amp;logoColor=white"></a>
</p>

<!-- start jina-description -->
<!-- start elevator-pitch -->

Jina is a neural search framework that empowers anyone to build SOTA and scalable neural search applications in minutes.
Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.

⏱️ **Save time** - *The* design pattern of neural search systems. Quickly build solutions of indexing, querying, understanding multi-/cross-modal data such as video, image, text, audio, source code, PDF.
⏱️ **Save time** - *The* design pattern of neural search systems. Native support for PyTorch/Keras/ONNX/Paddle. Build solutions in just minutes.

🌩️ **Local & cloud friendly** - Distributed architecture, scalable & cloud-native from day one. Same developer experience on local, [Docker compose](https://docs.jina.ai/how-to/docker-compose/), [Kubernetes](https://docs.jina.ai/how-to/kubernetes/).
🌌 **All data types** - Process, index, query, and understand videos, images, long/short text, audio, source code, PDFs, etc.

🚀 **Serve, scale & share** - Serve a local project with HTTP, WebSockets or gRPC endpoints in just minute. Scale your neural search applications to meet your availability and throughput requirements. Share and reuse building blocks from [Hub](https://hub.jina.ai).
🌩️ **Local & cloud friendly** - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.

🍱 **Own your stack** - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools. Enjoy the integration with the neural search ecosystem including [DocArray](https://docarray.jina.ai), [Hub](https://hub.jina.ai) and [Finetuner](https://finetuner.jina.ai).
🍱 **Own your stack** - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with
fragmented, multi-vendor, generic legacy tools.

<!-- end jina-description -->
<!-- end elevator-pitch -->

## Install

```bash
pip install --pre jina
```
pip install -U jina
```
More install options including Conda, Docker, Windows [can be found here](https://docs.jina.ai/get-started/install/).

For Jina 2.x users, please first uninstall old Jina via `pip uninstall jina` before installation. Please also [read 2to3 migration guide](https://docs.jina.ai/get-started/migrate/).

More install options including Conda, Docker, and Windows [can be found here](https://docs.jina.ai/get-started/install/).
## Learning and Docs

## [Documentation](https://docs.jina.ai)
- Brand new to Jina? Check our **[Learning Bootcamp](https://learn.jina.ai)** to get up to speed.
- Check our **[comprehensive docs](https://docs.jina.ai)** for deeper tutorials, more advanced topics, and API reference.

## Get Started


<p align="center">
<a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/jina/blob/master/.github/images/readme-get-started.svg?raw=true" alt="Get started with Jina to build production-ready neural search solution via ResNet in less than 20 minutes" width="100%"></a>
</p>

We promise you can build a **scalable** ResNet-powered image search **service** in 20 minutes or less, **from scratch to Kubernetes**. If not, you can forget about Jina.
We promise you can build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.

### Basic Concepts

Document, Executor and Flow are three fundamental concepts in Jina.
### Basic Concepts <img align="right" src="https://github.com/jina-ai/jina/blob/master/.github/images/clock-1min.svg?raw=true"></img>

- [**Document**](https://docarray.jina.ai/) is a data structure contains multi-modal data.
- [**Executor**](https://docs.jina.ai/fundamentals/executor/) is a self-contained component and performs a group of tasks on Documents.
- [**Flow**](https://docs.jina.ai/fundamentals/flow/) ties Executors together into a processing pipeline, provides scalability and facilitates deployments in the cloud.
Document, Executor, and Flow are three fundamental concepts in Jina.

Leveraging these three concepts, let's build a simple image search with the [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset) dataset. This is a microservice version of the [DocArray example](https://github.com/jina-ai/docarray#a-complete-workflow-of-visual-search).
- [**Document**](https://docs.jina.ai/fundamentals/document/) is the basic data type in Jina;
- [**Executor**](https://docs.jina.ai/fundamentals/executor/) is how Jina processes Documents;
- [**Flow**](https://docs.jina.ai/fundamentals/flow/) is how Jina streamlines and distributes Executors.

### Build a service from scratch

<sup>
Preliminaries: <a href="https://sites.google.com/view/totally-looks-like-dataset">download dataset</a>, <a href="https://pytorch.org/get-started/locally/">install PyTorch & Torchvision</a>
Leveraging these three components, let's build an app that **find similar images using ResNet50**.

### ResNet50 Image Search in 20 Lines <img align="right" src="https://github.com/jina-ai/jina/blob/master/.github/images/clock-5min.svg?raw=true"></img>


<sup>💡 Preliminaries: <a href="https://drive.google.com/file/d/1OLg-JRBJJgTYYcXBJ2x35wJyzqSty4mu/view?usp=sharing">download dataset</a>, <a href="https://pytorch.org/get-started/locally/">install PyTorch & Torchvision</a>
</sup>

```python
from jina import DocumentArray, Document

def preproc(d: Document):
return (d.load_uri_to_image_blob() # load
.set_image_blob_normalization() # normalize color
.set_image_blob_channel_axis(-1, 0)) # switch color axis
docs = DocumentArray.from_files('img/*.jpg').apply(preproc)

import torchvision
model = torchvision.models.resnet50(pretrained=True) # load ResNet50
docs.embed(model, device='cuda') # embed via GPU to speedup

q = (Document(uri='img/00021.jpg') # build query image & preprocess
.load_uri_to_image_blob()
.set_image_blob_normalization()
.set_image_blob_channel_axis(-1, 0))
q.embed(model) # embed
q.match(docs) # find top-20 nearest neighbours, done!
```

Done! Now print `q.matches` and you'll see the URIs of the most similar images.

<p align="center">
<a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/jina/blob/master/.github/images/readme-q-match.png?raw=true" alt="Print q.matches to get visual similar images in Jina using ResNet50" width="50%"></a>
</p>

Add three lines of code to visualize them:

```python
for m in q.matches:
m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()
```

<p align="center">
<a href="https://docs.jina.ai"><img src="https://github.com/jina-ai/jina/blob/master/.github/images/cat-similar.png?raw=true" alt="Visualize visual similar images in Jina using ResNet50" width="50%"></a>
</p>

Sweet! FYI, you can use Keras, ONNX, or PaddlePaddle for the embedding model. Jina supports them well.

### As-a-Service in 10 Extra Lines <img align="right" src="https://github.com/jina-ai/jina/blob/master/.github/images/clock-7min.svg?raw=true"></img>

With an extremely trivial refactoring and ten extra lines of code, you can make the local script a ready-to-serve service:

1. Import what we need.
```python
from docarray import Document, DocumentArray
from jina import Executor, Flow, requests
from jina import Document, DocumentArray, Executor, Flow, requests
```
2. Copy-paste the preprocessing step and wrap it via `Executor`:
```python
class PreprocImg(Executor):
@requests
def foo(self, docs: DocumentArray, **kwargs):
for d in docs:
(
d.load_uri_to_image_tensor(200, 200) # load
.set_image_tensor_normalization() # normalize color
.set_image_tensor_channel_axis(-1, 0)
) # switch color axis for the PyTorch model later
(d.load_uri_to_image_blob() # load
.set_image_blob_normalization() # normalize color
.set_image_blob_channel_axis(-1, 0)) # switch color axis
```
3. Copy-paste the embedding step and wrap it via `Executor`:

Expand All @@ -100,39 +148,30 @@ Preliminaries: <a href="https://sites.google.com/view/totally-looks-like-dataset
```python
class MatchImg(Executor):
_da = DocumentArray()

@requests(on='/index')
def index(self, docs: DocumentArray, **kwargs):
self._da.extend(docs)
docs.clear() # clear content to save bandwidth

@requests(on='/search')
def foo(self, docs: DocumentArray, **kwargs):
docs.match(self._da, limit=9)
docs[...].embeddings = None # save bandwidth as it is not needed
docs[...].blobs = None # save bandwidth as it is not needed
docs.match(self._da)
for d in docs.traverse_flat('r,m'): # only require for visualization
d.convert_uri_to_datauri() # convert to datauri
d.pop('embedding', 'blob') # remove unnecessary fields for save bandwidth
```
5. Connect all `Executor`s in a `Flow`, scale embedding to 3:
```python
f = (
Flow(port_expose=12345, protocol='http')
.add(uses=PreprocImg)
.add(uses=EmbedImg, replicas=3)
.add(uses=MatchImg)
)
f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)
```
Plot it via `f.plot('flow.svg')` and you get:
![](.github/images/readme-flow-plot.svg)

6. Index image data and serve REST query publicly:
```python
with f:
f.post(
'/index',
DocumentArray.from_files('img/*.jpg'),
show_progress=True,
request_size=8,
)
f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8)
f.block()
```

Expand All @@ -152,12 +191,17 @@ Or use a Python client to access the service:

```python
from jina import Client, Document
from jina.types.request import Response

def print_matches(resp: Response): # the callback function invoked when task is done
for idx, d in enumerate(resp.docs[0].matches): # print top-3 matches
print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')

c = Client(protocol='http', port=12345) # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), return_results=True)
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)
```

At this point, you probably have taken 7 minutes but here we are: an image search service with rich features:
At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:

<sup>

Expand All @@ -168,130 +212,42 @@ At this point, you probably have taken 7 minutes but here we are: an image searc

</sup>

### Deploy to Kubernetes in 7 Minutes <img align="right" src="https://github.com/jina-ai/jina/blob/master/.github/images/clock-7min.svg?raw=true"></img>

Have another seven minutes? We'll show you how to bring your service to the next level by deploying it to Kubernetes.

1. Create a Kubernetes cluster and get credentials (example in GCP, [more K8s providers here](https://docs.jina.ai/advanced/experimental/kubernetes/#preliminaries)):
```bash
gcloud container clusters create test --machine-type e2-highmem-2 --num-nodes 1 --zone europe-west3-a
gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
```
2. Move each `Executor` class to a separate folder with one Python file in each:
- `PreprocImg` -> 📁 `preproc_img/exec.py`
- `EmbedImg` -> 📁 `embed_img/exec.py`
- `MatchImg` -> 📁 `match_img/exec.py`
3. Push all Executors to [Jina Hub](https://hub.jina.ai):
```bash
jina hub push preproc_img
jina hub push embed_img
jina hub push match_img
```
You will get three Hub Executors that can be used via Docker container.
4. Adjust `Flow` a bit and open it:
```python
f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg')
with f:
f.block()
```


### Deploy the service via Docker Compose

If we want to further upgrade your Flow with Docker Compose or Kubernetes, we will first need to containerize the Executors.
The easiest way to do that is by using [Jina Hub](https://hub.jina.ai).

Move each of the two Executors to a separate folder with one Python file in each:
- `ImageEmbeddingExecutor` -> 📁 `embed_img/exec.py`
- `IndexExecutor` -> 📁 `match_img/exec.py`

Create a `requirements.txt` in `embed_img` and add `torchvision` as a requirement.

```shell
.
├── embed_img
│ ├── exec.py # copy-paste codes of ImageEmbeddingExecutor
│ └── requirements.txt # add the requirement `torchvision`
└── match_img
└── exec.py # copy-paste codes of IndexExecutor
```

Push all Executors to [Jina Hub](https://hub.jina.ai). (**Important**: Write down the string you get for the usage. It looks like this `jinahub://1ylut0gf`)
```bash
jina hub push embed_img # publish at jinahub+docker://1ylut0gf
jina hub push match_img # publish at jinahub+docker://258lzh3c
```

You will get two Hub Executors that can be used for any container.

<p align="center">
<img alt="Shell outputs publishing Executors" src="https://github.com/jina-ai/jina/blob/master/.github/images/readme-publish-executors.png" title="publish executors" width="60%"/>
</p>


A Flow can generate a Docker Compose configuration file so that you can easily start a Flow via `docker-compose up`.

Replace the `uses` arguments in the Flow with the values you have got from Jina Hub from previous steps. This will run the Flow with containerized Executors.

Generate the docker compose configuration from the Flow using one line of Python code.

```python
f = (
Flow(protocol='http', port_expose=12345)
.add(uses='jinahub+docker://1ylut0gf')
.add(uses='jinahub+docker://258lzh3c')
)
f.to_docker_compose_yaml() # By default, stored at `docker-compose.yml`
```

```shell
Flow@62548[I]:Docker compose file has been created under docker-compose.yml. You can use it by running `docker-compose up -f docker-compose.yml`
```

Now you can start your neural search application with docker compose.

```shell
docker-compose up
```

<p align="center">
<img alt="Shell outputs running docker-compose" src="https://github.com/jina-ai/jina/blob/master/.github/images/readme-docker-compose.png" title="outputs of docker-compose" width="60%"/>
</p>

### Deploy the service via Kubernetes

You can easily deploy a Flow with containerized Executors to a Kubernetes cluster as well.

Create a Kubernetes cluster and get credentials (example in GCP, [more K8s providers here](https://docs.jina.ai/advanced/experimental/kubernetes/#preliminaries)):
```bash
gcloud container clusters create test --machine-type e2-highmem-2 --num-nodes 1 --zone europe-west3-a
gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
```

Create a namespace `flow-k8s-namespace` for demonstration purpose ,
```bash
kubectl create namespace flow-k8s-namespace
```

Generate the kubernetes configuration files using one line of code:
```python
f.to_k8s_yaml('./k8s_config', k8s_namespace='flow-k8s-namespace')
```

```shell
k8s_config
├── executor0
│ ├── executor0-head.yml
│ └── executor0.yml
├── executor1
│ ├── executor1-head.yml
│ └── executor1.yml
└── gateway
└── gateway.yml
```

Use `kubectl` to deploy your neural search application:

```shell
kubectl apply -R -f ./k8s_config
```

<p align="center">
<img alt="Shell outputs running k8s" src="https://github.com/jina-ai/jina/blob/master/.github/images/readme-k8s.png" title="kubernetes outputs" width="60%"/>
</p>

Run port forwarding so that you can send requests to our Kubernetes application from local CLI :

```shell
kubectl port-forward svc/gateway -n flow-k8s-namespace 12345:12345
```

Now we have the Flow up running in Kubernetes and we can use the `Client` or cURL to send requests.

> Note that we are running everything in the cloud and make sure the image URIs are accessible from the Kubernetes cluster.
Intrigued? [Find more about Jina from our docs](https://docs.jina.ai).

## Run Quick Demo

- [👗 Fashion image search](https://docs.jina.ai/get-started/hello-world/fashion/): `jina hello fashion`
- [🤖 QA chatbot](https://docs.jina.ai/get-started/hello-world/covid-19-chatbot/): `pip install "jina[demo]" && jina hello chatbot`
- [📰 Multimodal search](https://docs.jina.ai/get-started/hello-world/multimodal/): `pip install "jina[demo]" && jina hello multimodal`
- 🍴 Fork the source of a demo to your folder: `jina hello fork fashion ../my-proj/`
- Create a new Jina project: `jina new hello-jina`


<!-- start support-pitch -->

Expand Down

0 comments on commit 0feea0a

Please sign in to comment.