GitHub - ra312/jina: An easier way to build neural search on the cloud

Cloud-Native Neural Search^[?] Framework for Any Kind of Data

Jina allows you to build deep learning-powered search-as-a-service in just minutes.

🌌 Universal data type - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.

🌩️ Fast & cloud-native - Distributed architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, sharding, async, REST/gRPC/WebSocket.

⏱️ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

🍱 Own your stack - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with fragmented, multi-vendor, generic legacy tools.

Installation

2.0 is still in pre-release, add --pre to install it. Why 2.0?

$ pip install --pre jina
$ jina -v
2.0.0rcN

via Docker

$ docker run jinaai/jina:master -v
2.0.0rcN

📦 More installation options

_{^{x86/64,arm/v6,v7,v8 (Apple M1)}}	On Linux/macOS & Python 3.7/3.8/3.9	Docker Users
Standard	`pip install --pre jina`	`docker run jinaai/jina:master`
_Daemon	_{pip install --pre "jina[daemon]"}	_{docker run --network=host jinaai/jina:master-daemon}
_{With Extras}	_{pip install --pre "jina[devel]"}	_{docker run jinaai/jina:master-devel}

Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.

Get Started

Document, Executor, Flow are three fundamental concepts in Jina.

📄 Document is the basic data type in Jina;
⚙️ Executor is how Jina processes Documents;
🔀 Flow is how Jina streamlines and distributes Executors.

Copy-paste the minimum example below and run it:

^{💡 Preliminaries: character embedding, pooling, Euclidean distance}

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all document in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embedding from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embedding from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.score.value)  # sort matches by its value

f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of this file
    f.block()  # block for listening request

Keep the above running and start a simple client:

from jina import Client, Document

def print_matches(req):  # the callback function invoked when task is done
    for idx, d in enumerate(req.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.score.value:2f}: "{d.text}"')
        
c = Client(host='localhost', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

It finds most similar lines to "request(on=something)" from the server code snippet and prints the following:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

^{😔 Doesn't work? Our bad! Please report it here.}

Run Quick Demo

👗 Fashion image search: jina hello fashion
🤖 QA chatbot: pip install --pre "jina[chatbot]" && jina hello chatbot
📰 Multimodal search: pip install --pre "jina[multimodal]" && jina hello multimodal

Fork Demo & Build Your Own

Copy the source code of a hello world to your own directory and start from there:

$ jina hello fork fashion ../my-proj/

Read Tutorials

🧠 What is "Neural Search"?
📄 Document & DocumentArray: the basic data type in Jina.
⚙️ Executor: how Jina processes Documents.
🔀 Flow: how Jina streamlines and distributes Executors.
🧼 Write clean code in Jina
📓 Developer References
😎 3 Reasons to use Jina 2.0

Support

Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (calendar link/.ics) and live stream on YouTube)
Subscribe to the latest video tutorials on our YouTube channel.

Join Us

Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

Contributing guidelines
Code of conduct - play nicely with the Jina community
Good first issues
Release cycles and development stages
Upcoming features - what's being planned, what we're thinking about.

Name		Name	Last commit message	Last commit date
Latest commit History 5,401 Commits
.github		.github
Dockerfiles		Dockerfiles
cli		cli
daemon		daemon
docs		docs
jina		jina
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.darglint		.darglint
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE.md		RELEASE.md
extra-requirements.txt		extra-requirements.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

via Docker

Get Started

Run Quick Demo

Fork Demo & Build Your Own

Read Tutorials

Support

Join Us

Contributing

About

Releases

Packages

Languages

License

ra312/jina

Folders and files

Latest commit

History

Repository files navigation

Installation

via Docker

Get Started

Run Quick Demo

Fork Demo & Build Your Own

Read Tutorials

Support

Join Us

Contributing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages