Skip to content

馃帀 v0.4.0

Compare
Choose a tag to compare
@nan-wang nan-wang released this 30 Jul 02:43
· 6582 commits to master since this release

Jina 0.4.0

We are excited to release Jina 0.4.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include fallbacks if GPU is unavailable, FaissIndexer on GPU, and switching indexers during querying.

Release 0.4.0

猬嗭笍 Major Features and Improvements

Usability

  • Add a new value for the on_gpu field. Setting on_gpu: auto in the yaml configure will first check if a GPU device is available and fallback to CPUs when no GPU is found. #617

  • Improve the accessibility of jina helloworld. We add a CLI argument to enable downloading via the proxy. If you are using a proxy to speed up your internet, try jina helloworld --download-proxy http://127.0.0.1:1087. Just replace the ip and port with your proxy settings. #595

  • Support to switching between different Indexers during querying. A new argument, ref_indexer, is added for this purpose. With the following yaml config of Indexer, NumpyIndexer is used for indexing and AnnoyIndexer is used for querying. The supported Indexer includes FaissIndexer, AnnoyIndexer, NGTIndexer, NmslibIndexer, SptagIndexer, and NumpyIndexer.

    !AnnoyIndexer
    with:
        ref_indexer:
            !NumpyIndexer
            with:
                index_filename: wrap-npidx
    

    #599 #589

  • Add a new parameter skip-on-error for the Pods. This argument is used to set up on which level you want jina to skip the errors. Check out more details at jina docs #570

     !ImageReader
     with:
         skip-on-error: 'EXECUTOR'
    

Scalability

  • Multiple improvements have been made to speed up the performance.
    • Improve the performance of NumpyIndexer. The argsort function is replaced by argpartion, which avoids the unnecessary sorting procedure and speed up the querying process. #641

    • Switch to zmqstream for the default message handler, which improves the performance of networking. #618

    • Use uvloop from tornado to improve the event handling speed in the Pods. #615

New Executors

  • Add NGTIndexer. NGT provides high-speed ANN searches for a large volume of data in high dimensional vector data space. #533

     !NGTIndexer
     with:
         index_filename: index.gz
         num_threads: 2
         metric: 'l2'
         epsilon: 0.1
    
  • Add support to running FaissIndexer on GPU and a new argument n_prob for FaissIndexer. Check out more details of the usages at our examples. #636 #638

    !FaissIndexer
    with:
        index_filename: index.gz
        index_key: 'IVF10,PQ4'
        train_filepath: train.gz
        distance: 'l2'
        nprobe: 1
    
  • Add support for Milvus as a new Indexer. Now you can do indexing and querying with MilvusIndexer. [W.I.P] #651

  • Add CustomKerasImageEncoder so that you can use your customized model from keras to encode images in jina. The following yaml config loads the model from path/to/your/model and use output of the layer with the name of awesome/encoding/layer as embedding results. #563

    !CustomKerasImageEncoder
    with:
        model_path: path/to/your/model
        layer_name: awesome/encoding/layer
    
  • Add an argument search_k for AnnoyIndexer. #642

    !AnnoyIndexer
    with:
        index_filename: index.gz
        metric: 'euclidean'
        n_trees: 10
        search_k: -1
    
  • Add FastICAEncoder for encoding. #590

    !FastICAEncoder
    with:
        output_dim: 32,
        num_features: 128,
        whiten: False,
    

Documentation

  • Welcome our evangelist @alexcg1 from New Zealand! He has been working hard on improving document readability, Jina 101, contribution guidelines and README retouches. A new document has been added to guide new contributors. #566

    #564
    #558
    #545

Unit tests

  • Add the coverage testing. Proudly, Jina's current test coverage is 73.04%. #659

鈿狅笍 Breaking Changes

  • Rename port_grpc to port_expose. Now we鈥檝e support both gRPC and RESTful APIs and therefore port_grpc does not live up to its name any longer. port_grpc will be deprecated in the future version. #598

  • Refactor ImageReader to inherit from BaseDocCrafter rather DocSegmenter. In case that you are using ImageReader, check out our examples for more details. #627

  • Refactor Ranker. The TopKFilterDriver is now used to filter out the chunks that do not belong to the top k documents. This driver is attached to Ranker by default. For DocPbIndexer and DataURIPbIndexer, TopKFilterDriver is removed from the default attachment. With k shards, this will leads to n * k results returned from the indexer when querying. #574

  • Remove the password_stdin argument for the jina hub CLI. #569

馃悶Bug Fixes and Other Changes

Flow

  • Fix the search_lines API for the Flow #606

Executors

  • Add a new argument truncation_strategy in BaseTransformerEncoder to adapt the latest Huggingface Transformers v3.0.0. #623
    !TransformerTorchEncoder
    with:
        pooling_strategy: cls
        model_name: distilbert-base-cased
        max_length: 96
        truncation_strategy: longest_first
    
  • Add size property for the indexers. #581

Drivers

  • Add a new driver UnaryEncoderDriver dedicated for testing and debugging. #635

  • Fix the problem of PublishDriver. PublishDriver is used to modify the num_parts when the pod is connect to another by the PUB-SUB connection. However, PublishDriver overwrites the original driver of the pod. #569

  • Remove the if clauses from the Drivers. #646

Protos

  • Add tags field in the Chunk and Document proto. The tags field is a map of strings and is designed to storage the value of the other fields that will be used for the filtering purpose. #574

  • Add location field for the Chunks. location is a list of integers. It can be used to mark the position or string, or the coordinates of an image, or the timestamp of an audio clip. #578

Tests

馃檹 Thanks to our Contributors

This release contains contributions from hanxiao, JoanFM, nan-wang, fhaase2, anish2197, alexcg1, BingHo1013, shivam-raj, Morriaty-The-Murderer, festeh, generall, emmaadesile, coolmian, JamesTang616, and YueLiu-jina

馃檹 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

馃 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.