Feathersjs adapter for Elasticsearch
Clone or download
Latest commit bd04ad1 Dec 16, 2018

README.md

feathers-elasticsearch

Greenkeeper badge

Build Status Dependency Status Download Status

feathers-elasticsearch is a database adapter for Elasticsearch. This adapter is not using any ORM, it is dealing with the database directly through the elasticsearch.js client.

$ npm install --save elasticsearch feathers-elasticsearch

Important: feathers-elasticsearch implements the Feathers Common database adapter API and querying syntax.

Getting Started

The following bare-bones example will create a messages endpoint and connect to a local messages type in the test index in your Elasticsearch database:

const feathers = require('@feathersjs/feathers');
const elasticsearch = require('elasticsearch');
const service = require('feathers-elasticsearch');

app.use('/messages', service({
  Model: new elasticsearch.Client({
    host: 'localhost:9200',
    apiVersion: '5.0'
  }),
  elasticsearch: {
    index: 'test',
    type: 'messages'
  }
}));

Options

The following options can be passed when creating a new Elasticsearch service:

  • Model (required) - The Elasticsearch client instance.
  • elasticsearch (required) - Configuration object for elasticsearch requests. The required properties are index and type. Apart from that you can specify anything that should be passed to all requests going to Elasticsearch. Another recognised property is refresh which is set to false by default. Anything else use at your own risk.
  • paginate [optional] - A pagination object containing a default and max page size (see the Pagination documentation).
  • esVersion (default: '2.4') [optional] - A string indicating which version of Elasticsearch the service is supposed to be talking to. Based on this setting the service will choose compatible API. If you plan on using Elasticsearch 6.0+ features (e.g. join fields) it's quite important to have it set, as there were breaking changes in Elasticsearch 6.0.
  • id (default: '_id') [optional] - The id property of your documents in this service.
  • parent (default: '_parent') [optional] - The parent property, which is used to pass document's parent id.
  • routing (default: '_routing') [optional] - The routing property, which is used to pass document's routing parameter.
  • join (default: undefined) [optional] - Elasticsearch 6.0+ specific. The name of the join field defined in the mapping type used by the service. It is required for parent-child relationship features (e.g. setting a parent, $child and $parent queries) to work.
  • meta (default: '_meta') [optional] - The meta property of your documents in this service. The meta field is an object containing elasticsearch specific information, e.g. _score, _type, _index, _parent, _routing and so forth. It will be stripped off from the documents passed to the service.

Complete Example

Here's an example of a Feathers server that uses feathers-elasticsearch.

const feathers = require('@feathersjs/feathers');
const rest = require('@feathersjs/express/rest');
const express = require('@feathersjs/express');

const service = require('feathers-elasticsearch');
const elasticsearch = require('elasticsearch');

const messageService = service({
  Model: new elasticsearch.Client({
    host: 'localhost:9200',
    apiVersion: '6.0'
  }),
  paginate: {
    default: 10,
    max: 50
  },
  elasticsearch: {
    index: 'test',
    type: 'messages'
  },
  esVersion: '6.0'
});

// Initialize the application
const app = express(feathers());

// Needed for parsing bodies (login)
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
// Enable REST services
app.configure(express.rest());
// Initialize your feathers plugin
app.use('/messages', messageService);
app.use(express.errorHandler());;

app.listen(3030);

console.log('Feathers app started on 127.0.0.1:3030');

You can run this example by using npm start and going to localhost:3030/messages. You should see an empty array. That's because you don't have any messages yet but you now have full CRUD for your new message service!

Supported Elasticsearch specific queries

On top of the standard, cross-adapter queries, feathers-elasticsearch also supports Elasticsearch specific queries.

$all

The simplest query match_all. Find all documents.

query: {
  $all: true
}

$prefix

Term level query prefix. Find all documents which have given field containing terms with a specified prefix (not analyzed).

query: {
  user: {
    $prefix: 'bo'
  }
}

$wildcard

Term level query wildcard. Find all documents which have given field containing terms matching a wildcard expression (not analyzed).

query: {
  user: {
    $wildcard: 'B*b'
  }
}

$regexp

Term level query regexp. Find all documents which have given field containing terms matching a regular expression (not analyzed).

query: {
  user: {
    $regexp: 'Bo[xb]'
  }
}

$exists

Term level query exists. Find all documents that have at least one non-null value in the original field (not analyzed).

query: {
  $exists: ['phone', 'address']
}

$missing

The inverse of exists. Find all documents missing the specified field (not analyzed).

query: {
  $missing: ['phone', 'address']
}

$match

Full text query match. Find all documents which have given given fields matching the specified value (analysed).

query: {
  bio: {
    $match: 'javascript'
  }
}

$phrase

Full text query match_phrase. Find all documents which have given given fields matching the specified phrase (analysed).

query: {
  bio: {
    $phrase: 'I like JavaScript'
  }
}

$phrase_prefix

Full text query match_phrase_prefix. Find all documents which have given given fields matching the specified phrase prefix (analysed).

query: {
  bio: {
    $phrase_prefix: 'I like JavaS'
  }
}

$child

Joining query has_child. Find all documents which have children matching the query. The $child query is essentially a full-blown query of its own. The $child query requires $type property.

Elasticsearch 6.0 change

Prior to Elasticsearch 6.0, the $type parameter represents the child document type in the index. As of Elasticsearch 6.0, the $type parameter represents the child relationship name, as defined in the join field.

query: {
  $child: {
    $type: 'blog_tag',
    tag: 'something'
  }
}

$parent

Joining query has_parent. Find all documents which have parent matching the query. The $parent query is essentially a full-blown query of its own. The $parent query requires $type property.

Elasticsearch 6.0 change

Prior to Elasticsearch 6.0, the $type parameter represents the parent document type in the index. As of Elasticsearch 6.0, the $type parameter represents the parent relationship name, as defined in the join field.

query: {
  $parent: {
    $type: 'blog',
    title: {
      $match: 'javascript'
    }
  }
}

$and

This operator does not translate directly to any Elasticsearch query, but it provides support for Elasticsearch array datatype. Find all documents which match all of the given criteria. As any field in Elasticsearch can contain an array, therefore sometimes it is important to match more than one value per field.

query: {
  $and: [
    { notes: { $match: 'javascript' } },
    { notes: { $match: 'project' } }
  ]
}

There is also a shorthand version of $and for equality. For instance:

query: {
  $and: [
    { tags: 'javascript' },
    { tags: 'react' }
  ]
}

Can be also expressed as:

query: {
  tags: ['javascript', 'react']
}

$sqs

simple_query_string. A query that uses the SimpleQueryParser to parse its context. Optional $operator which is set to or by default but can be set to and if required.

query: {
  $sqs: {
    $fields: [
      'title^5',
      'description'
    ],
    $query: '+like +javascript',
    $operator: 'and'
  }
}

This can also be expressed in an URL as the following:

http://localhost:3030/users?$sqs[$fields][]=title^5&$sqs[$fields][]=description&$sqs[$query]=+like +javascript&$sqs[$operator]=and

Parent-child relationship

Elasticsearch supports parent-child relationship however it is not exactly the same as in relational databases. To make things even more interesting, the relationship principles were slightly different up to (version 5.6)[https://www.elastic.co/guide/en/elasticsearch/reference/5.6/mapping-parent-field.html] and from (version 6.0+)[https://www.elastic.co/guide/en/elasticsearch/reference/6.0/parent-join.html] onwards.

Even though Elasticsearch's API changed in that matter, feathers-elasticsearch offers consistent API across those changes. That is actually the main reason why the esVersion and join service options have been introduced (see the "Options" section of this manual). Having said that, it is important to notice that there are but subtle differences, which are outline below and in the description of $parent and $child queries.

Overview

feathers-elasticsearch supports all CRUD operations for Elasticsearch types with parent mapping, and does that with the Elasticsearch constrains. Therefore:

  • each operation concering a single document (create, get, patch, update, remove) is required to provide parent id
  • creating documents in bulk (providing a list of documents) is the same as many single document operations, so parent id is required as well
  • to avoid any doubts, none of the query based operations (find, bulk patch, bulk remove) can use the parent id

Elasticsearch <= 5.6

Parent id should be provided as part of the data for the create operations (single and bulk):

postService.create({
  _id: 123,
  text: 'JavaScript may be flawed, but it\'s better than Java anyway.'
});

commentService.create({
  _id: 1000,
  _parent: 123,
  text: 'You cannot be serious.'
})

Please note, that name of the parent property (_parent by default) is configurable through the service options, so that you can set it to whatever suits you.

For all other operations (get, patch, update, remove), the parent id should be provided as part of the query:

childService.remove(
  1000,
  { query: { _parent: 123 } }
);

Elasticsearch >= 6.0

As the parent-child relationship changed in Elasticsearch 6.0, it is now expressed by the join datatype. Everything said above about the parent id holds true, although there is one more detail to be taken into account - the relationship name.

Let's consider the following mapping:

{
  mappings: {
    doc: {
      properties: {
        text: {
          type: 'text'
        },
        my_join_field: { 
          type: 'join',
          relations: {
            post: 'comment' 
          }
        }
      }
    }
  }
}

Parent id (for children) and relationship name (for children and parents) should be provided for as part of the data for the create operations (single and bulk):

docService.create({
  _id: 123,
  text: 'JavaScript may be flawed, but it\'s better than Java anyway.',
  my_join_field: 'post'
});

docService.create({
  _id: 1000,
  _parent: 123,
  text: 'You cannot be serious.',
  my_join_field: 'comment'
})

Please note, that name of the parent property ('_parent' by default) and the join property (undefined by default) are configurable through the service options, so that you can set it to whatever suits you.

For all other operations (get, patch, update, remove), the parent id should be provided as part of the query:

docService.remove(
  1000,
  { query: { _parent: 123 } }
);

Supported Elasticsearch versions

feathers-elasticsearch is currently tested on Elasticsearch 2.4, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 6.0, 6.1 and 6.2. Please note, even though the lowest version supported is 2.4, that does not mean it wouldn't work fine on anything lower than 2.4.

Quirks

Updating and deleting by query

Elasticsearch is special in many ways. For example, the "update by query" API is still considered experimental and so is the "delete by query" API introduced in Elasticsearch 5.0.

Just to clarify - update in Elasticsearch is an equivalent to patch in feathers. I will use patch from now on, to set focus on the feathers side of the fence.

Considering the above, our implementation of path / remove by query uses combo of find and bulk patch / remove, which in turn means for you:

  • Standard pagination is taken into account for patching / removing by query, so you have no guarantee that all existing documents matching your query will be patched / removed.
  • The operation is a bit slower than it could potentially be, because of the two-step process involved.

Considering, however that elasticsearch is mainly used to dump data in it and search through it, I presume that should not be a great problem.

Search visibility

Please be aware that search visibility of the changes (creates, updates, patches, removals) is going to be delayed due to Elasticsearch index.refresh_interval setting. You may force refresh after each operation by setting the service option elasticsearch.refresh as decribed above but it is highly discouraged due to Elasticsearch performance implications.

Full-text search

Currently feathers-elasticsearch supports most important full-text queries in their default form. Elasticsearch search allows additional parameters to be passed to each of those queries for fine-tuning. Those parameters can change behaviour and affect peformance of the queries therefore I believe they should not be exposed to the client. I am considering ways of adding them safely to the queries while retaining flexibility.

Performance considerations

None of the data mutating operations in Elasticsearch v2.4 (create, update, patch, remove) returns the full resulting document, therefore I had to resolve to using get as well in order to return complete data. This solution is of course adding a bit of an overhead, although it is also compliant with the standard behaviour expected of a feathers database adapter.

The conceptual solution for that is quite simple. This behaviour will be configurable through a lean switch allowing to get rid of those additional gets should they be not needed for your application. This feature will be added soon as well.

Upsert capability

An upsert parameter is available for the create operation that will update the document if it exists already instead of throwing an error.

postService.create({
  _id: 123,
  text: 'JavaScript may be flawed, but it\'s better than Ruby.'
},
{ 
  upsert: true
})

Additionally, an upsert parameter is also available for the update operation that will create the document if it doesn't exist instead of throwing an error.

postService.update(123, {
  _id: 123,
  text: 'JavaScript may be flawed, but Feathers makes it fly.'
},
{ 
  upsert: true
})

Born out of need

feathers-elasticsearch was born out of need. When I was building Hacker Search (a real time search engine for Hacker News), I chose Elasticsearch for the database and Feathers for the application framework. All well and good, the only snag was a missing adapter, which would marry the two together. I decided to take a detour from the main project and create the missing piece. Three weeks had passed and the result was... another project (typical, isn't it). Everything went to plan however, and Hacker Search has been happily using feathers-elasticsearch since its first release.

If you want to see the adapter in action, jump on Hacker Search and watch the queries sent from the client. Feel free to play around with the API.

License

Copyright (c) 2018

Licensed under the MIT license.