Any Tutorial / Example for writing a Presto-ElasticSearch connector ? #3057

sumanth232 · 2015-06-09T13:14:28Z

I want to write an ElasticSearch connector to perform JOINS in ElasticSearch using Presto.
Can anyone pls suggest on how to start. Any guidance will be of lot of help.

Downchuck · 2015-06-09T20:03:52Z

ElasticSearch has several Java clients available -- use the scroll method for large result sets:
https://www.elastic.co/guide/en/elasticsearch/guide/master/scan-scroll.html

Crate.io has an SQL adapter on top of the ElasticSearch code base, which could be of some help.

sumanth232 · 2015-06-10T09:41:40Z

Crate does not support JOINs.
Its mentioned in the FAQ - https://crate.io/docs/faq/

Q: Does Crate support JOINs?

Not yet. JOINs are on our roadmap and we try to do them the right way in usual quality and well performing.
For a lot of use cases there are other ways to achieve the same result as with JOINs. The best way to start is to take a look at the ARRAY and OBJECT data types to denormalise your data.

dain · 2015-06-10T16:12:09Z

I would start by forking the https://github.com/facebook/presto/tree/master/presto-example-http plugin, and adapting it to be able to read from elastic search. Last time I used elastic search the apis were all REST based so the presto-example-http plugin should be pretty close to what you need. Once you get that working, you’ll want to work on getting predicate push down working, but I’d start by just getting it to read at all.

-dain

On Jun 9, 2015, at 6:14 AM, Sumanth Bandi notifications@github.com wrote:

I want to write an ElasticSearch connector to perform JOINS in ElasticSearch using Presto.
Can anyone pls suggest on how to start. Any guidance will be of lot of help.

—
Reply to this email directly or view it on GitHub.

dain · 2015-06-24T17:46:48Z

In the legacy SPI that the example connector implements, a table is logically divided in partitions and partitions are divided into splits. A partition can provide a TupleDomain which describes the bounds of the values present in the partition which Presto can use to skip sections of the table that can not match the filter predicate. A split is simply a part of a partition.

Presto will enumerate and filter the partitions and then enumerate the splits for the partitions. Then Presto reads data in parallel from splits.

If your system does not support parallel reading, simply return a single Partition and a single Split. If your system has a more sophisticated physical layout, you will want to use the new TableLayouts SPI so that Presto can take advantage of the data organization.

sumanth232 · 2015-06-25T13:47:47Z

I wrote a basic connector with the necessary classes implemented. I also added a .properties file in 'etc/catalog' and
also edited the plugin.bundles 'etc/config.properties' file : (added ../presto-elasticsearch/pom.xml)

plugin.bundles=\
  ../presto-raptor/pom.xml,\
  ../presto-hive-cdh4/pom.xml,\
  ../presto-example-http/pom.xml,\
  ../presto-kafka/pom.xml, \
  ../presto-tpch/pom.xml,\
  ../presto-elasticsearch/pom.xml,\
  ../presto-mysql/pom.xml

but I get this error :

2015-06-25T19:16:22.214+0530    INFO    main    com.facebook.presto.metadata.CatalogManager -- Loading catalog etc/catalog/elasticsearch.properties --
2015-06-25T19:16:22.215+0530    ERROR   main    com.facebook.presto.server.PrestoServer No factory for connector elasticsearch
java.lang.IllegalArgumentException: No factory for connector elasticsearch

The problem is here :

private void loadPlugin(URLClassLoader pluginClassLoader)
            throws Exception
    {
        ServiceLoader<Plugin> serviceLoader = ServiceLoader.load(Plugin.class, pluginClassLoader);
        List<Plugin> plugins = ImmutableList.copyOf(serviceLoader);

        if (plugins.isEmpty()) {
            log.warn("No service providers of type %s", Plugin.class.getName());
        }

        for (Plugin plugin : plugins) {
            log.info("Installing %s", plugin.getClass().getName());
            installPlugin(plugin);
        }
    }

The size of plugins when loading this new plugin is 0, whereas for other old plugins , it is 1

List<Plugin> plugins = ImmutableList.copyOf(serviceLoader);

Can you please help, why the first 2 lines of this code are not working as expected ?

Can you please elaborate on this part of the Developer Docs, which I could not understand properly ?

Each plugin identifies an entry point: an implementation of the Plugin interface. 
This class name is provided to Presto via the standard Java ServiceLoader interface: 
the classpath contains a resource file named com.facebook.presto.spi.Plugin in the META-INF/services directory. 
The content of this file is a single line listing the name of the plugin class:

How should I provide the classname of my new plugin to presto ?

sumanth232 · 2015-06-25T15:08:08Z

The above problem solved after I added a file with the name 'com.facebook.presto.spi.Plugin' in the 'META-INF/services' directory :

presto/presto-elasticsearch/src/main/resources/META-INF/services/com.facebook.presto.spi.Plugin

But, I observed that for the other connectors (except tpch), the same file is present in a different directory :

presto/presto-kafka/target/classes/META-INF/services/com.facebook.presto.spi.Plugin
presto/presto-raptor/target/classes/META-INF/services/com.facebook.presto.spi.Plugin
...
...
kafka and raptor do not have a 'src/main/resources/META-INF/services' directory at all

Then how is the serviceloader loading the connectors kafka and raptor ?
Can anybody please give an explanation ?

sumanth232 · 2015-07-03T11:11:13Z

Pls tell me how to accurately implement these 3 function in the 'RecordCursor' interface while writing a connector

    long getTotalBytes();

    long getCompletedBytes();

    long getReadTimeNanos();

Please help ... pls..pls..pls..

electrum · 2015-07-03T16:23:34Z

Those functions are only for stats. If they don't mean anything for your connector or that info is not available just return 0.

sumanth232 · 2015-07-06T10:41:58Z

@electrum , @dain : Does Presto support dynamic columns in Tables, (for example, data stores which contain JSON documents, where new properties can be added in a JSON doc residing in an index/type) ?
In the Example connector, all the columns are hardcoded in 'example-metadata.json'. what if new columns are added in the csv doc ? How to handle these newly added columns in the csv (or new properties in JSON docs in elasticsearch indices) without restarting the Presto server every time a new column is added in a table ? Can this be handled by Presto at all ? Any suggestions will be of great help.
Thanks.

RobinUS2 · 2015-11-19T16:00:21Z

What's the status on this one? Any progress made? Thanks!

sumanth232 · 2015-11-19T16:22:40Z

@RobinUS2 , here is a basic version of the connector. It needs to be developed further and optimised.
#3240

corneversloot · 2016-02-08T08:27:56Z

Slightly off topic but still relevant for people looking into this topic; we have released a first version of a JDBC driver for Elasticsearch called sql4es. It supports most common SQL statements and can be use from any system supporting the JDBC interface.

ebuildy · 2016-10-25T08:32:08Z

Should not be better to have a connector to Apache Lucene instead Elasticsearch HTTP API ?

BTW you could use Hive external table elastisearch (via elasticsearch-hadoop official hive connector) and query it from PrestoDB.

corneversloot · 2016-10-25T11:24:18Z

Well, first of all the driver uses the transport API and not the HTTP one. I think you actually do want to use Elasticsearch because it provides distributed query execution and high availability.

The Hive connection you mention should work I think although I must admit i have never used it.

rohanarora0921 · 2017-06-12T19:25:32Z

@sumanth232 Are you still working on this? Any progress?

eulalie367 · 2017-07-21T22:28:48Z

https://github.com/albertocsm/presto/tree/master/presto-elasticsearch

dzen · 2017-08-03T07:49:39Z

I found this other fork today : https://github.com/ebyhr/presto, with an elastic branch.

findepi · 2018-06-12T09:30:32Z

This issue is obsolete now, closing.

As to the elasticsearch connector, if the above mentioned implementations are applicable to general audience, it would be valuable to have one in presto codebase.

haitaoyao mentioned this issue Nov 27, 2015

Presto Connector elastic/elasticsearch-hadoop#379

Closed

findepi closed this as completed Jun 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any Tutorial / Example for writing a Presto-ElasticSearch connector ? #3057

Any Tutorial / Example for writing a Presto-ElasticSearch connector ? #3057

sumanth232 commented Jun 9, 2015

Downchuck commented Jun 9, 2015

sumanth232 commented Jun 10, 2015

dain commented Jun 10, 2015

dain commented Jun 24, 2015

sumanth232 commented Jun 25, 2015

sumanth232 commented Jun 25, 2015

sumanth232 commented Jul 3, 2015

electrum commented Jul 3, 2015

sumanth232 commented Jul 6, 2015

RobinUS2 commented Nov 19, 2015

sumanth232 commented Nov 19, 2015

corneversloot commented Feb 8, 2016

ebuildy commented Oct 25, 2016

corneversloot commented Oct 25, 2016

rohanarora0921 commented Jun 12, 2017

eulalie367 commented Jul 21, 2017

dzen commented Aug 3, 2017 •

edited

Loading

findepi commented Jun 12, 2018

Any Tutorial / Example for writing a Presto-ElasticSearch connector ? #3057

Any Tutorial / Example for writing a Presto-ElasticSearch connector ? #3057

Comments

sumanth232 commented Jun 9, 2015

Downchuck commented Jun 9, 2015

sumanth232 commented Jun 10, 2015

dain commented Jun 10, 2015

dain commented Jun 24, 2015

sumanth232 commented Jun 25, 2015

sumanth232 commented Jun 25, 2015

sumanth232 commented Jul 3, 2015

electrum commented Jul 3, 2015

sumanth232 commented Jul 6, 2015

RobinUS2 commented Nov 19, 2015

sumanth232 commented Nov 19, 2015

corneversloot commented Feb 8, 2016

ebuildy commented Oct 25, 2016

corneversloot commented Oct 25, 2016

rohanarora0921 commented Jun 12, 2017

eulalie367 commented Jul 21, 2017

dzen commented Aug 3, 2017 • edited Loading

findepi commented Jun 12, 2018

dzen commented Aug 3, 2017 •

edited

Loading