Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presto_Elasticsearch_connector #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 132 additions & 0 deletions PRESTO_ELASTICSEARCH_CONNECTOR_TUTORIAL
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# ELASTICSEARCH CONNECTOR TUTORIAL

## Overview

The Elasticsearch Connector allows access to Elasticsearch data from Presto. This tutorial describes how to set up the Elasticsearch Connector to run SQL queries against Elasticsearch.

### Note:
Elasticsearch 6.0.0 or later is required.

## Installation

Step 1: Install Elasticsearch

Download and extract Elasticsearch

## Note:
This tutorial was tested with Elasticsearch Version: 8.8.1

Run the Elasticsearch with the following command
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest "Run Elasticsearch with the following command" or maybe "To run Elasticsearch, run the following command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a docker compose example.



$ bin/elasticsearch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider formatting commands the user must enter with a pair of ` before and after, like this: bin/elasticsearch. Consider this formatting throughout the tutorial to mark commands.

[2023-06-21T11:14:25,557][INFO ][o.e.n.Node ] [ip-172-31-18-24.ap-south-1.compute.internal] version[8.8.1], pid[6057], build[tar/f8edfccba429b6477927a7c1ce1bc6729521305e/2023-06-05T21:32:25.188464208Z], OS[Linux/6.1.29-50.88.amzn2023.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/20.0.1/20.0.1+9-29]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the initial output of the command bin/elasticsearch, suggest introducing this with something like "The output should be similar to the following", or some other description.

....

This will start the Elasticsearch on port 9200


## Interacting with ElasticSearch:

To check whether ElasticSearch has correctly installed and started locally, use the following URL in browser :

http://localhost:9200/



It should show you an output like:


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider formatting this as a code block.

{
"name" : "ip-172-31-18-24.ap-south-1.compute.internal",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "wY9BlfS-TKqTaCRpMi8kDQ",
"version" : {
"number" : "8.8.1",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "f8edfccba429b6477927a7c1ce1bc6729521305e",
"build_date" : "2023-06-05T21:32:25.188464208Z",
"build_snapshot" : false,
"lucene_version" : "9.6.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}






## Adding data to index

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce this command and describe what it will do: add data for (what?) into the index table?
Does the index table exist before this command is run? If not, maybe describe it as "Create the index table and..."

Also maybe describe what the output, if any, at the command line is for the successful command.

curl -XPOST -H "Content-Type: application/json" localhost:9200/samples/_bulk -d '
{"index": {"_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
'


[ec2-user@ip-172-31-18-24 elasticsearch-8.8.1]$ curl -XPOST -H "Content-Type: application/json" localhost:9200/samples/_bulk -d '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion to explain what this command does, and what the output looks like.

Suggest deleting "[ec2-user@ip-172-31-18-24 elasticsearch-8.8.1]$" unless the prompt is important here.

{"index": {"_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
'
{"took":87,"errors":false,"items":[{"index":{"_index":"samples","_id":"975463711","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":3,"_primary_term":4,"status":201}},{"index":{"_index":"samples","_id":"975463943","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":4,"status":201}}]}



## Deploying Presto

Deploy presto using below Documentation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Deploy presto using below Documentation
Deploy Presto. See :doc:`Deploy Presto </installation/deployment>`.


https://prestodb.io/docs/current/installation/deployment.html
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this URI when you format the previous line as an active link.


Elasticsearch Connector Configuration#

To configure the Elasticsearch connector, create a catalog properties file etc/catalog/elasticsearch.properties with the following contents, replacing the properties as appropriate:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest formatting "etc/catalog/elasticsearch.properties" as inline code.


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest code block

connector.name=elasticsearch
elasticsearch.host=localhost
elasticsearch.port=9200
elasticsearch.default-schema-name=default



## Presto-Command Line Interface

The Presto CLI provides a terminal-based interactive shell for running queries. The CLI is a self-executing JAR file, which means it acts like a normal UNIX executable.

Download presto-cli-0.281-executable.jar, rename it to presto, make it executable with chmod +x, then run it:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Download presto-cli-0.281-executable.jar, rename it to presto, make it executable with chmod +x, then run it:
Download :maven_download:`cli`.
Rename the JAR file to ``presto`` with the following command:
mv :maven_download:`cli` presto
Use ``chmod +x`` to make the renamed file executable:
chmod +x presto


./presto --server localhost:8080 --catalog elasticsearch --schema default
Run the CLI with the --help option to see the available options.

$ ./presto --server localhost:8080 --catalog elasticsearch --schema default

presto> show tables in elasticsearch.default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Explain what the user is doing. "At the Presto prompt, run the following command:
  2. Suggest formatting the command on the next line as code

Table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest code block

--------------
samples
single_index
(2 rows)

Query 20230621_131210_00019_gm3cm, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
[Latency: client-side: 249ms, server-side: 240ms] [2 rows, 53B] [8 rows/s, 220B/s]

presto> select * from elasticsearch.default.samples;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Susggest the same as above: tell the user what they are doing, and where (the Presto prompt), format the command as code, and format the following output as a code block.

amount | client_store_sk | id | quantity | school | school2 | students
--------+-----------------+-----------+----------+--------+---------+----------
NULL | NULL | NULL | NULL | NULL | NULL | 50000
480 | 1109 | 975463711 | 2 | NULL | NULL | NULL
2105 | 1109 | 975463943 | 2 | NULL | NULL | NULL
(3 rows)

Query 20230622_071755_00006_58nhi, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
[Latency: client-side: 465ms, server-side: 442ms] [3 rows, 165B] [6 rows/s, 373B/s]