Elasticsearch R DSL
R Makefile
Permalink
Failed to load latest commit information.
R moved n, size, and sort to new files, added egs to pkg level man file Jul 24, 2016
inst rework pkg - remove filter methods from the pkg into inst/ignore Jun 3, 2016
man-roxygen first commit Feb 22, 2015
man moved n, size, and sort to new files, added egs to pkg level man file Jul 24, 2016
tests rework pkg - remove filter methods from the pkg into inst/ignore Jun 3, 2016
vignettes rework pkg - remove filter methods from the pkg into inst/ignore Jun 3, 2016
.Rbuildignore rbuildignore .deb files on elasticsearch install on travis Feb 29, 2016
.gitignore first commit Feb 22, 2015
.travis.yml udpate travis file to install/start elasticsearch Feb 29, 2016
CONDUCT.md added code of conduct Oct 9, 2015
DESCRIPTION warn users about security, fix #3 Jul 24, 2016
LICENSE bumped license year to 2016 Jan 3, 2016
Makefile first commit Feb 22, 2015
NAMESPACE moved n, size, and sort to new files, added egs to pkg level man file Jul 24, 2016
README.Rmd warn users about security, fix #3 Jul 24, 2016
README.md warn users about security, fix #3 Jul 24, 2016
appveyor.yml update appveyor slack notification Feb 15, 2016
elasticdsl.Rproj

README.md

elasticdsl

Build Status Build status

An R DSL for Elasticsearch

Elasticsearch info

Security

You're fine running ES locally on your machine, but be careful just throwing up ES on a server with a public IP address - make sure to think about security.

  • Shield - This is a paid product provided by Elastic - so probably only applicable to enterprise users
  • DIY security - there are a variety of techniques for securing your Elasticsearch. A number of resources are collected in a blog post - tools include putting your ES behind something like Nginx, putting basic auth on top of it, using https, etc.

Install elasticdsl

install.packages("devtools")
devtools::install_github("ropensci/elasticdsl")
library('elasticdsl')

Setup

Instructions for installing, upgrading, starting Elasticsearch, and loading example data at ropensci/elastic

Initialization

The function elastic::connect() is used before doing anything else to set the connection details to your remote or local elasticdslsearch store. The details created by connect() are written to your options for the current session, and are used by elasticdsl functions.

elastic::connect(es_port = 9200)
#> url:       http://127.0.0.1 
#> port:      9200 
#> username:  NULL 
#> password:  NULL 
#> elasticsearch details:   
#>    status:                  200 
#>    name:                    Gloom 
#>    Elasticsearch version:   1.7.2 
#>    ES version timestamp:    2015-09-14T09:49:53Z 
#>    lucene version:          4.10.4

Set the index to use

index("shakespeare")
#> <index> shakespeare 
#>   type: 
#>   mappings: 
#>     line: 
#>       line_id: long 
#>       line_number: string 
#>       play_name: string 
#>       speaker: string 
#>       speech_number: long 
#>       text_entry: string 
...

Print query as pretty json

index("shakespeare") %>%
  filter() %>% 
  ids(c(1, 2, 150)) %>%
  explain() # doesn't exist yet

Execute query

res <- index("shakespeare") %>%
  filter() %>% 
  ids(c(1, 2)) %>%
  exec()

n() to get number of results

index("shakespeare") %>%
  ids(c(1, 2)) %>%
  exec() %>% 
  n()
#> [1] 2

Request size

index("shakespeare") %>%
  filter() %>% 
  prefix(speaker = "we") %>%
  size(2) %>% 
  fields(play_name) %>% 
  exec() %>% 
  n()
#> [1] 44

Request certain fields

s <- index("shakespeare") %>%
  filter() %>% 
  prefix(speaker = "we") %>%
  size(2)
s %>% fields(play_name) %>% exec() %>% .$hits %>% .$hits
#> [[1]]
#> [[1]]$`_index`
#> [1] "shakespeare"
#> 
#> [[1]]$`_type`
#> [1] "line"
#> 
#> [[1]]$`_id`
#> [1] "42"
#> 
...
s %>% fields(play_name, text_entry) %>% exec() %>% .$hits %>% .$hits
#> [[1]]
#> [[1]]$`_index`
#> [1] "shakespeare"
#> 
#> [[1]]$`_type`
#> [1] "line"
#> 
#> [[1]]$`_id`
#> [1] "42"
#> 
...
s %>% fields(play_name, text_entry, line_id) %>% exec() %>% .$hits %>% .$hits
#> [[1]]
#> [[1]]$`_index`
#> [1] "shakespeare"
#> 
#> [[1]]$`_type`
#> [1] "line"
#> 
#> [[1]]$`_id`
#> [1] "42"
#> 
...

Filters vs. queries

Filters are boolean queries and are much more computationally efficient than queries.

Filters

prefix filter

index("shakespeare") %>%
  filter() %>% 
  prefix(speaker = "we") %>%
  exec() %>% 
  n()
#> [1] 44

ids filter

index("shakespeare") %>%
  filter() %>% 
  ids(c(1, 2, 150)) %>%
  exec() %>% 
  n()
#> [1] 3

Queries

geoshape query (filters have a much larger range of geo queries)

index("geoshape") %>%
  geoshape(field = "location", type = "envelope", coordinates = list(c(-30, 50), c(30, 0))) %>% 
  n()
#> [1] 10

Meta

  • Please report any issues or bugs
  • License: MIT
  • Get citation information for elasticdsl in R doing citation(package = 'elasticdsl')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

rofooter