Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#987 Use ElasticSearch Java Client API #1181

Merged
merged 15 commits into from
Jun 4, 2019

Conversation

az79nefy
Copy link
Contributor

@az79nefy az79nefy commented May 16, 2019

What's in the PR

  • restore functionality of randomizing the query results
  • fix JavaDoc compilation issue

How to test manually

  • Execute external search queries

Automatic testing

  • PR includes unit tests

Documentation

  • PR updates documentation

- replaced manual creation of ElasticSearch queries with Java Client API
@az79nefy az79nefy self-assigned this May 16, 2019
@az79nefy az79nefy added this to In progress in External Search [M2] via automation May 16, 2019
@az79nefy az79nefy added this to the 0.10.0 milestone May 16, 2019
@az79nefy az79nefy added the WIP label May 16, 2019
- Enforce consistent version of ElasticSearch across modules
- Removed unused JSON classes
@reckart
Copy link
Member

reckart commented May 22, 2019

Jenkins, can you test this please?

@reckart
Copy link
Member

reckart commented May 24, 2019

@rgabbard thanks for checking in. We'll review if/how the randomness is preserved. ATM the build fails because of some odd JavaDoc problem.... once that is fixed, I'll have another look at the code.

@gabbard
Copy link

gabbard commented May 24, 2019

@reckart : Thanks - we have an intern @qpwo who is looking into isi-vista/curated-training-annotator#16 ; I'd be happy to have him test/fix the random ranking on top of this branch if it won't get in @az79nefy 's way.

@az79nefy
Copy link
Contributor Author

Sorry, I have not finished this PR yet, it is still work in progress. I have not come around to do random ranking yet, but if @qpwo wants to look into it, I'd be happy for the help.

- Switch from transport to REST high-level client
- Set ElasticSearch version such that its Lucene dependency aligns with the one we have for MTAS and RDF4J
@reckart
Copy link
Member

reckart commented May 24, 2019

I have switched the code to use the REST high-level client which seems to be the API of choice these days for ElasticSearch.

I tried all kinds of tricks to fix the problem that ElasticSearch imports cannot be resolved by JavaDoc, but without success. When building using mvn -Ddebug javadoc:aggregate, a file target/site/apidocs/options is created which contains a list of all the JAR files passed to the JavaDoc compiler. While it contains many ElasticSearch JARs, org.elasticsearch:elasticsearch is not among them. This causes the imports not to resolve. I have tried:

  • switching to older and newer versions of ElasticSearch
  • changing the position of the ElasticSearch dependencies in the pom.xml file
  • explicitly adding the type jar to the org.elasticsearch:elasticsearch dependency

... none worked. The RDF4J people hit the same problem though... maybe someone over there finds a solution: eclipse-rdf4j/rdf4j#1148

jcklie and others added 6 commits May 27, 2019 16:49
- added ability to do random ranking of query results
- introduced constant for strings such as "metadata"
- added querying of individual documents (GetRequest) for Java API
- removed more unused JSON classes
- added test to assert that retrieved document text is not null
- the key of the highlights map is the default field
- modified documentation about the expected fields of the ES document
@qpwo qpwo mentioned this pull request May 29, 2019
2 tasks
qpwo and others added 2 commits May 31, 2019 09:33
* made random search constantify the match quality so that result orderings are properly random. for issue #16
* added random seeding (need to change fixed seed)
* switch to aTraits.getSeed(), which still needs to somehow be different for different users
@reckart
Copy link
Member

reckart commented Jun 4, 2019

For the time being, I am disabling the javadoc:aggregate target on Jenkins which trips over the ElasticSearch dependency - we don't ship the aggregate docs anyway. That should hopefully fix the build...

@ukp-svc-jenkins
Copy link

49% (0.0%) vs master 49%

@reckart
Copy link
Member

reckart commented Jun 4, 2019

@az79nefy are there any code changes here that are left to be done?

@az79nefy
Copy link
Contributor Author

az79nefy commented Jun 4, 2019

Nope, no changes left to be done. This PR can be merged.

@az79nefy az79nefy added Needs Review and removed WIP labels Jun 4, 2019
@reckart reckart merged commit 33723a3 into master Jun 4, 2019
External Search [M2] automation moved this from In progress to Done Jun 4, 2019
@reckart reckart deleted the feature/987-Use-ElasticSearch-Java-Client-API branch June 4, 2019 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Use ElasticSearch Java Client API
6 participants