Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite golr loader to convert results to SolrInputDocument instead of JSON #32

Closed
kshefchek opened this issue Jun 6, 2017 · 2 comments
Assignees

Comments

@kshefchek
Copy link
Contributor

kshefchek commented Jun 6, 2017

Currently we serialize scigraph results as a list of JSON documents, and then post the entire JSON file to solr. This has worked well in the past, but seems to cause issues as these JSON documents have gotten larger.

As an alternative, we can use the SolrJ API to construct SolrInputDocument objects and post these the server in batches of 100k or so. It's unclear if this will result in any performance boost, as SolrJ appears to be sending them to the server using http regardless. At a minimum this should fix #27 and possibly #30.

As a test I've reworked the golr worker to convert the JSON documents to SolrInputDocuments - which is a pretty minimal change but is slightly less performant. After chatting with @kltm and @cmungall I'm planning on moving ahead with the larger refactor of removing the JSON intermediate step.

@benwbooth I'm wondering if it will conflict with your work on #17 ?

@kshefchek kshefchek self-assigned this Jun 6, 2017
@benwbooth
Copy link
Contributor

I don't think it would. I'm only adding some optional keys to the yaml, then adding extra closure relationships to the query. In any case, I'm happy to do any required refactoring as needed.

@kshefchek
Copy link
Contributor Author

Fixed with #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants