Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc_ids don't line up using the docs_bulk function #123

Closed
jhendric98 opened this issue Mar 23, 2016 · 8 comments
Closed

doc_ids don't line up using the docs_bulk function #123

jhendric98 opened this issue Mar 23, 2016 · 8 comments
Assignees
Milestone

Comments

@jhendric98
Copy link

Loading an elasticsearch index from a data.frame of about 125,000 observations and using the first variable as the argument for the doc_ids doesn't load the doc_ids correctly. It seems to load one row off.

docs_bulk(product,index = "model",type = "hd", doc_ids = product$productid)

product has 2 variables => productid and description

_id : 503
productid: 504
descscription: "4 drawer dresser, black"

My first productid is 101 and the first _id gets 100 assigned.

@sckott
Copy link
Contributor

sckott commented Mar 23, 2016

Thanks for the report. I remember there being a good reason to force zero based document IDs, so I do that https://github.com/ropensci/elastic/blob/master/R/docs_bulk.r#L226 when the ids are numeric. However, for user supplied I guess we should not do that

@sckott
Copy link
Contributor

sckott commented Mar 23, 2016

I wish I could remember what that reason was

@jhendric98
Copy link
Author

In every use case I currently have, the document id has significance as a relational key field that ties more than a single incoming dataset together. In the use case above, the productid is created from external systems that transactionally manage a product master. I have to provide search capacity to the underlying RDBMS as the search load on the system is overcoming the capacity to conduct transactions.

I'm sure there are use cases where simple sequential ids are a valid scheme, but one could use the default _id.

@sckott sckott added this to the v0.7 milestone Mar 23, 2016
@sckott sckott self-assigned this Mar 23, 2016
sckott added a commit that referenced this issue Mar 23, 2016
at 1 or at whatever the user supplies
updated docs for this fxn with more details about document ids, and noted possible
change to doc_ids in the future to default to UUIDs
bumped dev version
@sckott
Copy link
Contributor

sckott commented Mar 23, 2016

@jhendric98 okay reinstall like devtools::install_github("ropensci/elastic"), and try again

You should get the same doc ids you pass in to the function

see also #125

@jhendric98
Copy link
Author

great. I had just forked the project myself. I'll test it out in just a few hours when I get to my computer and let you know.

@jhendric98
Copy link
Author

Worked fine thanks.

@sckott
Copy link
Contributor

sckott commented Mar 24, 2016

great!

@sckott
Copy link
Contributor

sckott commented Mar 25, 2016

closing, pushing #125 soon

@sckott sckott closed this as completed Mar 25, 2016
@sckott sckott changed the title doc_ids don't line up using the docs_bulk function. doc_ids don't line up using the docs_bulk function Jul 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants