New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass arguments through promise chain instead of attaching to generator #186
Conversation
fyi bunch of style changes but I'll start reviewing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! OK to merge once the jscs issues are cleaned up.
@jgraff2 have you tested this anywhere with an Argus generator? |
I ran it with the integration tests which include an OAuth generator and mocked auth endpoints. |
src/remoteCollection/collect.js
Outdated
return doCollect(_g); | ||
})); | ||
return getSubjectsForGenerator(generator) | ||
.mapSeries((subject) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this mean the collections will happen sequentially instead of parallel (like before)? If yes, could this have any side effect? What if a repeater process is not finished until next repeater starts? Just throwing my thoughts out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point, that would definitely be a plausible scenario when run sequentially if we had lots of subjects. We don't have any way of detecting or handling that case. Thinking this through, the next repeat cycle would start while the previous one is still finishing up, so the last few requests would be sent in parallel with the first few for the next cycle. The sample upserts aren't sent until all requests have completed, and the requests are tracked in-memory separately for each collection cycle, so the overlapping cycles wouldn't interfere with each other, and they would end up finishing and sending upserts a minute apart, as normal. So it would still work normally, just with some requests being sent in parallel, which is how it was before anyway. The only impact would be the offset from the repeat cycle would be greater. And probably greater memory usage because we would be tracking two collection cycles at once.
So I think if the repeater cycles overlapped it would be fine. However, just the fact that a collection could take a lot longer this way is maybe reason to reconsider.
I did it this way to make sure we only do the token request once. If we did it in parallel, when the token expires, the next collection cycle would send a separate auth request for each subject; this way it only sends it the first time. But maybe this is another case where it would be better to rethink the current approach than try and force the solution to fit it.
The reason this is necessary in the first place is because the OAuth logic is implemented as part of sending the request. We could instead pull that out into a separate function, and run that once before sending any of the requests. I'm not sure exactly how that would work though since we currently rely on the result of the requests to tell us whether the token needs to be regenerated. Maybe when the token expires we just miss that cycle and request a new one next time? Or if any of the requests fail, we re-do all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both for thinking that through. Good question and good answer.
The sample upserts aren't sent until all requests have completed
Does this mean that all the samples could be sitting and waiting because one of the requests might be slow and eventually times out at 30s, but with retries, everything else still waiting... ? So the rest of the samples don't get sent until that slow one finally times out again after all the retries and generates its error samples?
Maybe when the token expires we just miss that cycle and request
a new one next time?
IMO, missing a cycle is not an acceptable solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, they won't get sent until all the promises complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
I moved the token creation and request retries outside the request-specific code, and went back to making the requests in parallel.
So now the by-subject collection flow looks like this:
- make a request to refocus to get subjects from subject query
- If necessary, make a request to the OAuth server to get a new token
- for each subject, in parallel, make a request to the data source
- if any of the requests failed because of an expired token, retry the entire collection cycle, which will generate a new token
- if any of the requests failed for other reasons, do nothing
- generate transform samples from the successful responses, or error samples from the failed ones
This way, all the asynchronous stuff happens outside the subject loop. (subject query, token creation, retries)
I would suggest testing this manually as well since the tests are not very thorough. |
…ct for all subjects in parallel.
@jgraff2 Were you able to test this manually? |
@pallavi2209 the tests here are not very thorough, but the integration tests include an OAuth section that tests the whole sequence. I did run it against those. However, looking at them again I realized they only cover bulk requests. I will add some by-subject tests. I will also see if I can add tests to make sure it's only requesting a new token when necessary. |
Updated: further restructuring of the collection flow to ensure correct error handling for by-subject collection. Previously a token login error would return a single error object, but the collection handler would be expecting an array, which would cause an error instead of generating error samples. Now, we only call the request handler if the requests were actually made; otherwise we handle the error and generate error samples in the repeater-level catch. |
@jgraff2 don't forget to npm publish! |
Published to npm. |
Pass arguments through the promise chain instead of attaching to the generator, so we don't have to clone generators for by-subject collection and we can track the OAuth token on the generator object.
This replaces the global auth tracking fix from 183.
Updated tests here: salesforce/refocus#1182