-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IQSS/9126- Fix workflow token access #9129
IQSS/9126- Fix workflow token access #9129
Conversation
//@EJB | ||
|
||
// TODO: | ||
// versions? -- L.A. 4.0 beta 10 | ||
@Path("datafile/bundle/{fileId}") | ||
@GET | ||
@Produces({"application/zip"}) | ||
public BundleDownloadInstance datafileBundle(@PathParam("fileId") String fileId, @QueryParam("fileMetadataId") Long fileMetadataId,@QueryParam("gbrecs") boolean gbrecs, @QueryParam("key") String apiToken, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) /*throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ { | ||
public BundleDownloadInstance datafileBundle(@PathParam("fileId") String fileId, @QueryParam("fileMetadataId") Long fileMetadataId,@QueryParam("gbrecs") boolean gbrecs, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) /*throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most api calls don't add the key as a QueryParam so I've removed them in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: we need to retire this "bundle access" method.
|
||
if (gbrecs != true && df.isReleased()){ | ||
// Write Guestbook record if not done previously and file is released | ||
User apiTokenUser = findAPITokenUser(apiToken); | ||
User apiTokenUser = findAPITokenUser(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calls findUserOrDie which will retrieve the key param or api token header, or the workflow token header.
// This will throw a ForbiddenException if access isn't authorized: | ||
checkAuthorization(df, apiToken); | ||
checkAuthorization(df); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments in this method.
@@ -859,15 +833,15 @@ public void write(OutputStream os) throws IOException, | |||
logger.fine("token: " + fileIdParams[i]); | |||
Long fileId = null; | |||
try { | |||
fileId = new Long(fileIdParams[i]); | |||
fileId = Long.parseLong(fileIdParams[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated - removed deprecated methods.
|
||
if (isAccessAuthorized(dataFile, getRequestApiKey())) { | ||
//Already have access | ||
if (isAccessAuthorized(dataFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment - my understanding is that in a request access call, if you already have access, this is a BAD_REQUEST
*/ | ||
|
||
if (session != null) { | ||
User apiTokenUser = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic used to handle 3 cases separately, mostly so that fine logging could be done to indicate which branch was being used: session - logged in user, guest user, and apitoken user. One tricky thing is that both session and apitoken calls could return a GuestUser and this wasn't broken out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how the code has been reorganized and simplified. And the logic seems correct. But could you please clarify the "this wasn't broken out" part above. Aside from the old code being convoluted and unwieldy, was there actually a problem with the old logic, was there a situation where the guest user returned by the apitoken call would overwrite an authenticated user from the session? - I've stared at the old code for a while now and I'm not seeing that, but figured I would ask to confirm.
(We may have assumed in the past that nobody would be calling the api with both a session and a token in real life).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was broken, so no real issue. I think my comment related to the old line 1877 and the code after it - there was no distinction between guest coming back from a session or from the apitoken logic. In refactoring, I had to make sure that both were still handled, e.g. in line 1790 below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at that code after line 1877 - it was just meaningless, wasn't it? I.e., it doesn't seem to be doing anything that the line 1861 hasn't already attempted.
It just looks like the old code may have been written with the assumption that session == null
when it's a direct API call - and it should never be null of course.
*/ | ||
|
||
if (session != null) { | ||
User apiTokenUser = null; | ||
//If we get a non-GuestUser from findUserOrDie, use it. Otherwise, check the session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic now looks for an apitoken authenticated user and uses it if it exists. If not, and a session user exists, we use that. If the apitoken method indicates a GuestUser, we will use that if there's no session.
//If we get a non-GuestUser from findUserOrDie, use it. Otherwise, check the session | ||
try { | ||
logger.fine("calling apiTokenUser = findUserOrDie()..."); | ||
apiTokenUser = findUserOrDie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the rearchitecting, I assume this call will end up handling both session and token logins and this part of the code will be able to be replaced. Until then, this is the only part of the api that was already designed to be usable with sessions or tokens.
//If we don't have a user, nothing more to do. (Note session could have returned GuestUser) | ||
if (user == null && apiTokenUser == null) { | ||
logger.warning("Unable to find a user via session or with a token."); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code had/has a check for both being null at the end, but there's no reason to proceed further if we have no user.
dvr = createDataverseRequest(apiTokenUser); | ||
} else { | ||
// used in JSF context, user may be Guest | ||
dvr = dvRequestService.getDataverseRequest(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplifying the code below by getting the appropriate request ready now. This means we can't use the permissionService.on() shortcut call, but it also means that session and token users are handled in the same single lines below.
// used in JSF context, user may be Guest | ||
dvr = dvRequestService.getDataverseRequest(); | ||
} | ||
if (!published) { // and restricted or embargoed (implied by earlier processing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since published and not restricted/embargoed is handled above, the main split now is whether it is published or not. If it's published, the only case left is with restricted/embargoed. With unpublished, both the restricted/embargoed and not restricted/embargoed both get handled the same way.
// last option - guest user in either contexts | ||
// Guset user is impled by the code above. | ||
if ( permissionService.requestOn(dvRequestService.getDataverseRequest(), df.getOwner()).has(Permission.ViewUnpublishedDataset) ) { | ||
if (permissionService.requestOn(dvr, df.getOwner()).has(Permission.ViewUnpublishedDataset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the changes above, this line now handles all three authenticated session user, token user, and guest cases.
} | ||
} else { | ||
logger.log(Level.FINE, "API token-based auth: User {0} is not authorized to access the datafile.", user.getIdentifier()); | ||
if (permissionService.requestOn(dvr, df).has(Permission.DownloadFile)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - all three cases. Also note in the old code I think the lines after #1955 were unreachable, repeating a case handled earlier.
try { | ||
logger.fine("calling apiTokenUser = findUserOrDie()..."); | ||
apiTokenUser = findUserOrDie(); | ||
if(apiTokenUser instanceof GuestUser) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted, the idea here is to not let a guest user returned from findUserOrDie (which happens when there is no key/token, and which we want if there's no session) from overriding an authenticated session user. This could probably be cleaned up further, but I'm again assuming the rearchitecture work will end up doing that in the AbstractApiBean soon. (Hopefully the rest of the cleanup here, in addition to solving the workflow token issue, will make it easier for this class to use that new method.)
…flow_tokens_in_access_api
…flow_tokens_in_access_api
added to sprint Dec 15, 2022 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to see this logic streamlined and simplified.
I asked a couple of simple questions, and there's a trivial typo, but I'm generally ready to move it into qa.
I appreciate the comments explaining what's going on. My only question is, is there a reason not to incorporate at least some of them into the code?
QA is left: |
What this PR does / why we need it: This PR removes some of the custom permissions checking in the Access class which had been stopping workflow token access to these endpoints.
Which issue(s) this PR closes:
Closes #9126
Special notes for your reviewer: I've tried to comment some in the PR - the logic here was pretty convoluted and I think had some dead code (which I noted back when embargoes went in). Hopefully cleaner now but would be good to have someone check my assumptions about the old code as noted in the comments.
Suggestions on how to test this: New functionality - easiest to have DANS test it - you need to run an async workflow and use the workflow token to access the download file endpoints to test it. Regression - this could have but shouldn't break any access by Guest/PrivateUrl/Authenticated Users with appropriate permissions (DownloadFile for published restricted or embargoed files, ViewUnpublishedDataset for unpublished files) to the main file, aux files, etc.
[from Leonid:] For regression, I just want to second the above, about retesting extra carefully the less trivial access cases. In addition to the ones listed above, maybe attempting a zip download, via the api, of a number of files only some of which the user is authorized to access.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: as a workflow related bug fix, probably not?
Additional documentation: