Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/9126- Fix workflow token access #9129

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Nov 3, 2022

What this PR does / why we need it: This PR removes some of the custom permissions checking in the Access class which had been stopping workflow token access to these endpoints.

Which issue(s) this PR closes:

Closes #9126

Special notes for your reviewer: I've tried to comment some in the PR - the logic here was pretty convoluted and I think had some dead code (which I noted back when embargoes went in). Hopefully cleaner now but would be good to have someone check my assumptions about the old code as noted in the comments.

Suggestions on how to test this: New functionality - easiest to have DANS test it - you need to run an async workflow and use the workflow token to access the download file endpoints to test it. Regression - this could have but shouldn't break any access by Guest/PrivateUrl/Authenticated Users with appropriate permissions (DownloadFile for published restricted or embargoed files, ViewUnpublishedDataset for unpublished files) to the main file, aux files, etc.
[from Leonid:] For regression, I just want to second the above, about retesting extra carefully the less trivial access cases. In addition to the ones listed above, maybe attempting a zip download, via the api, of a number of files only some of which the user is authorized to access.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: as a workflow related bug fix, probably not?

Additional documentation:

@qqmyers qqmyers added the GDCC: DANS related to GDCC work for DANS label Nov 3, 2022
//@EJB

// TODO:
// versions? -- L.A. 4.0 beta 10
@Path("datafile/bundle/{fileId}")
@GET
@Produces({"application/zip"})
public BundleDownloadInstance datafileBundle(@PathParam("fileId") String fileId, @QueryParam("fileMetadataId") Long fileMetadataId,@QueryParam("gbrecs") boolean gbrecs, @QueryParam("key") String apiToken, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) /*throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ {
public BundleDownloadInstance datafileBundle(@PathParam("fileId") String fileId, @QueryParam("fileMetadataId") Long fileMetadataId,@QueryParam("gbrecs") boolean gbrecs, @Context UriInfo uriInfo, @Context HttpHeaders headers, @Context HttpServletResponse response) /*throws NotFoundException, ServiceUnavailableException, PermissionDeniedException, AuthorizationRequiredException*/ {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most api calls don't add the key as a QueryParam so I've removed them in this class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: we need to retire this "bundle access" method.


if (gbrecs != true && df.isReleased()){
// Write Guestbook record if not done previously and file is released
User apiTokenUser = findAPITokenUser(apiToken);
User apiTokenUser = findAPITokenUser();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls findUserOrDie which will retrieve the key param or api token header, or the workflow token header.

// This will throw a ForbiddenException if access isn't authorized:
checkAuthorization(df, apiToken);
checkAuthorization(df);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments in this method.

@coveralls
Copy link

coveralls commented Nov 3, 2022

Coverage Status

Coverage increased (+0.06%) to 20.054% when pulling 49ab161 on GlobalDataverseCommunityConsortium:IQSS/9126-allow_workflow_tokens_in_access_api into 6c87b39 on IQSS:develop.

@@ -859,15 +833,15 @@ public void write(OutputStream os) throws IOException,
logger.fine("token: " + fileIdParams[i]);
Long fileId = null;
try {
fileId = new Long(fileIdParams[i]);
fileId = Long.parseLong(fileIdParams[i]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated - removed deprecated methods.


if (isAccessAuthorized(dataFile, getRequestApiKey())) {
//Already have access
if (isAccessAuthorized(dataFile)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment - my understanding is that in a request access call, if you already have access, this is a BAD_REQUEST

*/

if (session != null) {
User apiTokenUser = null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic used to handle 3 cases separately, mostly so that fine logging could be done to indicate which branch was being used: session - logged in user, guest user, and apitoken user. One tricky thing is that both session and apitoken calls could return a GuestUser and this wasn't broken out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the code has been reorganized and simplified. And the logic seems correct. But could you please clarify the "this wasn't broken out" part above. Aside from the old code being convoluted and unwieldy, was there actually a problem with the old logic, was there a situation where the guest user returned by the apitoken call would overwrite an authenticated user from the session? - I've stared at the old code for a while now and I'm not seeing that, but figured I would ask to confirm.

(We may have assumed in the past that nobody would be calling the api with both a session and a token in real life).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was broken, so no real issue. I think my comment related to the old line 1877 and the code after it - there was no distinction between guest coming back from a session or from the apitoken logic. In refactoring, I had to make sure that both were still handled, e.g. in line 1790 below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at that code after line 1877 - it was just meaningless, wasn't it? I.e., it doesn't seem to be doing anything that the line 1861 hasn't already attempted.
It just looks like the old code may have been written with the assumption that session == null when it's a direct API call - and it should never be null of course.

*/

if (session != null) {
User apiTokenUser = null;
//If we get a non-GuestUser from findUserOrDie, use it. Otherwise, check the session
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic now looks for an apitoken authenticated user and uses it if it exists. If not, and a session user exists, we use that. If the apitoken method indicates a GuestUser, we will use that if there's no session.

//If we get a non-GuestUser from findUserOrDie, use it. Otherwise, check the session
try {
logger.fine("calling apiTokenUser = findUserOrDie()...");
apiTokenUser = findUserOrDie();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the rearchitecting, I assume this call will end up handling both session and token logins and this part of the code will be able to be replaced. Until then, this is the only part of the api that was already designed to be usable with sessions or tokens.

//If we don't have a user, nothing more to do. (Note session could have returned GuestUser)
if (user == null && apiTokenUser == null) {
logger.warning("Unable to find a user via session or with a token.");
return false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code had/has a check for both being null at the end, but there's no reason to proceed further if we have no user.

dvr = createDataverseRequest(apiTokenUser);
} else {
// used in JSF context, user may be Guest
dvr = dvRequestService.getDataverseRequest();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying the code below by getting the appropriate request ready now. This means we can't use the permissionService.on() shortcut call, but it also means that session and token users are handled in the same single lines below.

// used in JSF context, user may be Guest
dvr = dvRequestService.getDataverseRequest();
}
if (!published) { // and restricted or embargoed (implied by earlier processing)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since published and not restricted/embargoed is handled above, the main split now is whether it is published or not. If it's published, the only case left is with restricted/embargoed. With unpublished, both the restricted/embargoed and not restricted/embargoed both get handled the same way.

// last option - guest user in either contexts
// Guset user is impled by the code above.
if ( permissionService.requestOn(dvRequestService.getDataverseRequest(), df.getOwner()).has(Permission.ViewUnpublishedDataset) ) {
if (permissionService.requestOn(dvr, df.getOwner()).has(Permission.ViewUnpublishedDataset)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the changes above, this line now handles all three authenticated session user, token user, and guest cases.

}
} else {
logger.log(Level.FINE, "API token-based auth: User {0} is not authorized to access the datafile.", user.getIdentifier());
if (permissionService.requestOn(dvr, df).has(Permission.DownloadFile)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - all three cases. Also note in the old code I think the lines after #1955 were unreachable, repeating a case handled earlier.

try {
logger.fine("calling apiTokenUser = findUserOrDie()...");
apiTokenUser = findUserOrDie();
if(apiTokenUser instanceof GuestUser) {
Copy link
Member Author

@qqmyers qqmyers Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, the idea here is to not let a guest user returned from findUserOrDie (which happens when there is no key/token, and which we want if there's no session) from overriding an authenticated session user. This could probably be cleaned up further, but I'm again assuming the rearchitecture work will end up doing that in the AbstractApiBean soon. (Hopefully the rest of the cleanup here, in addition to solving the workflow token issue, will make it easier for this class to use that new method.)

@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Dec 13, 2022
@mreekie
Copy link

mreekie commented Dec 14, 2022

added to sprint Dec 15, 2022

@landreev landreev self-assigned this Dec 19, 2022
@landreev landreev self-requested a review December 19, 2022 23:14
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see this logic streamlined and simplified.
I asked a couple of simple questions, and there's a trivial typo, but I'm generally ready to move it into qa.
I appreciate the comments explaining what's going on. My only question is, is there a reason not to incorporate at least some of them into the code?

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Review 🔎 to QA ✅ Dec 22, 2022
@kcondon kcondon assigned kcondon and unassigned landreev Jan 4, 2023
@mreekie
Copy link

mreekie commented Jan 11, 2023

QA is left:
Size for this sprint: 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GDCC: DANS related to GDCC work for DANS Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Some file access methods ignore workflow tokens
6 participants