Skip to content

Fundamental source code structure for checking resources external to Reactome#1

Open
jweiser wants to merge 30 commits intomasterfrom
develop
Open

Fundamental source code structure for checking resources external to Reactome#1
jweiser wants to merge 30 commits intomasterfrom
develop

Conversation

@jweiser
Copy link
Member

@jweiser jweiser commented Aug 18, 2020

The structure of the code is such that the ResourceParser class will take in a CSV or JSON file with records that describe properties for each resource. A list of Resource objects will be returned be returned after parsing. The Resource objects will be iterated over and the proper type of ResourceChecker will be created using the ResourceCheckerFactory. The ResourceChecker will do its work ensuring the resource exists and is valid and a report will be produced.

The code is still a work in progress right now (some aspects of Main and specific types of ResourceChecker classes are still being written), but it is far enough along for a review and merging the existing source.

jweiser and others added 29 commits June 16, 2020 11:35
@jweiser jweiser added the enhancement New feature or request label Aug 18, 2020
@@ -0,0 +1,1030 @@
[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a JSON file? Didn't I just see the same content in a CSV file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, initially I had the resource information in a Google Sheet and was building the parser based on CSV, but I realized JSON was a better way to go. The application supports both right now, but I may move it to entirely JSON.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #6

FTPClient ftpClient = new FTPClient();

ftpClient.connect(getFtpServer());
ftpClient.enterLocalPassiveMode();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need an option to toggle this (passive/active mode) for specific resources. It's been a problem in the past, hasn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I'll add it in if any of the resources accessed by FTP won't work without active mode during testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #5

Comment on lines +49 to +53
<!--
integration tests run by default when running the command "mvn verify"; can be overridden using the option
"-DskipITs=true"
-->
<skipITs>false</skipITs>
Copy link

@cookersjs cookersjs Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like they can also just change this <skipITs> value as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true but changing the default value in the file changes the configuration in a more permanent way. I set it to skipping ITs as false, because I think we would like to run integration tests by default, but the command-line option gives a convenient way to override that value on a one-time basis without affecting the default behaviour of running all tests (integration and unit tests).

Comment on lines +24 to +25


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :). I'll remove it. The Main class is under development and still messy :).

return resourceInfoJson;
}

private void checkMandatoryAttributesExist(String originalRecord) throws IllegalArgumentException {
Copy link

@cookersjs cookersjs Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you said this is a WIP so will probably plan to do this anyways, but for this method I'm a little unsure how it accomplishes what it does. I get the gist of it from the name of the method, but putting all these values into the Map and such is something I don't quite understand -- could comments be added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the how or the why that's unclear?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say both, but more of the 'how'.

Copy link
Member Author

@jweiser jweiser Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map holds the attributes of the resource that should be included in the JSON that describes the resource. The key for the map is the name of the attribute that should be there and the value is the call to the appropriate "getter" method which should be able to provide that value since the record is now stored in the Resource object after being constructed.

If any of the values in the map give a null or empty value (checked for in the for-loop iterating over the entries in the map), it means something is wrong because a value that should have been available after the construction of the Resource object is not available. This triggers an IllegalStateException because the Resource object is not in the expected state of having values for fundamental information about the resource it represents. This is either because the JSON provided didn't specify that information or something went wrong in providing that information during the construction of the object. Either way, it's an illegal state and should cause an exception to check the problem with the resource info and/or code.

Comment on lines +106 to +112
// public String getHeaderAsTSVString() {
// return String.join("\t", getHeaderNameToValueMap().keySet());
// }
//
// public String getValuesAsTSVString() {
// return String.join("\t", getHeaderNameToValueMap().values());
// }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP?

Copy link
Member Author

@jweiser jweiser Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually this was when the output was going to be in CSV format. Good catch - I'll remove it.

*/
default boolean isFileSizeAcceptable(long previousFileSize, double acceptablePercentageDrop) {
long differenceInFileSize = getFileSize() - previousFileSize;
double percentChangeInFileSize = differenceInFileSize * 100.0d / previousFileSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it ever happen where a resource is suddenly much larger than it used to be? I think that is covered by this code but I'm not sure... Maybe change the name of acceptablePercentageDrop to acceptablePercentageChange?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A file increase of any amount is currently acceptable, but you're right - I want to implement something to alert us to a suspicious jump in size. I'm just thinking of what the threshold would/should be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #2

* @see #isFileSizeAcceptable(long, double)
*/
default boolean isFileSizeAcceptable(long previousFileSize) {
final double acceptableFileSizePercentageDrop = 5.0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know for older release files we had the heuristic of a 10% change in file sizes being deemed suspicious. Any reason for the 5% threshold now? Also, should this value be configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought of 10% initially, but I wanted to err on the side of caution a bit more so I went (somewhat arbitrarily) with 5% with the caveat that we can change it as we get experience with the feedback we get with the automated checking.

For configuration, this method is overloaded allowing you to pass in a second parameter of the percentage drop threshold that will trigger the check to fail.

Comment on lines +46 to +48
// Path exampleFile = Paths.get(".", "example.html");
// Files.deleteIfExists(exampleFile);
// Files.write(exampleFile, page.getBytes(), StandardOpenOption.CREATE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removeable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup :)

Comment on lines +124 to +127
while (fileSize > BYTE_UNIT_CONVERSION_FACTOR) {
magnitudeOfByteConversionFactor += 1;
fileSize = fileSize / BYTE_UNIT_CONVERSION_FACTOR;
}
Copy link
Collaborator

@SolomonShorser-OICR SolomonShorser-OICR Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the number of possible magnitudes is fixed (and relatively small), a loop is not really necessary here. A few if-else statements would work just as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I'll take a look at reworking this.

@cookersjs
Copy link

Regarding the external resource json/csv files -- is there a plan for keeping everything in sync? Is the envisioned usage to be to use this new module to confirm everything will work for release, and for cases where it flags that it couldn't find the file then we would be notified and need to change it this modules files as well as the affected release step?

@jweiser
Copy link
Member Author

jweiser commented Aug 20, 2020

Regarding the external resource json/csv files -- is there a plan for keeping everything in sync? Is the envisioned usage to be to use this new module to confirm everything will work for release, and for cases where it flags that it couldn't find the file then we would be notified and need to change it this modules files as well as the affected release step?

The CSV file I think is going to be phased out and replaced with a JSON file. I'm still giving some thought about the best way to keep the reference to the release process up to date in a straightforward way.

For the usage, yes, this project is designed to check things outside of Reactome needed/used by the release process and give us warning that something is missing or incorrect and we need to contact the maintainer of the resource and/or change something on our side.

Copy link
Collaborator

@SolomonShorser-OICR SolomonShorser-OICR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few outstanding issues that are not yet resolved:

  • CSV/JSON input file - please commit to a format.
  • Files from resources being larger than expected.
  • Filesize chages from resources not being captured in the input file that should track this.

If they are not resolved within this PR, that's OK but in that case I'd like to see separate issues opened to track them, before approving this PR, just so these issues don't get lost.

@jweiser
Copy link
Member Author

jweiser commented Aug 20, 2020

There are a few outstanding issues that are not yet resolved:

* CSV/JSON input file - please commit to a format.

* Files from resources being larger than expected.

* Filesize chages from resources not being captured in the input file that should track this.

If they are not resolved within this PR, that's OK but in that case I'd like to see separate issues opened to track them, before approving this PR, just so these issues don't get lost.

Fair enough. I don't want this PR to get too large, so I'll create issues for them and reference them from this PR before it's closed.

@jweiser
Copy link
Member Author

jweiser commented Aug 26, 2020

There are a few outstanding issues that are not yet resolved:

* CSV/JSON input file - please commit to a format.

* Files from resources being larger than expected.

* Filesize chages from resources not being captured in the input file that should track this.

If they are not resolved within this PR, that's OK but in that case I'd like to see separate issues opened to track them, before approving this PR, just so these issues don't get lost.

Issues for these have been opened:

#6: CSV/JSON input file - please commit to a format.
#2: Files from resources being larger than expected.
#7: Filesize changes from resources not being captured in the input file that should track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants