-
Notifications
You must be signed in to change notification settings - Fork 7
File watching for media directory #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This reverts commit ccd83b2.
Squashed commits: [dffc707] Separate add file to cache behavior into a method [31bd002] Support watched nested folders [c7a4f6e] Separate watcher service from server media service. [72983e6] Keep cache updated by file watcher for single base directory [c30a13d] Clean up error handling and thread usage. [950ad40] WIP. [0e3f533] Remove backend references to useCache (+1 squashed commit) [57129b9] Add watcher service functionality
1e4d8b9 to
64cc99e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 10 files reviewed, 1 unresolved discussion
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 189 at r1 (raw file):
// update cache context.setAttribute(attributeName, node);
Brian suggested avoiding using the application context, and just storing the cache in a hashmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 11 files reviewed, 11 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 99 at r2 (raw file):
//verify the desired path Path desiredPath = Paths.get(desiredpath); if (!desiredPath.toFile().exists() || !desiredPath.toAbsolutePath().startsWith(remoteMediaDirectory)) {
desiredPath.toFile().exists() should be Files.exists(desiredPath)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 203 at r2 (raw file):
//verify the desired path Path desiredPath = Paths.get(desiredPathParam).toAbsolutePath(); if(!desiredPath.toFile().exists() || !desiredPath.startsWith(remoteMediaDirectory)) {
desiredPath.toFile().exists() should be Files.exists(desiredPath)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 269 at r2 (raw file):
Path dir = Paths.get(serverpath).toAbsolutePath(); if(dir.startsWith(uploadPath)){ if(!dir.toFile().exists()){
dir.toFile().exists() should be Files.exists(dir)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 270 at r2 (raw file):
if(dir.startsWith(uploadPath)){ if(!dir.toFile().exists()){ if(dir.toFile().mkdir()){
dir.toFile().mkdir() should be Files.createDirectory(dir)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 144 at r2 (raw file):
public ServerMediaListing getAllFiles(HttpServletRequest request, @RequestParam(required = true) String fullPath) { Path dir = Paths.get(fullPath); if(!dir.toFile().isDirectory() || !dir.toAbsolutePath().startsWith(propertiesUtil.getServerMediaTreeRoot()))
dir.toFile().isDirectory() should be Files.isDirectory(dir)
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/WfmStartup.java, line 143 at r2 (raw file):
private void startFileIndexing(ApplicationContext appContext) { if (appContext instanceof WebApplicationContext) { FileWatcherService fileWatcherService = new FileWatcherServiceImpl(propertiesUtil, ioUtils);
This should be Autowired in.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/WfmStartup.java, line 146 at r2 (raw file):
// Load existing files on startup since we won't see their creation event ThreadUtil.runAsync( () -> fileWatcherService.launchWatcher(propertiesUtil.getServerMediaTreeRoot()));
You should build the initial cache on the thread created in launchWatcher, then you can just call fileWatcherService.launchWatcher here without the ThreadUtil.runAsync
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 73 at r2 (raw file):
private final HashMap<WatchKey, Path> watcherMap = new HashMap<>(); private WatchService watcher;
watcherMap and watcher should be local to watcherThreadService
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 87 at r2 (raw file):
} public HashMap<Path, ServerMediaListing> getFileCache() {
You shouldn't return the internal HashMap here. This should be something like:
public ServerMediaListsing getFromCache(Path path) {
return fileCache.get(path)
}
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 110 at r2 (raw file):
thread.start(); watcherInstantiated = true; buildInitialCache(propertiesUtil.getServerMediaTreeRoot());
Please call buildInitialCache(propertiesUtil.getServerMediaTreeRoot()); on the newly created thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 32 files reviewed, 5 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 97 at r4 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Remove this semi-colon.
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 274 at r4 (raw file):
Previously, jrobble (Jeff Robble) wrote…
This can throw more than just an IOException:
UnsupportedOperationException - if the array contains an attribute that cannot be set atomically when creating the directory FileAlreadyExistsException - if a directory could not otherwise be created because a file of that name already exists (optional specific exception) IOException - if an I/O error occurs or the parent directory does not exist SecurityException - In the case of the default provider, and a security manager is installed, the checkWrite method is invoked to check write access to the new directory.Just catch Exception here.
FileAlreadyExistsException inherits from IOException so it's already being handled (we also check is the directory exists right before that). UnsupportedOperationException and SecurityException are RuntimeExceptions and do not necessarily need to be handle (but I can see why we would, since we get a cleaner error message). I can add UnsupportedOperationException and SecurityException to the catch, but I'm hesitant to put a generic catch(Exception) since that's generally considered bad practice. There are some RuntimeExceptions such as InterruptedException that you may not want to swallow.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 278 at r4 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Lowercase "create folder".
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 281 at r4 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Lowercase "exists".
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/WfmStartup.java, line 143 at r2 (raw file):
Previously, brosenberg42 wrote…
My comment is on the wrong line since you update this file. I was saying that you should Autowire FileWatcherService. You should not Autowire in WebApplicationContext nor ApplicationContext.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 10 files at r1, 3 of 10 files at r2, 6 of 7 files at r4, 2 of 2 files at r5.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @bquinn17 and @brosenberg42)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/MediaController.java, line 274 at r4 (raw file):
There are some RuntimeExceptions such as InterruptedException that you may not want to swallow.
Good point. I forgot about that.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 130 at r5 (raw file):
@ResponseBody public DirectoryTreeNode getAllDirectories(HttpServletRequest request, @RequestParam(required = false) Boolean useUploadRoot) {
Let's remove support for the useUploadRoot parameter. The only reason that exists is so that when first initializing the media tree in JS, the entire tree is initialized (what used to be web.server.media.tree.base=${env:MPF_HOME}/share) not just what's in the remote-media dir.
Now that the root directory tree now points to the remote-media dir:
rootDirectoryTreeCache.fillDirectoryTree(rootDirectoryTreeCache, new ArrayList<>(),
propertiesUtil.getServerMediaTreeRoot());
web.server.media.tree.base=${env:MPF_HOME}/share/remote-media
Currently, I believe that both the useUploadRoot=true and useUploadRoot=false behaviors do the same thing.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 148 at r5 (raw file):
return null; // security check log.info("All files requested in: " + fullPath);
Make this a debug line.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 160 at r5 (raw file):
@RequestMapping(value = {"/server/get-all-files-filtered"}, method = RequestMethod.POST) @ResponseBody public ResponseEntity<ServerMediaFilteredListing> getAllFilesFiltered(HttpServletRequest request,
If you're going to make this return a ResponseEntity, you should do the same for getAllFiles() and getAllDirectories(). getAllFiles() returns null when the security check fails. It should return a BAD_REQUEST, like this method.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 169 at r5 (raw file):
File dir = new File(fullPath); if (!dir.exists()) {
Please change this to:
if (!Paths.get(fullPath).toAbsolutePath().startsWith(propertiesUtil.getServerMediaTreeRoot())) {
// security check
return new ResponseEntity<>(HttpStatus.BAD_REQUEST);
}
File dir = new File(fullPath);
if (!dir.exists()) {
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}
if (!dir.isDirectory()) {
return new ResponseEntity<>(HttpStatus.BAD_REQUEST);
}
Putting on my tinfoil hat, assuming this endpoint got exposed, using the code in your current PR, an adversary could use this endpoint to learn what dirs exist on the system outside of the media tree root by determining which responses return NOT_FOUND and which return BAD_REQUEST.
Update the getFilesFilteredInvalidDirectory test accordingly.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/model/DirectoryTreeNode.java, line 101 at r5 (raw file):
if (seenNodes.contains(realChildNode)) { log.warn("Omitting duplicate symbolically linked directory in this branch: " + child.toAbsolutePath() + " --> " + realPath); continue; // prevent symlink loop
Note to self: Test this.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherService.java, line 33 at r5 (raw file):
import java.nio.file.Path; import java.util.HashMap;
HashMap import is not used here.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 149 at r5 (raw file):
buildInitialCache(propertiesUtil.getServerMediaTreeRoot()); log.info("Watcher task started");
Say "File watcher task started".
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 168 at r5 (raw file):
Path child = dir.resolve(eventPath); if (event.kind() == ENTRY_CREATE) { File newFile = new File(child.toAbsolutePath().toString());
Should this be newItem for consistency with removedItem?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 199 at r5 (raw file):
log.debug("Building initial cache"); DirectoryTreeNode node = this.rootDirectoryTreeCache;
Note to self: What if root is a symlink?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 204 at r5 (raw file):
File dir = new File(node.getFullPath()); // attempt to get attribute from application scope
What attribute?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 233 at r5 (raw file):
for (File tmpFile : tmpFiles) { try { files.add(tmpFile.toPath().toRealPath().toFile()); // resolve symbolic links
Note to self: Check for circular symlinks?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 285 at r5 (raw file):
if (item == null) return; // attempt to get attribute from application scope
What attribute?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/ServerMediaServiceImpl.java, line 66 at r5 (raw file):
mediaFiles.addAll(cachedFileNames.getData()); } else { log.error("Media file cache not found");
What if dir is not cached yet? Race condition?
trunk/workflow-manager/src/main/webapp/resources/mytheme/js/controllers/ServerMediaCtrl.js, line 337 at r5 (raw file):
}, error: function () { alert("The selected directory was deleted");
Note to self: Test this.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 136 at r5 (raw file):
@Test public void getDirectoriesAfterDelete() throws IOException, InterruptedException {
Call this getAllDirectoriesAfterDeletion for consistency.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 155 at r5 (raw file):
@Test public void getDirectoriesAfterEvents() throws IOException, InterruptedException {
Call this getAllDirectoriesAfterEvents for consistency.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 199 at r5 (raw file):
@Test public void getAllFilesNestedFolder() throws IOException, InterruptedException {
Call this getAllFilesNestedFolderAfterCreation to complement getAllFilesAfterDeletion.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 218 at r5 (raw file):
Thread.sleep(100); ServerMediaListing mediaListing = _controller.getAllFiles(_mockRequest, subFolder.getAbsolutePath());
Assert the output of getAllFiles before creating the new files.
|
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 199 at r5 (raw file): Previously, jrobble (Jeff Robble) wrote…
The content of symlinked dirs is displayed in the Create Job page on startup (root dir or a subdir), but when I upload a file to a symlinked dir it's never displayed in the web UI. The file is actually uploaded though, as I can see it in the file browser. Refreshing the Create Job page does nothing. It's not until I restart the WFM that the file is displayed. Adding a new dir to the symlinked dir result in similar behavior: The web UI is never updated, but I can see the dir in a file browser. Also, when I delete a file in the symlinked dir it still appears in the UI even after refreshing the UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @bquinn17 and @brosenberg42)
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 228 at r5 (raw file):
tmpFiles = new File[]{item}; }
Add the following here:
if (tmpFiles == null) {
return;
}
for (File tmpFile : tmpFiles) { will result in a NullPointerException if the WFM doesn't have the right permissions to accessitem. I ran into this when item was a dir owned by root.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/ServerMediaServiceImpl.java, line 66 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
What if dir is not cached yet? Race condition?
When the WFM starts up, a user cannot access the web app via the browser until after the buildInitialCache call. I'm wondering how long that call will take with hundreds of thousands of files, and how much space the cache will occupy. I guess that's a moot point though, since users don't have to upload anything to the remote-media directory if they are using the REST API.
trunk/workflow-manager/src/main/webapp/resources/mytheme/js/controllers/ServerMediaCtrl.js, line 337 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Note to self: Test this.
In the Create Job UI, removing a dir on the command line, and then clicking the checkbox next to it in the UI will result in a "Loading" dialog that never goes away.
Looking at the webconsole in FireFox, I see:
Error: arr is undefined
addFilesToSubmit@http://localhost:8080/workflow-manager/resources/js/controllers/ServerMediaCtrl.js:677:33
trunk/workflow-manager/src/main/webapp/resources/mytheme/js/controllers/ServerMediaCtrl.js, line 337 at r5 (raw file):
}, error: function () { alert("The selected directory was deleted");
End in period: "The selected directory was deleted."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 13 files reviewed, 19 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 130 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Let's remove support for the
useUploadRootparameter. The only reason that exists is so that when first initializing the media tree in JS, the entire tree is initialized (what used to beweb.server.media.tree.base=${env:MPF_HOME}/share) not just what's in theremote-mediadir.Now that the root directory tree now points to the
remote-mediadir:rootDirectoryTreeCache.fillDirectoryTree(rootDirectoryTreeCache, new ArrayList<>(), propertiesUtil.getServerMediaTreeRoot());web.server.media.tree.base=${env:MPF_HOME}/share/remote-mediaCurrently, I believe that both the
useUploadRoot=trueanduseUploadRoot=falsebehaviors do the same thing.
Good point. Upon looking closer, they both use propertiesUtil.getServerMediaTreeRoot()
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 148 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Make this a debug line.
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 160 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
If you're going to make this return a ResponseEntity, you should do the same for
getAllFiles()andgetAllDirectories().getAllFiles()returns null when the security check fails. It should return a BAD_REQUEST, like this method.
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/controller/ServerMediaController.java, line 169 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Please change this to:
if (!Paths.get(fullPath).toAbsolutePath().startsWith(propertiesUtil.getServerMediaTreeRoot())) { // security check return new ResponseEntity<>(HttpStatus.BAD_REQUEST); } File dir = new File(fullPath); if (!dir.exists()) { return new ResponseEntity<>(HttpStatus.NOT_FOUND); } if (!dir.isDirectory()) { return new ResponseEntity<>(HttpStatus.BAD_REQUEST); }Putting on my tinfoil hat, assuming this endpoint got exposed, using the code in your current PR, an adversary could use this endpoint to learn what dirs exist on the system outside of the media tree root by determining which responses return NOT_FOUND and which return BAD_REQUEST.
Update the
getFilesFilteredInvalidDirectorytest accordingly.
That makes sense. I was thinking that doing .startsWith() on a path that doesn't exists would cause an error, but it looks like it's fine.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherService.java, line 33 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
HashMap import is not used here.
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 149 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Say "File watcher task started".
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 168 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Should this be
newItemfor consistency withremovedItem?
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 199 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
The content of symlinked dirs is displayed in the Create Job page on startup (root dir or a subdir), but when I upload a file to a symlinked dir it's never displayed in the web UI. The file is actually uploaded though, as I can see it in the file browser. Refreshing the Create Job page does nothing. It's not until I restart the WFM that the file is displayed.
Adding a new dir to the symlinked dir result in similar behavior: The web UI is never updated, but I can see the dir in a file browser.
Also, when I delete a file in the symlinked dir it still appears in the UI even after refreshing the UI.
So the reason you're seeing that behavior is because the symlinked directories are not being registered with the WatchService. They're not being registered because Files.walkFileTree() (under walkAndRegisterDirectories) does not follow symlinks by default. I've updated that method call to make it follow symlinks.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 204 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
What attribute?
My bad, these comments we're left over from the old behavior
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 228 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Add the following here:
if (tmpFiles == null) { return; }
for (File tmpFile : tmpFiles) {will result in a NullPointerException if the WFM doesn't have the right permissions to accessitem. I ran into this whenitemwas a dir owned by root.
Done.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 233 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Note to self: Check for circular symlinks?
This might also be a concern on the file visitor, now that it can follow symlinks. Not sure if the visitor handles that implicitly.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 285 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
What attribute?
My bad, these comments we're left over from the old behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 13 files reviewed, 19 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/ServerMediaServiceImpl.java, line 66 at r5 (raw file):
What if dir is not cached yet? Race condition?
That's kind of the reason that I put the error message there. The user will just see an empty list of files on the UI until the files in that folder are cached, and a message will be logged when that happens. The watcher service has been very fast to update in my testing (sub millisecond, longer than the request takes on the front end), but that could definitely change when scaling up thousands of files and dirs.
a user cannot access the web app via the browser until after the
buildInitialCachecall.
This call executes on a background thread. I don't see any reason why this would prevent the user from accessing the web app.
trunk/workflow-manager/src/main/webapp/resources/mytheme/js/controllers/ServerMediaCtrl.js, line 337 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
End in period: "The selected directory was deleted."
Done.
trunk/workflow-manager/src/main/webapp/resources/mytheme/js/controllers/ServerMediaCtrl.js, line 337 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
In the Create Job UI, removing a dir on the command line, and then clicking the checkbox next to it in the UI will result in a "Loading" dialog that never goes away.
Looking at the webconsole in FireFox, I see:
Error: arr is undefined addFilesToSubmit@http://localhost:8080/workflow-manager/resources/js/controllers/ServerMediaCtrl.js:677:33
Fixed bug in the getAllFiles method of the MediaService and fixed ServerMediaController.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 136 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Call this
getAllDirectoriesAfterDeletionfor consistency.
Done.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 155 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Call this
getAllDirectoriesAfterEventsfor consistency.
Done.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 199 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Call this
getAllFilesNestedFolderAfterCreationto complementgetAllFilesAfterDeletion.
Done.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 218 at r5 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Assert the output of
getAllFilesbefore creating the new files.
I added an assert for the subdirectory being empty, is that what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 7 files at r6, 3 of 4 files at r7.
Reviewable status: 12 of 13 files reviewed, 4 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 233 at r5 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
This might also be a concern on the file visitor, now that it can follow symlinks. Not sure if the visitor handles that implicitly.
Note to self: Test this.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 116 at r6 (raw file):
} }; Files.walkFileTree(start, Set.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE, visitor);
Note to self: Test this.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/ServerMediaServiceImpl.java, line 66 at r5 (raw file):
This call executes on a background thread. I don't see any reason why this would prevent the user from accessing the web app.
While debugging I put a break point in that call. Turns out, the break point suspended the entire app, not just the thread. Thus, I was artificially introducing the blocking behavior.
When I switched the behavior of the breakpoint to suspend the thread, I did see that the web UI continued to load. I could see dirs in the file tree, but none of them contained any files. When I released the breakpoint, and clicked on the dir in the UI again, the files were shown.
I believe that the old code would show "Loading. Please wait..." instead of the file tree until the files were initially indexed. I'm wondering if we should do the same until the cache is built.
trunk/workflow-manager/src/test/java/org/mitre/mpf/mvc/controller/TestServerMediaController.java, line 218 at r5 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
I added an assert for the subdirectory being empty, is that what you meant?
Yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 12 files reviewed, 15 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/model/DirectoryTreeNode.java, line 71 at r9 (raw file):
} public DirectoryTreeNode fillDirectoryTree(DirectoryTreeNode node, List<DirectoryTreeNode> seenNodes, String uploadDir) {
I think this method should be made static, since it doesn't use any instance variables from the class.
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/model/DirectoryTreeNode.java, line 97 at r9 (raw file):
DirectoryTreeNode realChildNode = new DirectoryTreeNode(realPath.toFile()); // resolve symbolic link to real path if (seenNodes.contains(realChildNode)) {
seenNodes could be changed to a HashSet for better efficiency (O(1) lookups), since all we are doing is adding to it and checking if a node is in it. It's better for readability if we mark it as static.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 43 at r9 (raw file):
import java.io.IOException; import java.io.UncheckedIOException; import java.nio.file.*;
I've been told that star imports are generally considered bad practice, since they can clutter than namespace with imported packages that you may not actually be using. It's better for maintainability to only import that classes that are used. More of an explanation can be found here. You may need to disable the automatic star imports in your editor.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 63 at r9 (raw file):
private DirectoryTreeNode rootDirectoryTreeCache; private Map<WatchKey, List<Path>> watcherMap = new HashMap<>();
Brian told me that we should avoid keeping instance variables of objects that can be contained within the worker thread. This way we can avoid any threading issues that may arise if the main thread tries to access the object.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 81 at r9 (raw file):
seenNodes.add(rootDirectoryTreeCache); rootDirectoryTreeCache.fillDirectoryTree(rootDirectoryTreeCache, seenNodes,
Should this be done on the background thread? If there are a lot a directories it may delay the WFM startup.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 85 at r9 (raw file):
} public synchronized DirectoryTreeNode getRootDirectoryTreeCache() {
I'm not sure I understand why the synchronized tags were added. Is it the concern that writes from the worker thread will interfere with reads from the main thread? If so, shouldn't getFileListByPath also be synchronized? Even in that case I'm not sure how a lock on the getter method will prevent that competition for the resource.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 111 at r9 (raw file):
return FileVisitResult.SKIP_SUBTREE; } addFilesToCacheNoRecurse(dir.toFile(), new ArrayList<>());
If this is here can we remove the buildInitialCache method?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 169 at r9 (raw file):
// Load existing files on startup since we won't see their creation event //buildInitialCache(propertiesUtil.getServerMediaTreeRoot());
Remove these lines if not needed
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 180 at r9 (raw file):
while ((key = watcher.take()) != null) { // blocks until watcher.take() returns an event List<Path> dirs = new ArrayList<>(watcherMap.get(key)); // copy to prevent concurrent access exception if (dirs == null) {
Check if dirs is empty instead.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 276 at r9 (raw file):
} if (realPath == null) {
I don't think toRealPath can return a null.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 306 at r9 (raw file):
walkAndRegisterDirectories(Paths.get(item.getAbsolutePath()), watcher); // addFilesToCacheNoRecurse(item, new ArrayList<>());
Remove if not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a work in progress. It's not ready to land. It still has debug lines and requires cleanup.
In general, I'm only pushing my changes for collaboration with @bquinn17. My first round of changes where necessary to:
- Get the new symlink tests to pass.
- Fix the checkbox behavior on the job creation UI.
Reviewable status: 6 of 13 files reviewed, 17 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/model/DirectoryTreeNode.java, line 71 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
I think this method should be made static, since it doesn't use any instance variables from the class.
Done
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/model/DirectoryTreeNode.java, line 97 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
seenNodes could be changed to a HashSet for better efficiency (O(1) lookups), since all we are doing is adding to it and checking if a node is in it. It's better for readability if we mark it as static.
Changed to hash set. Make the method static?
trunk/workflow-manager/src/main/java/org/mitre/mpf/mvc/model/DirectoryTreeNode.java, line 67 at r10 (raw file):
} /* TODO
Possibly combine this data structure with the watcherMap.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 319 at r8 (raw file):
if (deletedDirKey != null) { deletedDirKey.reset(); deletedDirKey.cancel();
Do we need to reset and cancel?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 43 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
I've been told that star imports are generally considered bad practice, since they can clutter than namespace with imported packages that you may not actually be using. It's better for maintainability to only import that classes that are used. More of an explanation can be found here. You may need to disable the automatic star imports in your editor.
I agree that wildcards can cause unintentional conflicts, and are not as explicit, but I do like removing clutter from the code. If we want to remove wildcards, we should do so throughout the codebase, which is a larger effort.
To your point though, we try to follow the Google style guides, which say not to use them: https://google.github.io/styleguide/javaguide.html#s3.3.1-wildcard-imports
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 63 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
Brian told me that we should avoid keeping instance variables of objects that can be contained within the worker thread. This way we can avoid any threading issues that may arise if the main thread tries to access the object.
That makes sense, and I may change it back; however, more generally, I'm thinking about if there is a better way to design this. Things I don't like:
-
We manually iterate over values in the watcherMap when a file or directory is removed from the file system. Can we improve the lookup?
-
It seems like there may be a way to combine watcherMap with the rootDirectoryTreeCache. By keeping them separate we essentially have two data structures keeping track of every directory.
-
We call updateRootDirectoryTreeCache() whenever something changes on the file system, which requires recreating the entire directory tree from scratch. That seems inefficient.
Also, a pre-existing issue, the DirectoryTreeNode.find() can probably be improved by storing the nodes in a map as opposed to a list, so then child nodes can be looked up by path segment. Currently, the find() behavior looks at the full path for every node, so it's a brute force search as opposed to a directed one.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 81 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
Should this be done on the background thread? If there are a lot a directories it may delay the WFM startup.
Most of the time updateRootDirectoryTreeCache() is called from the watcher thread, so that does not affect WFM startup.
The exception is when it's called in the FileWatcherServiceImpl() constructor, which may delay startup (need to test this). We might be able to remove that call from the constructor and just let the rootDirectoryTreeCache be updated when the watcher thread walks the file system for the first time. Updating it before then may be pointless if it's just going to be updated again shortly thereafter.
The only benefit is preventing the user from being presented with an empty file tree when viewing the job creation UI for the first time; however, that can be addressed by presenting the user with a loading dialog if the rootDirectoryTreeCache has not been populated for the first time yet.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 85 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
I'm not sure I understand why the synchronized tags were added. Is it the concern that writes from the worker thread will interfere with reads from the main thread? If so, shouldn't getFileListByPath also be synchronized? Even in that case I'm not sure how a lock on the getter method will prevent that competition for the resource.
The two methods I synchronized are updateRootDirectoryTreeCache() and getRootDirectoryTreeCache(). The concern is that a REST request for the rootDirectoryTreeCache will be serviced shortly after the rootDirectoryTreeCache = new DirectoryTreeNode(mediaRoot); line, but before the cache is populated, resulting in an empty cache.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 111 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
If this is here can we remove the
buildInitialCachemethod?
Removed in my latest update.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 169 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
Remove these lines if not needed
Done
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 180 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
Check if dirs is empty instead.
I changed the code since you dropped this comment. If a key does not exist in the map a null will be returned, which is why I check for that here.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 276 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
I don't think toRealPath can return a null.
Removed.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 306 at r9 (raw file):
Previously, bquinn17 (Bryan Quinn) wrote…
Remove if not needed.
Done
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 344 at r10 (raw file):
// remove sub-directories from the file cache too fileCache.keySet().removeIf(k -> k.startsWith(item.getAbsolutePath()));
Before you stated:
// The OS will delete files from the lowest level up, therefore we do not need to worry about
// subdirectories or sub files. This directory is guaranteed to be empty since those deletion events
// will have cleared everything below it from the cache
I have not observed that. Rather, when I have a nested dir, and then manually delete the parent dir from the file system through the OS file viewer, I only get an event for the parent. Thus, I must iterate through the fileCache and remove all entries that are contained in that parent dir.
It seems like there should be a more efficient way to get those subdir cache entries by cross-referencing the rootDirectoryTreeCache, since the latter knows the nesting structure. That seems more efficient than iterating over the entire fileCache.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/ServerMediaServiceImpl.java, line 82 at r10 (raw file):
// recurse if (recurse && node.getNodes() != null) {
I needed to add this recursion back in because the dir checkboxes in the job creation UI were broke. You should be able to check the box next to a parent dir, and then all of the checkboxes next to the files in that dir, and the subdirs in that dir, and the files in those subdirs, and so forth, should be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, note that if two symlinks point to the same dir, the current code has a separate watcherMap entry for each of those symlinks. (Note that this works since removing a file in the symlinked dir generates an event for each watcherMap entry.) The code currently uses absolute paths for most everything (except checking for symlink cycles when validating dirs).
Note that the absolute path keeps the symlink as part of the path segment. This means that for each file in the symlinked dir, there will be multiple entries for it in the fileCache, depending on the number of symlinks that point to that dir. While inefficient, this does make it easier to service REST calls to get files from the individual symlink dirs.
If we keep things this way, we can revert the watcherMap back to using a Path instead of a List<Path>.
One issue with using the watcherMap with real paths is that we need to be able to map the real paths to one or more symlink paths (needed to handle file events), and map each symlink path to a real path (needed to service REST calls to get files), which would add more map data structures and add complexity to the code. It was too much to handle for the first round of cleanup.
Reviewable status: 6 of 13 files reviewed, 17 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 13 files reviewed, 8 unresolved discussions (waiting on @bquinn17, @brosenberg42, and @jrobble)
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 319 at r8 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Do we need to reset and cancel?
You're right, it only needs to be cancelled.
https://docs.oracle.com/javase/8/docs/api/java/nio/file/WatchKey.html#cancel--
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 43 at r9 (raw file):
Previously, jrobble (Jeff Robble) wrote…
I agree that wildcards can cause unintentional conflicts, and are not as explicit, but I do like removing clutter from the code. If we want to remove wildcards, we should do so throughout the codebase, which is a larger effort.
To your point though, we try to follow the Google style guides, which say not to use them: https://google.github.io/styleguide/javaguide.html#s3.3.1-wildcard-imports
Isn't it better to remove them as we go and not add new ones? Either way there will be a mix of star imports and regular imports until all of the star imports are gone.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 63 at r9 (raw file):
Previously, jrobble (Jeff Robble) wrote…
That makes sense, and I may change it back; however, more generally, I'm thinking about if there is a better way to design this. Things I don't like:
We manually iterate over values in the watcherMap when a file or directory is removed from the file system. Can we improve the lookup?
It seems like there may be a way to combine watcherMap with the rootDirectoryTreeCache. By keeping them separate we essentially have two data structures keeping track of every directory.
We call updateRootDirectoryTreeCache() whenever something changes on the file system, which requires recreating the entire directory tree from scratch. That seems inefficient.
Also, a pre-existing issue, the DirectoryTreeNode.find() can probably be improved by storing the nodes in a map as opposed to a list, so then child nodes can be looked up by path segment. Currently, the find() behavior looks at the full path for every node, so it's a brute force search as opposed to a directed one.
That makes sense, there are some improvements that can still be made to the performance, including improvements to the existing code. We might want to decide if this feature is already enough of an improvement on performance for now?
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 81 at r9 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Most of the time updateRootDirectoryTreeCache() is called from the watcher thread, so that does not affect WFM startup.
The exception is when it's called in the FileWatcherServiceImpl() constructor, which may delay startup (need to test this). We might be able to remove that call from the constructor and just let the rootDirectoryTreeCache be updated when the watcher thread walks the file system for the first time. Updating it before then may be pointless if it's just going to be updated again shortly thereafter.
The only benefit is preventing the user from being presented with an empty file tree when viewing the job creation UI for the first time; however, that can be addressed by presenting the user with a loading dialog if the rootDirectoryTreeCache has not been populated for the first time yet.
Right, that was what I was trying to say
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 85 at r9 (raw file):
Previously, jrobble (Jeff Robble) wrote…
The two methods I synchronized are updateRootDirectoryTreeCache() and getRootDirectoryTreeCache(). The concern is that a REST request for the rootDirectoryTreeCache will be serviced shortly after the
rootDirectoryTreeCache = new DirectoryTreeNode(mediaRoot);line, but before the cache is populated, resulting in an empty cache.
I'm not sure this will lock the rootDirectoryTreeCache object, it will just prevent two threads from calling the getter method (or the updateRootDirectoryTreeCache method ) at the same time.
You might want to use sychronized to add a monitor on the rootDirectoryTreeCache object, as explained here.
Otherwise I think the loading message mentioned early would fix this issue.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/FileWatcherServiceImpl.java, line 344 at r10 (raw file):
Previously, jrobble (Jeff Robble) wrote…
Before you stated:
// The OS will delete files from the lowest level up, therefore we do not need to worry about // subdirectories or sub files. This directory is guaranteed to be empty since those deletion events // will have cleared everything below it from the cacheI have not observed that. Rather, when I have a nested dir, and then manually delete the parent dir from the file system through the OS file viewer, I only get an event for the parent. Thus, I must iterate through the fileCache and remove all entries that are contained in that parent dir.
It seems like there should be a more efficient way to get those subdir cache entries by cross-referencing the rootDirectoryTreeCache, since the latter knows the nesting structure. That seems more efficient than iterating over the entire fileCache.
Oh, I was using the rm -r command, maybe that has a different behavior than the UI. That's unfortunate.
trunk/workflow-manager/src/main/java/org/mitre/mpf/wfm/service/ServerMediaServiceImpl.java, line 82 at r10 (raw file):
Previously, jrobble (Jeff Robble) wrote…
I needed to add this recursion back in because the dir checkboxes in the job creation UI were broke. You should be able to check the box next to a parent dir, and then all of the checkboxes next to the files in that dir, and the subdirs in that dir, and the files in those subdirs, and so forth, should be checked.
That works
#897
This change is