Skip to content

Commit

Permalink
add to docs about the fallback PollingObserver
Browse files Browse the repository at this point in the history
  • Loading branch information
eminizer committed Dec 12, 2023
1 parent 142bb17 commit d4d3782
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@ Directory monitoring with watchdog

The upload directory uses a Python library called `watchdog <https://pypi.org/project/watchdog/>`_ to monitor the filesystem for events that would trigger files to be uploaded. The implementation is purposefully minimal to reflect the limitations of using sub/pub messaging systems to reconstruct directories on disparate machines: specifically, ``DataFileUploadDirectory`` programs are "append-only" in that they will never send messages to delete certain files.

For example, if you move a bunch of files to a new subdirectory while the program is running, the consumer side will get new messages for the files in the subdirectory, but any reconstructed files in the original location will remain. Any files that are modified will be produced again; deleting files does not send any new messages. Because many common editing programs (such as vim) use short-lived temporary files to maintain atomicity of their outputs, files are not be marked to be produced until they have existed unmodified for 3 seconds. Files that are created and deleted within those three seconds will not be produced. The default 3 second lag time is configurable from the command line using the ``--watchdog_lag_time`` argument on the command line.
For example, if you move a bunch of files to a new subdirectory while the program is running, the consumer side will get new messages for the files in the subdirectory, but any reconstructed files in the original location will remain. Any files that are modified will be produced again; deleting files does not send any new messages. Because many common editing programs (such as vim) use short-lived temporary files to maintain atomicity of their outputs, files are not be marked to be produced until they have existed unmodified for 3 seconds (by default). Files that are created and deleted within those three seconds will not be produced. The default 3 second lag time is configurable from the command line using the ``--watchdog_lag_time`` argument.

Users who would like to test a particular workflow's interaction with watchdog can use the `testing script <https://github.com/openmsi/openmsistream/blob/main/test/watchdog_testing.py>`_ in the repository to see what events get triggered when. Running that script pointed to a directory will log messages for each recognized watchdog event. Only events for files are relevant for OpenMSIStream.
On Windows platforms, watchdog monitors directories using ReadDirectoryChangesW by default. But that notifier doesn't perform reliably on networked drives, or really on any filesystems that's not NTFS. In those cases, Microsoft recommends keeping periodic snapshots of directory contents and comparing with them to find changes in directory trees. Watchdog `implements a fallback solution to do this<https://github.com/gorakhargosh/watchdog?tab=readme-ov-file#about-using-watchdog-with-cifs>`_ (called a ``PollingObserver``), but it's more memory-intensive and less performant. To use that alternate observer in a ``DataFileUploadDirectory``, add the ``--use_polling_observer`` flag.

Users who would like to test a particular workflow's interaction with watchdog can use the `testing script <https://github.com/openmsi/openmsistream/blob/main/test/watchdog_testing.py>`_ in the repository to see what events get triggered when. There is also `an alternate version <https://github.com/openmsi/openmsistream/blob/main/test/watchdog_testing_polling_observer.py>`_ that uses the ``PollingObserver``. Running one of those scripts pointed to a directory will log messages for each recognized watchdog event. Only events for files are relevant for OpenMSIStream.

Restarting the program
----------------------
Expand Down
2 changes: 1 addition & 1 deletion test/test_scripts/test_polling_observer.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def run_data_file_download_directory(self, download_file_dict, **other_create_kw
except Exception as exc:
raise exc

def test_upload_and_download_directories_kafka(self):
def test_polling_observer_kafka(self):
"""
Test the upload and download directories while applying regular expressions
"""
Expand Down
27 changes: 27 additions & 0 deletions test/watchdog_testing_polling_observer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""Watchdog testing script from the PyPI page (https://pypi.org/project/watchdog/),
except using a PollingObserver instead of the default
"""

import sys
import time
import logging
from watchdog.observers.polling import PollingObserver
from watchdog.events import LoggingEventHandler

if __name__ == "__main__":
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
path = sys.argv[1] if len(sys.argv) > 1 else "."
event_handler = LoggingEventHandler()
observer = PollingObserver()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
finally:
observer.stop()
observer.join()

0 comments on commit d4d3782

Please sign in to comment.