New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save git info #72
Save git info #72
Conversation
…nfig [labscript] section.
I think there are a couple of thread safety issues here:
I was also kind of imaging the cache would be in runmanager, but I think this is better (more self contained). It makes it easy to clear the cache (just restart the runmanager subprocess!). It does mean there would be duplication of git calls if/when we parallelise the runmanager compilation, but that would have been hard to avoid anyway! |
Yeah I haven't worked very directly with threading in python before so I'm not surprised. Thanks for taking the time to point them out!
I had thought about that a bit. Doing |
So I did some reading and you are technically correct in that reading values for existing keys is thread safe in current CPython implementations even if another thread overwrites the value. But it is not guaranteed to stay thread safe in future Python versions and would not be thread safe should our code change in the future (say if a future change meant it was possible that one thread might insert while another was reading). Basically I find it's safer to always use a lock on most datatypes, and the overhead is pretty minimal! |
Yeah it sounds like even if it works now, it's very prone to breakage whenever something else changes in labscript or elsewhere. Locks are definitely the way to go! |
I believe I've got the lock implemented correctly now, though feel free to tell me otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've made some further comments. They're minor improvements and only come into play if filewatcher is trying to update the vcs data while a compilation is going on (which probably won't happen very often!). But I figured going into details with these concepts might be useful for you when working on other parts of the labscript suite (where lock contention is more common)
_vcs_cache_rlock = threading.RLock() | ||
def _file_watcher_callback(name, info, event): | ||
with _vcs_cache_rlock: | ||
_vcs_cache[name] = _run_vcs_commands(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the lock is now being held for however long it takes to query the subprocess information. You can reduce lock contention by doing something like temp = _run_vcs_commands(name)
outside of the lock and then only holding the lock when you place that into the _vcs_cache
dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe it or not I actually originally set things up as you're suggesting in both of these comments with Lock
in place of RLock
, but then changed it. For the lock here in _file_watcher_callback()
, if the _file_watcher
thread called _file_watcher_callback()
then at least some of the cache data is out of date, so it is actually better for it to block the save_labscripts()
thread from writing cached results to drive until the cache is updated.
labscript/labscript.py
Outdated
if info or err: | ||
hdf5_file[save_path].attrs['hg ' + str(command[0])] = info.decode('utf-8') + '\n' + err.decode('utf-8') | ||
with _vcs_cache_rlock: | ||
if path not in _vcs_cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, you could do:
with _vcs_cache_rlock:
condition = path not in _vcs_cache
if condition:
#add to filewatch, and call callback
with _vcs_cache_rlock:
for command, info, err in _vcs_cache[path]:
to reduce lock contention. This actually might mean I was wrong about needing the RLock
over a plain Lock
(but whatever...RLock
s are safer in the long run).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I did it this way I thought I was just making things more complicated than necessary. As soon as the lock is released after the first with
statement, it is almost immediately re-acquired either by the explicit callback call or the following with
statement.
But now that you brought it up again, I think you're right that this is better. Freeing up the lock even briefly gives the _file_watcher
thread an opportunity to run _file_watcher_callback()
between those statements so it could potentially update the cache last minute right before last with
statement there. I'll make this change.
I also considered reducing lock contention even further by locking just to do file_cache_entry = _vcs_cache[path].copy()
then freeing the lock during for
loop which is slowed by writing to disk. I think that would work fine as long as nothing in the list returned by _vcs_cache[path]
is mutable and edited by both threads. Currently the list only has tuples of strings so that should be ok unless something changes. Alternatively copy.deepcopy()
could be used to play it safe. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 4716ceb implements your suggestion, but doesn't move the for
loop out of the locked section as mentioned above.
@philipstarkey I got a request for the git features in this PR recently, so I'm thinking of merging it. Would you like to take another look at it beforehand or should I just move forward with it? I also plan on testing then (presumably) merging #73 too while I'm at it. @dihm Do you have an interest in taking a look at this or #73? You've been very active in the development lately so I thought I should give you the opportunity to add any input if you're interested |
Happy for you to go ahead as you see fit! |
I believe all of the comments could be considered resolved now and I've been using the code in this PR for the better part of a year without issue. I'll probably merge this PR sometime in the next few days. |
I tested this on top of #73. The speed improvement is pretty significant if you aren't using any versioning! I'm much less experienced with all of the locks but it appears to be OK, especially if Phil says so. It would be good to update the default profile with the new keys over in |
Dope, glad it works for your setup as well! Actually the new keys already got added to the default profile in |
So it is! That's what I get for assuming my installation was up to date. Good work getting this sorted! |
commit f6409e0 Author: David Meyer <dihm@users.noreply.github.com> Date: Tue Dec 7 10:11:56 2021 -0500 Update setup.cfg to show python 3.9 support. commit 4d26344 Merge: 62455b5 deca5d8 Author: David Meyer <dihm@users.noreply.github.com> Date: Wed Nov 10 18:42:45 2021 -0500 Merge pull request #77 from zakv/square-wave Added AnalogQuantity.square_wave(). commit deca5d8 Author: Zak V <zakven@mit.edu> Date: Wed Nov 10 18:00:24 2021 -0500 Changed the parameterization of AnalogQuantity.square_wave() inputs, then added square_wave_levels() method which uses the old parameterization. This PR also inverts the previous meaning of duty cycle in the old parameterization. Now duty cycle is the fraction of the time spent outputing `level_0` rather than `level_1` in square_wave_levels(). commit 62455b5 Merge: 13239cc bda942a Author: David Meyer <dihm@users.noreply.github.com> Date: Wed Nov 10 08:53:04 2021 -0500 Merge pull request #84 from chrisjbillington/full-unitconversion-import-path Save the full import path of unit conversion classes commit bda942a Author: chrisjbillington <chrisjbillington@gmail.com> Date: Tue Nov 9 16:50:14 2021 +1100 Save the full import path of unit conversion classes This allows unit conversion classes located outside of `labscript_utils` to be used. Closes #71 commit 13239cc Merge: 83e10e5 4716ceb Author: zakv <4721629+zakv@users.noreply.github.com> Date: Fri Aug 20 16:48:15 2021 -0400 Merge pull request #72 from zakv/save-git-info Save git info commit 83e10e5 Merge: 9fbd24c ea17d77 Author: zakv <4721629+zakv@users.noreply.github.com> Date: Fri Aug 20 16:47:49 2021 -0400 Merge pull request #73 from philipstarkey/feature/performance-improvements Performance improvements commit 9fbd24c Merge: 04db899 736cb32 Author: zakv <4721629+zakv@users.noreply.github.com> Date: Fri Aug 20 08:33:57 2021 -0400 Merge pull request #79 from zakv/fix-78 Fix #78: labscript.py accidentally overwrites dedent() commit 04db899 Merge: a805b69 9915e50 Author: David Meyer <dihm@users.noreply.github.com> Date: Fri Jul 16 14:54:58 2021 -0400 Merge pull request #74 from dihm/labscript-docs Initial pass at API docs for labscript commit 9915e50 Author: David Meyer <dihm.meyer@gmail.com> Date: Thu Jul 15 10:17:58 2021 -0400 Even more docstring coverage in the API. commit a4057e5 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 10:38:13 2021 -0400 Adding more docstrings to the API. commit 846377d Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 10:34:02 2021 -0400 Remove colorama stuff. Doesn't work on RTD anyway. Now everything will be more consistent. commit 073ceb4 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 10:08:19 2021 -0400 Add missing dependency that sphinx should bring in automatically but doesn't on RTD. commit 08071b1 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 09:51:36 2021 -0400 Update sphinx pin to newest stable version. commit 4b6f805 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 09:41:26 2021 -0400 Quick hack to get a coverage percentage of the API documentation. We can't easily use the coverage extension because it counts things documented via the `undoc-members` flag as documented! Actual items that need documenting can also be printed if desired by setting the `undoc_print_objects` flag in conf.py. This is automatically done for RTD builds. commit 09091e3 Author: David Meyer <dihm.meyer@gmail.com> Date: Tue Jul 13 18:02:21 2021 -0400 Add docstrings for all of the defined functions. commit c1960c0 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Jul 9 11:45:36 2021 -0400 Clean up warning messages and dead code. commit b9be2c3 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Jul 9 11:23:00 2021 -0400 Version bump RTD theme. commit 7e086b6 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Apr 2 13:23:33 2021 -0400 Version bump sphinx build requirement to 3.2.1 commit 689bc40 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Apr 2 12:19:55 2021 -0400 Adding the connection diagram figure from Phil's thesis. commit f5b8eed Author: David Meyer <dihm.meyer@gmail.com> Date: Tue Nov 17 07:29:34 2020 -0500 Version bump sphinx to 3.1.2 This fixes issue where autodoc does not correctly handle decorated function signatures. commit 6512ac6 Author: David Meyer <dihm@users.noreply.github.com> Date: Mon Nov 16 20:17:19 2020 -0500 Version bump sphinx Fixes issue with using autosummary templates commit 0e91b09 Author: David Meyer <dihm.meyer@gmail.com> Date: Mon Nov 16 18:39:30 2020 -0500 Initial pass at API docs for labscript. This relies on recursive features and custom templates for autosummary, as described in https://stackoverflow.com/questions/2701998/sphinx-autodoc-is-not-automatic-enough/62613202#62613202 commit 736cb32 Author: Zak V <zakven@mit.edu> Date: Thu Feb 25 19:23:29 2021 -0500 Fixed issue 78 in which labscript.py accidentally overwrote labscript_utils.dedent() with matplotlib.cbook.dedent(). commit 054e281 Author: Zak V <zakven@mit.edu> Date: Thu Feb 25 18:40:41 2021 -0500 Added AnalogQuantity.square_wave(). commit 4716ceb Author: Zak V <zakven@mit.edu> Date: Thu Oct 15 05:47:31 2020 -0400 Reduced _vcs_cache_rlock contention in save_labscripts(). commit ea17d77 Author: Phil Starkey <philipstarkey@users.noreply.github.com> Date: Thu Oct 15 18:50:45 2020 +1100 Performance improvements These small changes seem to improve compilation time significantly. Tests (done with saving of hg info disabled) indicate compilation time for a complex experiment is decreased somewhere between 25-33%. commit 903cc43 Author: Zak V <zakven@mit.edu> Date: Wed Oct 14 03:59:16 2020 -0400 Worked on some thread safety issues with vcs caching. commit 0f3f187 Author: Zak V <zakven@mit.edu> Date: Tue Oct 13 02:01:49 2020 -0400 Results from git/hg are now cached for faster shot compilation. commit 6022d97 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:52:10 2020 -0400 Added '--tags' to call to git describe. commit 2025755 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:38:10 2020 -0400 Updated git/hg error message. commit dc8d917 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:35:39 2020 -0400 Added "git describe --always HEAD" to git commands. commit 9a20221 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:30:44 2020 -0400 Removed "--verify" from git rev-parse call. commit 58ed6db Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:28:35 2020 -0400 save_labscripts()'s calls to git/hg for a given file now run in parallel. commit df4e17f Author: Zak V <zakven@mit.edu> Date: Sat Oct 3 14:35:59 2020 -0400 Added support for save_hg_info and save_git_info options in the labconfig [labscript] section. commit e0073ea Author: Zak V <zakven@mit.edu> Date: Sat Oct 3 13:48:15 2020 -0400 Added support for saving git repo info.
Proposed changes:
git branch --show-current
gives the name of the current branch.git describe --tags --always HEAD
gives the name of the most recent tag reachable from the current commit.--tags
flag makes it look for lightweight tags as well instead of just annotated tags.--always
flag makes it so that If no tag is reachable from the current commit, just the shortened hash is returned.git rev-parse HEAD
returns the full hash of the current commit.git diff HEAD module_filename
returns a diff between the file on disk and the version in the current commit.save_hg_info
andsave_git_info
options in the[labscript]
section of the labconfig.save_hg_info
defaults toTrue
andsave_git_info
defaults toFalse
.FileWatcher
is used to track when files are modified, then the callback updates the cache immediately.FileWatcher
during the first call tosave_labscripts()
that uses it, so the first shot compilation is a bit slower than others. The cache is updated asynchronously so future calls tosave_labscripts
are fast even if the file is modified. After the first run, they typically take 20 ms to 40 ms for 6 files on my machine.Resolves #38.