Save git info #72

zakv · 2020-10-04T17:36:00Z

Proposed changes:

Support saving git repo information to a shot file, similar to how hg info is saved. The output from following commands are saved:
- git branch --show-current gives the name of the current branch.
- git describe --tags --always HEAD gives the name of the most recent tag reachable from the current commit.
  - If the current commit isn't tagged, a suffix showing the number of commits since the most recent tag and a shortened hash for the current commit are shown.
  - The --tags flag makes it look for lightweight tags as well instead of just annotated tags.
  - The --always flag makes it so that If no tag is reachable from the current commit, just the shortened hash is returned.
- git rev-parse HEAD returns the full hash of the current commit.
- git diff HEAD module_filename returns a diff between the file on disk and the version in the current commit.
Allow turning on/off saving of git/hg info with setting in labconfig.
- These are controlled by the save_hg_info and save_git_info options in the [labscript] section of the labconfig.
- To avoid changing the existing behavior, save_hg_info defaults to True and save_git_info defaults to False.
Run the calls to git/hg for a given file in parallel. This is straightforward since they run in separate processes anyway.
- On my machine, these are the timings to run the git/hg commands on five files during compilation:
  - hg: 1.788 seconds without parallelization and 0.695 seconds with parallelization, a 2.57x speedup.
  - git: 0.707 seconds without parallelization and 0.283 seconds with parallelization, a 2.50x speedup.
Cache results from calls to git/hg to speed up compilation as suggested in Saving of hg info slows down compilation #38
- A FileWatcher is used to track when files are modified, then the callback updates the cache immediately.
- Files are added to the FileWatcher during the first call to save_labscripts() that uses it, so the first shot compilation is a bit slower than others. The cache is updated asynchronously so future calls to save_labscripts are fast even if the file is modified. After the first run, they typically take 20 ms to 40 ms for 6 files on my machine.

Resolves #38.

…nfig [labscript] section.

…lel.

zakv · 2020-10-04T18:17:24Z

This partially resolves #38 since running some of the calls to git/hg in parallel reduces the time required to save vcs info. It's certainly not as fast as using ModuleWatcher/FileWatcher though, as suggested in that issue.

Update: commit 0f3f187 implements caching, which resolves #38

philipstarkey · 2020-10-14T05:03:14Z

I think there are a couple of thread safety issues here:

FileWatcher.files is not thread safe. Internally access to the files attribute is always serialised by a threading.Lock(). Technically you could use this lock (it's in the .lock attribute) to access it safely from the labscript code but really the lock, files and folders attributes of the filewatcher should be private. FileWatcher needs a cleanup in this regard (I've logged this as FileWatcher has thread unsafe attributes exposed labscript-utils#68). Rather than checking to see if filepath is in the filewatchers list of files, I would check if it is in _vcs_cache. However, you will also need a lock for this too!
Because _vcs_cache is being accessed by multiple threads (filewatcher runs the check in another trhead), you need a threading.Lock object to serialise access. Actually, you probably want a threading.RLock() so you can put it inside _file_watcher_callback as well as using it for the change in 1 (without having to worrying about releasing the lock before calling _file_watcher_callback) and the extraction of results on line #2293.

I was also kind of imaging the cache would be in runmanager, but I think this is better (more self contained). It makes it easy to clear the cache (just restart the runmanager subprocess!). It does mean there would be duplication of git calls if/when we parallelise the runmanager compilation, but that would have been hard to avoid anyway!

zakv · 2020-10-14T07:34:10Z

I think there are a couple of thread safety issues here:

Yeah I haven't worked very directly with threading in python before so I'm not surprised. Thanks for taking the time to point them out!

Because _vcs_cache is being accessed by multiple threads (filewatcher runs the check in another trhead), you need a threading.Lock object to serialise access.

I had thought about that a bit. Doing for command, info, err in _vcs_cache[path] certainly wouldn't be thread save. I ended up convincing myself that results = _vcs_cache[path]; for command, info, err in results would be fine though because _vcs_cache[path] always points to a list no matter where the other thread is in executing _file_watcher_callback(), even partway through its assignment to _vcs_cache[name]. Then results would stay pointed at that list even if the other thread modifies _vcs_cache. I'm quite likely wrong though and I'll need to add a lock to address your point (1) anyway, so I'll get started on that!

philipstarkey · 2020-10-14T07:53:06Z

I had thought about that a bit. Doing for command, info, err in _vcs_cache[path] certainly wouldn't be thread save. I ended up convincing myself that results = _vcs_cache[path]; for command, info, err in results would be fine though because _vcs_cache[path] always points to a list no matter where the other thread is in executing _file_watcher_callback(), even partway through its assignment to _vcs_cache[name]. Then results would stay pointed at that list even if the other thread modifies _vcs_cache. I'm quite likely wrong though and I'll need to add a lock to address your point (1) anyway, so I'll get started on that!

So I did some reading and you are technically correct in that reading values for existing keys is thread safe in current CPython implementations even if another thread overwrites the value. But it is not guaranteed to stay thread safe in future Python versions and would not be thread safe should our code change in the future (say if a future change meant it was possible that one thread might insert while another was reading). Basically I find it's safer to always use a lock on most datatypes, and the overhead is pretty minimal!

zakv · 2020-10-14T08:02:17Z

So I did some reading and you are technically correct in that reading values for existing keys is thread safe in current CPython implementations even if another thread overwrites the value. But it is not guaranteed to stay thread safe in future Python versions and would not be thread safe should our code change in the future (say if a future change meant it was possible that one thread might insert while another was reading). Basically I find it's safer to always use a lock on most datatypes, and the overhead is pretty minimal!

Yeah it sounds like even if it works now, it's very prone to breakage whenever something else changes in labscript or elsewhere. Locks are definitely the way to go!

zakv · 2020-10-14T08:04:37Z

I believe I've got the lock implemented correctly now, though feel free to tell me otherwise

philipstarkey

So I've made some further comments. They're minor improvements and only come into play if filewatcher is trying to update the vcs data while a compilation is going on (which probably won't happen very often!). But I figured going into details with these concepts might be useful for you when working on other parts of the labscript suite (where lock contention is more common)

philipstarkey · 2020-10-15T08:16:27Z

labscript/labscript.py

+_vcs_cache_rlock = threading.RLock()
+def _file_watcher_callback(name, info, event):
+    with _vcs_cache_rlock:
+        _vcs_cache[name] = _run_vcs_commands(name)


So the lock is now being held for however long it takes to query the subprocess information. You can reduce lock contention by doing something like temp = _run_vcs_commands(name) outside of the lock and then only holding the lock when you place that into the _vcs_cache dictionary.

Believe it or not I actually originally set things up as you're suggesting in both of these comments with Lock in place of RLock, but then changed it. For the lock here in _file_watcher_callback(), if the _file_watcher thread called _file_watcher_callback() then at least some of the cache data is out of date, so it is actually better for it to block the save_labscripts() thread from writing cached results to drive until the cache is updated.

philipstarkey · 2020-10-15T08:21:40Z

labscript/labscript.py

-                            if info or err:
-                                hdf5_file[save_path].attrs['hg ' + str(command[0])] = info.decode('utf-8') + '\n' + err.decode('utf-8')
+                    with _vcs_cache_rlock:
+                        if path not in _vcs_cache:


Similarly here, you could do:

with _vcs_cache_rlock: condition = path not in _vcs_cache if condition: #add to filewatch, and call callback with _vcs_cache_rlock: for command, info, err in _vcs_cache[path]:

to reduce lock contention. This actually might mean I was wrong about needing the RLock over a plain Lock (but whatever...RLocks are safer in the long run).

When I did it this way I thought I was just making things more complicated than necessary. As soon as the lock is released after the first with statement, it is almost immediately re-acquired either by the explicit callback call or the following with statement.

But now that you brought it up again, I think you're right that this is better. Freeing up the lock even briefly gives the _file_watcher thread an opportunity to run _file_watcher_callback() between those statements so it could potentially update the cache last minute right before last with statement there. I'll make this change.

I also considered reducing lock contention even further by locking just to do file_cache_entry = _vcs_cache[path].copy() then freeing the lock during for loop which is slowed by writing to disk. I think that would work fine as long as nothing in the list returned by _vcs_cache[path] is mutable and edited by both threads. Currently the list only has tuples of strings so that should be ok unless something changes. Alternatively copy.deepcopy() could be used to play it safe. Thoughts?

Commit 4716ceb implements your suggestion, but doesn't move the for loop out of the locked section as mentioned above.

zakv · 2021-08-18T09:01:32Z

@philipstarkey I got a request for the git features in this PR recently, so I'm thinking of merging it. Would you like to take another look at it beforehand or should I just move forward with it?

I also plan on testing then (presumably) merging #73 too while I'm at it.

@dihm Do you have an interest in taking a look at this or #73? You've been very active in the development lately so I thought I should give you the opportunity to add any input if you're interested

philipstarkey · 2021-08-18T10:13:21Z

Happy for you to go ahead as you see fit!

zakv · 2021-08-20T17:47:33Z

I believe all of the comments could be considered resolved now and I've been using the code in this PR for the better part of a year without issue. I'll probably merge this PR sometime in the next few days.

dihm · 2021-08-20T19:46:46Z

I tested this on top of #73. The speed improvement is pretty significant if you aren't using any versioning! I'm much less experienced with all of the locks but it appears to be OK, especially if Phil says so.

It would be good to update the default profile with the new keys over in labscript_utils once this gets merged.

zakv · 2021-08-20T20:47:16Z

Dope, glad it works for your setup as well!

Actually the new keys already got added to the default profile in labscript_utils (labscript-suite/labscript-utils#66). That PR got merged when it started looking like this PR would likely get merged at some point

dihm · 2021-08-20T22:01:47Z

So it is! That's what I get for assuming my installation was up to date.

Good work getting this sorted!

commit f6409e0 Author: David Meyer <dihm@users.noreply.github.com> Date: Tue Dec 7 10:11:56 2021 -0500 Update setup.cfg to show python 3.9 support. commit 4d26344 Merge: 62455b5 deca5d8 Author: David Meyer <dihm@users.noreply.github.com> Date: Wed Nov 10 18:42:45 2021 -0500 Merge pull request #77 from zakv/square-wave Added AnalogQuantity.square_wave(). commit deca5d8 Author: Zak V <zakven@mit.edu> Date: Wed Nov 10 18:00:24 2021 -0500 Changed the parameterization of AnalogQuantity.square_wave() inputs, then added square_wave_levels() method which uses the old parameterization. This PR also inverts the previous meaning of duty cycle in the old parameterization. Now duty cycle is the fraction of the time spent outputing `level_0` rather than `level_1` in square_wave_levels(). commit 62455b5 Merge: 13239cc bda942a Author: David Meyer <dihm@users.noreply.github.com> Date: Wed Nov 10 08:53:04 2021 -0500 Merge pull request #84 from chrisjbillington/full-unitconversion-import-path Save the full import path of unit conversion classes commit bda942a Author: chrisjbillington <chrisjbillington@gmail.com> Date: Tue Nov 9 16:50:14 2021 +1100 Save the full import path of unit conversion classes This allows unit conversion classes located outside of `labscript_utils` to be used. Closes #71 commit 13239cc Merge: 83e10e5 4716ceb Author: zakv <4721629+zakv@users.noreply.github.com> Date: Fri Aug 20 16:48:15 2021 -0400 Merge pull request #72 from zakv/save-git-info Save git info commit 83e10e5 Merge: 9fbd24c ea17d77 Author: zakv <4721629+zakv@users.noreply.github.com> Date: Fri Aug 20 16:47:49 2021 -0400 Merge pull request #73 from philipstarkey/feature/performance-improvements Performance improvements commit 9fbd24c Merge: 04db899 736cb32 Author: zakv <4721629+zakv@users.noreply.github.com> Date: Fri Aug 20 08:33:57 2021 -0400 Merge pull request #79 from zakv/fix-78 Fix #78: labscript.py accidentally overwrites dedent() commit 04db899 Merge: a805b69 9915e50 Author: David Meyer <dihm@users.noreply.github.com> Date: Fri Jul 16 14:54:58 2021 -0400 Merge pull request #74 from dihm/labscript-docs Initial pass at API docs for labscript commit 9915e50 Author: David Meyer <dihm.meyer@gmail.com> Date: Thu Jul 15 10:17:58 2021 -0400 Even more docstring coverage in the API. commit a4057e5 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 10:38:13 2021 -0400 Adding more docstrings to the API. commit 846377d Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 10:34:02 2021 -0400 Remove colorama stuff. Doesn't work on RTD anyway. Now everything will be more consistent. commit 073ceb4 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 10:08:19 2021 -0400 Add missing dependency that sphinx should bring in automatically but doesn't on RTD. commit 08071b1 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 09:51:36 2021 -0400 Update sphinx pin to newest stable version. commit 4b6f805 Author: David Meyer <dihm.meyer@gmail.com> Date: Wed Jul 14 09:41:26 2021 -0400 Quick hack to get a coverage percentage of the API documentation. We can't easily use the coverage extension because it counts things documented via the `undoc-members` flag as documented! Actual items that need documenting can also be printed if desired by setting the `undoc_print_objects` flag in conf.py. This is automatically done for RTD builds. commit 09091e3 Author: David Meyer <dihm.meyer@gmail.com> Date: Tue Jul 13 18:02:21 2021 -0400 Add docstrings for all of the defined functions. commit c1960c0 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Jul 9 11:45:36 2021 -0400 Clean up warning messages and dead code. commit b9be2c3 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Jul 9 11:23:00 2021 -0400 Version bump RTD theme. commit 7e086b6 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Apr 2 13:23:33 2021 -0400 Version bump sphinx build requirement to 3.2.1 commit 689bc40 Author: David Meyer <dihm.meyer@gmail.com> Date: Fri Apr 2 12:19:55 2021 -0400 Adding the connection diagram figure from Phil's thesis. commit f5b8eed Author: David Meyer <dihm.meyer@gmail.com> Date: Tue Nov 17 07:29:34 2020 -0500 Version bump sphinx to 3.1.2 This fixes issue where autodoc does not correctly handle decorated function signatures. commit 6512ac6 Author: David Meyer <dihm@users.noreply.github.com> Date: Mon Nov 16 20:17:19 2020 -0500 Version bump sphinx Fixes issue with using autosummary templates commit 0e91b09 Author: David Meyer <dihm.meyer@gmail.com> Date: Mon Nov 16 18:39:30 2020 -0500 Initial pass at API docs for labscript. This relies on recursive features and custom templates for autosummary, as described in https://stackoverflow.com/questions/2701998/sphinx-autodoc-is-not-automatic-enough/62613202#62613202 commit 736cb32 Author: Zak V <zakven@mit.edu> Date: Thu Feb 25 19:23:29 2021 -0500 Fixed issue 78 in which labscript.py accidentally overwrote labscript_utils.dedent() with matplotlib.cbook.dedent(). commit 054e281 Author: Zak V <zakven@mit.edu> Date: Thu Feb 25 18:40:41 2021 -0500 Added AnalogQuantity.square_wave(). commit 4716ceb Author: Zak V <zakven@mit.edu> Date: Thu Oct 15 05:47:31 2020 -0400 Reduced _vcs_cache_rlock contention in save_labscripts(). commit ea17d77 Author: Phil Starkey <philipstarkey@users.noreply.github.com> Date: Thu Oct 15 18:50:45 2020 +1100 Performance improvements These small changes seem to improve compilation time significantly. Tests (done with saving of hg info disabled) indicate compilation time for a complex experiment is decreased somewhere between 25-33%. commit 903cc43 Author: Zak V <zakven@mit.edu> Date: Wed Oct 14 03:59:16 2020 -0400 Worked on some thread safety issues with vcs caching. commit 0f3f187 Author: Zak V <zakven@mit.edu> Date: Tue Oct 13 02:01:49 2020 -0400 Results from git/hg are now cached for faster shot compilation. commit 6022d97 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:52:10 2020 -0400 Added '--tags' to call to git describe. commit 2025755 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:38:10 2020 -0400 Updated git/hg error message. commit dc8d917 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:35:39 2020 -0400 Added "git describe --always HEAD" to git commands. commit 9a20221 Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:30:44 2020 -0400 Removed "--verify" from git rev-parse call. commit 58ed6db Author: Zak V <zakven@mit.edu> Date: Sun Oct 4 12:28:35 2020 -0400 save_labscripts()'s calls to git/hg for a given file now run in parallel. commit df4e17f Author: Zak V <zakven@mit.edu> Date: Sat Oct 3 14:35:59 2020 -0400 Added support for save_hg_info and save_git_info options in the labconfig [labscript] section. commit e0073ea Author: Zak V <zakven@mit.edu> Date: Sat Oct 3 13:48:15 2020 -0400 Added support for saving git repo info.

zakv added 7 commits October 3, 2020 13:48

Added support for saving git repo info.

e0073ea

Added support for save_hg_info and save_git_info options in the labco…

df4e17f

…nfig [labscript] section.

save_labscripts()'s calls to git/hg for a given file now run in paral…

58ed6db

…lel.

Removed "--verify" from git rev-parse call.

9a20221

Added "git describe --always HEAD" to git commands.

dc8d917

Updated git/hg error message.

2025755

Added '--tags' to call to git describe.

6022d97

zakv mentioned this pull request Oct 4, 2020

Added [labscript] section to example.ini. labscript-suite/labscript-utils#66

Merged

Results from git/hg are now cached for faster shot compilation.

0f3f187

Worked on some thread safety issues with vcs caching.

903cc43

philipstarkey reviewed Oct 15, 2020

View reviewed changes

philipstarkey mentioned this pull request Oct 15, 2020

Performance improvements #73

Merged

Reduced _vcs_cache_rlock contention in save_labscripts().

4716ceb

philipstarkey self-assigned this Oct 31, 2020

zakv merged commit 13239cc into labscript-suite:master Aug 20, 2021

zakv deleted the save-git-info branch August 20, 2021 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save git info #72

Save git info #72

zakv commented Oct 4, 2020 •

edited

zakv commented Oct 4, 2020 •

edited

philipstarkey commented Oct 14, 2020

zakv commented Oct 14, 2020

philipstarkey commented Oct 14, 2020

zakv commented Oct 14, 2020

zakv commented Oct 14, 2020

philipstarkey left a comment

philipstarkey Oct 15, 2020

zakv Oct 15, 2020

philipstarkey Oct 15, 2020

zakv Oct 15, 2020 •

edited

zakv Oct 15, 2020

zakv commented Aug 18, 2021

philipstarkey commented Aug 18, 2021

zakv commented Aug 20, 2021

dihm commented Aug 20, 2021 •

edited

zakv commented Aug 20, 2021

dihm commented Aug 20, 2021

Save git info #72

Save git info #72

Conversation

zakv commented Oct 4, 2020 • edited

zakv commented Oct 4, 2020 • edited

philipstarkey commented Oct 14, 2020

zakv commented Oct 14, 2020

philipstarkey commented Oct 14, 2020

zakv commented Oct 14, 2020

zakv commented Oct 14, 2020

philipstarkey left a comment

Choose a reason for hiding this comment

philipstarkey Oct 15, 2020

Choose a reason for hiding this comment

zakv Oct 15, 2020

Choose a reason for hiding this comment

philipstarkey Oct 15, 2020

Choose a reason for hiding this comment

zakv Oct 15, 2020 • edited

Choose a reason for hiding this comment

zakv Oct 15, 2020

Choose a reason for hiding this comment

zakv commented Aug 18, 2021

philipstarkey commented Aug 18, 2021

zakv commented Aug 20, 2021

dihm commented Aug 20, 2021 • edited

zakv commented Aug 20, 2021

dihm commented Aug 20, 2021

zakv commented Oct 4, 2020 •

edited

zakv commented Oct 4, 2020 •

edited

zakv Oct 15, 2020 •

edited

dihm commented Aug 20, 2021 •

edited