Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a global HDF5 lock (trac: 10464) #2688

Merged
merged 5 commits into from Jul 1, 2014
Merged

Conversation

manics
Copy link
Member

@manics manics commented Jun 25, 2014

Opening against develop instead of dev_5_0 in case it deadlocks the demo

See: https://trac.openmicroscopy.org.uk/ome/ticket/10464
Particularly: https://trac.openmicroscopy.org.uk/ome/ticket/10464#comment:11

Testing: Modify the connection details in the following script
https://gist.github.com/manics/b03acf51254b16c30d3b
and run python tables-crash.py 100 100 from 5+ teminals at the same time. This will create 100 tables/script, and for each table will make 100 calls to addData. With a standard HDF5 install (e.g. brew install hdf5 without any custom options) this will probably fail after less than a minute if you connect to a server without this PR. Then try again with this PR merged and it might work. In theory this could be tested against trout-latest and trout-merge if you set the right ports.

If that works then check for deadlocks.

Note this reverts #2621

@joshmoore
Copy link
Member

Argh, second full run of (first 5 and then 3) terminals without your patch. From dpkg -l:

ii  libhdf5-7:amd64                                       1.8.11-5ubuntu7                                     amd64
Hierarchical Data Format 5 (HDF5) - runtime files - serial version

I'll keep trying.

@joshmoore
Copy link
Member

Another round of 3 shells and one with 5 shells passed. Likely I will have to assume that it's not going to be reproducible on my system. I'm going to add some logging of performance speeds to your script, run before and after your change, and then finish up, since the only thing that concerns me is if there is significant blockage on the single lock. I can certainly see from the code how this change would prevent concurrent access.

@joshmoore
Copy link
Member

Just running against the two different versions doesn't show any marked decrease in performance (both roughly 1.7 ms per call to addData). I could get the error to occur, but both with and without your commit my using a multi-threaded version:

...
def runMT(totalopen, totalnew):
    threads = []
    for n in xrange(totalnew):
        t = threading.Thread(target=lambda: run(totalopen))
        threads.append(t)
    for x in threads:
        x.start()
    for x in threads:
        x.join()

At, 100 threads (./tables-crash.py 1 100), 40+ threads will fail with NoneType has no attribute initialize.

@manics
Copy link
Member Author

manics commented Jun 30, 2014

  1. I can reproduce the multi-threaded error (runMT)
  2. The original test script works against trout-merge, and fails as expected with trout-latest
  3. The unit tests still fail intermittently
    https://travis-ci.org/manics/openmicroscopy/jobs/28783959
    https://s3.amazonaws.com/archive.travis-ci.org/jobs/28783959/log.txt

So I think I've fixed something, just not sure what.

@joshmoore
Copy link
Member

Interesting. Guess we may well have some follow-ups...

I'd suggest then that's merge after tomorrow's builds and see if anything arises. If we're worried about the impact on 5.0, it might make sense to put this behind a configuration property so we can ask users to turn in on (or off) depending on errors they're seeing.

@joshmoore
Copy link
Member

No Tables-related test failures today. Merging.

joshmoore added a commit that referenced this pull request Jul 1, 2014
Use a global HDF5 lock (trac: 10464)
@joshmoore joshmoore merged commit 76b0231 into ome:develop Jul 1, 2014
@manics
Copy link
Member Author

manics commented Jul 2, 2014

Il'l hold off rebasing for a few days in case the travis errors reappear.

@joshmoore
Copy link
Member

👍

@manics
Copy link
Member Author

manics commented Jul 10, 2014

--rebased-to #2783

@sbesson sbesson added this to the 5.1.0-m1 milestone Oct 14, 2014
@manics manics deleted the hdf5_locking-5_1 branch October 30, 2014 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants