New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H5PL: Plugin Interface #1166
H5PL: Plugin Interface #1166
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1166 +/- ##
=======================================
Coverage 83.73% 83.73%
=======================================
Files 18 18
Lines 2146 2146
=======================================
Hits 1797 1797
Misses 349 349
Continue to review full report at Codecov.
|
class TestSearchPaths(TestCase): | ||
|
||
def test_default(self): | ||
self.assertEqual(h5pl.size(), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this depend on the HDF5_PLUGIN_PATH
environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, quite the contrary: it assumes the default value (see next line).
Running in the separate process, without side effects and predictable HDF5_PLUGIN_PATH
sounds good. What's the most elegant way to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if pytest has some neat magic to make this convenient, but I would do something like this:
from subprocess import run, PIPE, STDOUT
res = run([sys.executable, '-m', 'pytest', '...'], stdout=PIPE, stderr=STDOUT)
print(res.stdout)
assert res.returncode == 0
The ...
there would contain whatever arguments are needed to make pytest run the tests in a module it would otherwise skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure current CI uses pytest
and not unittest2
at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #1003 was merged, so as far as I know, it uses pytest.
A lot of tests are still written with the unittest API; pytest can find and run these with no problem. I'm not personally enthusiastic about 'converting' existing tests to a more idiomatic pytest style - it sounds like a lot of change for very little benefit - but I'd say it's OK to write new tests assuming pytest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the approach of https://github.com/pytest-dev/pytest-forked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not enthusiastic! It looks like it's implemented as a command-line option which applies to all tests, whereas we want to apply this to only a specific set of tests that manipulate global state. Also, fork-without-exec is easy to mess up.
I can imagine this being a useful general utility, though - run some tests in a subprocess and integrate their results back into the main test results. Pytest itself has some machinery for testing pytest plugins which might be a useful starting point for something like this: https://docs.pytest.org/en/latest/writing_plugins.html#testing-plugins
h5py/tests/old/test_h5pl.py
Outdated
|
||
from ..common import TestCase | ||
|
||
@ut.skip('The tests have side effects') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go for this, I think we should come up with a way to run the tests by default; maybe run them in a separate process? Tests that don't run by default aren't very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are also documentation ;-)
But I agree; see my other comment.
# === C API =================================================================== | ||
|
||
IF HDF5_VERSION >= (1, 10, 1): | ||
cpdef append(const char* search_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with the low-level API, but should these have the with_phil
decorator for locking? Check with the other maintainers before adding this; it might not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the phil
lock is mostly about making sure that we don't create / reuse / destroy c++ objects in an unsafe way (this is to support the correct ref-counting on holding c++ File objects open even if the user drops all (direct) python references to the File and only keeps Datasets). See the code in _objects.pyx
which is rather racey (including an issue where we discovered we have to turn off gc temporarily due to list(of_a_dict)
having "mutation during iteration" issues).
That is a very long way of saying I don't think we need to lock with the same lock as everything else (as if the user has race conditions between adding plugins to the search path and opening files I think that is on them).
On the other hand, these should be used rarely enough that locking could not hurt. We may also want to lock them between eachother (as I can imagine an append
and a remove
colliding and ending up with a "gap" in the hdf5 datastrucutres....).
Uhh, it took longer to make it run on all Pythons, but now at least Travis passes. I have no idea what's going on with AppVeyor though: https://ci.appveyor.com/project/h5py/h5py/builds/22108613/job/1aqbnhig7tw4hxr4#L237 @takluyver Could you please take another look and make your judgement whether my sandboxing implementation is in principle tolerable. |
h5py/tests/old/test_h5pl.py
Outdated
h5pl.append(b'/opt/hdf5/vendor-plugin') | ||
self.assertEqual(h5pl.size(), 2) | ||
self.assertTrue(h5pl.get(0).endswith(b'hdf5/lib/plugin\x00')) | ||
self.assertEqual(h5pl.get(1), b'/opt/hdf5/vendor-plugin\x00') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generally expect a Python API to return strings without the trailing null. Does h5py's low level API include the trailing null elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, and frankly I was quite surprised HDF5 returns nulls here. The docs do not mention it (if I'm not missing something).
Being in doubt I went with "h5py is a thin, pythonic wrapper" motto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess it doesn't mention it because it's normal for C to store strings in n+1 bytes including the terminating null byte. Maybe Cython has some convenient functionality to trim off the null byte?
For running the tests in a subprocess, I was thinking of something like using a pytest marker along with a pytest configuration file to exclude these tests by default. Then run pytest in a subprocess, overriding the marker selection to run only those tests. I haven't 100% figured this out, but I'm pretty sure it should be possible, and it saves creating temporary files for the test code. |
The problem with Appveyor appears to be affecting my build on #1132 as well, so it looks like it's not your PR causing it. Someone gets to have fun debugging Windows testing issues! 😆 😞 |
Well, each of H5PL tests needs to be run in a subprocess. If we are less pedantic, these could be run independently of the first batch but without isolation; the failures would be a bit harder to localize, but pragmatically that might still be better compared to per-test sandboxing hackery. However I see a great surprise effect for the users when they discover multiple pytest invocations are needed :-/ These tests are making me sad; maybe remove them altogether? |
It there a way to ssh/Remote Desktop there? |
I'd say it's sufficient to run the whole group of them in a subprocess, rather than each one individually. I don't think it's too big a problem if they affect each other, but I'd rather they didn't cause cascading failures through the test suite. I feel like we're missing something, though. This can't be the only project that wants to isolate a group of tests that work by changing global state.
Looks like there is if you have access to the appveyor control panel: https://www.appveyor.com/docs/how-to/rdp-to-build-worker/ |
https://github.com/pytest-dev/pytest-forked is probably closest from what I googled so far. |
H5PLget(index, buf, n + 1) | ||
return PyBytes_FromStringAndSize(buf, n) | ||
finally: | ||
efree(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, here's another place in the codebase doing something similar:
Lines 849 to 856 in c585944
size = H5Pget_virtual_dsetname(self.id, index, NULL, 0) | |
name = <char*>emalloc(size+1) | |
try: | |
# TODO check return size | |
H5Pget_virtual_dsetname(self.id, index, name, <size_t>size+1) | |
src_dset_name = bytes(name).decode('utf-8') | |
finally: | |
efree(name) |
I suspect that the hardcoded decode-as-utf8 there is not quite right, but it looks like one can call bytes(buf)
to convert the buffer to Python bytes.
Yeah, pytest-forked is similar, but that's for when you want to isolate every test in your test suite, because any of them might segfault. And it doesn't work on Windows. I think there must be other projects which want to isolate just a few particular tests that are doing strange things. |
Well, I'm attending https://conf.python.ru/2019, so might be pointed to something worthy there ;-) |
It's not easy to implement, but I don't think it's all that hard either; it needs a bit of thinking about how to get the right data to the right place. Maybe other projects just have a similar bit of ugly infrastructure built into their test suites. I don't know where to look for it, though. Perhaps it's more niche than I thought. |
Hit by morning inspiration, I went for something completely different. Why bother with subprocess isolation if HDF5 can be re-initialized afresh with H5close+H5open combo? class TestSearchPaths(ut.TestCase):
def setUp(self):
h5.close()
# as per https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf,
# in case your HDF5 setup has it different
# (e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=826522)
os.environ['HDF5_PLUGIN_PATH'] = os.path.expandvars('%ALLUSERSPROFILE%/hdf5/lib/plugin') \
if platform.system() == 'Windows' else '/usr/local/hdf5/lib/plugin'
h5.open()
def tearDown(self):
h5.close()
h5.open() Exposing Lines 211 to 247 in c585944
Lines 1402 to 1408 in c585944
Even importlib.reload(h5py.h5t) is not enough, due to cython/cython#2659.The following code allowed me to pass all tests: def tearDown(self):
h5.close()
h5.open()
h5py.h5t.init() # new function
importlib.reload(h5py._hl.base)
h5py._hl.base.dlapl = h5py._hl.base.default_lapl()
h5py._hl.base.dlcpl = h5py._hl.base.default_lcpl()
importlib.reload(h5py) but I'm now not sure if it's as appealing as perceived initially :-) Do we want to make targeted changes to allow re-initialization of |
That doesn't sound like a promising approach, unfortunately. I don't think tests should rely on a global 'reset button' that isn't used explicitly in normal code. It's easy for oversights to mean the reset button doesn't reset everything, and there may be corner cases when it messes up debugging. The guaranteed way to get a clean, isolated process environment is to start a new process. Let's wait and see what the other maintainers (@tacaswell @aragilar) think of this. Maybe if it's only adding low-level API functions, it's OK for the tests not to run by default. Or perhaps they have some other idea about how to handle it. |
I have been able to set the path after importing h5py and I have been able to read the data. Thanks a lot! Is it intentional that I have to specify the path as a byte string? It is a bit confusing that when using HDF5_PLUGIN_PATH I can use a string and that when using the function I have to use bytes. It took me some time to figure out why append was not working. I had to get rid of the non existing default directory or to use insert(path, 0) that was also working. Tested on windows, python 3.6, hdf5 1.10.5 as supplied by the HDF Group. import os
import h5py
# cleanup directories that might not exist
# other way append does not work and I have to use insert (path, 0)
# that works
delete = []
for i in range(h5py.h5pl.size()):
dirname = h5py.h5pl.get(i)
if not os.path.exists(dirname):
print("directory <%s> to be deleted" % dirname)
delete.append(i)
delete.reverse()
for i in delete:
h5py.h5pl.remove(i)
fname = "lz4.h5"
assert(os.path.exists(fname))
for attempt in range(2):
# first try to read without filter
if attempt:
h5py.h5pl.append(b"C:\\GIT\\hdf5plugin\\hdf5plugin\\VS2015\\x64")
#h5py.h5pl.insert(b"C:\\GIT\\hdf5plugin\\hdf5plugin\\VS2015\\x64", 0)
for i in range(h5py.h5pl.size()):
print("i = ", i, "path = ", h5py.h5pl.get(i))
try:
h5 = h5py.File(fname, "r")
data = h5["/entry/data"][:]
h5.close()
expected_shape = (50, 2167, 2070)
assert(data.shape[0] == 50)
assert(data.shape[1] == 2167)
assert(data.shape[2] == 2070)
assert(data[21, 1911, 1549] == 3141)
if attempt:
print("SUCCESS!")
else:
print("Read without need of filter. Bad test")
except:
if attempt == 0:
print("Error expected. Great")
else:
print("Error NOT expected :(") |
@vasole Thanks for the review! |
Ping @tacaswell @aragilar - I wanted your ideas about testing this before we merge it. To summarise the discussion: it's tricky to test nicely, because the desired effect of these functions is global changes in the HDF5 library. Unless you can be sure you're undoing global changes properly, they can cause other tests to behave strangely, which is horrendous to debug. So I'd like to run these tests in their own subprocess so they don't interfere. But we haven't found a good way to do that. So, do we:
|
A variant of the first option might be a good middle ground: skip the plugin tests in the main test run, and have a separate test run for just the plugin tests (using something like https://docs.pytest.org/en/latest/example/simple.html#control-skipping-of-tests-according-to-command-line-option and an addition text env in tox). This won't stop the plugin tests interfering with each other, but at least it will localise the problem to the plugin system. |
It is a little hard to believe that we are still using unittest here |
Also, I have added an |
I've rebased and continued this in #1256 |
This was merged as #1256. |
Implementation of #928.
Please review. MutableSequence high-level interface can in principle be implemented, but do we really need it for such a niche functionality?