Fix CMake not adding some tests #104

RaulPPelaez · 2023-05-25T10:16:57Z

Solves #98

RaulPPelaez · 2023-05-25T10:53:43Z

Now the CI is picking up all tests.
@mikemhenry I suspect if you try the GPU runner the OpenCL tests will fail, could you try?

mikemhenry · 2023-05-25T23:23:59Z

Will do!

mikemhenry · 2023-05-25T23:29:14Z

Running here: https://github.com/openmm/NNPOps/actions/runs/5085521894

mikemhenry · 2023-05-26T00:16:44Z

@RaulPPelaez good call with that 20min time out, it got stuck in a loop https://github.com/openmm/NNPOps/actions/runs/5085521894/jobs/9139106717

RaulPPelaez · 2023-05-26T06:34:36Z

It does not seem stuck in a loop to me, it seems like it just takes too long. The waterbox test does take a long time in the CPU. See one of the CPU CI's https://github.com/openmm/NNPOps/actions/runs/5078935616/jobs/9124055531?pr=104
It takes 24 minutes (mainly installing CUDA I guess?), the one that does not install CUDA is 10 mins.
Could it be that the CPU in the AWS runner is too slow?
Maybe the GPU runner could filter and run just GPU tests? EDIT: Not sure how to achieve this.

RaulPPelaez · 2023-05-26T08:12:23Z

Also, no idea where this error comes from and why its only there in the GPU runner:

8: deviceString = 'cpu', molFile = '1hvj'
8: 
8:     @pytest.mark.parametrize('deviceString', ['cpu', 'cuda'])
8:     @pytest.mark.parametrize('molFile', ['1hvj', '1hvk', '2iuz', '3hkw', '3hky', '3lka', '3o99'])
8:     def test_model_serialization(deviceString, molFile):
8:     
8:         if deviceString == 'cuda' and not torch.cuda.is_available():
8:             pytest.skip('CUDA is not available')
8:     
8:         from NNPOps.EnergyShifter import TorchANIEnergyShifter
8:     
8:         device = torch.device(deviceString)
8:     
8: >       mol = mdtraj.load(os.path.join(molecules, f'{molFile}_ligand.mol2'))
8: 
8: test/TestEnergyShifter.py:80: 
8: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
8: ../3/envs/nnpops/lib/python3.10/site-packages/mdtraj/core/trajectory.py:396: in load
8:     kwargs["top"] = _parse_topology(kwargs.get("top", filename_or_filenames[0]), **topkwargs)
8: ../3/envs/nnpops/lib/python3.10/site-packages/mdtraj/core/trajectory.py:181: in _parse_topology
8:     topology = load_mol2(top, **kwargs).topology
8: ../3/envs/nnpops/lib/python3.10/site-packages/mdtraj/formats/mol2.py:91: in load_mol2
8:     atoms, bonds = mol2_to_dataframes(filename)
8: ../3/envs/nnpops/lib/python3.10/site-packages/mdtraj/formats/mol2.py:200: in mol2_to_dataframes
8:     data = dict((key, list(grp)) for key, grp in itertools.groupby(f, _parse_mol2_sections))
8: ../3/envs/nnpops/lib/python3.10/site-packages/mdtraj/formats/mol2.py:200: in <genexpr>
8:     data = dict((key, list(grp)) for key, grp in itertools.groupby(f, _parse_mol2_sections))
8: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
8: 
8: self = <encodings.ascii.IncrementalDecoder object at 0x7fb6956f8520>
8: input = b'@<TRIPOS>MOLECULE\n    A78\n  115   118     0     0     0\nSMALL\nUSER_CHARGES\n\n\n@<TRIPOS>ATOM\n      1 C1       ....8968   -1.7691   16.1457 hc         1 A78      0.039867\n    104 H47         9.7154   -5.7103   17.7810 hc         1 '
8: final = False
8: 
8:     def decode(self, input, final=False):
8: >       return codecs.ascii_decode(input, self.errors)[0]
8: E       UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
8: 
8: ../3/envs/nnpops/lib/python3.10/encodings/ascii.py:26: UnicodeDecodeError

mikemhenry · 2023-05-26T18:20:50Z

I've done this before with pytest + decorators. I'll first try bumping up the time-out first. I thought it was a loop since I wasn't looking closely and thought it kept running the same pytest tests, but now I see they are separate invocations.

RE UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)

It does seem to be consistent. It shows up only for test_model_serialization[1hvj-cpu] & test_model_serialization[1hvj-cuda] in TestBatchedNN.py and TestEnergyShifter.py. I will see if it shows up when I bump the timeout time.

I will also look at cuda versions of the test since that will save time and money, but I want to get everything passing first.

Thank you for your patience on this!

mikemhenry · 2023-05-26T19:49:31Z

 11: FAILED test/TestSymmetryFunctions.py::test_model_serialization[1hvj-cpu] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
11: FAILED test/TestSymmetryFunctions.py::test_model_serialization[1hvj-cuda] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
11: FAILED test/TestSymmetryFunctions.py::test_non_default_stream[1hvj] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
 10: FAILED test/TestSpeciesConverter.py::test_model_serialization[1hvj-cpu] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
10: FAILED test/TestSpeciesConverter.py::test_model_serialization[1hvj-cuda] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
8: FAILED test/TestEnergyShifter.py::test_model_serialization[1hvj-cpu] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
8: FAILED test/TestEnergyShifter.py::test_model_serialization[1hvj-cuda] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
 5: FAILED test/TestBatchedNN.py::test_model_serialization[1hvj-cpu] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)
5: FAILED test/TestBatchedNN.py::test_model_serialization[1hvj-cuda] - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 217: ordinal not in range(128)

looks like it is all the same file... maybe there is something weird with it? I'll see if I can reproduce on aws

mikemhenry · 2023-05-26T22:59:23Z

There is something weird with the file

(nnpops) [ec2-user@ip-10-0-142-194 build]$ file ../src/pytorch/molecules/1hvk_ligand.mol2
../src/pytorch/molecules/1hvk_ligand.mol2: ASCII text
(nnpops) [ec2-user@ip-10-0-142-194 build]$ file ../src/pytorch/molecules/1hvj_ligand.mol2
../src/pytorch/molecules/1hvj_ligand.mol2: data

mikemhenry · 2023-05-26T23:03:18Z

Viewing the file, it does have junk in it:

Looks weird on github too:

@RaulPPelaez @raimis
Where did this come from?

RaulPPelaez · 2023-05-29T08:03:56Z

Thank you for your patience on this!

What are you saying! Thank you, man!

Where did this come from?

I have no idea.
Looking around the mol2 spec I see no reason why non-ascii characters should be there. Seems like some formatting error to me.
I tr -cd '\11\12\15\40-\176''d the file, lets try now.
What really surprises me is that thus far that test just happily ate the non-ascii characters.

RaulPPelaez · 2023-05-29T09:13:36Z

There are other strange chars in that file:

NNPOps/src/pytorch/molecules/1hvj_ligand.mol2

Line 20 in 054d487

12 N22 3.1777 2.0550 10.9450 z| 1 A78 -0.557900

Seems to me like that | should not be there. Maybe @raimis can tell us more...

sef43 · 2023-05-29T10:22:11Z

What really surprises me is that thus far that test just happily ate the non-ascii characters.

Ah i have seen this error before but it went away with a different MDTraj version so I didn’t investigate further

mikemhenry · 2023-05-30T19:02:46Z

I haven't really looked into how the tests are using this file, so perhaps the z| atom type doesn't matter and since this is an Amazon Linux image, it is possible the default locale (which controls encoding) is something like C instead of utf-8 which is why it barfs here and not on a modern linux desktop os. I can probably hack around it (well setting uft-8 as a locale isn't a hack) but I think it would be better for the file to get fixed.

mikemhenry · 2023-07-05T23:09:39Z

testing here: https://github.com/openmm/NNPOps/actions/runs/5469785425

RaulPPelaez · 2023-07-07T08:46:02Z

Seems like the issue with the non-ascii files was fixed. All tests pass inyour GPU runner!

mikemhenry · 2023-07-07T15:51:49Z

So this should be good to merge in, any objections?

mikemhenry · 2023-07-07T15:52:08Z

Requested some reviewers

CMakeLists.txt

RaulPPelaez · 2023-07-07T16:33:41Z

One of the tests fails to install CUDA with "no space left on device", it happens sometimes and I do not think we can do anything about it. I will trigger a rerun

RaulPPelaez · 2023-07-07T17:25:45Z

Success! Please review again @peastman

peastman · 2023-07-07T17:32:37Z

Looks good as far as I can tell.

mikemhenry · 2023-07-07T18:50:31Z

Sweet, I will re-run the GPU tests to make sure we are good and if they pass I will get this merged in!

mikemhenry · 2023-07-07T18:53:05Z

Testing here: https://github.com/openmm/NNPOps/actions/runs/5489633893

RaulPPelaez · 2023-07-24T08:03:36Z

GPU tests passed too. Lets merge!
cc @raimis @peastman

Fix CMake not adding some tests

fb90a19

RaulPPelaez mentioned this pull request May 25, 2023

Many tests are being skipped by the CI #98

Closed

Forgot to copy one script

cd7cf53

Remove non-ascii characters in a mol2 file

b9009fe

Merge remote-tracking branch 'origin/master' into fix_cmake

4d1f174

mikemhenry requested review from raimis and peastman July 7, 2023 15:51

peastman reviewed Jul 7, 2023

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

Add PME tests in CMakeLists.txt

bc4d83b

RaulPPelaez added 2 commits July 7, 2023 18:34

Trigger a rerun

66c3f60

Trigger a rerun

2601eac

raimis approved these changes Jul 24, 2023

View reviewed changes

raimis merged commit 5e2438d into openmm:master Jul 24, 2023
4 checks passed

raimis mentioned this pull request Jul 24, 2023

NNPOps 0.6 #107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CMake not adding some tests #104

Fix CMake not adding some tests #104

RaulPPelaez commented May 25, 2023

RaulPPelaez commented May 25, 2023

mikemhenry commented May 25, 2023

mikemhenry commented May 25, 2023

mikemhenry commented May 26, 2023

RaulPPelaez commented May 26, 2023 •

edited

Loading

RaulPPelaez commented May 26, 2023

mikemhenry commented May 26, 2023

mikemhenry commented May 26, 2023

mikemhenry commented May 26, 2023

mikemhenry commented May 26, 2023

RaulPPelaez commented May 29, 2023

RaulPPelaez commented May 29, 2023

sef43 commented May 29, 2023

mikemhenry commented May 30, 2023

mikemhenry commented Jul 5, 2023

RaulPPelaez commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

RaulPPelaez commented Jul 7, 2023

RaulPPelaez commented Jul 7, 2023

peastman commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

RaulPPelaez commented Jul 24, 2023

Fix CMake not adding some tests #104

Fix CMake not adding some tests #104

Conversation

RaulPPelaez commented May 25, 2023

RaulPPelaez commented May 25, 2023

mikemhenry commented May 25, 2023

mikemhenry commented May 25, 2023

mikemhenry commented May 26, 2023

RaulPPelaez commented May 26, 2023 • edited Loading

RaulPPelaez commented May 26, 2023

mikemhenry commented May 26, 2023

mikemhenry commented May 26, 2023

mikemhenry commented May 26, 2023

mikemhenry commented May 26, 2023

RaulPPelaez commented May 29, 2023

RaulPPelaez commented May 29, 2023

sef43 commented May 29, 2023

mikemhenry commented May 30, 2023

mikemhenry commented Jul 5, 2023

RaulPPelaez commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

RaulPPelaez commented Jul 7, 2023

RaulPPelaez commented Jul 7, 2023

peastman commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

mikemhenry commented Jul 7, 2023

RaulPPelaez commented Jul 24, 2023

RaulPPelaez commented May 26, 2023 •

edited

Loading