Make Atoms() faster via caches #224

pmrv · 2021-06-02T05:43:33Z

Fixes #48.

Looking at benchmarks I found that almost all time creating a new Atoms object is spent on either i) reading the periodic table from a csv file and ii) loading mendeleev objects from element strings.

I think it's reasonable to assume that during the lifetime of a pyiron process neither the periodic table changes on disk nor is the mendeleev database updated and therefore cache both on the module level (i.e. even multiple instances of Atoms and PeriodicTable share one cache). In my benchmark that gives a factor 100 speed up.

Caching convert_element has some technical nits with it:

I want one cache for all Atoms objects, since
- each structure might be small, so a per-instance cache wouldn't help a lot (in fact Atoms has a self-rolled cache like that already; I think we can remove it now, but I want to double check with you)
- we can expect the results from mendeleev to be the same for all instances anyway
The expensive part is actually only this, so an alternative would be to make Atom some kind of singleton.
At the end of the method it also updates Atoms.species, in the cached case this would not be called. However all call sites of convert_element also call set_species later in the code path, so nothing is lost. It's still messy to rely on this, but I feel this is a separate refactor.

pyiron_atomistics/pyiron_atomistics/atomistics/structure/atoms.py

Line 661 in e69d6df

def convert_element(self, el, pse=None):

pmrv · 2021-06-02T09:16:43Z

Caching the method on Atoms didn't quite work, so I did the much less intrusive change to cache the call to mendeleev directly. This leaves a factor 2 on the table, but all the nits from above are handled by this.

When cached, change in the dataframe is preserved. Therefore, use separate if statements to ensure that the qwargs are stores (Fixes failing test)

sudarsan-surendralal

This looks good! I made a small commit to fix the failing tests for the Atom class.

Can you also add the benchmarking you have locally as part of the unittests for the Atoms class?

Edit: I had to make more than one commit to fix this ;)

This is done using `get_initial_magnetic_moments()`

pmrv · 2021-06-07T09:58:41Z

Thanks for the fix! I will look into how to best do a benchmark as a test case in a system-independent way, probably it's sufficient to time two runs and assert the second time is much faster.

I've attached the profile for this problem at the end here.

stale · 2021-06-21T11:59:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pmrv · 2021-07-13T13:16:24Z

@sudarsan-surendralal Can you have another look? I can't reproduce the test failure outside of the CI.

coveralls · 2021-07-14T07:53:19Z

Pull Request Test Coverage Report for Build 1056610704

10 of 10 (100.0%) changed or added relevant lines in 1 file are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+0.01%) to 68.077%

Files with Coverage Reduction	New Missed Lines	%
pyiron_atomistics/atomistics/structure/periodic_table.py	1	83.5%

Totals
Change from base Build 1056270290:	0.01%
Covered Lines:	10908
Relevant Lines:	16023

💛 - Coveralls

sudarsan-surendralal · 2021-07-14T08:55:45Z

@pmrv The tests should work now. I also took the liberty to extend your benchmarks a bit. We seem to consistently get a 15x speedup which is awesome. But I can't seem to get a 100x speedup you found in your benchmarks.

pmrv · 2021-07-14T12:48:31Z

@pmrv The tests should work now. I also took the liberty to extend your benchmarks a bit. We seem to consistently get a 15x speedup which is awesome. But I can't seem to get a 100x speedup you found in your benchmarks.

I think my benchmark last time was a more elaborate example creating the atoms via get_structure, so maybe there were extra factors that made it even slower. I'll look for it again to confirm the speed up, but 15x looks already very good.

pmrv · 2021-07-14T12:51:28Z

pyiron_atomistics/atomistics/structure/periodic_table.py

@@ -349,15 +353,12 @@ def add_element(
            self.dataframe = self.dataframe.append(parent_element_data_series)
        else:
            self.dataframe.loc[new_element] = parent_element_data_series
-        if len(qwargs) != 0:


This is what broke the tests before?

Yes but not only this. The recent commit b278013 also helped fix the tests

pmrv · 2021-07-15T08:46:03Z

The tests were failing on mac, so I decreased the expected speedup to 10. Otherwise this looks pretty good! I'll still want to understand what is different now to the original changes, but after that I think we can merge this.

niklassiemer · 2021-07-15T09:06:25Z

Is there something wrong in the way the speedup is computed (on mac)? In both last cases the threshold was only slightly missed...

pmrv · 2021-07-15T10:12:39Z

Is there something wrong in the way the speedup is computed (on mac)? In both last cases the threshold was only slightly missed...

Weird, I changed it to 10, because it ~14 before… :S

niklassiemer · 2021-07-15T10:48:54Z

Is there something wrong in the way the speedup is computed (on mac)? In both last cases the threshold was only slightly missed...

Weird, I changed it to 10, because it ~14 before… :S

That is what I meant with both last cases. And in both cases it was only slightly below. Therefore, I asked if there is an error on computing the time difference on Mac 😆

pmrv · 2021-07-16T10:25:14Z

Is there something wrong in the way the speedup is computed (on mac)? In both last cases the threshold was only slightly missed...

Weird, I changed it to 10, because it ~14 before… :S

That is what I meant with both last cases. And in both cases it was only slightly below. Therefore, I asked if there is an error on computing the time difference on Mac laughing

Yep, I see that now. I'll try to figure it out.

pmrv · 2021-07-22T13:37:26Z

Suddenly the speed up test worked for a factor x10 on macOS. I still don't know what's going on, but I've made it hopefully more robust by averaging the times and then checking instead of checking in every iteration.

pmrv · 2021-07-22T14:05:36Z

On my local machine speed ups are in the range x30-40 now, hope that works on CI as well.

Since the timings are sub-seconds, using ints breaks ;)

pmrv · 2021-07-22T15:01:40Z

The merge commit earlier messed up the tests, so I removed it again via a rebase.

sudarsan-surendralal

Taking the average of the speedup is definitely a better idea. LGTM!

pmrv force-pushed the get_structure branch from b6ae806 to ae0218e Compare June 2, 2021 09:30

pmrv marked this pull request as ready for review June 2, 2021 09:31

pmrv requested a review from sudarsan-surendralal June 2, 2021 09:31

pmrv added the enhancement New feature or request label Jun 2, 2021

pmrv added 2 commits June 2, 2021 11:31

Cache periodic table conversion

6a649ee

Cache mendeleev function instead of ChemicalElement

ae0218e

pmrv marked this pull request as draft June 2, 2021 15:42

Adding tags even when tags is defined

aea41a5

When cached, change in the dataframe is preserved. Therefore, use separate if statements to ensure that the qwargs are stores (Fixes failing test)

sudarsan-surendralal suggested changes Jun 6, 2021

View reviewed changes

sudarsan-surendralal added 4 commits June 7, 2021 00:12

🐛 🔥 Remove redundant code in add_element

568ec3d

🐛 Check length of qwargs rather than if it's None

518715f

testing user defined tags

0d6ce32

This is done using `get_initial_magnetic_moments()`

Merge remote-tracking branch 'origin/master' into get_structure

fba7629

stale bot added the stale label Jun 21, 2021

stale bot closed this Jul 5, 2021

pmrv reopened this Jul 5, 2021

stale bot removed the stale label Jul 5, 2021

pmrv mentioned this pull request Jul 8, 2021

Use HasStructure for _get_neighbors instead of re-creating structures… #276

Merged

pmrv marked this pull request as ready for review July 13, 2021 11:38

pmrv and others added 4 commits July 13, 2021 15:35

Add simple timing test

49f3b33

Undo caching (trying to identify source of failing CI tests)

37a10f5

Undo change

597ef1e

Remove unclear code (debugging)

b278013

sudarsan-surendralal added 2 commits July 14, 2021 10:46

🔥 Remove code that doesn't make much sense!

9979c6f

Extending Marvin's benchmarks!

28af038

pmrv commented Jul 14, 2021

View reviewed changes

pmrv force-pushed the get_structure branch from c460c2f to 28af038 Compare July 22, 2021 13:24

Average times for benchmark test

f1486d0

Fix type of numpy arrays

7b874fd

Since the timings are sub-seconds, using ints breaks ;)

pmrv force-pushed the get_structure branch from e3c9610 to 7b874fd Compare July 22, 2021 15:00

Merge branch 'master' into get_structure

50725da

sudarsan-surendralal approved these changes Jul 23, 2021

View reviewed changes

pmrv merged commit bdf72b6 into master Jul 23, 2021

delete-merged-branch bot deleted the get_structure branch July 23, 2021 07:22

This was referenced Jul 31, 2021

Debug failing windows tests #302

Closed

Revert "Make Atoms() faster via caches" #305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Atoms() faster via caches #224

Make Atoms() faster via caches #224

pmrv commented Jun 2, 2021

pmrv commented Jun 2, 2021

sudarsan-surendralal left a comment •

edited

Loading

pmrv commented Jun 7, 2021 •

edited

Loading

stale bot commented Jun 21, 2021

pmrv commented Jul 13, 2021

coveralls commented Jul 14, 2021 •

edited

Loading

sudarsan-surendralal commented Jul 14, 2021

pmrv commented Jul 14, 2021

pmrv Jul 14, 2021

sudarsan-surendralal Jul 14, 2021

pmrv commented Jul 15, 2021

niklassiemer commented Jul 15, 2021

pmrv commented Jul 15, 2021

niklassiemer commented Jul 15, 2021

pmrv commented Jul 16, 2021

pmrv commented Jul 22, 2021

pmrv commented Jul 22, 2021

pmrv commented Jul 22, 2021

sudarsan-surendralal left a comment

Make Atoms() faster via caches #224

Make Atoms() faster via caches #224

Conversation

pmrv commented Jun 2, 2021

pmrv commented Jun 2, 2021

sudarsan-surendralal left a comment • edited Loading

Choose a reason for hiding this comment

pmrv commented Jun 7, 2021 • edited Loading

stale bot commented Jun 21, 2021

pmrv commented Jul 13, 2021

coveralls commented Jul 14, 2021 • edited Loading

Pull Request Test Coverage Report for Build 1056610704

💛 - Coveralls

sudarsan-surendralal commented Jul 14, 2021

pmrv commented Jul 14, 2021

pmrv Jul 14, 2021

Choose a reason for hiding this comment

sudarsan-surendralal Jul 14, 2021

Choose a reason for hiding this comment

pmrv commented Jul 15, 2021

niklassiemer commented Jul 15, 2021

pmrv commented Jul 15, 2021

niklassiemer commented Jul 15, 2021

pmrv commented Jul 16, 2021

pmrv commented Jul 22, 2021

pmrv commented Jul 22, 2021

pmrv commented Jul 22, 2021

sudarsan-surendralal left a comment

Choose a reason for hiding this comment

sudarsan-surendralal left a comment •

edited

Loading

pmrv commented Jun 7, 2021 •

edited

Loading

coveralls commented Jul 14, 2021 •

edited

Loading