Optimizing performance #30

jacobwilliams · 2016-03-25T16:16:40Z

I was wondering if there were possibilities for optimizing performance for very large namelist files. I'm in the process of doing some benchmarks for some large files (say around 1500 lines, with multiple namelists and pretty much all variable types). I can only get around 1.7 calls of f90nml.read() per second. I assume that it's the parsing and/or creating of the structures that is the bottleneck (actually reading in the lines should only take a fraction of a second), but I'm going to investigate further.

One thing I was wondering if parsing of different namelists in the same file could take place in parallel? I don't know if such a thing is possible or not, but if it is that might be something to explore. Maybe this is something I can try and contribute to, rather than just reporting bugs and asking for features!

The text was updated successfully, but these errors were encountered:

marshallward · 2016-03-25T21:39:26Z

I did some profiling a while back. I don't have access to the logs, but I recall that almost all of the time was due to the shlex implementation. The f90nml code itself was not much of a problem, aside from relying on shlex! I should go back and confirm all of this though.

At this point (particularly after putting in patch), the only thing I really use shlex for is for tokenizing strings and (optionally) stripping comments. It does handle some of the uglier cases, such as comment tokens inside of strings, that I wasn't savvy enough to resolve with regex. But maybe it's time to write a new, simpler tokenizer?

As for working in parallel, I don't think Python makes this sort of thing easy. If people do want to process in parallel, probably the best thing to is use multiprocessing or some other forking module.

jacobwilliams · 2016-04-15T14:57:02Z

I would definitely support this. It would be nice if f90nml was blazing fast! I guess shlex has a huge overhead?

marshallward · 2017-07-07T14:14:42Z

There is a new branch (tokenizer) using the Fortran-specific tokenizer from the flint project. It appears to be running about 10-25% faster, depending on the namelist.

It's also passing all of my tests, but I could be missing some cases.

I had higher hopes for this, but I think it's a step forward, if only because I may be able to make further improvements to this new tokenizer and speed things up. A lot of the logic in f90nml was also set up to accommodate the quirks of shlex, and there's probably plenty of opportunity to clean that up or shift the work into the tokenizer.

Also, given that this tokenizer is a lot smarter, it might be possible to efficiently skip namelists as suggested in #39.

marshallward · 2017-07-08T11:20:24Z

Latest push (4497c15) has improved the whitespace/comment token check, which was taking up 25% of update_tokens() runtime. Net speedup is now 1.4x!

jacobwilliams · 2017-07-08T18:24:47Z

There is a crash if the file uses Windows (CRLF) line breaks.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "f90nml/__init__.py", line 53, in read
    return parser.read(nml_path)
  File "f90nml/parser.py", line 107, in read
    return self.readstream(nml_file, nml_patch)
  File "f90nml/parser.py", line 123, in readstream
    toks = tokenizer.parse(line)
  File "f90nml/tokenizer.py", line 94, in parse
    raise ValueError

marshallward · 2017-07-09T07:21:45Z

I'm having trouble reproducing this. I did unix2dos on all of the namelists in the test directory but could not see the problem.

But logically I can sort of understand how it could happen, since there are some explicit \n checks in the tokenizer.

Can you send me your example? Or does it always happen for you?

jacobwilliams · 2017-07-09T19:11:37Z

Here is one:
test.nml.zip

marshallward · 2017-07-10T00:24:37Z

gah... that one works fine for me!
I mean.. it's probably as easy as doing something like if char == '\r': continue but I have no way to verify this. Maybe I need to find a windows machine to test this out.

jacobwilliams · 2017-07-10T01:08:33Z

I just noticed I was using Python 2.7. When I switch to 3.5 it works. (I'm using a Mac BTW).

marshallward · 2017-07-10T01:19:34Z

Just tried on 2.7.13, 2.7.11 and 2.6.6 and they all worked for me (in Linux 4.x), so it must be something else. I'll try out on a Mac environment too.

marshallward · 2017-07-10T01:42:42Z

Nevermind... wasn't using the right branch on the other platform. I can reproduce this in 2.7.11 and 2.6.6 ~~(but not 2.7.13)~~. I will fix this today.

marshallward · 2017-07-10T02:16:28Z

This appears to be working for me now.

jacobwilliams · 2017-07-10T02:22:16Z

Agreed. Thanks!!

jacobwilliams · 2017-07-10T02:23:31Z

oops...didn't mean to close issue...if you have more to do on this?

marshallward · 2017-07-10T02:25:06Z

Yeah might as well leave it open.. I don't feel like a 1.4x speedup is really much to get excited about. But replacing the tokenizer is a good step forward. Time to clean up the real code!

Anyway I will probably merge this into master if everything looks good.

marshallward · 2017-07-12T12:00:00Z

I've streamlined the tokenizer a bit by removing some cases which cannot happen in namelists, and have rewritten the name tokenizer to search ahead rather than per-character iteration. This seems to have lead to a net speedup of ~1.7x relative to the old shlex tokenizer.

At this point, tokenization is now under 30% of the runtime, and a lot of performance issues can be blamed on clumsy token management.

Again, nothing algorithmically impressive here, but clear that improvements are still on the table.

marshallward · 2017-09-28T01:27:00Z

I think that I will close this issue, if only because I no longer see a clear strategy for improvement. As far as I can tell, the core problem is that I iterate over each character, which unfortunately seems to have a fair bit of Python overhead. But short of doing everything in C, I don't know a much better way to do it.

I think it's fair enough to say that the problem is per-byte iteration during tokenization and, to a lesser extent, per-token iteration during parsing, and reopen those issues if it becomes a recurring problem.

marshallward added the enhancement label Jul 22, 2016

jacobwilliams mentioned this issue Nov 23, 2016

Only extract a specified namelist #39

Open

jacobwilliams closed this as completed Jul 10, 2017

jacobwilliams reopened this Jul 10, 2017

marshallward closed this as completed Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing performance #30

Optimizing performance #30

jacobwilliams commented Mar 25, 2016

marshallward commented Mar 25, 2016

jacobwilliams commented Apr 15, 2016

marshallward commented Jul 7, 2017 •

edited

marshallward commented Jul 8, 2017

jacobwilliams commented Jul 8, 2017

marshallward commented Jul 9, 2017

jacobwilliams commented Jul 9, 2017

marshallward commented Jul 10, 2017

jacobwilliams commented Jul 10, 2017

marshallward commented Jul 10, 2017 •

edited

marshallward commented Jul 10, 2017 •

edited

marshallward commented Jul 10, 2017

jacobwilliams commented Jul 10, 2017

jacobwilliams commented Jul 10, 2017

marshallward commented Jul 10, 2017

marshallward commented Jul 12, 2017

marshallward commented Sep 28, 2017

Optimizing performance #30

Optimizing performance #30

Comments

jacobwilliams commented Mar 25, 2016

marshallward commented Mar 25, 2016

jacobwilliams commented Apr 15, 2016

marshallward commented Jul 7, 2017 • edited

marshallward commented Jul 8, 2017

jacobwilliams commented Jul 8, 2017

marshallward commented Jul 9, 2017

jacobwilliams commented Jul 9, 2017

marshallward commented Jul 10, 2017

jacobwilliams commented Jul 10, 2017

marshallward commented Jul 10, 2017 • edited

marshallward commented Jul 10, 2017 • edited

marshallward commented Jul 10, 2017

jacobwilliams commented Jul 10, 2017

jacobwilliams commented Jul 10, 2017

marshallward commented Jul 10, 2017

marshallward commented Jul 12, 2017

marshallward commented Sep 28, 2017

marshallward commented Jul 7, 2017 •

edited

marshallward commented Jul 10, 2017 •

edited

marshallward commented Jul 10, 2017 •

edited