Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing performance #30

Closed
jacobwilliams opened this issue Mar 25, 2016 · 17 comments
Closed

Optimizing performance #30

jacobwilliams opened this issue Mar 25, 2016 · 17 comments

Comments

@jacobwilliams
Copy link

I was wondering if there were possibilities for optimizing performance for very large namelist files. I'm in the process of doing some benchmarks for some large files (say around 1500 lines, with multiple namelists and pretty much all variable types). I can only get around 1.7 calls of f90nml.read() per second. I assume that it's the parsing and/or creating of the structures that is the bottleneck (actually reading in the lines should only take a fraction of a second), but I'm going to investigate further.

One thing I was wondering if parsing of different namelists in the same file could take place in parallel? I don't know if such a thing is possible or not, but if it is that might be something to explore. Maybe this is something I can try and contribute to, rather than just reporting bugs and asking for features!

@marshallward
Copy link
Owner

I did some profiling a while back. I don't have access to the logs, but I recall that almost all of the time was due to the shlex implementation. The f90nml code itself was not much of a problem, aside from relying on shlex! I should go back and confirm all of this though.

At this point (particularly after putting in patch), the only thing I really use shlex for is for tokenizing strings and (optionally) stripping comments. It does handle some of the uglier cases, such as comment tokens inside of strings, that I wasn't savvy enough to resolve with regex. But maybe it's time to write a new, simpler tokenizer?

As for working in parallel, I don't think Python makes this sort of thing easy. If people do want to process in parallel, probably the best thing to is use multiprocessing or some other forking module.

@jacobwilliams
Copy link
Author

I would definitely support this. It would be nice if f90nml was blazing fast! I guess shlex has a huge overhead?

@marshallward
Copy link
Owner

marshallward commented Jul 7, 2017

There is a new branch (tokenizer) using the Fortran-specific tokenizer from the flint project. It appears to be running about 10-25% faster, depending on the namelist.

It's also passing all of my tests, but I could be missing some cases.

I had higher hopes for this, but I think it's a step forward, if only because I may be able to make further improvements to this new tokenizer and speed things up. A lot of the logic in f90nml was also set up to accommodate the quirks of shlex, and there's probably plenty of opportunity to clean that up or shift the work into the tokenizer.

Also, given that this tokenizer is a lot smarter, it might be possible to efficiently skip namelists as suggested in #39.

@marshallward
Copy link
Owner

Latest push (4497c15) has improved the whitespace/comment token check, which was taking up 25% of update_tokens() runtime. Net speedup is now 1.4x!

@jacobwilliams
Copy link
Author

There is a crash if the file uses Windows (CRLF) line breaks.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "f90nml/__init__.py", line 53, in read
    return parser.read(nml_path)
  File "f90nml/parser.py", line 107, in read
    return self.readstream(nml_file, nml_patch)
  File "f90nml/parser.py", line 123, in readstream
    toks = tokenizer.parse(line)
  File "f90nml/tokenizer.py", line 94, in parse
    raise ValueError

@marshallward
Copy link
Owner

I'm having trouble reproducing this. I did unix2dos on all of the namelists in the test directory but could not see the problem.

But logically I can sort of understand how it could happen, since there are some explicit \n checks in the tokenizer.

Can you send me your example? Or does it always happen for you?

@jacobwilliams
Copy link
Author

Here is one:
test.nml.zip

@marshallward
Copy link
Owner

gah... that one works fine for me!
I mean.. it's probably as easy as doing something like if char == '\r': continue but I have no way to verify this. Maybe I need to find a windows machine to test this out.

@jacobwilliams
Copy link
Author

I just noticed I was using Python 2.7. When I switch to 3.5 it works. (I'm using a Mac BTW).

@marshallward
Copy link
Owner

marshallward commented Jul 10, 2017

Just tried on 2.7.13, 2.7.11 and 2.6.6 and they all worked for me (in Linux 4.x), so it must be something else. I'll try out on a Mac environment too.

@marshallward
Copy link
Owner

marshallward commented Jul 10, 2017

Nevermind... wasn't using the right branch on the other platform. I can reproduce this in 2.7.11 and 2.6.6 (but not 2.7.13). I will fix this today.

@marshallward
Copy link
Owner

This appears to be working for me now.

@jacobwilliams
Copy link
Author

Agreed. Thanks!!

@jacobwilliams
Copy link
Author

oops...didn't mean to close issue...if you have more to do on this?

@jacobwilliams jacobwilliams reopened this Jul 10, 2017
@marshallward
Copy link
Owner

Yeah might as well leave it open.. I don't feel like a 1.4x speedup is really much to get excited about. But replacing the tokenizer is a good step forward. Time to clean up the real code!

Anyway I will probably merge this into master if everything looks good.

@marshallward
Copy link
Owner

I've streamlined the tokenizer a bit by removing some cases which cannot happen in namelists, and have rewritten the name tokenizer to search ahead rather than per-character iteration. This seems to have lead to a net speedup of ~1.7x relative to the old shlex tokenizer.

At this point, tokenization is now under 30% of the runtime, and a lot of performance issues can be blamed on clumsy token management.

Again, nothing algorithmically impressive here, but clear that improvements are still on the table.

@marshallward
Copy link
Owner

I think that I will close this issue, if only because I no longer see a clear strategy for improvement. As far as I can tell, the core problem is that I iterate over each character, which unfortunately seems to have a fair bit of Python overhead. But short of doing everything in C, I don't know a much better way to do it.

I think it's fair enough to say that the problem is per-byte iteration during tokenization and, to a lesser extent, per-token iteration during parsing, and reopen those issues if it becomes a recurring problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants