New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing performance #30
Comments
I did some profiling a while back. I don't have access to the logs, but I recall that almost all of the time was due to the At this point (particularly after putting in As for working in parallel, I don't think Python makes this sort of thing easy. If people do want to process in parallel, probably the best thing to is use |
I would definitely support this. It would be nice if |
There is a new branch (tokenizer) using the Fortran-specific tokenizer from the flint project. It appears to be running about 10-25% faster, depending on the namelist. It's also passing all of my tests, but I could be missing some cases. I had higher hopes for this, but I think it's a step forward, if only because I may be able to make further improvements to this new tokenizer and speed things up. A lot of the logic in Also, given that this tokenizer is a lot smarter, it might be possible to efficiently skip namelists as suggested in #39. |
Latest push (4497c15) has improved the whitespace/comment token check, which was taking up 25% of |
There is a crash if the file uses Windows (CRLF) line breaks. Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "f90nml/__init__.py", line 53, in read
return parser.read(nml_path)
File "f90nml/parser.py", line 107, in read
return self.readstream(nml_file, nml_patch)
File "f90nml/parser.py", line 123, in readstream
toks = tokenizer.parse(line)
File "f90nml/tokenizer.py", line 94, in parse
raise ValueError |
I'm having trouble reproducing this. I did But logically I can sort of understand how it could happen, since there are some explicit Can you send me your example? Or does it always happen for you? |
Here is one: |
gah... that one works fine for me! |
I just noticed I was using Python 2.7. When I switch to 3.5 it works. (I'm using a Mac BTW). |
Just tried on 2.7.13, 2.7.11 and 2.6.6 and they all worked for me (in Linux 4.x), so it must be something else. I'll try out on a Mac environment too. |
Nevermind... wasn't using the right branch on the other platform. I can reproduce this in 2.7.11 and 2.6.6 |
This appears to be working for me now. |
Agreed. Thanks!! |
oops...didn't mean to close issue...if you have more to do on this? |
Yeah might as well leave it open.. I don't feel like a 1.4x speedup is really much to get excited about. But replacing the tokenizer is a good step forward. Time to clean up the real code! Anyway I will probably merge this into master if everything looks good. |
I've streamlined the tokenizer a bit by removing some cases which cannot happen in namelists, and have rewritten the name tokenizer to search ahead rather than per-character iteration. This seems to have lead to a net speedup of ~1.7x relative to the old shlex tokenizer. At this point, tokenization is now under 30% of the runtime, and a lot of performance issues can be blamed on clumsy token management. Again, nothing algorithmically impressive here, but clear that improvements are still on the table. |
I think that I will close this issue, if only because I no longer see a clear strategy for improvement. As far as I can tell, the core problem is that I iterate over each character, which unfortunately seems to have a fair bit of Python overhead. But short of doing everything in C, I don't know a much better way to do it. I think it's fair enough to say that the problem is per-byte iteration during tokenization and, to a lesser extent, per-token iteration during parsing, and reopen those issues if it becomes a recurring problem. |
I was wondering if there were possibilities for optimizing performance for very large namelist files. I'm in the process of doing some benchmarks for some large files (say around 1500 lines, with multiple namelists and pretty much all variable types). I can only get around 1.7 calls of
f90nml.read()
per second. I assume that it's the parsing and/or creating of the structures that is the bottleneck (actually reading in the lines should only take a fraction of a second), but I'm going to investigate further.One thing I was wondering if parsing of different namelists in the same file could take place in parallel? I don't know if such a thing is possible or not, but if it is that might be something to explore. Maybe this is something I can try and contribute to, rather than just reporting bugs and asking for features!
The text was updated successfully, but these errors were encountered: