Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only extract a specified namelist #39

Open
jacobwilliams opened this issue Nov 23, 2016 · 5 comments
Open

Only extract a specified namelist #39

jacobwilliams opened this issue Nov 23, 2016 · 5 comments

Comments

@jacobwilliams
Copy link

Idea... Say there was a file with multiple namelists, and you only wanted to read a specified one. Maybe this could be an option, where you specify the one you want, rather than having to parse the whole file. (related to #30 if the file is very large and the parsing is a bottleneck if you only want certain info from the file).

@marshallward
Copy link
Owner

Yes, a good idea I think, but I wonder how to do it in an effective way. Most of the time seems to be spent parsing and constructing the tokens (via shlex) and you would still need to sort through all the tokens prior to the specified namelist, even if only to determine where each namelist begins and ends.

It would let you exit immediately after reading the namelist, rather than going through the whole file, which might help in some cases.

A more intelligent tokenizer (#30) might be a way forward here. Or maybe it's time to just dump the entire namelist (or file) into memory and dice it up into pieces. (Maybe I should have done that from the beginning...)

@jacobwilliams
Copy link
Author

Anything to speed things up has my support. Unfortunately it's only moral support right now. :)

@jacobwilliams
Copy link
Author

FYI: I have a simple experiment related to this here. I was testing splitting up a file of multiple namelists into chunks, reading them separately with multiprocessing, and then stitching the results back together at the end. Even with only one thread, it's still faster than a default read.

Also related to #30.

@marshallward
Copy link
Owner

Thanks, useful info! Splitting the namelists would generally be more difficult, but it shows there's value in splitting up the work. At the least, splitting the namelist into groups before parsing them individually is probably a better approach.

I think that I do something like this in the new parser (which has lagged unfortunately) but will make it a priority when I get back to it.

BTW I'm in the process of relocating my family to a new job overseas, so no idea when I'll get time to think about this.

@jacobwilliams
Copy link
Author

FYI: I noticed something else. If the keys contain array notation, the parsing is dramatically slower. See the examples here.

'files/test.nml'     # 112 namelists -- short keys [8 sec]
'files/test4b.nml'   # 112 namelists -- longer keys no arrays  [9 sec]
'files/test4c.nml'   # 112 namelists -- longer keys w/ types  [12 sec]
'files/test4.nml'    # 112 namelists -- longer keys w/ array [42 sec]

This is killing me since all my namelists have many arrays. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants