Only extract a specified namelist #39

jacobwilliams · 2016-11-23T03:03:36Z

Idea... Say there was a file with multiple namelists, and you only wanted to read a specified one. Maybe this could be an option, where you specify the one you want, rather than having to parse the whole file. (related to #30 if the file is very large and the parsing is a bottleneck if you only want certain info from the file).

marshallward · 2016-11-23T03:19:08Z

Yes, a good idea I think, but I wonder how to do it in an effective way. Most of the time seems to be spent parsing and constructing the tokens (via shlex) and you would still need to sort through all the tokens prior to the specified namelist, even if only to determine where each namelist begins and ends.

It would let you exit immediately after reading the namelist, rather than going through the whole file, which might help in some cases.

A more intelligent tokenizer (#30) might be a way forward here. Or maybe it's time to just dump the entire namelist (or file) into memory and dice it up into pieces. (Maybe I should have done that from the beginning...)

jacobwilliams · 2016-11-23T15:42:41Z

Anything to speed things up has my support. Unfortunately it's only moral support right now. :)

jacobwilliams · 2019-03-31T04:51:58Z

FYI: I have a simple experiment related to this here. I was testing splitting up a file of multiple namelists into chunks, reading them separately with multiprocessing, and then stitching the results back together at the end. Even with only one thread, it's still faster than a default read.

Also related to #30.

marshallward · 2019-04-01T23:00:56Z

Thanks, useful info! Splitting the namelists would generally be more difficult, but it shows there's value in splitting up the work. At the least, splitting the namelist into groups before parsing them individually is probably a better approach.

I think that I do something like this in the new parser (which has lagged unfortunately) but will make it a priority when I get back to it.

BTW I'm in the process of relocating my family to a new job overseas, so no idea when I'll get time to think about this.

jacobwilliams · 2019-06-09T15:51:20Z

FYI: I noticed something else. If the keys contain array notation, the parsing is dramatically slower. See the examples here.

'files/test.nml'     # 112 namelists -- short keys [8 sec]
'files/test4b.nml'   # 112 namelists -- longer keys no arrays  [9 sec]
'files/test4c.nml'   # 112 namelists -- longer keys w/ types  [12 sec]
'files/test4.nml'    # 112 namelists -- longer keys w/ array [42 sec]

This is killing me since all my namelists have many arrays. :)

marshallward added the enhancement label Nov 23, 2016

marshallward mentioned this issue Jul 7, 2017

Optimizing performance #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only extract a specified namelist #39

Only extract a specified namelist #39

jacobwilliams commented Nov 23, 2016

marshallward commented Nov 23, 2016

jacobwilliams commented Nov 23, 2016

jacobwilliams commented Mar 31, 2019

marshallward commented Apr 1, 2019

jacobwilliams commented Jun 9, 2019

Only extract a specified namelist #39

Only extract a specified namelist #39

Comments

jacobwilliams commented Nov 23, 2016

marshallward commented Nov 23, 2016

jacobwilliams commented Nov 23, 2016

jacobwilliams commented Mar 31, 2019

marshallward commented Apr 1, 2019

jacobwilliams commented Jun 9, 2019