New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file arguments #26
Conversation
Some API questions are how to handle `force=` and the default output name for patching; I decided to ignore `force` for files. For the latter issue, I raise `ValueError` when `nml_fname` is a file since there is no sane default. (personally, if it was my library, I would rip out the default name functionality entirely, and make it so that when there is no output file or path, it will still build and return the patched Namelist *object* without writing a patched namelist *file*. But that's just me)
A big component of the patching is the preservation of the comments inside of the namelist file, which is lost in the conversion to If I recall, we parse and modify the file line by line (mostly) and immediately push the output to the target patched file. I'll give your patch a try today and get back to you, thanks for your help with this issue! |
Yeah, I have come to understand that over the course of writing the patch. My thought process behind eliminating the default name feature is the following: Suppose that somebody provides a patch without specifying where they want the patched file to be written. Then I can think of three categories: The most useful behavior for (c) would be the original behavior, but (c) requires both that the default name is documented, and that people actually read the documentation, so I find it unlikely. The most useful behavior for (a) would be to return a patched |
I think the interpretation of I wasn't able to reproduce the issue raised in #25 which required nested exception blocks. Do you have a test case which produces this error? I was able to just move the |
My modification is here: I haven't gone over the |
Oops, I mentioned that because I thought it might be related to the problem you had here and I didn't see the edit. But now that I look at it I see it causes no directly observable problems, because it is simply a resource leak. It can however be exposed with the following stress test:
This is because pfile is never explicitly closed in the following statement:
The leak is easier to observe in a garbage-collected implementation like Pypy. In CPython, files are automatically closed when refcount drops to zero, so |
Just to be clear, on the default name thing, is what I have currently okay for now, or are you looking for something closer to the original behavior in some way? The reason I currently throw |
I see the problem now with the file descriptors, thanks for the explanation. Moving the namelist Moving the namelist As for changing the default patch name output, I still need to think about this. Currently, the purpose of From what I can tell, you are (very justifiably) interpreting But parsing a file while preserving its whitespace and comments is a much more deliberate operation, and is the motivation for including If we do not select a target file, and only want the function output, then we are really just saying that we want to do a So for now, I'd prefer to either retain the Generalising everything to support file objects and character streams will require some of these ideas to be revisited, such as the appropriate output of As you mentioned before, there may be some confusion about the choice of file name, or some questions about documentation, but I think that needs to be a separate discussion. |
I just pushed a change to the file openings that seems to resolve some of the file descriptor problems without the nested exceptins (your script works at least). I can work through the conflicts if necessary, no need to push extra work on to you. I do think that the default filename issue needs to be discussed before merging though. |
Okay, thanks for the explanation, I think I see better where you are coming from about In retrospect, I think my thought process was colored by the fact that most of my short time on this project has been spent on On the other hand, somebody first visiting the project and seeing a method named |
I believe the new version would still have a resource leak in Pypy if opening the pfile fails. For instance:
This one can't be exposed in CPython by any sort of stress test because in this case, the issue is nml_file, which is a local variable not saved anywhere and therefore will get destructed as soon as the function exits. Basically, as I see it, the only way to do this kind of stuff 100% correctly is to always have |
I tested out your original example in Pypy (2 and 3, very recent versions) and wasn't able to reproduce the same issue, even after ramping up the number of parsers to 80k. (Also, I had missed your original comment about CPython. My original revision also showed file resource problems in CPython, though maybe for different reasons.) Do you have a similar case that shows the unopened files in Pypy? Otherwise I can try to monitor file descriptors during a run ( BTW this is what was changed. It looks OK to me but I know it's easy to miss these things. # Open file descriptors
nml_file = open(nml_fname, 'r')
if nml_patch_in and patch_fname:
self.pfile = open(patch_fname, 'w')
try:
return self.readstream(nml_file, nml_patch)
finally:
# Close the unfinished files on any exceptions within readstream
nml_file.close()
if self.pfile:
self.pfile.close() |
Nevermind, I am being silly. Yes, of course if opening I did just try several cpython and pypy tests and didn't detect anything, but I'm guessing that they are just clever enough to prevent resource consumption. |
Do you think moving the file opens inside the try:
# Open file descriptors
nml_file = open(nml_fname, 'r')
if nml_patch_in and patch_fname:
self.pfile = open(patch_fname, 'w')
return self.readstream(nml_file, nml_patch)
finally:
# Close the unfinished files on any exceptions within readstream
nml_file.close()
if self.pfile:
self.pfile.close() |
Nope... then |
Wait no.. the Actually now |
Haha, yeah, exceptions can be hard to reason about! This kind of stuff is why some modern languages like e.g. Rust and Go don't even have them. (note: don't quote me on that :P) Oftentimes I like to solve problems like this by writing a helper class. Here is a prototype of what I mean:
The idea being that we could then simply write something like this:
and know that the files will be correctly closed no matter how we exit the |
Sorry for going on about this, I just don't want to inadvertently break anything in the future! I'll just revert any changes that I made and merge the pull request in. But do you think you could revert the changes on Also, if you have any idea how to introduce a test to check for unclosed file descriptors that would be useful, but its not too important. (Mostly just to ensure that it doesn't get re-broken in the future.) |
Note those are how I implement
I'm not sure if perhaps there is some miscommunication here? The behavior in this patch should be 100% backwards compatible; I intended for it to behave identical to the way it did before in all situations that worked before. I.e. if nml_fname is "in.nml" it still defaults to "in.nml~". The new |
I see that it works, sorry. I thought I had tested it out but must have overlooked the output. No worries. Thanks agan for the work and the explanations, I will merge this in now. |
I'm getting the impression that the changes I made to edit: I will anyways, because I would hate to think that I contributed confusing code! So, the purpose of the At first, I tried a "minimalist" approach. I tried adding a
I bring this up in case you find it easier to read the above than the current code. To me however, I find this very difficult to read, so I got rid of the
Then this is
The purpose of the code from line 240-246 is to call
which is perhaps clearer. My apologies if I am drowning you in details you don't care to read about! |
No worries, more documentation is always better! I can see that purpose is to avoid the extra carriage return. (I can also now see that the It does seem to me that flattening and rebuilding the namelist is a fair bit of extra work just to avoid a redundant carriage return. Though it's a pretty tiny amount of work or memory either way, of course. (Nearly all the time is in the rather slow tokenizing of As you say, the real issue is the closing and re-opening of the target file, which is probably a bad idea if the target is an already-opened file object. Maybe it would be ok to do the I might have a play with it to avoid building and using |
Honestly, part of me is tempted to just go back to the old behaviour which did not have spaces between the groups. |
I have an implementation which uses a flag inside the class to track when a newline is required. It's under the newline branch, diff is here: https://github.com/marshallward/f90nml/compare/newline If it looks OK to you, then I'll swap to this version. |
Seems fair to me! |
Cool, I have merged it in. Thanks again for submitting this and going through it with me, it's often no fun to troll through someone else's old code. |
Some API questions are how to handle
force=
and the defaultoutput name for patching; I decided to ignore
force
forfiles. For the latter issue, I raise
ValueError
whennml_fname
is a file since there is no sane default.
(personally, if it was my library, I would rip out the default name
functionality entirely, and make it so that when there is no output
file or path, it will still build and return the patched Namelist
object without writing a patched namelist file. But that's
just me)
Addresses #25 (although there is no string API here, at least yet)