New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix of #3240 (Drop Temporary file usage while reading data) #3346
base: master
Are you sure you want to change the base?
Conversation
… objects in obspy.core.util.base._generic_reader
…o into test_reading_string_io (as implemented in nordic/core.py
Hi Don't have any objections to the changes done to the gcf reader but perhaps that the added guard in lines 214 and 217 seems a bit superfluous, but perhaps I missing something here. Could you please give some more details what was cryptic in the code base and I can perhaps walk you through this. |
Hi
The try catch in lines 214-217 (I guess you refer to that) is a simple workaround to prevent a
The problem with gcf is that I do not see where the passed file (currently a Lines 210 to 212 in 417b047
It looks like everything is delegated to Thank you very much in advance Ps: Writing to TemporaryFile for gcf files only might be a workaround for the function |
Hi First of all, thanks for working on this think it'll be a nice contrib. The python part of the code sends the file path to the underlying c-code where the file is open, see: obspy/obspy/io/gcf/src/gcf_io.c Lines 534 to 540 in 417b047
where in order to have the c-code compile on (most) platforms. So what needs to be done to the code i:
Also in |
Thanks @paitor for the suggestion, it really helped. Eventually I opted to keep away from the rabbit hole of binary streams, file descriptors, C and O/S compatibility (which I tried to sort out without success). I therefore proceeded to:
Tests are passing on my local machine (btw, why are they not passing on Github CLI?) |
Hi Don't see any problems with the changed code but out of curiosity, did you try to benchmark the performance? One of the reason for asking is that my decision to update the code from a pure python implementation to an underlying C-implementation were that the C-implementation resulted in an approx 80-fold speed-up in reading gcf data. Would be interesting to see if the update affects the performance and if so if this is a greater loss than the gain. I have never done a review before so leave this to the other reviewers. |
…tream + fix several docstrings
@patior On my computer (macOS 2.6 GHz 6-Core Intel Core i7):
Results:
this PR version:
So I do not see any hint for a significant performance difference. However, we should always keep in mind to benchmark any performance change with the performance improvements that this PR aims to give, not only in terms of computing speed, but also in terms of time released from implementing custom code or wrapper functions outside obpsy, in order to prevent writing several files to disk needlessly (as it happened to us). I am trying to refactor text files because I saw some improvement that could be done to the code, then I would also wait for other reviewers because the PR will probably need some feedback I guess |
Any feedback? |
Sorry for not seeing this earlier. I'll give this a proper look and review soon! 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice further improvement on unifying reading/writing routines. I tried to be as thorough as possible since this touches the very core of things, but in some instances it's hard to think everything through, so to some extent we'll have to rely on our test suite.
There's some comments that need addressing but overall I don't see a reason not to merge this after that. 👍
Oh and this might need a rebase eventually too |
Thanks @megies , I'll try to go through all your comments the next days and commit changes to this PR |
It's in the github milestone for 1.5.0 and it won't get left out for sure, no worries |
Fixed the last comments of @megies (thanks for the review), apart from the ascii question (see comment to the only unresolved issue above). As a side note, just for safety, as @megies pointed out this PR touches the very core of things and nobody of us is expert in everything, if you guys know any person experienced in specific code blocks that can check the modifications (or run this branch in their usual workflow) to see if there are additional tests to add, I would be happy to have them invited to the discussion |
What does this PR do?
This PR tries to fix the first issue (point 1) in #3240, allowing now to seamlessly read from both file-like objects and file paths and eliminating the highly inefficient fallback of writing data to disk on
TypeError
s (if these errors happen now, there is not need for the fallback, andread
simply raises them)Note for reviewers:
I could not write support for file-like objects in the case of 'GCF' format because the code in the related module was quite criptic: as such, this format still allows only file paths as argumentIn account of the two points above, I am available to improve my PR or provide an additional one, but probably some discussion is needed. I am available even via zoom in case as the problem might require some round table and pair programming
Why was it initiated? Any relevant Issues?
See point 1. of #3240
PR Checklist
Just add the build_docs tag to this PR.
Docs will be served at docs.obspy.org/pr/{branch_name} (do not use master branch).
Please post a link to the relevant piece of documentation.
just add the test_network tag to this PR.
from all the CI builds look correct. Add the "upload_plots" tag so that plotting
outputs are attached as artifacts.