Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading SEGY not in shot sort #1

Open
jemla61 opened this issue Oct 3, 2019 · 7 comments
Open

Reading SEGY not in shot sort #1

jemla61 opened this issue Oct 3, 2019 · 7 comments

Comments

@jemla61
Copy link

jemla61 commented Oct 3, 2019

Hi,

I'm trying to use SegyIO to read a SEGY file that is not sorted by source location, e.g. CDP,offset or a post-stack 3D. From the source code, it looks like the primary sort headers are hard-coded as changes in (SourceX,SourceY). Is there a way to change it to use different headers?

Thanks for your help, and thanks for making this package available as open source - great work!

Marc.

@henryk-modzelewski
Copy link
Member

henryk-modzelewski commented Oct 4, 2019

Marc,

I am not the original author of the code (he is not with SLIM anymore) but I do not believe that segy_read does any sorting. The trace headers and data are filled in in the order of reading.

Am I missing something?

Henryk

@jemla61
Copy link
Author

jemla61 commented Oct 5, 2019

Hi Henryk,

I was referring to scanning a file that is stored on disk and is ordered by headers other than SourceX/SourceY. For example, pre-stack datasets after final processing are usually saved on disk sorted by CDP/Offset, or a post-stack 3D volume would be ordered by Inline/Crossline. To scan these files, the headers to check for the start of a new ensemble (gather) would be CDP and Inline.

You can see an example if you run segy_scan() on the post-stack 3D volume in
src/data/testdata.segy. Since the Sx,Sy headers change with every trace, it creates a SeisCon object with elements for all the 40,000 traces in the file. It would be more useful to have 400 ensembles, one for each inline.

It would be great if segy_scan() could have another argument to tell it what headers to use to delimit the ensemble, instead of the default Sx,Sy. I've taken a closer look at the code and it looks doable. I can give it a try and let you know how it works (I can you the updated code if you like).

Another thing I'm working on is to modify segy_read() so it will only read a single ensemble at a time, instead of the entire data file. This would make it possible to read a file that was too big to fit in memory by looping over all the ensembles, which are also delimited by a pair of headers that may be different from Sx,Sy.

Marc.

@henryk-modzelewski
Copy link
Member

henryk-modzelewski commented Oct 5, 2019

Marc,

Got it. Thanks for the patient explanations and willingness to contribute. We are happy to see it used and improved.

We've build this package to make our life easier, but we always read data in certain way and I am not surprised that it is not the only way that somebody would like to use it. I would love to chat about what would be both usefull, familiar, and easy for user.

segy_read ensemble should be easy to implement, but keep in mind that we intend to go to variable trace length, so it would be nice to keep it still flexible enough to implement. Funny part, sexgy_read was never intended for large memory. (It was just on the side addition for dirty code.) SeisCon was designed for that, but if segy_read can be more useful, then why not.

I did not look yet in segy_scan.

If you intend to start to contribute the only request I have is that you try to use multiple dispatch as much as reasonable to implement different flavours of the functions.

BTW. Did you have a chance to look at https://github.com/jpjones76/SeisIO.jl ? I was thinking about rebuilding SegyIO using utilities from that package since the support more segy versions.

Thanks again,
Henryk

@jemla61
Copy link
Author

jemla61 commented Oct 7, 2019

Henryk,
Thanks for the additional info. I'll definitely keep variable trace length and multiple dispatch in mind for any changes I make.
Since this is getting a bit off-topic as an "issue", I'll contact you directly by the email I found on github.
Marc.

@klensink
Copy link
Contributor

klensink commented Oct 8, 2019

Hi Marc,

I am the original author Henryk mentioned previously, even though I'm no longer with SLIM I'm happy to discuss extending the package. Unfortunately you're obersvation is correct, I was a little short sighted and hardcoded the automatic block detection to only look over source locations.

I beleive that all you need to modify would be this block of code. It is already based off of keys, so it shouldnt be much work to modify the function call to accept some arbitrary set of keywords.

In the mean time, you do not need to use the automatic chunking and can instead specify a blocksize (int) value to segy_scan. This will return continous blocksize chunks of traces out of the file in whatever order they are stored. The downside of this method is that it is only really effective if each group in the SEGY file has the same number of traces.

In regards to modifying segy_read to give out of core access to ensembles in the underlying file, I'd just like to point out that this is precisely the core functionality of SeisCon objects and segy_scan. Once you've scanned the file, indexing into the SeisCon object reads the corresponding ensemble out of the file, without reading the whole thing. That said, I think you may be proposing scanning the file on the fly, and loading blocks one by one. If this is the case I'd be happy to talk about the idea in a a new issue, however I'm not sure I see a use case where it would be required to scan the file on the fly.

@jemla61
Copy link
Author

jemla61 commented Oct 8, 2019

Hi Keegan,

Thanks for your comments. I found that block of code you pointed to and agree that it won't be too much work to add in a caller-supplied list of keys, with the defaults Sx,Sy as before.

As I've said offline to Henryk, I'm looking at using SegyIO as an import/export module where an entire data file will be read sequentially and converted to a new, HDF5-based data format I'm working on (SEGY is not a great choice for processing!). Since the whole file needs to be converted in the same sort order, the overhead of doing a scan first isn't needed (and may be significant for large marine data sets). Having the option to run a scan to do scattered reads is a great feature (e.g. for QC) but we wouldn't use it that often.

I've just finished two new functions, segy_open() that opens the input file and reads the binary header, and read_ensemble() that reads traces in the file sequentially until a caller-supplied list of headers changes indicating the end of the ensemble. The whole ensemble is then returned in a SeisBlock object. The maximum number of traces per ensemble needs to be known to pre-allocate the arrays, but it supports a variable number as long as it's < the max. I still need to do some more testing and code clean up but I should be able to show it to you for comments soon.

Thanks again,
Marc.

@klensink
Copy link
Contributor

klensink commented Oct 9, 2019

Hi Marc,

Thanks for explaining your use case, I see now why you would need this functionality. I've thought about it a bit more and I think that this should be possible to implement mostly making use of functions that have already been written. In terms of consistancy and maintainabiity I think it makes the most sense to use the defined modules where ever possible.

I think that you could get what you want by just adapting scan_shot from this line downwards, to turn it into some kind of iterator that reads and returns a block, rather than scanning it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants