Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segy: struct.error: 'h' format requires -32768 <= number <= 32767 #1393

Closed
josephjamesfarrugia opened this issue May 4, 2016 · 19 comments · Fixed by #2358
Closed

segy: struct.error: 'h' format requires -32768 <= number <= 32767 #1393

josephjamesfarrugia opened this issue May 4, 2016 · 19 comments · Fixed by #2358
Assignees
Labels
Milestone

Comments

@josephjamesfarrugia
Copy link

josephjamesfarrugia commented May 4, 2016

obspy version 1.0.1,Python version 2.7.6, OSX.
obspy was installed via pip and python via their website

Essentially, I'd like to use obspy to merge all the traces in a stream object, and then write a .sgy file from the merged stream. I've been able to do the aforementioned by writing the merged stream to a .txt file, but I've been unsuccessful in writing the .sgy file.

Below is a copy of my code:

# !/usr/bin/env python

# IMPORT RELEVANT SCRIPT PACKAGES
import os  # Operating system dependent functionality
from obspy.io.segy.core import _read_segy
import sys

filename = ['200', '201']  # File name

for i in range(0, len(filename), 1):
    original_segy = os.path.join('/Users/josephfarrugia/Dropbox/Masters_Work/Ontario Site Response Field Campaign/' + filename[i] + '.sgy')
    st = _read_segy(original_segy)

    merged_st = {}
    for x in range(1, 13, 1):  # First trace in the stream starts at x = 0
        print('Writing Channel %d to .txt File' % (x + 12))  # Indicating what channel is being merged/written to .txt
        merged_st[x] = st[((x - 1) + 12):len(st):24].merge(method=1, fill_value=None, interpolation_samples=2)  # See:
        # https://docs.obspy.org/packages/autogen/obspy.core.stream.Stream.merge.html for more information
        # Start at x = 0 (representing the 13th channel) and through to x = 13-1 = 12 (24th channel)
        merged_st[x].write('%s_channel_%d.txt' % (filename[i], x), format='TSPAIR')  # Write the new .txt file

        print('Writing Channel %d to .sgy File' % (x + 12))  # Indicating what channel is being merged/written to .sgy
        merged_st[x].write('%s_channel_%d.sgy' % (filename[i], x), format="SEGY")  # Write the new .sgy file

I receive the following error:

Traceback (most recent call last):
  File "/Users/josephfarrugia/Dropbox/Masters_Work/Python/segyconcat.py", line 32, in <module>
    merged_st[x].write('%s_channel_%d.sgy' % (filename[i], x), format="SEGY")  # Write the new .sgy file
  File "/usr/local/lib/python2.7/site-packages/obspy/core/stream.py", line 1444, in write
    write_format(self, filename, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/obspy/io/segy/core.py", line 435, in _write_segy
    segy_file.write(filename, data_encoding=data_encoding, endian=byteorder)
  File "/usr/local/lib/python2.7/site-packages/obspy/io/segy/segy.py", line 232, in write
    self._write(file, data_encoding=data_encoding, endian=endian)
  File "/usr/local/lib/python2.7/site-packages/obspy/io/segy/segy.py", line 272, in _write
    self.binary_file_header.write(file, endian=endian)
  File "/usr/local/lib/python2.7/site-packages/obspy/io/segy/segy.py", line 415, in write
    file.write(pack(format, getattr(self, name)))
struct.error: 'h' format requires -32768 <= number <= 32767

I'm fairly new to Python, but have experience with Matlab. From my research, I don't think this is a Python error. Hoping someone knows how to rectify the problem!

You can download the corresponding files to run with my script here to replicate the error: https://www.dropbox.com/s/7m5wbb1hkmmhok8/200.sgy?dl=0
https://www.dropbox.com/s/sdttwiurxhjwgcg/201.sgy?dl=0

@claudiodsf
Copy link
Member

Hi, here's a more pythonic way of writing your code, which does not resolve the issue, but it's a bit clearer for others, I hope.

A few notes:

  • You should use the generic ObsPy function read() instead of _read_segy()
  • It looks like that there's no need to put your merged stream into a dictionary (merged_st = {}).
  • You need to copy the original stream before manipulating it: merged_st = st[((x - 1) + 12)::24].copy()
# !/usr/bin/env python
from obspy import read

filename = ['200', '201']

for fname in filename:
    st = read(fname + '.sgy')

    for x in range(1, 13, 1):
        print('Writing Channel %d to .txt File' % (x + 12))
        merged_st = st[((x - 1) + 12)::24].copy()
        merged_st.merge(method=1, fill_value=None, interpolation_samples=2)
        merged_st.write('%s_channel_%d.txt' % (fname, x), format='TSPAIR')

        print('Writing Channel %d to .sgy File' % (x + 12))
        merged_st.write('%s_channel_%d.sgy' % (fname, x), format="SEGY")

As I said before, this code fails with the same error. I guess it's related to some invalid value in your stream...

@josephjamesfarrugia
Copy link
Author

Thanks!

Joseph Farrugia
M.Sc. Candidate, Geophysics
Engineering Seismology
Department of Earth Sciences (BGS 1033)
Western University

On May 4, 2016, at 1:06 PM, Claudio Satriano notifications@github.com wrote:

!/usr/bin/env python

from obspy import read

filename = ['200', '201']

for fname in filename:
st = read(fname + '.sgy')

for x in range(1, 13, 1):
    print('Writing Channel %d to .txt File' % (x + 12))
    merged_st = st[((x - 1) + 12)::24].copy()
    merged_st.merge(method=1, fill_value=None, interpolation_samples=2)
    merged_st.write('%s_channel_%d.txt' % (fname, x), format='TSPAIR')

    print('Writing Channel %d to .sgy File' % (x + 12))
    merged_st.write('%s_channel_%d.sgy' % (fname, x), format="SEGY")

@bsmithyman
Copy link

I suspect it's the sample rate setting; for SEG-Y it is stored as a signed short int in microseconds, so it has to be between 1 us and 32.767 ms. This is commonly a problem in going to/from data that don't fit the assumptions in SEG-Y. You can divide the dt header by 1000, in which anything in Hz will be kHz in your SEG-Y workflow.

@josephjamesfarrugia
Copy link
Author

I'll give that a shot! Thanks Brendan!

Joseph Farrugia
M.Sc. Candidate
Geophysics and Seismology
Department of Earth Sciences
Western University

On May 4, 2016, at 6:44 PM, Brendan Smithyman notifications@github.com wrote:

I suspect it's the sample rate setting; for SEG-Y it is stored as a signed short int in microseconds, so it has to be between 1 us and 32.767 ms. This is commonly a problem in going to/from data that don't fit the assumptions in SEG-Y. You can divide the dt header by 1000, in which anything in Hz will be kHz in your SEG-Y workflow.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@josephjamesfarrugia
Copy link
Author

josephjamesfarrugia commented May 5, 2016

Hey Brendan,

I did as you suggested, and set the dt header to dt/1000 (2000/1000). I still get the same error!

# !/usr/bin/env python
from obspy import read

filename = ['201']

for fname in filename:
    st = read(fname + '.sgy')

    for x in range(1, 13, 1):
        print('Writing Channel %d to .txt File' % (x + 12))
        merged_st = st[((x - 1) + 12)::24].copy()
        merged_st.merge(method=1, fill_value=None, interpolation_samples=2)
        merged_st[0].stats.segy.trace_header['sample_interval_in_ms_for_this_trace'] = 2
        merged_st.write('%s_channel_%d_copy.txt' % (fname, x), format='TSPAIR')

        print('Writing Channel %d to .sgy File' % (x + 12))
        merged_st.write('%s_channel_%d_copy.sgy' % (fname, x), format="SEGY")

Here's all the header information from the traces:

AttribDict({'receiver_group_elevation': 0, 'ensemble_number': 0, 'unassigned': '\x00\x00\x00\x00\x00\x00\x00\x00', 'sweep_length_in_ms': 0, 'data_use': 0, 'original_field_record_number': 202, 'year_data_recorded': 16, 'datum_elevation_at_receiver_group': 0, 'day_of_year': 119, 'hour_of_day': 12, 'sample_interval_in_ms_for_this_trace': 2000, 'number_of_samples_in_this_trace': 16000, 'taper_type': 0, 'x_coordinate_of_ensemble_position_of_this_trace': 0, 'gap_size': 0, 'geophone_group_number_of_trace_number_one': 0, 'low_cut_slope': 0, 'coordinate_units': 1, 'group_coordinate_y': 0, 'group_coordinate_x': 23, 'source_measurement_exponent': 0, 'instrument_early_or_initial_gain': 0, 'for_3d_poststack_data_this_field_is_for_in_line_number': 0, 'source_energy_direction_exponent': 0, 'distance_from_center_of_the_source_point_to_the_center_of_the_receiver_group': 0, 'trace_weighting_factor': 0, 'mute_time_end_time_in_ms': 0, 'trace_number_within_the_ensemble': 0, 'number_of_horizontally_stacked_traces_yielding_this_trace': 1, 'geophone_group_number_of_last_trace': 0, 'over_travel_associated_with_taper': 0, 'number_of_vertically_summed_traces_yielding_this_trace': 1, 'unpacked_header': None, 'scalar_to_be_applied_to_times': 0, 'scalar_to_be_applied_to_all_elevations_and_depths': 0, 'energy_source_point_number': 0, 'water_depth_at_source': 0, 'for_3d_poststack_data_this_field_is_for_cross_line_number': 0, 'notch_filter_frequency': 0, 'source_static_correction_in_ms': 0, 'instrument_gain_constant': 0, 'sweep_trace_taper_length_at_start_in_ms': 0, 'high_cut_frequency': 0, 'lag_time_B': 0, 'lag_time_A': 0, 'high_cut_slope': 0, 'minute_of_hour': 42, 'uphole_time_at_source_in_ms': 0, 'scalar_to_be_applied_to_all_coordinates': 1, 'shotpoint_number': 0, 'device_trace_identifier': 0, 'subweathering_velocity': 0, 'source_depth_below_surface': 0, 'trace_sequence_number_within_line': 0, 'sweep_trace_taper_length_at_end_in_ms': 0, 'delay_recording_time': -32000, 'weathering_velocity': 0, 'source_coordinate_x': 16, 'source_coordinate_y': 0, 'source_type_orientation': 0, 'mute_time_start_time_in_ms': 0, 'sweep_frequency_at_end': 0, 'total_static_applied_in_ms': 0, 'time_basis_code': 1, 'group_static_correction_in_ms': 0, 'sweep_type': 0, 'surface_elevation_at_source': 0, 'alias_filter_frequency': 0, 'low_cut_frequency': 0, 'endian': u'>', 'trace_identification_code': 1, 'source_measurement_mantissa': 0, 'scalar_to_be_applied_to_the_shotpoint_number': 0, 'source_measurement_unit': 0, 'source_energy_direction_mantissa': 0, 'second_of_minute': 48, 'trace_sequence_number_within_segy_file': 0, 'transduction_constant_exponent': 0, 'alias_filter_slope': 0, 'sweep_frequency_at_start': 0, 'uphole_time_at_group_in_ms': 0, 'gain_type_of_field_instruments': 0, 'trace_value_measurement_unit': 0, 'trace_number_within_the_original_field_record': 24, 'transduction_units': 0, 'y_coordinate_of_ensemble_position_of_this_trace': 0, 'notch_filter_slope': 0, 'geophone_group_number_of_roll_switch_position_one': 0, 'correlated': 0, 'datum_elevation_at_source': 0, 'water_depth_at_group': 0, 'transduction_constant_mantissa': 0})

I think I changed the right attribute.

@megies
Copy link
Member

megies commented May 5, 2016

@josephjamesfarrugia, download links to files are not working for me.

Also, my feeling tells me this is the same problem as in #1385.

@megies megies added the .io.segy label May 5, 2016
@josephjamesfarrugia
Copy link
Author

@megies I've updated the links so they should work now!

@josephjamesfarrugia
Copy link
Author

@megies @bsmithyman
Following what I could understand from the example in #1385 I attempted to trim the stream object before writing to SEGY. Again, I'm able to write the TXT file, but the write to SEGY crashes with the same error:

struct.error: 'h' format requires -32768 <= number <= 32767

Copy of the current code is below:
segyconcat.txt

@josephjamesfarrugia
Copy link
Author

josephjamesfarrugia commented May 5, 2016

Update: I changed the last line of my code to:

merged_st.write('%s_channel_%d_copy.sgy' % (fname, x), format="SEGY", data_encoding=1, byteorder=sys.byteorder)

And now I receive the following error (same as #1385):

struct.error: short format requires SHRT_MIN <= number <= SHRT_MAX

@josephjamesfarrugia
Copy link
Author

josephjamesfarrugia commented May 5, 2016

Also, the length of each trace is long...approximately 45 minutes. Might that be producing the error?

Update: I was ABLE to write a 30 second trace to SEGY. I'm thinking there must be a file size limit.

@josephjamesfarrugia
Copy link
Author

Update: It's the number of samples.

1 Trace(s) in Stream:
Seq. No. in line:    0 | 2016-04-28T11:30:57.000000Z - 2016-04-28T11:31:57.998000Z | 500.0 Hz, 30500 samples
Writing Channel 13 to .txt File
Writing Channel 13 to .sgy File

If I set the record length to just a minute (roughly two traces), the number of samples (30500) is less than the max number of samples allowed (original error -- struct.error: 'h' format requires -32768 <= number <= 32767).

I gradually increased the record length until the number of samples exceeded 32767, and the write to SEGY failed as expected.

So I think it's a matter of resampling the stream object. Therefore, I'll close this issue. However, I get a new error when trying to write the resampled stream object.

Thanks everyone for your time and input.

@krischer
Copy link
Member

krischer commented May 6, 2016

Reopening as this definitely requires a better error message.

Also: We currently write the number of samples as a signed short (thus the range from -32768 to 32767). We could store it as an unsigned short which would double the effective range of allowed number of samples. The SEG-Y manual does not appear to specify whether to use signed or unsigned integers (it only says to use two's complement integers) so we might risk compatibility with other SEG-Y tools. Any SEG-Y experts around here that know the best way to handle this?

@krischer krischer reopened this May 6, 2016
@bsmithyman
Copy link

One of the clearer quick references I use is here, which is useful for the old formats. My mostly-compatible SEG-Y rev. 0 / rev. 1 library uses unsigned ints for dt and ns, now that I look at it. I based that off of the Seismic Unix headers and convention, specifically SU/src/su/include/[tapebhdr.h,tapehdr.h]. So, I think that would probably be my go-to open source reference; also, SU is BSD License, so it's a safe place to look w/o worrying about license violations. This is contrary to what I remembered off of the top of my head, so I guess my comments above should say 65535 us (though, of course, turns out it was ns that was the issue in this case.

@krischer
Copy link
Member

krischer commented May 6, 2016

Thanks for the hint to look at the SU source code! I did not actually run any tests but from looking at the code I don't think SU can deal with unsigned values for the following reasons:

  • The SU/src/su/include/[tapesegy.h,tapebhdr.h] files define everything as unsigned, but when SU/src/su/lib/hdrpkge.c actually reads and writes the header its values are cast to and from signed (pointer) values.
  • Type definitions in Seismix Unix are for some reason available in lots of places but I think the authoritative location is SU/src/su/include/hdr.h which is used by the gethval() function which only results in signed header values.

I guess we should just check if Seismic Unix can deal with unsigned values. Signed and unsigned values are identical if one does not leave the positive range of the signed version.

Cheers!

@LKueperkoch
Copy link

Everything looks fine in the code, but two things might be missing (that's what I additionally did to the data): trim all traces to get equal start and end times, fill gaps with zeros.

@krischer
Copy link
Member

In a recent discussion on the mailing list (http://lists.swapbytes.de/archives/obspy-users/2017-March/002358.html) it was pointed out that two's complement numbers are by definition signed numbers and that the SEG-Y spec states that all integers are two's complement integers.

ObsPy thus does the correct (and maximally compatible) thing. Users who want unsigned values will have to monkey-patch ObsPy to get that behaviour.

We still need a better error message though (also for #1396).

@ThomasLecocq
Copy link
Contributor

ThomasLecocq commented Mar 27, 2017

"struct.error: short format requires SHRT_MIN <= number <= SHRT_MAX"

"This error might occur because the traces within your stream are longer than SHRT_MAX, try to slice the stream in traces smaller than SHRT_MAX before saving to SEGY"

@megies megies added this to the 1.1.1 milestone Mar 27, 2017
This was referenced Feb 13, 2018
@megies megies changed the title struct.error: 'h' format requires -32768 <= number <= 32767 segy: struct.error: 'h' format requires -32768 <= number <= 32767 Feb 13, 2018
@megies megies modified the milestones: 1.1.1, 1.2.1 Apr 19, 2018
@megies
Copy link
Member

megies commented Feb 20, 2019

was this fixed by #2196, same as #2194? Can we close this ticket?

@megies megies modified the milestones: 1.2.1, 1.2.0 Feb 20, 2019
@megies megies added this to Waiting for Review in Release 1.2.0 Feb 20, 2019
@megies megies moved this from Waiting for Review to Waiting on CI in Release 1.2.0 Mar 15, 2019
@megies megies moved this from Waiting on CI to In Progress in Release 1.2.0 Mar 15, 2019
@megies megies self-assigned this Mar 15, 2019
@megies
Copy link
Member

megies commented Sep 12, 2019

This has been worked on and improved in recent PRs. Closing

@megies megies closed this as completed Sep 12, 2019
@megies megies moved this from In Progress to Done in Release 1.2.0 Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

7 participants