Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsync error in load-raster #854

Closed
edsu opened this issue Mar 7, 2024 · 7 comments · Fixed by #859
Closed

rsync error in load-raster #854

edsu opened this issue Mar 7, 2024 · 7 comments · Fixed by #859
Assignees
Labels

Comments

@edsu
Copy link
Contributor

edsu commented Mar 7, 2024

This error popped up when testing remediation in stage. I'm not sure what's going wrong with this one, but maybe the stray single quote is causing a problem?

Error: load-raster : Command failed with exit 23: rsync -v '/var/geomdtk/current/tmp/normalizeraster_bh409ss0052/G5754_L7_1783_B6.tif'.aux.xml /var/geoserver/local/raster/geotiff/bh409ss0052.tif.aux.xml 

If you want to try accessioning the files again into stage you can find them in the "Difficult Data" folder as 002.zip: https://drive.google.com/file/d/14IjHTgerzRh0G8lM_DF9QkXnU8luI4o7/view?usp=drive_link

@edsu edsu added the bug label Mar 7, 2024
@peetucket
Copy link
Member

Object: https://argo-stage.stanford.edu/view/druid:bh409ss0052

HB Alert: https://app.honeybadger.io/projects/52899/faults/105534189

@peetucket
Copy link
Member

peetucket commented Mar 13, 2024

The problem with the rsync line triggering this exception is that the file being copied does not exist. This file should be created by RasterNormalizer#compute_statistics, but is not being generated for this particular tiff after it has been compressed, as I have verified manually. It is being generated if you use the original tiff (before the compression step). No exception or errors that I can tell are being generated by the compute_statistics method, so it's unclear to me what is happening. We may need some help to understand those gdal commands a bit better.

Steps to reproduce for a test object in stage using the bad data: https://argo-stage.stanford.edu/view/druid:mj707mv2580

Walking manually through what is happening with the raster normalizer being triggered by the LoadRaster robot: https://github.com/sul-dlss/gis-robot-suite/blob/main/lib/gis_robot_suite/raster_normalizer.rb#L18-L28

cocina_object = Dor::Services::Client.object('druid:mj707mv2580').find;
rootdir = GisRobotSuite.locate_druid_path cocina_object.externalIdentifier.delete_prefix('druid:'), type: :stage
normalizer = GisRobotSuite::RasterNormalizer.new(logger: Logger.new(nil), cocina_object:, rootdir:)

tmpdir = normalizer.send(:tmpdir)
input_filepath = normalizer.send(:input_filepath)
output_filepath = normalizer.send(:output_filepath)

FileUtils.mkdir_p tmpdir

normalizer.send(:'epsg4326_projection?') # returns true, thus can just compress

normalizer.send(:compress_only) # copies file to tmp directory and compresses

normalizer.send(:'eight_bit?') # returns false, thus skip that method

normalizer.send(:compute_statistics) # should produce the .aux.xml file, but does not for compressed TIF, even though it does for the original tif

@peetucket
Copy link
Member

So I don't think this is an rsync issue, but rather an issue with computing statistics in this line with this particular object's tif when it is compressed:

https://github.com/sul-dlss/gis-robot-suite/blob/main/lib/gis_robot_suite/raster_normalizer.rb#L81

I did leave the PR up that adjusts the rsync call, even though both styles work (original and the one in my PR)... i think maybe it looks cleaner to me. #857

@edsu
Copy link
Contributor Author

edsu commented Mar 13, 2024

Maybe I'm reading it wrong but I think if normalizer.send(:'epsg4326_projection?') returns false then it will reproject_and_compress?

@peetucket
Copy link
Member

Maybe I'm reading it wrong but I think if normalizer.send(:'epsg4326_projection?') returns false then it will reproject_and_compress?

Oh, you are correct, but I had it wrong and will fix in my comment. For that TIFF normalizer.send(:'epsg4326_projection?') returns true. just double checked

edsu added a commit that referenced this issue Mar 13, 2024
Since load_raster expects to be able to rsync the statistics file that
is generated during the normalization step, we should have the
normalizer raise an error if it didn't get generated.

Refs #854
@edsu
Copy link
Contributor Author

edsu commented Mar 13, 2024

I can reproduce this problem with gdal. Download the data from Google Drive (it hasn't hit preservation yet so it's not available in Argo) and then:

$ unzip -d 002 002.zip
$ cd 002/bh409ss0052
$ gdal_translate -a_srs EPSG:4326 G5754_L7_1783_B6.tif G5754_L7_1783_B6-compressed.tif -co 'COMPRESS=LZW'
$ gdalinfo -mm -stats -norat -noct G5754_L7_1783_B6-compressed.tif

It doesn't create the G5754_L7_1783_B6-compressed.tif.aux.xml file like it should.

This is weird because running the same command on the uncompressed original tif does generate work:

gdalinfo -mm -stats -norat -noct G5754_L7_1783_B6.tif

will generate G5754_L7_1783_B6.tif.aux.xml

@kimdurante do you remember any issues where gdalinfo doesn't generate the stats file?

edsu added a commit that referenced this issue Mar 14, 2024
If a stats file has not been generated (which can happen) don't try to
copy it.

Refs #854
@edsu
Copy link
Contributor Author

edsu commented Mar 14, 2024

@kimdurante I've got a branch which simply skips trying to copy the stats file if it's not there, instead of blowing up. I accessioned an the problematic data here using Preassembly. Here is the object:

It seems to display just fine which makes me wonder if GeoServer really needs these stats files?

aaron-collier pushed a commit that referenced this issue Mar 15, 2024
* Skip copying stats file if it isn't there

If a stats file has not been generated (which can happen) don't try to
copy it.

Refs #854

* Remove stats generation

Since it didn't seem to make a difference to GeoServer per Kim we can
remove generating and copying the stats file altogether.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants