Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload failed: Exited with 124: Transcoding... #102

Closed
danbjoseph opened this issue Mar 1, 2018 · 23 comments
Closed

Upload failed: Exited with 124: Transcoding... #102

danbjoseph opened this issue Mar 1, 2018 · 23 comments

Comments

@danbjoseph
Copy link

screen shot 2018-03-01 at 8 36 48 am

https://map.openaerialmap.org/#/upload/status/5a972ded2553e6000ce5ad39?_k=1b2vfw
size 8,192,105,472 bytes

Driver: GTiff/GeoTIFF
Files: mosaic-4326-order3-jpeg.tif
Size is 273238, 153942
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-72.342339328072867,18.706082688337968)
Pixel Size = (0.000000469128872,-0.000000469128872)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=JPEG
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( -72.3423393,  18.7060827) ( 72d20'32.42"W, 18d42'21.90"N)
Lower Left  ( -72.3423393,  18.6338641) ( 72d20'32.42"W, 18d38' 1.91"N)
Upper Right ( -72.2141555,  18.7060827) ( 72d12'50.96"W, 18d42'21.90"N)
Lower Right ( -72.2141555,  18.6338641) ( 72d12'50.96"W, 18d38' 1.91"N)
Center      ( -72.2782474,  18.6699734) ( 72d16'41.69"W, 18d40'11.90"N)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
  Mask Flags: PER_DATASET ALPHA 
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
  Mask Flags: PER_DATASET ALPHA 
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue
  Mask Flags: PER_DATASET ALPHA 
Band 4 Block=256x256 Type=Byte, ColorInterp=Alpha
@danbjoseph
Copy link
Author

processed it without compression. same error.

@danbjoseph
Copy link
Author

processed it, no compression and leaving out the alpha band and still received the same error

size 126.4 GB

Driver: GTiff/GeoTIFF
Files: mosaic-4326-3band.tif
Size is 273238, 153942
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-72.342339328072867,18.706082688337968)
Pixel Size = (0.000000469128872,-0.000000469128872)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( -72.3423393,  18.7060827) ( 72d20'32.42"W, 18d42'21.90"N)
Lower Left  ( -72.3423393,  18.6338641) ( 72d20'32.42"W, 18d38' 1.91"N)
Upper Right ( -72.2141555,  18.7060827) ( 72d12'50.96"W, 18d42'21.90"N)
Lower Right ( -72.2141555,  18.6338641) ( 72d12'50.96"W, 18d38' 1.91"N)
Center      ( -72.2782474,  18.6699734) ( 72d16'41.69"W, 18d40'11.90"N)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue

@smit1678
Copy link
Collaborator

smit1678 commented Mar 5, 2018

From @sharkinsspatial it looks like we might be running into a timeout issue: https://github.com/mojodna/marblecutter-tools/blob/master/bin/transcode.sh#L123. This was hardcoded at 1hr, any particular reason @mojodna?

@danbjoseph
Copy link
Author

danbjoseph commented Mar 6, 2018

Is there a workaround for this so we can get this imagery up and shared? Seth is on vacation, does he need to get back for this to move ahead?

@smit1678
Copy link
Collaborator

smit1678 commented Mar 7, 2018

Unless @sharkinsspatial or @dakotabenjamin have some other ideas, I think we have to have @mojodna look into this since we're running the transcoder off of the docker image at quay.io/mojodna/marblecutter-tools.

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

(Back now.)

This was hardcoded at 1hr, any particular reason @mojodna?

I was finding that GDAL would stall (but not quit) when processing large images when not enough RAM was available. Most images should have been processed within an hour (for "reasonable size"), hence the default. We should come up with some sort of heuristic based on the size of the input for both the timeout and the amount of memory requested from the Batch job.

@danbjoseph
Copy link
Author

is it going to be possible to upload the image?

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

(Requested a URL to the image offline.)

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

On an r4.4xlarge (120GB RAM), with GDAL_CACHEMAX=50% and a local (EFS) copy of mosaic-4326-3band.tif (118G), it wrote out ~2.6GB in 35 minutes before using up all available memory (89.7% in this case, since another process was running).

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

Provided that the EFS has enough available burst credits, it seems to be better to download the source file rather than use GDAL's VSI layer to read it. GDAL >= 2.3 + GDAL_HTTP_MERGE_CONSECUTIVE_RANGES =YES could improve performance when reading from S3, so that's worth revisiting at some point.

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

With GDAL_CACHEMAX unset (defaulting to 5%), it reached the same 2.6GB point in 31 minutes, utilizing 20.1% of RAM at that point.

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

The initial gdal_translate completed in ~36 minutes, topping out at 20.1% of RAM, producing a ~2.6GB TIFF and 45MB mask sidecar.

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

gdaladdo appears stable @ 5.3% of RAM. For reference, the source is 273238 x 153942. (I'm thinking that necessary memory is based on the dimensions of the source rather than the file size, since they may be variably efficient--273238 x 153942 x 3 bands x 1 byte per pixel ~= 126GB)

@mojodna
Copy link
Collaborator

mojodna commented Mar 15, 2018

gdaladdo timed out after 1 hour.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

After increasing the timeout, gdaladdo completed in 1:36 and increased the transcoded version's size to 3.9GB.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

Summary: success after pre-downloading the source, providing enough memory (~25GB, probably less once GDAL_CACHEMAX is 5% of the container's memory), and increasing the timeout to gdaladdo (to 2h; it completed in ~1:30).

While watching intermediate outputs, I discovered that conversion of the mask sidecar to a COG resulted in a dramatically larger (242.4MB vs. ~63MB) result. The raw mask (sans overviews) is on the EFS as raw.tif.msk.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

-co NBITS=1 (and -co SPARSE_OK=yes -co ZLEVEL=9, dropping PREDICTOR) when converting the mask sidecar decreases the mask size to 13MB. I'm downloading the transcoded image to visually check it in QGIS.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

NBITS=1 doesn't work; try DISCARD_LSB=7 instead? Otherwise, what's difference between the COG version and the original?

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

DISCARD_LSB=7 works and produces a 164M sidecar.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

This (from Internal nodata masks):

This does not affect the way the mask band is written (it is always 1-bit).

suggests that the original is being written as NBITS=1, but somehow when the image is copied over w/ gdal_translate, it's being converted into 8-bit. Setting NBITS=1 may work for the source image (haven't checked @ 1:1), but the values in the overviews seem to be made invalid.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

quay.io/mojodna/marblecutter-tools has been updated w/ the GDAL_CACHEMAX and 2 hour overview timeouts, so it'll be used by Batch for subsequent runs. We still need to allocate appropriate amounts of memory when the Batch jobs are created in oam-catalog though.

@mojodna
Copy link
Collaborator

mojodna commented Mar 16, 2018

Note to self: write up mask COG creation issue on the GDAL trac.

@mojodna
Copy link
Collaborator

mojodna commented Apr 20, 2018

The image originally responsible for this issue has been uploaded successfully and a GDAL issue has been opened for the COG mask size discrepancy: OSGeo/gdal#468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants