Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading TIFF larger than 2GB #1627

Closed
yewunudt opened this issue Sep 8, 2016 · 4 comments
Closed

Reading TIFF larger than 2GB #1627

yewunudt opened this issue Sep 8, 2016 · 4 comments
Milestone

Comments

@yewunudt
Copy link

yewunudt commented Sep 8, 2016

How to read a TIFF file larger than 2GB? In file geotrellis/spark/src/main/scala/geotrellis/spark/io/hadoop/HdfsUtils.scala: 165, it limits the length of a file to Int.MaxValue.toLong. The "Cannot read path $path because it's too big..." error will be reported if trying to read a large TIFF file.

@fosskers
Copy link
Contributor

fosskers commented Sep 27, 2016

Hi there, there are two approaches to the "big GeoTIFF input problem" with GeoTrellis:

  1. Cut up your large GeoTIFF ahead of time with a tool like gdal_retile.py
  2. Use our recently-merged windowed GeoTIFF reading feature, which lets you read in a small section of a GeoTIFF, if you have an Extent that represents some subsection of it.

We haven't yet taken that new feature and implemented a kind of:

readBigGeoTiff[K]: GeoTiff => TileLayerRDD[K]

but it's coming soon.

@lossyrob lossyrob added this to the 1.0 milestone Oct 19, 2016
@lossyrob
Copy link
Member

The problem here is that we read files off of HDFS as an Array[Byte], which can't have more bytes than integers. In order to read large GeoTiffs (that aren't in BigTiff format, so < 4G), we'll have to change some things around in how we read off of HDFS and also implement the streaming reads that we have implemented for S3, for HDFS.

@jbouffard
Copy link

@yewunudt As of right now, you can read in files that are 4 GB or less either locally or through S3 via a LocalBytesStreamer and a S3BytesStreamer, respectively. Here is a quick overview on how to use S3BytesStreamer. The LocalBytesStreamer acts the same way. As for reading files from Hdfs, that is something that we are currently working on.

When a GeoTiff file is larger than 4 GB, then it's referred to as a BigTiff. These files actually have a different layout than a normal GeoTiff. We currently can't read BigTiffs from anywhere, but there's a PR in progress that'll allow to do so soon link.

@jbouffard
Copy link

BigTiff reading is now supported, there should be no issue ingesting GeoTiffs at any size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants