Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: jbouffard <jbouffard@azavea.com>
- Loading branch information
jbouffard
committed
Dec 5, 2016
1 parent
3391f2f
commit 4cfa7ae
Showing
1 changed file
with
370 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,370 @@ | ||
## Reading GeoTiffs | ||
|
||
- [Introduction](#introduction) | ||
- [Reading GeoTiffs With GeoTrellis](#reading-geotiffs-with-geotrellis) | ||
- [Reading Locally Part 1: Reading For the First Time](#reading-locally-part-1-reading-for-the-first-time) | ||
- [Reading Locally Part 2: Expanding Our Vocab](reading-locally-part-2-expanding-our-vocab) | ||
- [Dealing With Compressed GeoTiffs](#dealing-with-compressed-geotiffs) | ||
- [Streaming GeoTiffs](#streaming-in-geotiffs) | ||
- [Tips for Using This Feature](#tips-for-using-this-feature) | ||
- [Reading in Small Files >= 4GB](#reading-in-small-files->=-4GB) | ||
- [Reading in Large Files](#reading-in-large-files) | ||
- [How to Use This Feature](#how-to-use-this-feature) | ||
- [Using Apply Methods](#using-apply-methods) | ||
- [Using Object Methods](#using-object-methods) | ||
- [Conclusion](#conclusion) | ||
|
||
### Introduction | ||
This tutorial will go over how to read GeoTiff files using GeoTrellis on your | ||
local filesystem. It assumes that you already have the environment needed to | ||
run these examples. If not, please follow this [link](setup.md) to get | ||
GeoTrellis working on your system. Also, this tutorial uses GeoTiffs from the | ||
`raster-test` project from GeoTrellis. If you have not already done so, please | ||
clone GeoTrellis [here](https://github.com/geotrellis/geotrellis) so that you | ||
can access the needed files. | ||
- - - | ||
|
||
### Reading GeoTiffs With GeoTrellis | ||
One of the most common methods of storing geospatial information is through | ||
GeoTiffs. This is reflected throughout the GeoTrellis library where many of its | ||
features can work with GeoTiffs. Which would mean that there would have to be | ||
many different ways to read in GeoTiff, and indeed there are! In the following | ||
document, we will go over the methods needed to load in a GeoTiff from your | ||
local filesystem. | ||
|
||
Before we start, open a Scala REPL in the Geotrellis directory. | ||
|
||
#### Reading Locally Part 1: Reading For the First Time | ||
Reading a local GeoTiff is actually pretty easy. You can see how to do it below. | ||
```scala | ||
import geotrellis.raster.io.geotiff.reader.GeoTiffReader | ||
import geotrellis.raster.io.geotiff._ | ||
|
||
val path: String = "path/to/geotrellis/raster-test/data/geotiff-test-files/lzw_int32.tif" | ||
val geoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband(path) | ||
``` | ||
And that's it! Not too bad at all really, just four lines of code. Even still, | ||
though, let's break this down line-by-line so we can see what exactly is going | ||
on. | ||
```scala | ||
import geotrellis.raster.io.geotiff.reader.GeoTiffReader | ||
``` | ||
This import statement brings in `GeoTiffReader` from | ||
`geotrellis.raster.io.geotiff.reader` so that we can use it in the REPL. As the | ||
name implys, `GeoTiffReader` is the object that actually reads the GeoTiff. If | ||
you ever wonder about how we analyze and process GeoTiffs, then | ||
`geotrellis.raster.io.geotiff` would be the place to look. Here's a | ||
[link](https://github.com/geotrellis/geotrellis/tree/master/raster/src/main/scala/geotrellis/raster/io/geotiff). | ||
```scala | ||
import geotrellis.raster.io.geotiff._ | ||
``` | ||
The next import statement loads in various data types that we need so that we | ||
can assign them to our `val`s. | ||
|
||
Okay, so we brought in the object that will give us our GeoTiff, now we just | ||
need to supply it what to read. This is where the next line of code comes into | ||
play. | ||
```scala | ||
val path: String = "path/to/geotrellis/raster-test/data/geotiff-test-files/lzw_int32.tif" | ||
``` | ||
Our `path` variable is a `String` that contains the file path to a GeoTiff in | ||
`geotrellis.raster-test`. `GeoTiffReader` will use this value then to read in | ||
our GeoTiff. There are more types of paramters `GeoTiffReader` can accept, | ||
however. These are `Array[Byte]`s and `ByteReader`s. We will stick with | ||
`String`s for this lesson, but `Array[Byte]` is not that much different. It's | ||
just all of the bytes within your file held in an Array. To learn more about | ||
`ByteReader`, follow this [link](../util/byte-reader.md). | ||
|
||
The last part of our four line coding escapade is: | ||
```scala | ||
val geoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband(path) | ||
``` | ||
This line assigns the variable, `geoTiff`, to the file that is being read in. | ||
Notice the `geoTiff`'s type, though. It is `SinglebandGeoTiff`. Why does | ||
`geoTiff` have this type? It's because in GeoTrellis, `SinglebandGeoTiff`s and | ||
`MutlibandGeoTiff`s are two seperate subtypes of `GeoTiff`. In case you were | ||
wondering about the second `import` statement earlier, this is where is comes | ||
into play; as these two types are defined within | ||
`geotrellis.raster.io.geotiff`. | ||
|
||
Great! We have a `SinglebandGeoTiff`. Let's say that we have a | ||
`MultibandGeoTiff`, though; let's use the code from above to read it. | ||
```scala | ||
import geotrellis.raster.io.geotiff.reader.GeoTiffReader | ||
import geotrellis.raster.io.geotiff._ | ||
|
||
// now a MultibandGeoTiff!!! | ||
val path: String = "path/to/raster-test/data/geotiff-test-files/3bands/3bands-striped-band.tif" | ||
val geoTiff = GeoTiffReader.readSingleband(path) | ||
``` | ||
If we run this code, what do you think will happen? The result may surprise | ||
you, we get back a `SinglebandGeoTiff`! **When told to read a | ||
`SinglebandGeoTiff` from a `MultibandGeoTiff` without a return type, the | ||
`GeoTiffReader` will just read in the first band of the file and return that**. | ||
Thus, it is important to keep in mind what kind of GeoTiff you are working | ||
with, or else you could get back an incorrect result. | ||
|
||
To remedy this issue, we just have to change the method call and return type so | ||
that `GeoTiffReader` will read in all of the bands of our GeoTiff. | ||
```scala | ||
val geoTiff: MultibandGeoTiff = GeoTiffReader.readMultiband(path) | ||
``` | ||
And that's it! We now have our `MutlibandGeoTiff`. | ||
|
||
> ##### Beginner Tip | ||
> A good way to ensure that your codes works properly is to give the return | ||
> data type for each of your `val`s and `def`s. If by chance your return type | ||
> and is different from what is actually returned, the compiler will throw an | ||
> error. In addition, this will also make your code easier to read and | ||
> understand for both you and others as well. | ||
> | ||
> Example: | ||
> | ||
> ```scala | ||
> val multiPath = "path/to/a/multiband/geotiff.tif | ||
> | ||
> // This will give you the wrong result!!! | ||
> val geoTiff = GeoTiffReader.readSingleband(multiPath) | ||
> | ||
> // This will cause your compiler to throw an error | ||
> val geoTiff: MultibandGeoTiff = GeoTiffReader.readSingleband(multiPath) | ||
>``` | ||
Before we move on to the next section, I'd like to take moment and talk about | ||
an alternative way in which you can read in GeoTiffs. Both `SinglebandGeoTiff`s | ||
and `MultibandGeoTiff`s have their own `apply` methods, this means that you can | ||
give your parameter(s) directly to their companion objects and you'll get back | ||
a new instance of the class. | ||
For `SinglebandGeoTiff`s: | ||
```scala | ||
import geotrellis.raster.io.geotiff.SinglebandGeoTiff | ||
val path: String = "path/to/raster-test/data/geotiff-test-files/lzw_int32.tif" | ||
val geoTiff: SinglebandGeoTiff = SinglebandGeoTiff(path) | ||
``` | ||
There are two differences found within this code from the previous example. The first is this: | ||
```scala | ||
import geotrellis.raster.io.geotiff.SinglebandGeoTiff | ||
``` | ||
As stated earlier, `SinglebandGeoTiff` and `MultibandGeoTiff` are found within | ||
a different folder of `geotrellis.raster.io.geotiff`. This is important to keep | ||
in mind when importing, as it can cause your code not to compile if you refer | ||
to the wrong sub-folder. | ||
The second line that was changed is: | ||
```scala | ||
val geoTiff: SinglebandGeoTiff = SinglebandGeoTiff(path) | ||
``` | ||
Here, we see `SinglebandGeoTiff`'s `apply` method being used on `path`. Which | ||
returns the same thing as `GeoTiffReader.readSingleband(path)`, but with less | ||
verbosity. | ||
`MultibandGeoTiff`s are the exact same as their singleband counterparts. | ||
```Scala | ||
import geotrellis.raster.io.geotiff.MultibandGeoTiff | ||
val path: String = "raster-test/data/geotiff-test-files/3bands/3bands-striped-band.tif" | ||
val geoTiff: MultibandGeoTiff = MultibandGeoTiff(path) | ||
``` | ||
Our overview of basic GeoTiff reading is now done! But keep reading! For you | ||
have greater say over how your GeoTiff will be read than what has been shown. | ||
- - - | ||
### Reading Locally Part 2: Expanding Our Vocab | ||
We can read GeoTiffs, now what? Well, there's actually more that we can do when | ||
reading in a file. Sometimes you have a compressed GeoTiff, or other times you | ||
might want to read in only a sub-section of GeoTiff and not the whole | ||
thing. In either case, GeoTrellis can handle these issues with ease. | ||
#### Dealing With Compressed GeoTiffs | ||
Compression is a method in which data is stored with fewer bits and can then be | ||
uncompressed so that all data becomes available. This applies to GeoTiffs as | ||
well. When reading in a GeoTiff, you can state whether or not you want a | ||
compressed file to be uncompressed or not. | ||
```scala | ||
import geotrellis.raster.io.geotiff.reader.GeoTiffReader | ||
import geotrellis.raster.io.geotiff._ | ||
// reading in a compressed GeoTiff and keeping it compressed | ||
val compressedGeoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif", false, false) | ||
// reading in a compressed GeoTiff and uncompressing it | ||
val compressedGeoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif", true, false) | ||
``` | ||
As you can see from the above code sample, the first `Boolean` value is what | ||
determines whether or not the file should be decompressed or not. What does the | ||
other `Boolean` value for? We'll get to that soon! For right now, though, we'll | ||
just focus on the first one. | ||
Why would you want to leave a file compressed or have uncompressed when reading | ||
it? One of the benefits of using compressed GeoTiffs is that might lead to | ||
better performance depending on your system and the size of the file. Another | ||
instance where the compression is needed is if your file is over 4GB is size. | ||
This is because when a GeoTiff is uncompressed in GeoTrellis, it is stored in | ||
an Array. Anything over 4GB is larger than the max array size for Java, so | ||
trying read in anything bigger will cause your process to crash. | ||
By default, decompression occurs on all read GeoTiffs. Thus, these two lines of | ||
code are the same. | ||
```scala | ||
// these will both return the same thing! | ||
GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif") | ||
GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif", true, false) | ||
``` | ||
In addition, both `SinglebandGeoTiff` and `MultibandGeoTiff` have a method, | ||
`compressed`, that uncompresses a GeoTiff when it is read in. | ||
```scala | ||
SinglebandGeoTiff.compressed("path/to/compressed/geotiff.tif") | ||
MultibandGeoTiff.compressed("path/to/compressed/geotiff.tif") | ||
``` | ||
#### Streaming GeoTiffs | ||
Remember that mysterious second parameter from earlier? It determines if a | ||
GeoTiff should be read in via streaming or not. What is streaming? Streaming is | ||
process of not reading in all of the data of a file at once, but rather getting | ||
the data as you need it. It's like a "lazy read". Why would you want this? The | ||
benefit of streaming is that it allows you to work with huge or just parts of | ||
files. In turn, this makes it possible to read in sub-sections of GeoTiffs | ||
and/or not having to worry about memory usage when working with large files. | ||
##### Tips For Using This Feature | ||
It is important to go over the strengths and weaknesses of this feature before | ||
use. If implemented well, the WindowedGeoTiff Reader can save you a large | ||
amount of time. However, it can also lead to further problems if it is not used | ||
how it was intended. | ||
It should first be stated that this reader was made to read in ***sections*** | ||
of a Geotiff. Therefore, reading in either the entire, or close to the whole | ||
file will either be comparable or slower than reading in the entire file at | ||
once and then cropping it. In addition, crashes may occur depending on the size | ||
of the file. | ||
##### Reading in Small Files | ||
Smaller files are GeoTiffs that are less than or equal to 4GB in isze. The way | ||
to best utilize the reader for these kinds of files differs from larger ones. | ||
To gain optimum performance, the principle to follow is: **the smaller the area | ||
selected, the faster the reading will be**. What the exact performance increase | ||
will be depends on the bandtype of the file. The general pattern is that the | ||
larger the datatype is, quicker it will be at reading. Thus, a Float64 GeoTiff | ||
will be loaded at a faster rate than a UByte GeoTiff. There is one caveat to | ||
this rule, though. Bit bandtype is the smallest of all the bandtypes, yet it | ||
can be read in at speed that is similar to Float32. | ||
For these files, 90% of the file is the cut off for all band and storage types. | ||
Anything more may cause performance declines. | ||
##### Reading in Large Files | ||
Whereas small files could be read in full using the reader, larger files cannot | ||
as they will crash whatever process you're running. The rules for these sorts | ||
of files are a bit more complicated than that of their smaller counterparts, | ||
but learning them will allow for much greater performance in your analysis. | ||
One similarity that both large and small files share is that they have the same | ||
principle: **the smaller the area selected, the faster the reading will be**. | ||
However, while smaller files may experience slowdown if the selected area is | ||
too large, these bigger files will crash. Therefore, this principle must be | ||
applied more strictly than with the previous file sizes. | ||
In large files, the pattern of performance increase is the reverse of the | ||
smaller files. Byte bandtype can not only read faster, but are able to read in | ||
larger areas than bigger bandtypes. Indeed, the area which you can select is | ||
limited to what the bandtype of the GeoTiff is. Hence, an additional principle | ||
applies for these large files: **the smaller the bandtype, the larger of an | ||
area you can select**. The exact size for each bandtype is not known, estimates | ||
have been given in the table bellow that should provide some indication as to | ||
what size to select. | ||
| BandType | Area Threshold Range In Cells | | ||
|:--------:|:----------------------------------------------------:| | ||
| Byte | [5.76 * 10<sup>9</sup>, 6.76 * 10<sup>9</sup>) | | ||
| Int16 | [3.24 * 10<sup>9</sup>, 2.56 * 10<sup>9</sup>) | | ||
| Int32 | [1.44 * 10<sup>9</sup>, 1.96 * 10<sup>9</sup>) | | ||
| UInt16 | [1.96 * 10<sup>9</sup>, 2.56 * 10<sup>9</sup>) | | ||
| UInt32 | [1.44 * 10<sup>9</sup>, 1.96 * 10<sup>9</sup>) | | ||
| Float32 | [1.44 * 10<sup>9</sup>, 1.96 * 10<sup>9</sup>) | | ||
| Float64 | [3.6 * 10<sup>8</sup>, 6.4 * 10<sup>8</sup>) | | ||
- - - | ||
##### How to Use This Feature | ||
Using this feature is straight forward and easy. There are two ways to | ||
implement the WindowedReader: Supplying the desired extent with the path to the | ||
file, and cropping an already existing file that is read in through a stream. | ||
###### Using Apply Methods | ||
Supplying an extent with the file's path and having it being read in windowed | ||
can be done in the following ways: | ||
```scala | ||
val path: String = "path/to/my/geotiff.tif" | ||
val e: Extent = Extent(0, 1, 2, 3) | ||
// supplying the extent as an Extent | ||
// if the file is singleband | ||
SinglebandGeoTiff(path, e) | ||
// or | ||
GeoTiffReader.readSingleband(path, e) | ||
// if the file is multiband | ||
MultibandGeoTiff(path, e) | ||
// or | ||
GeoTiffReader.readMultiband(path, e) | ||
// supplying the extent as an Option[Extent] | ||
// if the file is singleband | ||
SinglebandGeoTiff(path, Some(e)) | ||
// or | ||
GeoTiffReader.readSingleband(path, Some(e)) | ||
// if the file is multiband | ||
MultibandGeoTiff(path, Some(e)) | ||
// or | ||
GeoTiffReader.readMultiband(path, Some(e)) | ||
``` | ||
###### Using Object Methods | ||
Cropping an already loaded GeoTiff that was read in through Streaming. By using | ||
this method, the actual file isn't loaded into memory, but its data can still | ||
be accessed. Here's how to do the cropping: | ||
```scala | ||
val path: String = "path/to/my/geotiff.tif" | ||
val e: Extent = Extent(0, 1, 2, 3) | ||
// doing the reading and cropping in one line | ||
// if the file is singleband | ||
SinglebandGeoTiff.streaming(path).crop(e) | ||
// or | ||
GeoTiffReader.readSingleband(path, false, true).crop(e) | ||
// if the file is multiband | ||
MultibandGeoTiff.streaming(path).crop(e) | ||
// or | ||
GeoTiffReader.readMultiband(path, false, true).crop(e) | ||
// doing the reading and cropping in two lines | ||
// if the file is singleband | ||
val sgt: SinglebandGeoTiff = | ||
SinglebandGeoTiff.streaming(path) | ||
// or | ||
GeoTiffReader.readSingleband(path, false, true) | ||
sgt.crop(e) | ||
// if the file is multiband | ||
val mgt: MultibandGeoTiff = | ||
MultibandGeoTiff.streaming(path) | ||
// or | ||
GeoTiffReader.readMultiband(path, false, true) | ||
mgt.crop(e) | ||
``` | ||
- - - | ||
### Conclusion | ||
That takes care of reading local GeoTiff files! It should be said, though, that | ||
what we went over here does not just apply to reading local files. In fact, | ||
reading in GeoTiffs from other sources have similar parameters that you can use | ||
to achieve the same goal. |