Skip to content

Commit

Permalink
Added ReadingGeoTiffs docs
Browse files Browse the repository at this point in the history
Signed-off-by: jbouffard <jbouffard@azavea.com>
  • Loading branch information
jbouffard committed Dec 5, 2016
1 parent 3391f2f commit 4cfa7ae
Showing 1 changed file with 370 additions and 0 deletions.
370 changes: 370 additions & 0 deletions docs/tutorials/reading-geoTiffs.md
@@ -0,0 +1,370 @@
## Reading GeoTiffs

- [Introduction](#introduction)
- [Reading GeoTiffs With GeoTrellis](#reading-geotiffs-with-geotrellis)
- [Reading Locally Part 1: Reading For the First Time](#reading-locally-part-1-reading-for-the-first-time)
- [Reading Locally Part 2: Expanding Our Vocab](reading-locally-part-2-expanding-our-vocab)
- [Dealing With Compressed GeoTiffs](#dealing-with-compressed-geotiffs)
- [Streaming GeoTiffs](#streaming-in-geotiffs)
- [Tips for Using This Feature](#tips-for-using-this-feature)
- [Reading in Small Files >= 4GB](#reading-in-small-files->=-4GB)
- [Reading in Large Files](#reading-in-large-files)
- [How to Use This Feature](#how-to-use-this-feature)
- [Using Apply Methods](#using-apply-methods)
- [Using Object Methods](#using-object-methods)
- [Conclusion](#conclusion)

### Introduction
This tutorial will go over how to read GeoTiff files using GeoTrellis on your
local filesystem. It assumes that you already have the environment needed to
run these examples. If not, please follow this [link](setup.md) to get
GeoTrellis working on your system. Also, this tutorial uses GeoTiffs from the
`raster-test` project from GeoTrellis. If you have not already done so, please
clone GeoTrellis [here](https://github.com/geotrellis/geotrellis) so that you
can access the needed files.
- - -

### Reading GeoTiffs With GeoTrellis
One of the most common methods of storing geospatial information is through
GeoTiffs. This is reflected throughout the GeoTrellis library where many of its
features can work with GeoTiffs. Which would mean that there would have to be
many different ways to read in GeoTiff, and indeed there are! In the following
document, we will go over the methods needed to load in a GeoTiff from your
local filesystem.

Before we start, open a Scala REPL in the Geotrellis directory.

#### Reading Locally Part 1: Reading For the First Time
Reading a local GeoTiff is actually pretty easy. You can see how to do it below.
```scala
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
import geotrellis.raster.io.geotiff._

val path: String = "path/to/geotrellis/raster-test/data/geotiff-test-files/lzw_int32.tif"
val geoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband(path)
```
And that's it! Not too bad at all really, just four lines of code. Even still,
though, let's break this down line-by-line so we can see what exactly is going
on.
```scala
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
```
This import statement brings in `GeoTiffReader` from
`geotrellis.raster.io.geotiff.reader` so that we can use it in the REPL. As the
name implys, `GeoTiffReader` is the object that actually reads the GeoTiff. If
you ever wonder about how we analyze and process GeoTiffs, then
`geotrellis.raster.io.geotiff` would be the place to look. Here's a
[link](https://github.com/geotrellis/geotrellis/tree/master/raster/src/main/scala/geotrellis/raster/io/geotiff).
```scala
import geotrellis.raster.io.geotiff._
```
The next import statement loads in various data types that we need so that we
can assign them to our `val`s.

Okay, so we brought in the object that will give us our GeoTiff, now we just
need to supply it what to read. This is where the next line of code comes into
play.
```scala
val path: String = "path/to/geotrellis/raster-test/data/geotiff-test-files/lzw_int32.tif"
```
Our `path` variable is a `String` that contains the file path to a GeoTiff in
`geotrellis.raster-test`. `GeoTiffReader` will use this value then to read in
our GeoTiff. There are more types of paramters `GeoTiffReader` can accept,
however. These are `Array[Byte]`s and `ByteReader`s. We will stick with
`String`s for this lesson, but `Array[Byte]` is not that much different. It's
just all of the bytes within your file held in an Array. To learn more about
`ByteReader`, follow this [link](../util/byte-reader.md).

The last part of our four line coding escapade is:
```scala
val geoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband(path)
```
This line assigns the variable, `geoTiff`, to the file that is being read in.
Notice the `geoTiff`'s type, though. It is `SinglebandGeoTiff`. Why does
`geoTiff` have this type? It's because in GeoTrellis, `SinglebandGeoTiff`s and
`MutlibandGeoTiff`s are two seperate subtypes of `GeoTiff`. In case you were
wondering about the second `import` statement earlier, this is where is comes
into play; as these two types are defined within
`geotrellis.raster.io.geotiff`.

Great! We have a `SinglebandGeoTiff`. Let's say that we have a
`MultibandGeoTiff`, though; let's use the code from above to read it.
```scala
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
import geotrellis.raster.io.geotiff._

// now a MultibandGeoTiff!!!
val path: String = "path/to/raster-test/data/geotiff-test-files/3bands/3bands-striped-band.tif"
val geoTiff = GeoTiffReader.readSingleband(path)
```
If we run this code, what do you think will happen? The result may surprise
you, we get back a `SinglebandGeoTiff`! **When told to read a
`SinglebandGeoTiff` from a `MultibandGeoTiff` without a return type, the
`GeoTiffReader` will just read in the first band of the file and return that**.
Thus, it is important to keep in mind what kind of GeoTiff you are working
with, or else you could get back an incorrect result.

To remedy this issue, we just have to change the method call and return type so
that `GeoTiffReader` will read in all of the bands of our GeoTiff.
```scala
val geoTiff: MultibandGeoTiff = GeoTiffReader.readMultiband(path)
```
And that's it! We now have our `MutlibandGeoTiff`.

> ##### Beginner Tip
> A good way to ensure that your codes works properly is to give the return
> data type for each of your `val`s and `def`s. If by chance your return type
> and is different from what is actually returned, the compiler will throw an
> error. In addition, this will also make your code easier to read and
> understand for both you and others as well.
>
> Example:
>
> ```scala
> val multiPath = "path/to/a/multiband/geotiff.tif
>
> // This will give you the wrong result!!!
> val geoTiff = GeoTiffReader.readSingleband(multiPath)
>
> // This will cause your compiler to throw an error
> val geoTiff: MultibandGeoTiff = GeoTiffReader.readSingleband(multiPath)
>```
Before we move on to the next section, I'd like to take moment and talk about
an alternative way in which you can read in GeoTiffs. Both `SinglebandGeoTiff`s
and `MultibandGeoTiff`s have their own `apply` methods, this means that you can
give your parameter(s) directly to their companion objects and you'll get back
a new instance of the class.
For `SinglebandGeoTiff`s:
```scala
import geotrellis.raster.io.geotiff.SinglebandGeoTiff
val path: String = "path/to/raster-test/data/geotiff-test-files/lzw_int32.tif"
val geoTiff: SinglebandGeoTiff = SinglebandGeoTiff(path)
```
There are two differences found within this code from the previous example. The first is this:
```scala
import geotrellis.raster.io.geotiff.SinglebandGeoTiff
```
As stated earlier, `SinglebandGeoTiff` and `MultibandGeoTiff` are found within
a different folder of `geotrellis.raster.io.geotiff`. This is important to keep
in mind when importing, as it can cause your code not to compile if you refer
to the wrong sub-folder.
The second line that was changed is:
```scala
val geoTiff: SinglebandGeoTiff = SinglebandGeoTiff(path)
```
Here, we see `SinglebandGeoTiff`'s `apply` method being used on `path`. Which
returns the same thing as `GeoTiffReader.readSingleband(path)`, but with less
verbosity.
`MultibandGeoTiff`s are the exact same as their singleband counterparts.
```Scala
import geotrellis.raster.io.geotiff.MultibandGeoTiff
val path: String = "raster-test/data/geotiff-test-files/3bands/3bands-striped-band.tif"
val geoTiff: MultibandGeoTiff = MultibandGeoTiff(path)
```
Our overview of basic GeoTiff reading is now done! But keep reading! For you
have greater say over how your GeoTiff will be read than what has been shown.
- - -
### Reading Locally Part 2: Expanding Our Vocab
We can read GeoTiffs, now what? Well, there's actually more that we can do when
reading in a file. Sometimes you have a compressed GeoTiff, or other times you
might want to read in only a sub-section of GeoTiff and not the whole
thing. In either case, GeoTrellis can handle these issues with ease.
#### Dealing With Compressed GeoTiffs
Compression is a method in which data is stored with fewer bits and can then be
uncompressed so that all data becomes available. This applies to GeoTiffs as
well. When reading in a GeoTiff, you can state whether or not you want a
compressed file to be uncompressed or not.
```scala
import geotrellis.raster.io.geotiff.reader.GeoTiffReader
import geotrellis.raster.io.geotiff._
// reading in a compressed GeoTiff and keeping it compressed
val compressedGeoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif", false, false)
// reading in a compressed GeoTiff and uncompressing it
val compressedGeoTiff: SinglebandGeoTiff = GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif", true, false)
```
As you can see from the above code sample, the first `Boolean` value is what
determines whether or not the file should be decompressed or not. What does the
other `Boolean` value for? We'll get to that soon! For right now, though, we'll
just focus on the first one.
Why would you want to leave a file compressed or have uncompressed when reading
it? One of the benefits of using compressed GeoTiffs is that might lead to
better performance depending on your system and the size of the file. Another
instance where the compression is needed is if your file is over 4GB is size.
This is because when a GeoTiff is uncompressed in GeoTrellis, it is stored in
an Array. Anything over 4GB is larger than the max array size for Java, so
trying read in anything bigger will cause your process to crash.
By default, decompression occurs on all read GeoTiffs. Thus, these two lines of
code are the same.
```scala
// these will both return the same thing!
GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif")
GeoTiffReader.readSingleband("path/to/compressed/geotiff.tif", true, false)
```
In addition, both `SinglebandGeoTiff` and `MultibandGeoTiff` have a method,
`compressed`, that uncompresses a GeoTiff when it is read in.
```scala
SinglebandGeoTiff.compressed("path/to/compressed/geotiff.tif")
MultibandGeoTiff.compressed("path/to/compressed/geotiff.tif")
```
#### Streaming GeoTiffs
Remember that mysterious second parameter from earlier? It determines if a
GeoTiff should be read in via streaming or not. What is streaming? Streaming is
process of not reading in all of the data of a file at once, but rather getting
the data as you need it. It's like a "lazy read". Why would you want this? The
benefit of streaming is that it allows you to work with huge or just parts of
files. In turn, this makes it possible to read in sub-sections of GeoTiffs
and/or not having to worry about memory usage when working with large files.
##### Tips For Using This Feature
It is important to go over the strengths and weaknesses of this feature before
use. If implemented well, the WindowedGeoTiff Reader can save you a large
amount of time. However, it can also lead to further problems if it is not used
how it was intended.
It should first be stated that this reader was made to read in ***sections***
of a Geotiff. Therefore, reading in either the entire, or close to the whole
file will either be comparable or slower than reading in the entire file at
once and then cropping it. In addition, crashes may occur depending on the size
of the file.
##### Reading in Small Files
Smaller files are GeoTiffs that are less than or equal to 4GB in isze. The way
to best utilize the reader for these kinds of files differs from larger ones.
To gain optimum performance, the principle to follow is: **the smaller the area
selected, the faster the reading will be**. What the exact performance increase
will be depends on the bandtype of the file. The general pattern is that the
larger the datatype is, quicker it will be at reading. Thus, a Float64 GeoTiff
will be loaded at a faster rate than a UByte GeoTiff. There is one caveat to
this rule, though. Bit bandtype is the smallest of all the bandtypes, yet it
can be read in at speed that is similar to Float32.
For these files, 90% of the file is the cut off for all band and storage types.
Anything more may cause performance declines.
##### Reading in Large Files
Whereas small files could be read in full using the reader, larger files cannot
as they will crash whatever process you're running. The rules for these sorts
of files are a bit more complicated than that of their smaller counterparts,
but learning them will allow for much greater performance in your analysis.
One similarity that both large and small files share is that they have the same
principle: **the smaller the area selected, the faster the reading will be**.
However, while smaller files may experience slowdown if the selected area is
too large, these bigger files will crash. Therefore, this principle must be
applied more strictly than with the previous file sizes.
In large files, the pattern of performance increase is the reverse of the
smaller files. Byte bandtype can not only read faster, but are able to read in
larger areas than bigger bandtypes. Indeed, the area which you can select is
limited to what the bandtype of the GeoTiff is. Hence, an additional principle
applies for these large files: **the smaller the bandtype, the larger of an
area you can select**. The exact size for each bandtype is not known, estimates
have been given in the table bellow that should provide some indication as to
what size to select.
| BandType | Area Threshold Range In Cells |
|:--------:|:----------------------------------------------------:|
| Byte | [5.76 * 10<sup>9</sup>, 6.76 * 10<sup>9</sup>) |
| Int16 | [3.24 * 10<sup>9</sup>, 2.56 * 10<sup>9</sup>) |
| Int32 | [1.44 * 10<sup>9</sup>, 1.96 * 10<sup>9</sup>) |
| UInt16 | [1.96 * 10<sup>9</sup>, 2.56 * 10<sup>9</sup>) |
| UInt32 | [1.44 * 10<sup>9</sup>, 1.96 * 10<sup>9</sup>) |
| Float32 | [1.44 * 10<sup>9</sup>, 1.96 * 10<sup>9</sup>) |
| Float64 | [3.6 * 10<sup>8</sup>, 6.4 * 10<sup>8</sup>) |
- - -
##### How to Use This Feature
Using this feature is straight forward and easy. There are two ways to
implement the WindowedReader: Supplying the desired extent with the path to the
file, and cropping an already existing file that is read in through a stream.
###### Using Apply Methods
Supplying an extent with the file's path and having it being read in windowed
can be done in the following ways:
```scala
val path: String = "path/to/my/geotiff.tif"
val e: Extent = Extent(0, 1, 2, 3)
// supplying the extent as an Extent
// if the file is singleband
SinglebandGeoTiff(path, e)
// or
GeoTiffReader.readSingleband(path, e)
// if the file is multiband
MultibandGeoTiff(path, e)
// or
GeoTiffReader.readMultiband(path, e)
// supplying the extent as an Option[Extent]
// if the file is singleband
SinglebandGeoTiff(path, Some(e))
// or
GeoTiffReader.readSingleband(path, Some(e))
// if the file is multiband
MultibandGeoTiff(path, Some(e))
// or
GeoTiffReader.readMultiband(path, Some(e))
```
###### Using Object Methods
Cropping an already loaded GeoTiff that was read in through Streaming. By using
this method, the actual file isn't loaded into memory, but its data can still
be accessed. Here's how to do the cropping:
```scala
val path: String = "path/to/my/geotiff.tif"
val e: Extent = Extent(0, 1, 2, 3)
// doing the reading and cropping in one line
// if the file is singleband
SinglebandGeoTiff.streaming(path).crop(e)
// or
GeoTiffReader.readSingleband(path, false, true).crop(e)
// if the file is multiband
MultibandGeoTiff.streaming(path).crop(e)
// or
GeoTiffReader.readMultiband(path, false, true).crop(e)
// doing the reading and cropping in two lines
// if the file is singleband
val sgt: SinglebandGeoTiff =
SinglebandGeoTiff.streaming(path)
// or
GeoTiffReader.readSingleband(path, false, true)
sgt.crop(e)
// if the file is multiband
val mgt: MultibandGeoTiff =
MultibandGeoTiff.streaming(path)
// or
GeoTiffReader.readMultiband(path, false, true)
mgt.crop(e)
```
- - -
### Conclusion
That takes care of reading local GeoTiff files! It should be said, though, that
what we went over here does not just apply to reading local files. In fact,
reading in GeoTiffs from other sources have similar parameters that you can use
to achieve the same goal.

0 comments on commit 4cfa7ae

Please sign in to comment.