# Reading Cloud-Optimized GeoTIFFs the Hard Way

In this notebook we will explore how one can read Cloud-Optimized GeoTIFFs (COGs) the hard way, i.e., by requesting and parsing byte ranges by hand. We'll query the Earth Search STAC catalog to find an image in COG format, parse the embedded metadata and file structure out of the file, then use that information to read the bytes of an image tile from the file and process them into a usable numpy array. We'll then visualize that array in a slippy map to verify what we did got us the expected result.

Before we get into it, we have to get some initial stuff out of the way, like imports and some other defs we'll need for later.

In [ ]:
from __future__ import annotations

import dataclasses
import enum
import json
import struct
import urllib.request

from pprint import pprint
from typing import Any, Iterator, Literal, Self, TypedDict

import folium
import numpy as np

from griffine import Affine, Grid
from odc.geo.geom import point
from pystac_client import Client

In [ ]:
# This is a mapping of the TIFF data types to the struct package's format charaters
# see https://docs.python.org/3/library/struct.html#format-characters

DATA_TYPES = {
    1: 'B',  # BYTE (uint8)
    2: 's',  # ASCII (char[1])
    3: 'H',  # SHORT (uint16)
    4: 'I',  # LONG (uint32)
    5: 'II',  # RATIONAL (uint32[2])
    6: 'b',  # SBYTE (int8)
    7: 'B',  # UNDEFINED (uint8)
    8: 'h',  # SSHORT (int16)
    9: 'i',  # SLONG (int32)
    10: 'ii',  # SRATIONAL (int32[2])
    11: 'f',  # FLOAT (float32)
    12: 'd',  # DOUBLE (float64)
    13: 'I',  # SUBIFD (uint32)
    # 14: '',
    # 15: '',
    16: 'Q',  # ? (uint64)
    17: 'q',  # ? (int64)
    18: 'Q',  # ? (uint64)
}

In [ ]:
EPSG_4326 = 'EPSG:4326'

ENDIANNESS = {
    b'MM': '>',  # big endian
    b'II': '<',  # little endian
}

In [ ]:
def binary(_bytes: bytes, join_str: str = ' ') -> None:
    _hex = _bytes.hex()
    return join_str.join([
        '{:08b}'.format(int(_hex[i:i+2], 16))
        for i in range(0, len(_hex), 2)
    ])

def url_read_bytes(url: str, start: int, end: int) -> bytes:
    request = urllib.request.Request(
        url,
        headers={'Range': f'bytes={start}-{end-1}'},
    )
    with urllib.request.urlopen(request) as response:
        return response.read()

## Point of Interest (POI)

To give us something to use for our query, let's define a point of interest.

In [ ]:
# Point of Interest
POI = point(-121.695833, 45.373611, crs=EPSG_4326)

# Let's find out where the point is
point_map = POI.explore(name='point')
point_map

## Querying Earth Search

We'll use pystac-client to search the Earth Search Sentinel 2 L2A collection for a scene intersecting our POI. We'll aim for something with low cloud cover, in the year 2023, and we'll pick the most recent scene that matches these parameters.

**WARNING**: You _can_ change this to fetch scenes from a different collection, STAC API, or not use STAC and just put in an href directly to a COG of your choosing. Doing so is discouraged while in the workshop, as differences in the way the file was created might be impossible to overcome within the time limits of this workshop. Consider leaving this as-is to start, and at a later date, when you have more familiarity parsing TIFFs, you can try a different source.

In [ ]:
client = Client.open("https://earth-search.aws.element84.com/v1")

search = client.search(
    max_items=1,
    collections=['sentinel-2-c1-l2a'],
    intersects=POI,
    datetime='2023/2023',
    query=['eo:cloud_cover<10'],
    sortby=[{"direction": "desc", "field": "properties.datetime"}],
)
item = next(search.items())
print(json.dumps(item.to_dict(), indent=4))

We can throw that item onto our map to see its footprint relative to our POI.

In [ ]:
stac_item_layer = folium.GeoJson(item, name='stac-item-footprint')
point_map.fit_bounds(stac_item_layer.get_bounds())
stac_item_layer.add_to(point_map)

point_map

Notably, the item we retrieved has many different bands, all of them COGs. We only need one for this exercise, so we'll grab the red band's href because that should be a good looking band visually.

In [ ]:
href = item.assets['red'].href
print(href)

## The TIFF file header

The first few bytes of a TIFF file tell us a couple important things we'll need to process the rest of the file. First is the endianness of the file, second is that the file is really a TIFF file.

Note that not all files with a `.tif` extension are a standard TIFF. Most notably, a standard TIFF file uses 32-bit integer offsets within the file to index particular bytes within the file (such as the offset to the first byte in an image tile, for example). Due to the maximum value of such an integer, standard TIFF files have a maximum file size of 4GB (or 2GB for certain archiac implementations that mistakenly used _signed_ integers for offset values). To get around this limitation, the BigTIFF format was developed using 64-bit integer offsets--but this is not a standard TIFF! It is close, but different enough we're not going to worry about supporting it for our little TIFF "library" that we're going to build.

Actually, we're not going to support a lot of things. This implementation is going to be very specific to what we need for the specific Earth Search COG we selected. That's okay: sometimes a purpose-built implementation is more performant because it doesn't have to handle all the weird edge cases. Or at least that's what we can tell ourselves to feel better as we go along and notice how many conditions we're not handling in a more general way.

### So what's in the header already?

Enough jibber-jabber, let's read out the header. It's the first 4 bytes of a standard TIFF.

In [ ]:
# (See notes: cell0)

### Endianness and the TIFF Byte-Order Mark

TIFF uses the first two bytes of the file to encode the endianness of the file as the "Byte-Order Mark". This presumably enables writers to use the most efficient endianess for their host system, if needed. Readers must support reading big or little endian files, where writers can pick one endianness.

We can consider the endianness of a TIFF file to "describe the order of the bytes in a multi-byte data type". That is, when we have a 16-, 32-, or 64-bit value, the endianness tells how to interpret which bytes are most- and least-signifcant. In the case of a 32-bit integer value, that looks like this:

```
16^1
|16^0
|| 16^3
|| |16^2
|| || 16^5
|| || |16^4
|| || || 16^7
|| || || ⎮16^6
C0 00 00 00    = 192  little endian (II)

00 00 00 C0    = 192  big endian (MM)
|| || || ⎮16^0
|| || || 16^1
|| || |16^2
|| || 16^3
|| |16^4
|| 16^5
|16^6
16^7
```

Big endian is encoded as the byte-order mark `MM` (from Motorola processors), and little endian is encoded as `II` (from Intel processors).*

The doubled letter is used to ensure that the binary sequence is the same no matter the endianness. Again, this is because the endianness affects only the order of the bytes in that two-byte word, not the order of the bits within each of the bytes.

----

*Naively, the above example might make it seem silly to use little endian notation--big endian seems much more natural, given the way we generally have been taught to read and write numbers. However, when it comes to how most microprocessors and memory operations actually work, little endian has some clear benefits which has generally led to its dominance outside networking. More of the nuance and complication of endianness is well documented on [its wikipedia page](https://en.wikipedia.org/wiki/Endianness).

In [ ]:
endianness = ENDIANNESS[header[0:2]]  # We'll need this later, so let's save it into a var now
print(f"Endianness signature: {header[0:2]}; struct endianness format char: '{endianness}'")

To reiterate this point about endianness: we care about endianness so we can ensure we can interpret the bytes in each word in the file in the appropriate order. Beyond that we aren't going to need to worry about endianness.

### Magic number

Many files encode a special number in their first few bytes, which can be used to distingush the files is of a given format. Wikipedia has [a big long list of these "magic numbers"](https://en.wikipedia.org/wiki/List_of_file_signatures) for anyone curious. TIFF uses the value `42` for it's magic number (and BigTIFF 43).

In [ ]:
# (See notes: cell1)

## The Python `struct` module

Note the use of the `struct` module above. This is a module from the Python stdlib that is super handy when working with binary data, as it is able to pack (Python type to binary representation) and unpack (binary representation to Python type) data values given a specific format. Packing and unpacking allow specifying the endianness of the binary data using the `>` and `<` characgters, which designate big and little endianness, respectively.

The `H` in the magic number unpacking indicates that the data type is `uint16`. Some of data types we'll be working and their struct format charaters include:

| Data type | Format character |
| --------- | ---------------- |
| `uint8`   | `B`              |
| `char[1]` | `s`              |
| `unit16`  | `H`              |
| `unint32` | `I`              |

Additional format characters are listed in the `DATA_TYPES` dict defined near the beginning of this notebook; also consider reviewing [the `struct` docs](https://docs.python.org/3/library/struct.html) for the full list of format codes available and other details on how to use `struct`.

## First IFD offset: the next four bytes

Immediately following the TIFF header is the offset to the first image file directory (IFD) in the file. In a standard TIFF this offset is a 32-bit unsigned integer, as was previously alluded. We can read in and view those bytes:

In [ ]:
# (See notes: cell2)

Though that's not super useful until we unpack those bytes into in integer (here using `I` because the offset is a `uint32` value):

In [ ]:
# (See notes: cell3)

## Parsing the Image File Directory

The Image File Directory (IFD) is a data structure composed of entries called tags (hence the name "Tag Image File Format"). The IFD doesn't start with the first tag entry, however. It begins with a 2-byte `unit16` value indicating the number of tags within the IFD. This value enables us, along with the IFD offset within the file, to read the entire sequence of tag bytes via `file_bytes[ifd_offset + 2:ifd_offset + (tags_count * tag_size)]`.

### Tag structure

In a standard TIFF, tags are a 12-byte sequence (so `tag_size` above is 12 bytes) of the following structure:

| Tag Bytes | Tag field name  | Field data type |
| --------- | --------------- | --------------- |
| 0 - 1     | `code`          | `uint16`        |
| 2 - 3     | `data_type`     | `uint16`        |
| 4 - 7     | `count`         | `uint32`        |
| 8 - 11    | `value`         | `char[4]`       |

In the case of BigTIFF files, each tag is a 20-byte sequence where the `count` and `value` are of type `uint64`.

The tag `code` field gives us a way to find the meaning of the tag `value`, as the `code` is an integer that maps to the tag name. The Library of Congress has [a handy table](https://www.loc.gov/preservation/digital/formats/content/tiff_tags.shtml) we can use to look up the tags by their codes.

#### Tag data types

The tag `data_type` is also an integer value, in this case mapping to the data type we can use to interpret `value` per the following table:

| `data_type` | Type Name | Data type   |
| ----------- | --------- | ----------- |
| 1           | BYTE      | `uint8`     |
| 2           | ASCII     | `char[1]`   |
| 3           | SHORT     | `uint16`    |
| 4           | LONG      | `uint32`    |
| 5           | RATIONAL  | `uint32[2]` |
| 6           | SBYTE     | `int8`      |
| 7           | UNDEFINED | `uint8`     |
| 8           | SSHORT    | `int16`     |
| 9           | SLONG     | `int32`     |
| 10          | SRATIONAL | `int32[2]`  |
| 11          | FLOAT     | `float32`   |
| 12          | DOUBLE    | `float64`   |
| 13          | SUBIFD    | `uint32`    |
| 14          | n/a       | n/a         |
| 15          | n/a       | n/a         |
| 16          | ?         | `uint64`    |
| 17          | ?         | `int64`     |
| 18          | ?         | `uint64`    |

(I believe data types 16, 17, and 18 are specifc to BigTIFF, but I have so far been unable to find confirmation either way.)

The `count` field tells us how many of the listed `data_type` make up the `value` of the tag. Note that even a `count` of just one for a `data_type` of, say 5, or two `uint32`s would not fit in a `value` in a standard TIFF file as `value` itself is only four bytes long. Similarly, a `count` greater than 4 with a `data_type` of 1 (`uint8`) would also be larger than can fit in `value`.

In such cases where `count * len_in_bytes(data_type) > 4`, `value` itself is not actually the tag value but an offset to the actual value within the file. The length of that value is given by the previous expression `count * len_in_bytes(data_type)`. Thus, to get the actual value we can read `file_bytes[value:value + (count * len_in_bytes(data_type))]`.

The IFD doesn't end with the last tag either. Each IFD contains a 4-byte (`uint32`) offset to the next IFD in the file (or 8-byte `uint64` in the case of BigTIFF). In the event an IFD is the last one in the file it will have a value of 0 for its next IFD offset. As a result, it should be possible to build a map of the complete contents of a TIFF by iterating through its IFDs and parsing their tags into some appropriate hierarchical data structure (TIFF --< IFDs --< Image segments) .

### Finding the tag count and reading the tag bytes

As mentioned, an IFD starts with a 2-byte `uint16` value indicating its number of tags. If we have an IFD's offset (`ifd_offset`) within the file--which for the first IFD we know is given to us as the first bytes in the file immediately following the TIFF header--then we also know that IFS's tag offset (`tags_start`) is given by `ifd_offset + 2`.

Parsing the tag count (`tags_count`) should simply be a matter of using `struct.unpack` to unpack the two tag count bytes into an integer (struct format char `H` for `uint16`). We need to make sure we use the endianness indicicated in the file header. `<` is little endian in `struct.unpack`, where `>` is big endian. Looking back, the proper endian character should have been saved into the `endianness` var for us back when we were inspecting the header bytes.

In [ ]:
# (See notes: cell4)

If we know the tag count and the tag size (12 bytes for TIFF, 20 for BigTIFF), then we can find the total number of bytes in the IFD's tags by `tag_count * tag_size`. From this we should be able to find the last byte of the tags with `tags_end = tags_start + (tag_count * tag_size)`, allowing us to read the tag bytes (`tags_bytes`) from the file.

In [ ]:
# (See notes: cell5)

It's also important to note that we can use the `tags_end` to know the offset of the next IFD offset, which a 4-byte value we can unpack into a `uint32` (for a standard TIFF, it's uint64 for a BigTIFF). We won't use this for anything in this notebook, but it is good to know in case you want to take parsing further and go on to the other IFDs in this file.

In [ ]:
# (See notes: cell6)

### Parsing each tag

To parse each tag we need to find a way to split each tag's bytes out of the of the larger bytes string. Python gives us many valid ways of doing this. Let's try using a `for` loop to split the tags bytes to see what each tag looks like.

In [ ]:
# (See notes: cell7)

#### Unpacking the tag values

The above show us we can easily extract each tag's bytes, but we next need to use `struct.unpack` to extract the tag's `code`, `data_type`, `count`, and `value` binary values into Python types. Remember that `code` and `data_type` are `uint16` values, which map to the struct `H` format. Look up the proper struct format values for `count` and `value` knowing what you know about the data types of those tag fields and verify if the format passed into `struct.unpack` in the example here is correct (feel free to consult the `DATA_TYPES` dict above or the struct docs directly).

For variety, this example implementation uses a `while` loop to extract the tag bytes. Each tag's fields are added into a dictionary indexed by the tag `code` to facilitate easy access in later code.

In [ ]:
tags = {}
tag_index = 0

while tag_index < tags_count:
    try:
        tag_bytes = tags_bytes[(tag_size * tag_index) : (tag_size * (tag_index + 1))]
        tag_index += 1
    except IndexError:
        break

    code, data_type, count, value = struct.unpack(f'{endianness}HHI4s', tag_bytes)
    tags[code] = {
        'data_type': data_type,
        'count': count,
        'value': value,
    }

tags

#### Understanding tag codes

Now that we have TIFF tag values to look at, it would be good to mention the [Libray of Congress' guide to TIFF Tags](https://www.loc.gov/preservation/digital/formats/content/tiff_tags.shtml) again. We can use that lookup table to interpret each of the integer codes in a meaningful way. Note that some codes we will see in every file, while others may be specific to the way a file was encoded or the type of data it contains. Further, a number of the tags are specific to the GeoTIFF format and are required for such files, while some are used for metadata by GDAL and can generally be expected in a GeoTIFF (though not always of course).

For example, we should always expect to see 256, 257, 258, and 259 (and others, these are just good examples):

| Code | Tag Name      | Tag Description              |
| ---- | ------------- | ---------------------------- |
| 256  | ImageWidth    | Number of image columns      |
| 257  | ImageLength   | Number of image rows         |
| 258  | BitsPerSample | Number of bits in each pixel |
| 259  | Compression   | Integer mapping to compression algorithm used for each image segment |

#### Unpacking the tag values

Recalling the earlier explanation about tag data types, counts, and values, we know that unpacking the tag values will not be the same for each tag given the differences in those three aforementioned tag fields across each of our different tags. For some tags that have a single count of a shorter data type we can unpack the tag `value` directly. But for longer values we'll have to use the tag `value` as an offset into the file to read the actual bytes to unpack.

We'll start with one of these easier examples and unpack the image size tags 256 and 257. Check the data types for these tags. What are the struct format chars for each? Will we need to unpack all four bytes of the `value` for either of these tags?

In [ ]:
# (See notes: cell8)

In the cases where the tag `value`'s four bytes are not sufficient to contain the whole tag value, parsing is a bit more complex. We not only need to find the struct format character (`struct_dtype`) and size for the tag's data type, but then we need to:

* use the data type size and the tag `count` to calculate how many bytes we need to read (`size`)
* unpack the `value` to get the actual value's byte offset in the file (`offset`)
* combine `size` and `offset` to get the byte range and read that out fo the file (giving us `values`)
* build the struct format string (`endianness + (struct_dtype * count)`) then unpack `values`

We'll preview this here with an example unpacking the tile offsets tag (324). The values we get out of this (`tile_offsets`) are the byte offsets for each image segment (tile) in the image represented by this IFD. We will be able to use these offsets in the next section to read the specific tile containing our POI (though we'll have unpack the rest of our tags and do a bit of math to figure out which one and what to do with the bytes).

In [ ]:
tag = tags[324]
struct_dtype = DATA_TYPES[tag['data_type']]
size = tag['count'] * struct.calcsize(struct_dtype)
offset = struct.unpack(f'{endianness}I', tag['value'])[0]
values = url_read_bytes(href, offset, offset+size)
tile_offsets = struct.unpack(endianness + (struct_dtype * tag['count']), values)

for idx, tile_offset in enumerate(tile_offsets):
    print(f"Offset tile {idx}: {tile_offset}")

### Questions

* Refer back to the STAC item and see if the `file` STAC extension is in use. Is the file size listed for the COG asset your are examining, and if so how close to the end of the file do these tiles appear to get?
* Can you use the unpacking examples to create a generalized approach to unpacking the tag values and apply that to the rest of the tags in the IFD? The next section will have you unpack all the tags, so finding a quick an efficient way to do this might be helpful.

## Reading a tile from the image

Read the tile intersecting our POI will require most of our tags to be unpacked and decoded. Refer back to the tags dictionary `tags` keys for the list of all tag codes in our TIFF's first IFD and the above documentation on the tag codes. Unpack each tag's value into the corresponding variable name in the list below:

* `image_width`
* `image_length`
* `bits_per_sample`
* `compression`
* `samples_per_pixel`
* `predictor`
* `tile_width`
* `tile_length`
* `tile_offsets`
* `tile_byte_counts`
* `sample_format`
* `pixel_scale`
* `tie_point`
* `geo_key_directory`
* `geo_double_params` (if defined, else `tuple()`)
* `geo_ascii_params` (if defined, else `b''`)
* `gdal_metadata`
* `nodata_value`

**NOTE**: if you have chosen a different COG source than the default Sentinel 2 red band from Earth Search, you might need to consider additional tags and processing to get this part to work. TIFF is an extremely flexible format, but this means it has many different cases that need to be handled to be able to read any arbitrary file (which also means some atypical features supported by one implementation might lead to incompatibilities with other implementations).

### Let's define a function to make upacking the tags easier

We have a lot of tags to unpack. For the sake of time, here's a function that we can use to make unpacking all the tags easier.

In [ ]:
class TagDict(TypedDict):
    data_type: int
    count: int
    value: bytes

type TagsDict = dict[int: TagDict]

def unpack_tag(tag: TagsDict, endianness: Literal['>', '<']) -> Any:
    struct_dtype = DATA_TYPES[tag['data_type']]
    size = tag['count'] * struct.calcsize(struct_dtype)
    value = tag['value']

    offset = None
    if size > len(value):
        offset = struct.unpack(endianness + 'I', value)[0]
        value = url_read_bytes(href, offset, offset+size)
        
    unpacked = struct.unpack(endianness + (struct_dtype * tag['count']), value[:size])

    # if data_type == 2 (ASCII) we want to join the chars together
    if tag['data_type'] == 2:
        return b''.join(unpacked)
    elif tag['count'] == 1:
        return unpacked[0]
    return unpacked

Let's try that out on the tags we unpacked above and see how this works!

In [ ]:
# (See notes: cell9)

In [ ]:
# (See notes: cell10)

Now let's use that function to unpack all our tags into the corresponding variable.

In [ ]:
image_width = unpack_tag(tags[256], endianness)
image_length = unpack_tag(tags[257], endianness)
bits_per_sample = unpack_tag(tags[258], endianness)
compression = unpack_tag(tags[259], endianness)
samples_per_pixel = unpack_tag(tags[277], endianness)
predictor = unpack_tag(tags[317], endianness)
tile_width = unpack_tag(tags[322], endianness)
tile_length = unpack_tag(tags[323], endianness)
tile_offsets = unpack_tag(tags[324], endianness)
tile_byte_counts = unpack_tag(tags[325], endianness)
sample_format = unpack_tag(tags[339], endianness)
pixel_scale = unpack_tag(tags[33550], endianness)
tie_point = unpack_tag(tags[33922], endianness)
geo_key_directory = unpack_tag(tags[34735], endianness)
geo_double_params = tuple()  # we don't have any double params in this image
geo_ascii_params = unpack_tag(tags[34737], endianness)
gdal_metadata = unpack_tag(tags[42112], endianness)
original_nodata_value = unpack_tag(tags[42113], endianness)

### Interpreting tag values

Many of the tags are straightforward. Some are enumerations which require an external lookup table. Others require cross-references between their values to make sense of the contents. Let's take a look at the few that are not straightforward to understand.

#### Compression

The `compression` tag value represents one of an enumerated set of possible compression methods. Continuing with the spirit of needing to consult various external lookup tables, the [Wikipedia entry for TIFF has a great table of possible compression formats and their integer values](https://en.wikipedia.org/wiki/TIFF#TIFF_Compression_Tag) is a great resource for understanding the meaning of the different possible values.

**Question**: What is the value of the `compression` tag and what compression scheme does it indicate?

In [ ]:
# (See notes: cell11)

#### Sample format

The `sample_format` tag value represents one of an enumerated set of possible data types. Those values map as follows:

| Format Value | Data Type |
| ------------ | --------- |
| 1 | `uint`   |
| 2 | `int`    |
| 3 | `float`  |
| 4 | untyped  |
| 5 | `cint`   |
| 6 | `cfloat` |

The bit depth of the specified format is dependent on the value of the `bits_per_sample` tag.

In [ ]:
# (See notes: cell12)

**Question**: What does the value of the `sample_format` tag indicate with regards to the data type and length of the cell values in this image (e.g., `uint32`, `int8`, `float32`, etc.)? What does this data type map to in the struct format characters?

#### Pixel scale and tie point

The `pixel_scale` tag is part of the GeoTIFF specification. It is a three-tuple where each value represents one dimension of the pixel scale, specifically the x, y, and z scales, respectively. In other words, each of the scale values represent the change in coordinate from one pixel origin to the next along the specified dimension. The units of each scale value are the same as those specified in coordinate reference system (CRS; we'll see this when reviewing the `geo_key_directory` below).

The `tie_point` tag is again a member of the GeoTIFF specification. It defines a set of coordinates in the image space and their mappings to coordinates in the model space as a list of six-tuples. The first three tuple values are the image space x, y, and z coordinates, respectively. The latter three tuple values are the model space x, y, and z, respectively. The model space is perhaps best understood to be the coordinate reference system defined for the image.

What is most notable for us about these two tags is that we can use them to build a simple affine transform to convert between image space and model space. Most GeoTIFF files, like ours, use such an affine transform, which means the following is frequently true:

* the set of coordinate mappings has length one, i.e., only one point is mapped from image space to coordinate space
* the image space coordinates of that point are (0, 0, 0), which effectively allows us to consider the model space coordiantes to be the geographic point represented by the image origin

The [GeoTIFF spec docs detailing how to use these values are here](http://geotiff.maptools.org/spec/geotiff2.6.html). Note that the `pixel_scale` tag is optional; in some cases a `ModelTransformationTag` is used instead to encode the affine transformation matrix into the file, such as when needing to express grid rotation. Sometimes neither of these tags are present, such as when the transformation is not affine, which is typically when the `tie_point` tag would have multiple points describing a warp mesh over the image. Consult the docs to fully understand the interactions of these three tags and how to interpret their values outside this simple affine case.

Why does all of this matter for us? We can use the `pixel_scale` and `tie_point` tag values to construct an affine transform object, which we'll use to perform coordinate transformations between model space (the image CRS) and image space (pixel coordinates). We need to be able to do this to find what pixel in the image contains our POI, so we can find which image tile to read.

In [ ]:
transform = Affine(
    # w-e pixel resolution / pixel width
    pixel_scale[0],
    # row rotation (typically zero)
    0,
    # x-coordinate of the upper-left corner of the upper-left pixel (origin)
    tie_point[3] - (pixel_scale[0] * tie_point[0]),
    # column rotation (typically zero)
    0,
    # n-s pixel resolution / pixel height (negative value for a north-up image)
    -pixel_scale[1],
    # y-coordinate of the upper-left corner of the upper-left pixel (origin)
    tie_point[4] - (-pixel_scale[1] * tie_point[1]),
)

#### Geo key directory and the params

Another set of GeoTIFF-spec tags, `geo_key_directory`, `geo_double_params`, and `geo_ascii_params` represent a collection of geospatial information we need to interpret the data in a spatially-aware way. For example, such important information as the CRS is stored amongst these tags. The [GeoTIFF spec docs also document these tags and their interactions](http://geotiff.maptools.org/spec/geotiff2.4.html).

In short, `geo_double_params` and `geo_ascii_params` are actually sets of parameters that can be used to fill in information that cannot be represented directly in the `geo_key_directory` due to data type differences (the latter is a `uint16` tuple whereas the former two are tuples of double precision floats and ASCII-encoded strings, respectively). The `geo_key_directory` is a collection of four-tuples (potentially with some additional trailing values), the first of which is a header that documents the tuples that follow. It has the following 8-byte structure:

```
Header = (KeyDirectoryVersion, KeyRevision, MinorRevision, NumberOfKeys)
```

For our purposes, the important piece here are the number of keys: we need to know how many keys are in the directory to be able to work out the offset to any additional values in the directory structure we might need to fill in directory entries that have multiple `unit16` values.

After the header, each of the keys in the directory have the 8-byte structure:

```
KeyEntry = (KeyID, TIFFTagLocation, Count, Value_Offset)
```

The `KeyID` here is just like our TIFF tags: it is an identifier that can be used with an external lookup table to interpret the meaning of the key's value. The `TIFFTagLocation` is used to point to a TIFF tag that contains the value for this key: if the value is directly embedded in the key (in the place of `Value_Offset`) then the location is `0` and this key's value is of type `uint16`. In cases where the value is not directly embedded in the key the location will have the value of the tag code that contains the value. The `Value_Offset` and `Count` can then be used to extract the set of values pertaining to this key from that tag's data. The data type of the key value is given by the source tag's data type.

For example, if we have a key entry with the values `(1024, 0, 1, 1)` we know that the key ID is `1024`, the location of `0` means the value is embedded in the key entry, and that means our count is necessarily `1` and we can interpret the `Value_Offset` as the key value, in this case `1`.

A more complex example could be like `(2049, 34737, 7, 22)`: in this case we have a non-zero location, so we have to read the values--in this case seven values per the count value--from a separate tag. The location of `34737` corresponds to the `geo_ascii_params` tag, which not only tells us where to get the values for this key, but also their data type. If we have a value of the `34737` tag of `b'WGS 84 / UTM zone 10N|WGS 84|\x00'`, then taking 7 bytes from position 22 we end up with `b'WGS 84|'`. The `|` is intended to be converted into a null byte to terminate the extracted string; in Python it is easy enough to read one less byte than the key's count for ASCII-type keys as string termination is handled for us.

We can use the above explanation to write a general-purpose function to extract the keys into a dict, like we originally did with the IFD tags, and then we can use it to extract our keys. Refer to the [GeoTIFF document "Geocoding Raster Data"](http://geotiff.maptools.org/spec/geotiff2.7.html#2.7) for and explanantion of the key IDs and how to understand their meanings. These keys are critical for finding the CRS of the file via the information presented in that documentation.

In [ ]:
def extract_geo_keys(
    key_directory: tuple[int, ...],
    double_params: tuple[float, ...],
    ascii_params: bytes,
)-> dict[int, int | float | bytes]:
    keys: dict[int, int | float | bytes] = {}
    
    try:
        _, _, _, key_count = key_directory[0:4]
        for key_index in range(key_count):
            offset = (4 * key_index) + 4
            key_id, location, count, value_offset = key_directory[offset:offset + 4]
            
            if location == 0:
                keys[key_id] = value_offset
            elif location == 34735:
                keys[key_id] = key_directory[value_offset:value_offset + count]
            elif location == 34736:
                keys[key_id] = double_params[value_offset:value_offset + count]
            elif location == 34737:
                keys[key_id] = ascii_params[value_offset: value_offset + (count - 1)]
            else:
                raise ValueErorr(f'Unknown location: {location}')
    except:
        raise ValueError('Could not parse geo keys')

    return keys

In [ ]:
geo_keys = extract_geo_keys(geo_key_directory, geo_double_params, geo_ascii_params)
geo_keys

**Question**: From the extracted geo key values, can you find the CRS and it's EPSG code?

#### Nodata

The GDAL nodata value is stored in GeoTIFFs as a null-terminated ASCII string, for some reason (likely to ensure it can be parsed with a consistent data type, in particular because the nodata value needs to be interpreted with the data type of the TIFF data, which might not map directly to the TIFF-defined data types). Because of this, the `nodata` value needs some additional processing before we can use it.

Specifically, we need to clip the final character off, then we need to cast it to an appropropriate data type (as given by `sample_format` and `bits_per_sample`). For example, if we have an integer data type for our image data then we need to do something like `nodata_value = int(original_nodata_value[:-1])`.

In [ ]:
# We need to clip the string terminator off the nodata value
# before coercing to an int (because it is stored as ASCII)
nodata_value = int(original_nodata_value[:-1])
nodata_value

## Reading an image tile

Now that we have all our metadata parsed out we can focus on using that metadata to read a tile. We have our POI though, so we presumably want to find the tile containing said POI, rather than some other arbitrary tile. To do so we'll need to do some math.

### Transforming our POI

First, we need to find the point coordinates of our POI in the same reference system as the image. We can use the `to_crs` method on our POI with our image's CRS as parsed from the geo keys above.

In [ ]:
image_crs = f"EPSG:{geo_keys[3072]}"
POI_proj = POI.to_crs(image_crs)
print(f'x={POI_proj.geom.x}, y={POI_proj.geom.y}')

Check the above output. Does it make sense given what we know about the origin of our image, and the relation of our POI to the image footprint?

### Finding the pixel coordinates of our POI

Now that we have our POI geographic coordinates in the same CRS as our image, we can use the image's affine transform to convert our POI's geographic coordinates into pixel coordinates in our image's pixel grid. The author of this notebook has created a small library to make tasks involving raster grids with affine transforms easier, called `griffine`. We can make an instance of a `griffine.Grid` and attach our transform to it, which will help us in a number of operations to come. Then we can use the grid to find the cell containing our point.

In [ ]:
grid = Grid(rows=image_length, cols=image_width).add_transform(transform)
cell = grid.point_to_cell(POI_proj)
print(f'row={cell.row}, col={cell.col}')

Again, check the above output. Does it make sense given what we know about the origin of our image, its grid, and the relation of our POI to the image footprint?

### Finding our tile coordinates

To work out which tile we need to read we need to convert our pixel coordinates into tile coordinates. We can tile our `grid` object and get the tile containing our POI.

In [ ]:
tile_grid = grid.tile_via(Grid(rows=tile_length, cols=tile_width))
tile = tile_grid.point_to_tile(POI_proj)
print(f'tile_row={tile.row}, tile_col={tile.col}')

One last time: check the above output. Does it make sense given what we know about the structure our image, and the relation of our POI to the image footprint?

### Finding our tile offset and byte length

The tag values of `tile_offsets` and `tile_byte_counts` give us the offset and byte lengths of each tile. To retrive them for a given tile we need to know the tile's index within those tuples. We use our `tile` object with our `tile_grid` to fine the tile's linear index within the grid.

In [ ]:
tile_index = tile_grid.linear_index(tile)
tile_offset = tile_offsets[tile_index]
tile_byte_length = tile_byte_counts[tile_index]
print(f'tile ({tile.row}, {tile.col}) has index {tile_index} and is at offset {tile_offset} with length {tile_byte_length}')

### Actually reading the tile

Now that we know where the tile bytes are in the file we can read them, extract them (using the specified `compression`), then unpack them into a numpy array.

In [ ]:
tile_bytes = url_read_bytes(href, tile_offset, tile_offset + tile_byte_length)

In [ ]:
# Per our `compression` tag we know we the data is compressed using `DEFLATE`,
# which can be extracted using the stdlib `zlib` module.
import zlib
tile_extracted = zlib.decompress(tile_bytes, 0)

In [ ]:
# Our data type is `uint16` as we previously found; using our
# lookup table we know that maps to a struct format char of `H`.
# That dtype is 2-bytes in length, so we know our tile data
# contains `len(tile_extracted) // 2` pixel values.
struct_dtype = 'H'
tile_array = np.array(
    struct.unpack(
        endianness + (struct_dtype * (len(tile_extracted) // struct.calcsize(struct_dtype))),
        tile_extracted,
    ),
    dtype=np.uint16
).reshape(tile_width, tile_length)
tile_array

In [ ]:
# Or, more easily but perhaps too abstractly
np.frombuffer(tile_extracted, dtype=np.uint16).reshape(tile_width, tile_length)

#### A note on `predictor`

The `predictor` tag is used when a filtering step is done prior to compression. For geospatial data, values of `2` and `3` are common, `2` is best for integer data, and calculates the horizontal difference between cells. `3` is used for floating point data. In other words, if we have a predictor set and it isn't `1` (indicating no predictor) then we can't just extract the data and start using it. The data will require processing step to reverse the prediction operation and restore the data back to its original values.

In [ ]:
print(predictor)

In [ ]:
# as our data is using predictor `2` (horizontal difference),
# we'll need to reverse the difference using a cumulative sum
tile_array_unfiltered = np.cumsum(tile_array, axis=1, dtype=tile_array.dtype)
tile_array_unfiltered

#### Scale and offset

We have yet one more operation we need to do to our data array to make it usable. It turns out the stored data format `uint16` isn't actually the real data format. Instead, limited-precision floats have been mapped to that data type by using a specified scaling factor. Moreover, due to the Sentinel 2 L2 atmospheric correction process, it's possible to have negative values in the data, which must be accounted for by shifting the uint values via a specified offset.

Both the `scale` and `offset` values are contained within the `gdal_metadata` tag. The GDAL metadata format is, sadly, XML, though for our purposes it is readable enough we don't need to worry about parsing complexities, we can just print out the value of that tag and read out the values we need.

In [ ]:
gdal_metadata

In [ ]:
# fill in the value_offset and value_scale from the above
value_offset = -0.1
value_scale = 0.0001
tile_array_scaled_offset = (tile_array_unfiltered * value_scale) + value_offset
tile_array_scaled_offset

## Visualizing the tile on our map

Now that we have our tile data, it would be great to see it alongside our POI to visually confirm we got the tile we expected. It turns out Folium has a kinda hokey way of converting numpy arrays to PNGs for display on the map, which we can leverage here to visually verify the data we've read for our tile and the operations we've done on it. It's not perfect, as it assumes our data is aligned to the mercator grid (which it probably isn't), but it's close enough for us to take a look.

We just need the tile's min and max latitude and longitude (in EPSG:4326 coordinates) so we can tell Folium it's bounding box (roughly), then we can (re-)make our map and add our layers. We can use our `tile` object to compute those coordinates in our image CRS then convert them to EPSG:4326.

In [ ]:
tile_origin_x, tile_origin_y  = point(*tile.origin.coords[0], crs=image_crs).to_crs(EPSG_4326).coords[0]
tile_antiorigin_x, tile_antiorigin_y  = point(*tile.antiorigin.coords[0], crs=image_crs).to_crs(EPSG_4326).coords[0]

# we make a whole new map because if we screwed
# something up we only have to re-run this cell to fix it
raster_map = POI.explore(name='point')

stac_item_layer.add_to(raster_map)
raster_map.fit_bounds(stac_item_layer.get_bounds())

folium.raster_layers.ImageOverlay(
    tile_array_scaled_offset,
    bounds=[[tile_antiorigin_y, tile_origin_x], [tile_origin_y, tile_antiorigin_x]],
    name='tile',
).add_to(raster_map)

folium.LayerControl().add_to(raster_map)

raster_map

## Additional exercises to consider later

* Find how many overviews are in this file.
* Find the dimensions and gsd of each overview.
* Repeat reading the tile containing your point of interest, but do so from one of the overviews.
* How can we make reading the file more efficient? Can we get all the IFDs in the file with a single read without having to read in image data?
* Can you write the TIFF for the map visualization yourself instead of using an external lib?
* Repeat these exercises with a multiband TIFF to see how the file structure differs to support the additional bands.

Any other cool ideas? Let me know and/or share with the group.

## Appendix: example TIFF metadata parser

Could we take parsing another step further, to better facilitate the above exercises? Wouldn't it be great if we could parse the tags into a complete objects? What if we could start with just a reference to the TIFF itself, and have an entire data structure built up to parse all the IFDs out of the file in one go?

Turns out this is a fun problem and I wanted to code up a solution. Here's my attempt; what might yours look like?

In [ ]:
class Endianness(bytes, enum.Enum):
    BIG_ENDIAN = b'MM'
    LITTLE_ENDIAN = b'II'

    @property
    def unpack_char(self: Self) -> str:
        match self:
            case Endianness.BIG_ENDIAN:
                return '>'
            case Endianness.LITTLE_ENDIAN:
                return '<'


@dataclasses.dataclass
class TIFFBytes:
    data: bytes
    endianness: Endianness

    def unpack(self: Self, format: str) -> tuple[Any, ...]:
        return struct.unpack(f'{self.endianness.unpack_char}{format}', self.data)

    def chunk(self: Self, chunk_size) -> Iterator[Self]:
        if len(self) % chunk_size != 0:
            raise ValueError(
                f'Cannot chunk data exactly into {chunk_size}: length {len(self)}',
            )
        yield from (
            self[chunk_index * chunk_size:(chunk_index * chunk_size) + chunk_size]
            for chunk_index in range(len(self)//chunk_size)
        )

    def __len__(self: Self) -> int:
        return len(self.data)

    def __getitem__(self: Self, key: int | slice) -> Self:
        return type(self)(
            data=self.data[key],
            endianness=self.endianness,
        )


@dataclasses.dataclass
class Tag:
    code: int
    data_type: int
    count: int
    value: Any
    raw: TIFFBytes = dataclasses.field(repr=False)
    offset: int | None = dataclasses.field(default=None, repr=False)

    @classmethod
    def from_bytes(
        cls: type[Self],
        tiff: TIFF,
        tag_bytes: TIFFBytes,
    ) -> Self:
        code, data_type, count = tag_bytes[:8].unpack('HHI')
        offset, raw, unpacked = cls.unpack_tag_value(tiff, data_type, count, tag_bytes[8:])
        return cls(
            code=code,
            data_type=data_type,
            count=count,
            value=unpacked,
            raw=raw,
            offset=offset,
        )

    def pack(self: Self, offset: int | None = None) -> bytes:
        struct_dtype = DATA_TYPES[data_type]
        packed = struct.pack
        

    @staticmethod
    def unpack_tag_value(tiff: TIFF, data_type: int, count: int, value: TIFFBytes) -> tuple[int | None, bytes, Any]:
        struct_dtype = DATA_TYPES[data_type]
        size = count * struct.calcsize(struct_dtype)

        offset = None
        if size > len(value):
            offset = value.unpack('I')[0]
            value = tiff.read_bytes(offset, offset+size)

        unpacked = value[:size].unpack(str(count) + struct_dtype)

        # if data_type == 2 (ASCII) we want to join the chars together
        if data_type == 2:
            return offset, value, b''.join(unpacked)
        elif count == 1:
            return offset, value, unpacked[0]
        return offset, value, unpacked


class Tags(dict[int, Tag]):
    @classmethod
    def from_tags(cls: type[Self], tags: list[Tag]) -> Self:
        return cls((t.code, t) for t in tags)

    @classmethod
    def from_tiff_bytes(cls: type[Self], tiff: TIFF, tags_bytes: TIFFBytes) -> Self:
        return cls.from_tags([Tag.from_bytes(tiff, tag_bytes) for tag_bytes in tags_bytes.chunk(12)])


@dataclasses.dataclass
class IFD:
    offset: int
    tags: Tags
    next_offset: int

    @classmethod
    def from_tiff_offset(cls: type[Self], tiff: TIFF, offset: int) -> Self:
        tags_start = offset + 2
        tags_count = tiff.read_bytes(offset, tags_start).unpack('H')[0]
        tags_end = tags_start + (tags_count * tag_size)
        tags_bytes = tiff.read_bytes(tags_start, tags_end)
        next_offset = tiff.read_bytes(tags_end, tags_end + 4).unpack('I')[0]
        
        return cls(
            offset=offset,
            tags=Tags.from_tiff_bytes(tiff, tags_bytes),
            next_offset=next_offset,
        )


@dataclasses.dataclass
class TIFFMeta:
    '''Class to help parse TIFF IFDs. Only supports standard TIFFs, not BigTIFF.'''
    href: str
    endianness: Endianness
    ifds: list[IFD]
    
    # We can track the max byte read to parse out IFD stuff.
    # This could be an interesting data point to learn how to better optimize reads.
    max_ifd_byte: int = 0
    
    def __init__(self: Self, href: str) -> None:
        self.href = href

        # we don't use self.read_bytes yet because we don't have endianness
        __bytes = url_read_bytes(self.href, 0, 8)
        self.max_ifd_byte = 8
        self.endianness = Endianness(__bytes[0:2])

        _bytes = TIFFBytes(data=__bytes, endianness=self.endianness)
        
        magic_number = _bytes[2:4].unpack('H')[0]
        if magic_number != 42:
            raise TypeError(f"Unsupported file type: magic number {magic_number} != 42")
        
        self.ifds: list[IFD] = []
        ifd_offset = _bytes[4:8].unpack('I')[0]
        while ifd_offset:
            ifd = self.parse_ifd(ifd_offset)
            self.ifds.append(ifd)
            ifd_offset = ifd.next_offset

    def read_bytes(self: Self, start: int, end: int) -> TIFFBytes:
        # Note that reading for each byte range we want is terribly inefficient.
        # We could instead use some sort of filelike object that will read and cache
        # larger chunks of the file, as needed to accommodate requested byte ranges.
        # Of course if we wanted to read the whole file this way we'd need to be careful
        # of the memory requirements of such a solution.
        self.max_ifd_byte = max(self.max_ifd_byte, end)
        return TIFFBytes(
            data=url_read_bytes(self.href, start, end),
            endianness=self.endianness,
        )

    def parse_ifd(self: Self, offset: int) -> IFD:
        return IFD.from_tiff_offset(self, offset)

In [ ]:
tiff_meta = TIFFMeta(href)
pprint(tiff_meta)

#### A note about `tiff_meta.max_ifd_byte`

After parsing all IFDs in the file, including reading and unpacking all the tags, we see that the max byte read from the file (`max_ifd_byte`) is merely 4208. Thus we could be pretty sure, even for a an absolutely huge TIFF file, that reading something like the first 1-2 MB of file data would give us the entire set of IFDs. We could use this insight to make our reader more efficient: if we made only one read request to for the first 1-2 MB of the file, we could be pretty certain we could parse the IFD without having to incur the penalty of any further network round trips, at least until we are ready to retrive image data.