# Extracting Time-Series from Cloud-Optimized GeoTIFFs (COGs)

This notebook shows how to use rasterio to efficiently extract pixels values from cloud-optimized GeoTIFF files hosted on cloud-buckets.

We leverage GDAL Virtual Rasters (VRT) to create a virtual stacked image from separate images and query them using `rasterio.sample`. This method is super fast and only fetches the data required for the pixels instead of the entire file.

In [21]:
import os
import rasterio
from osgeo import gdal
import tempfile

In [3]:
os.environ['GS_NO_SIGN_REQUEST'] = 'YES'

In this example, we have a folder on Google Cloud Storage (GCS) has 12 files representing soil moisture for each month.

```
soil_moisture_202301.tif
soil_moisture_202302.tif
soil_moisture_202303.tif
...
```

We want to sample pixel values from each of these at N different locations. 

### Creating a VRT

This is a one-time process to create a VRT for efficient query of the datasets.

### GDAL Command Line Tool

One can use the GDAL command-line tool to create a VRT and place it in the same GCS bucket as the files. 

`gdalbuildvrt -input_file_list filelist.txt soil_moisture.vrt`

The filelist would contain URLS of files with the `/vsigs` prefix.

```
/vsigs/spatialthoughts-public-data/terraclimate/soil_moisture_202301.tif
/vsigs/spatialthoughts-public-data/terraclimate/soil_moisture_202301.tif
/vsigs/spatialthoughts-public-data/terraclimate/soil_moisture_202301.tif
...
```


### Using GDAL Python API

In [None]:
# Create a VRT file in the temp directory
temp_dir = tempfile.gettempdir()
vrt_file = 'soil_moisture.vrt'
vrt_options = gdal.BuildVRTOptions(separate=True)
vrt_file_path = os.path.join(temp_dir, vrt_file)

# Add URLs to the files
urls = []
prefix = '/vsigs/spatialthoughts-public-data/terraclimate/'
for month in range(1, 13):
    image_id = f'soil_moisture_2023{month:02d}.tif'
    path = prefix + image_id
    urls.append(path)

# Create the VRT
gdal.BuildVRT(vrt_file_path, urls, options=vrt_options).FlushCache()

Once done, copy the VRT to the same GCS bucket

## Sampling Pixel Values

In [45]:
vrt_file_path = '/vsigs/spatialthoughts-public-data/terraclimate/soil_moisture.vrt'

In [43]:
with rasterio.open(vrt_file_path) as src:
    coords = [(73, 21)]
    samples = rasterio.sample.sample_gen(src, coords)
    for sample in samples:
        print(sample)

[ 53.   41.6  34.4  29.4  25.7 219.6 219.6 206.6 219.6 135.8  96.8  62.3]
