# Subsetting your data

Subsetting your data refers to selecting some amount of data to work with that is less than the whole image.  In Python I see two main ways to do this:
1. manually by dropping data from your numpy array
2. using rasterio's windowed reading

The manual option is simplier, but has the limitations that
* it is a little harder to save the output data to a new file
* you still have to read the whole dataset into memory

The rasterio option doesn't have these limitations, but is a bit more complicated.

## Reading the whole file and dropping the data from the numpy array

Start by opening the file like normally with the `.read()` method.

In [1]:
import rasterio

In [2]:
filepath_h2o = '../input_data/f150131t01p00r10_refl/f150131t01p00r10_h2o_v1'

In [3]:
with rasterio.open(filepath_h2o, 'r') as src:
    full_raster = src.read()

In [4]:
full_raster

array([[[15536, 15536, 15536, ..., 15536, 15536, 15536],
        [15536, 15536, 15536, ..., 15536, 15536, 15536],
        [15536, 15536, 15536, ..., 15536, 15536, 15536],
        ...,
        [15536, 15536, 15536, ..., 15536, 15536, 15536],
        [15536, 15536, 15536, ..., 15536, 15536, 15536],
        [15536, 15536, 15536, ..., 15536, 15536, 15536]],

       [[    0,     0,     0, ...,     0,     0,     0],
        [    0,     0,     0, ...,     0,     0,     0],
        [    0,     0,     0, ...,     0,     0,     0],
        ...,
        [    0,     0,     0, ...,     0,     0,     0],
        [    0,     0,     0, ...,     0,     0,     0],
        [    0,     0,     0, ...,     0,     0,     0]],

       [[    0,     0,     0, ...,     0,     0,     0],
        [    0,     0,     0, ...,     0,     0,     0],
        [    0,     0,     0, ...,     0,     0,     0],
        ...,
        [   73,    76,    91, ...,    87,    91,    89],
        [   73,    76,    91, ...,    87,    

Say I want to  look at just the 1000-1800th row and 200-420th column.  I picked those numbers randomly, but if I wanted to determine my area of interest based off of a coordiante I could use the affine tranformation to convert the coordinate I want to the pixel number.  If the coordiantes of  my area of study were in latitude and longitude and my data in easting and northing I could use the `reproject` function to convert my lat/lon to easting/northing.

In [5]:
print('full raster shape ', full_raster.shape)
my_subset = full_raster[:,1000:1800, 200:420]
print('subset raster shape ', my_subset.shape)

full raster shape  (3, 8596, 2158)
subset raster shape  (3, 800, 220)


There we have it, my subset raster.

## Windowed Reading

A lot of the code from this section comes from the [rasterio docs](https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html).

Windowed reading requires a bit more work up front, but it allows you to keep your transform geospatially updated in the event you want to save out your data.

The syntax for a the subset chunk in rasterio is `Window` and the sytnax to use the `Window` object looks like this:

> Window(col_offset, row_offset, width, height)

So in the case of my 1000-1800th row and 200-420th column desire from the section above my window object would look like `Window(200, 1000, 220, 800)`.

In [6]:
import rasterio
from rasterio.windows import Window

In [7]:
filepath_h2o = '../input_data/f150131t01p00r10_refl/f150131t01p00r10_h2o_v1'
filepath_img = '../input_data/f170508t01p00r11rdn_e/f170508t01p00r11rdn_e_sc01_ort_img'

In [8]:
with rasterio.open(filepath_h2o, 'r') as src:
    window = Window(200, 1000, 220, 800)
    window_subset = src.read(window=window)

print(window_subset.shape)

(3, 800, 220)


So my data looks just the same as above.  The advantage, however, is that the Window object does some math for us to keep track of how the transform has changed in the window. 

In [9]:
with rasterio.open(filepath_h2o, 'r') as src:
    window = Window(200, 1000, 220, 800)
    src_transform = src.transform
    win_transform = src.window_transform(window)

In [10]:
print('source transform \n', src_transform)
print('window transform \n', win_transform)

source transform 
 | 16.64,-2.93, 303895.60|
|-2.93,-16.64, 3967305.20|
| 0.00, 0.00, 1.00|
window transform 
 | 16.64,-2.93, 304289.60|
|-2.93,-16.64, 3950075.02|
| 0.00, 0.00, 1.00|


Having this affine makes it easy to save out our windowed data if we need to.

In [11]:
# Copy the metadata
with rasterio.open(filepath_h2o, 'r') as src:
    metadata = src.meta.copy()

In [12]:
# Update relevant keys
metadata.update(transform=win_transform, height=window.height, width=window.width)

In [13]:
# Make the output directory if it does not exist yet
import os
if not os.path.exists('../output_data'):
    os.makedirs('../output_data')

In [14]:
# Save the raster
with rasterio.open('../output_data/window_subset', 'w', **metadata) as dst:
    dst.write(window_subset)