# Subsetting your data

Subsetting your data refers to selecting some amount of data to work with that is less than the whole image.  In Python I see two main ways to do this:
1. manually by dropping data from your numpy array
2. using rasterio's windowed reading methods

The manual option is simplier, but has the limitations that
* it is a little harder to save the output data to a new file
* you still have to read the whole dataset into memory

The rasterio option doesn't have these limitations, but is a bit more complicated.

In [None]:
# All notebook imports
import rasterio
from rasterio.windows import Window
import os

## Reading the whole file and dropping the data from the numpy array

Start by opening the file and reading the data with the `.read()` method.

In [1]:
import rasterio

In [2]:
filepath_rad = '../input_data/f100520t01p00r08rdn_b/f100520t01p00r08rdn_b_sc01_ort_img'

In [3]:
with rasterio.open(filepath_rad, 'r') as src:
    full_raster = src.read()

In [4]:
full_raster

array([[[-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        ...,
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50]],

       [[-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        ...,
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50]],

       [[-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        ...,
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50]],

       ...,

       [[-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50],
        [-50, -50, -50, ..., -50, -50, -50

Say I want to  look at just the 1000-1800th row and 200-420th column.  I would use those numbers to index my matrix.

In [5]:
print('full raster shape ', full_raster.shape)
my_subset = full_raster[:,1000:1800, 200:420]
print('subset raster shape ', my_subset.shape)

full raster shape  (224, 1684, 795)
subset raster shape  (224, 684, 220)


In [6]:
my_subset

array([[[1397, 1403, 1388, ..., 1317, 1304, 1303],
        [1394, 1400, 1401, ..., 1313, 1294, 1311],
        [1398, 1389, 1396, ..., 1321, 1331, 1293],
        ...,
        [ -50,  -50,  -50, ...,  -50,  -50,  -50],
        [ -50,  -50,  -50, ...,  -50,  -50,  -50],
        [ -50,  -50,  -50, ...,  -50,  -50,  -50]],

       [[1603, 1588, 1610, ..., 1490, 1465, 1479],
        [1594, 1585, 1586, ..., 1495, 1470, 1478],
        [1574, 1585, 1605, ..., 1484, 1493, 1462],
        ...,
        [ -50,  -50,  -50, ...,  -50,  -50,  -50],
        [ -50,  -50,  -50, ...,  -50,  -50,  -50],
        [ -50,  -50,  -50, ...,  -50,  -50,  -50]],

       [[1674, 1666, 1669, ..., 1552, 1544, 1536],
        [1660, 1646, 1677, ..., 1534, 1526, 1516],
        [1666, 1662, 1684, ..., 1529, 1554, 1533],
        ...,
        [ -50,  -50,  -50, ...,  -50,  -50,  -50],
        [ -50,  -50,  -50, ...,  -50,  -50,  -50],
        [ -50,  -50,  -50, ...,  -50,  -50,  -50]],

       ...,

       [[   4,    1,    

There we have it, my subset raster.

With respect to the indexes I picked those numbers randomly.  However, if I wanted to determine my area of interest based off of some coordinates I could use the affine tranformation to convert the coordinate I want to the pixel number.  To go a step further, if the coordiantes of  my area of study were in latitude and longitude and my data in easting and northing I could use the `reproject` function to convert my lat/lon to easting/northing.

## Windowed Reading

_A lot of the code from this section comes from the [rasterio docs](https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html)._

Windowed reading requires a bit more work up front, but it allows you to keep your transform geospatially updated in the event you want to save out your data.  This method involves using a rasterio object called `Window` to access your subset instead of indexing it directly.

The steps to this method are:
1. create the window object
2. read the data from `src` using the window

### Creating the `Window` object

The syntax for a the subset chunk in rasterio is `Window` and the sytnax to use the `Window` object looks like this:

> Window(COLUMN_OFFSET, ROW_OFFSET, WIDTH, HEIGHT)

The OFFSETs specify the row and column numbers of the upper left corner of your window.

So getting the 1000-1800th row and 200-420th column with a window object would look like `Window(200, 1000, 220, 800)`.

In [7]:
from rasterio.windows import Window

In [8]:
Window(200, 1000, 220, 800)

Window(col_off=200, row_off=1000, width=220, height=800)

Another way to create the window is to use the `.from_slices()` method.

> Window.from_slices((ROW_START, ROW_STOP), (COLUMN_START, COLUMNS_STOP))

In [9]:
Window.from_slices((1000, 1800), (200, 420))

Window(col_off=200, row_off=1000, width=220, height=800)

These two are equivalent ways to do the same thing, so we can can see in the output that this gives us the same window as above.

### Reading the data with the `Window`

To read the data out we use our regular `src.read()` method but this time we specify `window`.

In [10]:
import rasterio

In [11]:
with rasterio.open(filepath_rad, 'r') as src:
    my_window = Window(200, 1000, 220, 800)
    window_subset = src.read(window=my_window)

print(window_subset)

[[[1397 1403 1388 ... 1317 1304 1303]
  [1394 1400 1401 ... 1313 1294 1311]
  [1398 1389 1396 ... 1321 1331 1293]
  ...
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]]

 [[1603 1588 1610 ... 1490 1465 1479]
  [1594 1585 1586 ... 1495 1470 1478]
  [1574 1585 1605 ... 1484 1493 1462]
  ...
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]]

 [[1674 1666 1669 ... 1552 1544 1536]
  [1660 1646 1677 ... 1534 1526 1516]
  [1666 1662 1684 ... 1529 1554 1533]
  ...
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]]

 ...

 [[   4    1    7 ...    5    2    0]
  [   5    0    5 ...    2    5    4]
  [   3    7    7 ...    2    2    3]
  ...
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]
  [ -50  -50  -50 ...  -50  -50  -50]]

 [[   3    1   -1 ...    1   -3    2]
  [   1    0

So my data looks just the same as above.  The advantage, however, is that the Window object does some math for us to keep track of how the transform has changed in the window. 

In [12]:
with rasterio.open(filepath_rad, 'r') as src:
    window = Window(200, 1000, 220, 800)
    src_transform = src.transform
    win_transform = src.window_transform(window)

In [13]:
print('source transform \n', src_transform)
print('window transform \n', win_transform)

source transform 
 | 0.00, 17.10, 475785.77|
| 17.10,-0.00, 3350578.50|
| 0.00, 0.00, 1.00|
window transform 
 | 0.00, 17.10, 492885.77|
| 17.10,-0.00, 3353998.50|
| 0.00, 0.00, 1.00|


Having this affine makes it easy to save out our windowed data if we need to.

In [14]:
# Copy the existing metadata
with rasterio.open(filepath_rad, 'r') as src:
    metadata = src.meta.copy()

In [15]:
# Update relevant keys
metadata.update(transform=win_transform, height=window.height, width=window.width)

In [16]:
# Make the output directory if it does not exist yet
import os
if not os.path.exists('../output_data'):
    os.makedirs('../output_data')

In [17]:
# Save the raster
with rasterio.open('../output_data/subset_raster', 'w', **metadata) as dst:
    dst.write(window_subset)