Use the `solaris` conda environment to run this notebook (use `environment-solaris.yml` to set up this environment).

`solaris` commit `7c2940f0a274c76388cb59694f415ac8906e1b92` (repo: https://github.com/CosmiQ/solaris)

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'  # default is ‘last_expr’

%load_ext autoreload
%autoreload 2

In [2]:
import sys

sys.path.append('path_to/solaris')

In [47]:
import os
import pickle

from tqdm import tqdm
import rasterio
import geopandas as gpd
from shapely.geometry import box

import solaris as sol
from solaris.data import data_dir

When there's a PIL import error, import again seems to make it go away...

In [5]:
sol.__version__

'0.2.1'

# Tiling images/labels and creating label masks

Solaris package documentation: https://solaris.readthedocs.io/en/latest/

In [6]:
out_dir = '/data/WCS_land_use/train_200218'

label_path = '/data/WCS_land_use/Landuse_shape/derived/landuse.shp'

img_path = '/data/WCS_land_use/Imagery/wcs_orinoquia_trial_region_201301_201512.tif'

The mini dataset scene is 5423 x 5332 (W x H)

In [7]:
tile_size = 2000  # larger than model input size to avoid storing too many small tiles

In [22]:
# bounding box of the scene

region = rasterio.open(img_path)

In [30]:
region.bounds
region.bounds[1]

BoundingBox(left=-71.35731526792055, bottom=2.663953975056441, right=-69.8958461321865, top=4.100899103534028)

2.663953975056441

In [34]:
# box(minx, miny, maxx, maxy)

region_polygon = box(*region.bounds)

In [38]:
region_polygon_gpd =  gpd.GeoDataFrame(geometry=[region_polygon])

## Tiling images

`dest_dir` will be created

`src_tile_size` is what size on the original raster you want the chips to be. 
`dest_tile_size` is if you want the resulting chips to be in a different resolution. If not specified, will be the same as input

Bounds for each tile is stored in the `raster_tiler.tile_bounds` property, which is later passed to the `VectorTiler` instance.

In [8]:
raster_tiler = sol.tile.raster_tile.RasterTiler(dest_dir=os.path.join(out_dir, 'tiles'),  # the directory to save images to
                                                src_tile_size=(tile_size, tile_size),
                                                verbose=True)

Initializing Tiler...
Tiler initialized.
dest_dir: /Users/siyuyang/Source/temp_data/WCS_land_use/train_200218/tiles
dest_crs will be inferred from source data.
src_tile_size: (2000, 2000)
tile size units metric: False
Resampling is set to None


EPSG:4326 aka WGS84

The following step cuts the tiles and output TIFs with file names in the format 

`[src-filename]_[longitude]_[latitude].tif`

e.g. `wcs_orinoquia_trial_region_201301_201512_-70.818_3.742.tif`

In [9]:
raster_tiler.tile(img_path)

- The file is greater than 512xH or 512xW, it is recommended to include internal overviews



Beginning tiling...
Checking input data...
COG: True
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Source CRS: EPSG:4326
Destination CRS: EPSG:4326
Inputs OK.


9it [00:49,  5.46s/it]

Tiling complete. Cleaning up...
Done. CRS returned for vector tiling.





CRS.from_epsg(4326)

In [12]:
type(raster_tiler.tile_bounds)
len(raster_tiler.tile_bounds)
raster_tiler.tile_bounds[0]

list

9

(-71.35731526792055, 2.663953975056441, -70.81832609744885, 3.2029431455281543)

In [16]:
# serialize the bounds so we can start from here
with open(os.path.join(out_dir, 'tile_bounds.pickle'), 'wb') as f:
    pickle.dump(raster_tiler.tile_bounds, f, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
# load as needed
# with open(os.path.join(out_dir, 'tile_bounds.pickle'), 'rb') as f:
#     tile_bounds = pickle.load(f)

In [13]:
tile_paths = [os.path.join(out_dir, 'tiles', p) for p in os.listdir(os.path.join(out_dir, 'tiles'))]

In [17]:
# verify the resulting tile a bit

os.path.getsize(tile_paths[0]) / 1000000  # getsize() is in bytes, so convert to MB

tile = rasterio.open(tile_paths[0])
tile.crs
tile.shape
tile.count  # number of bands
tile.bounds

170.795843

CRS.from_epsg(4326)

(2000, 2000)

11

BoundingBox(left=-71.35731526792055, bottom=3.202912324611524, right=-70.81829527653221, top=3.7419323159998674)

In [18]:
vector_tiler = sol.tile.vector_tile.VectorTiler(dest_dir=os.path.join(out_dir, 'tiles_labels'),
                                                verbose=True)

Preparing the tiler...
Initialization done.


Input `src` to the `vector_tile.tile()` function needs to be a geopandas GeoDataFrame or a GeoJSON. Since our labels come as shape files, we first load them as a GeoDataFrame. 

In [19]:
%%time
landuse_shape = gpd.read_file(label_path)

CPU times: user 52 s, sys: 2.84 s, total: 54.8 s
Wall time: 59.5 s


In [20]:
landuse_shape.dtypes

OBJECTID         int64
AREA_HA        float64
Landuse          int64
Landuse_WC       int64
geometry      geometry
dtype: object

Typo here, it's supposed to be `Landuse_WCS`, but is `Landuse_WC` here

In [40]:
%%time

landuse_shape_exploded = landuse_shape.explode()

CPU times: user 8.97 s, sys: 549 ms, total: 9.52 s
Wall time: 9.6 s


In [41]:
len(landuse_shape)
len(landuse_shape_exploded)

86643

1514140

In [43]:
%%time

landuse_shape_exploded.geometry = landuse_shape_exploded.geometry.buffer(0)

CPU times: user 2min 48s, sys: 15.8 s, total: 3min 4s
Wall time: 3min 12s


Need to explode the multipolygons / buffer by distance 0 because 

```TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point -71.721134404029499 3.4018246278043875 at -71.721134404029499 3.4018246278043875```

(self-intersecting polygons have edges crossing each other, in contrast to simple polygons)

or

```TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.multipolygon.MultiPolygon object at 0x162d8f8d0>```


Only exploding does not make the shapes valid... Exploding takes about 10 seconds on my laptop; buffer by distance 0 takes about 3.5 minutes.

But if you apply `buffer(0)` without first exploding, it takes a super long time (cannot finish). So have to explode first. 

In [44]:
vector_tiler.tile(landuse_shape_exploded,
                  tile_bounds=raster_tiler.tile_bounds)


0it [00:00, ?it/s][A

Num tiles: 9



1it [02:29, 149.75s/it][A
2it [02:35, 106.58s/it][A
3it [03:26, 89.79s/it] [A
4it [03:30, 63.99s/it][A
5it [03:32, 45.63s/it][A
6it [03:55, 38.68s/it][A
7it [03:56, 27.32s/it][A
8it [03:57, 19.62s/it][A
9it [04:03, 27.02s/it][A


## Creating label masks

DataFrame is after exploding and buffering by 0.

In [45]:
tile_labels_paths = [os.path.join(out_dir, 'tiles_labels', p) for p in os.listdir(os.path.join(out_dir, 'tiles_labels'))]

In [49]:
os.mkdir(os.path.join(out_dir, 'tiles_masks'))

In [50]:
im_prefix = img_path.split('/')[-1].split('.tif')[0]

for tile_label_path in tqdm(tile_labels_paths):
    lon_lat = tile_label_path.split('geoms')[1].split('.geojson')[0]
    tile_path = os.path.join(out_dir, 'tiles', im_prefix + lon_lat + '.tif')
    
    fp_mask = sol.vector.mask.footprint_mask(
        df=tile_label_path,
        out_file=os.path.join(out_dir, 'tiles_masks', 'mask{}.png'.format(lon_lat)),  # _ included
        reference_im=tile_path,
        burn_field='Landuse_WC')



  0%|          | 0/9 [00:00<?, ?it/s][A[A

 11%|█         | 1/9 [00:16<02:10, 16.35s/it][A[A

 22%|██▏       | 2/9 [00:17<01:22, 11.78s/it][A[A

 33%|███▎      | 3/9 [00:20<00:54,  9.10s/it][A[A

 44%|████▍     | 4/9 [00:22<00:35,  7.04s/it][A[A

 56%|█████▌    | 5/9 [00:25<00:23,  5.92s/it][A[A

 67%|██████▋   | 6/9 [00:27<00:14,  4.79s/it][A[A

 78%|███████▊  | 7/9 [00:36<00:11,  5.88s/it][A[A

 89%|████████▉ | 8/9 [00:36<00:04,  4.26s/it][A[A

100%|██████████| 9/9 [00:38<00:00,  4.25s/it][A[A
