# [Dependencies](https://spacenetchallenge.github.io/#Dependencies)
> The [AWS Command Line Interface (CLI)](https://aws.amazon.com/cli/) must be installed with an active AWS account. Configure the AWS CLI using ‘aws configure’

# [Accessing the SpaceNet Data on AWS](https://aws.amazon.com/public-datasets/spacenet/#Accessing_the_SpaceNet_Data_on_AWS)
> The SpaceNet dataset is being released in several Areas of Interest. All AOIs will follow a similar directory structure and data format. The imagery is GeoTIFF satellite imagery and corresponding GeoJSON building footprints. You can use the following [aws-cli](https://aws.amazon.com/cli/) command to examine all files available in the dataset (details of file structure below):

> `aws s3 ls spacenet-dataset --request-payer requester`

> For more detailed information on how to access specific files within the dataset, see [here](https://github.com/SpaceNetChallenge/utilities/tree/master/content/download_instructions).

> _The spacenet-dataset S3 bucket is provided as a Requester Pays bucket, see [here](https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html) for more information._

# Downloading Rio raster and vector data with [Boto](https://boto3.readthedocs.io/en/latest/index.html)
Since the bucket is Request Pays, we cannot successfully curl images. Instead, Boto, the AWS SDK for Python, provides an interface to download files from Request Pays buckets. The [S3Transfer](https://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer) class has a download method that can take in a 'RequestPayer' argument.

In [1]:
import os
import boto3

client = boto3.client("s3")
transfer = boto3.s3.transfer.S3Transfer(client)

bucket = "spacenet-dataset"
aoi_path = "AOI_1_Rio"
aoi_data_path = os.path.join(aoi_path, "srcData")
building_labels_path = os.path.join(aoi_data_path, "buildingLabels")
mosaic_3band_path = os.path.join(aoi_data_path, "mosaic_3band")

def download_if_not_exists(key, filename):
    if not os.path.exists(filename):
        transfer.download_file(
            bucket=bucket, key=key, filename=filename,
            extra_args={"RequestPayer": "requester"})

def download_rio_vector_file(geojson_name):
    filename = os.path.join("/tmp", geojson_name)
    key = os.path.join(building_labels_path, geojson_name)
    download_if_not_exists(key, filename)
    return filename

outline_filename = download_rio_vector_file("Rio_OUTLINE_Public_AOI.geojson")
buildings_filename = download_rio_vector_file("Rio_Buildings_Public_AOI_v2.geojson")

def list_objects(prefix):
    return client.list_objects_v2(
        Bucket=bucket, Prefix=prefix,
        RequestPayer='requester')
    
def list_keys(prefix):
    objects = list_objects(prefix)["Contents"]
    return [obj["Key"] for obj in objects]

def download_rio_raster_file():
    mosaic_3band_key = list_keys(mosaic_3band_path)[0]
    mosaic_3band_tiff = mosaic_3band_key.split("/")[-1]
    mosaic_3band_filename = os.path.join("/tmp", mosaic_3band_tiff)
    download_if_not_exists(mosaic_3band_key, mosaic_3band_filename)
    return mosaic_3band_filename

mosaic_3band_filename = download_rio_raster_file()

# Wrangling imagery with [GDAL](http://www.gdal.org/gdal_translate.html)
Since "Compression type JPEG is not supported by [this reader](https://github.com/locationtech/geotrellis/blob/master/raster/src/main/scala/geotrellis/raster/io/geotiff/compression/Decompressor.scala#L119-L122)" at the time of this demo, we need to [gdal_translate](http://www.gdal.org/gdal_translate.html) the image with a different compression type.

In [2]:
from osgeo import gdal

catalog_uri = os.path.join("/tmp", "catalog.tif")

if not os.path.exists(catalog_uri):
    gdal.Translate(
        destName=catalog_uri, srcDS=mosaic_3band_filename,
        creationOptions=['COMPRESS=LZW']
)

# Ingesting imagery for fast viewing with [GeoPySpark](https://github.com/locationtech-labs/geopyspark)

In [3]:
import geopyspark as gps
from pyspark import SparkContext
conf = gps.geopyspark_conf("local[*]", "spacenet-ingest")
conf.set(key='spark.ui.enabled', value='true')
sc = SparkContext.getOrCreate(conf)

catalog_uri = "file:///home/hadoop/notebooks/catalog.tif"
# The following operation takes about X seconds on a reasonably capable 4-core laptop
rdd = gps.geotrellis.geotiff.get(
    gps.geotrellis.constants.LayerType.SPATIAL, 
    catalog_uri,
    max_tile_size=512,
    num_partitions=500)

laid_out = rdd.tile_to_layout(layout = gps.GlobalLayout(), target_crs=3857)
reprojected = laid_out.reproject("EPSG:3857").cache().repartition(600)
pyramided = reprojected.pyramid(start_zoom=12, end_zoom=1)

for tiled in pyramided:
    gps.geotrellis.catalog.write("file:///tmp/spacenet-catalog", "spacenet-ingest", tiled)

Py4JJavaError: An error occurred while calling z:geopyspark.geotrellis.io.geotiff.GeoTiffRDD.get.
: java.io.IOException: No matching file(s) for path: file:/home/hadoop/notebooks/catalog.tif
	at geotrellis.spark.io.hadoop.HdfsUtils$.listFiles(HdfsUtils.scala:93)
	at geotrellis.spark.io.hadoop.package$withHadoopConfigurationMethods.withInputDirectory(package.scala:44)
	at geotrellis.spark.io.hadoop.package$withHadoopConfigurationMethods.withInputDirectory(package.scala:61)
	at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$.configuration(HadoopGeoTiffRDD.scala:79)
	at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$.apply(HadoopGeoTiffRDD.scala:91)
	at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$.apply(HadoopGeoTiffRDD.scala:126)
	at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$.multiband(HadoopGeoTiffRDD.scala:207)
	at geotrellis.spark.io.hadoop.HadoopGeoTiffRDD$.spatialMultiband(HadoopGeoTiffRDD.scala:256)
	at geopyspark.geotrellis.io.geotiff.GeoTiffRDD$.geopyspark$geotrellis$io$geotiff$GeoTiffRDD$$getHadoopGeoTiffRDD(GeoTiffRDD.scala:125)
	at geopyspark.geotrellis.io.geotiff.GeoTiffRDD$$anonfun$get$1.apply(GeoTiffRDD.scala:104)
	at geopyspark.geotrellis.io.geotiff.GeoTiffRDD$$anonfun$get$1.apply(GeoTiffRDD.scala:93)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at geopyspark.geotrellis.io.geotiff.GeoTiffRDD$.get(GeoTiffRDD.scala:93)
	at geopyspark.geotrellis.io.geotiff.GeoTiffRDD.get(GeoTiffRDD.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)


# Showing Rio’s outline, imagery, and building footprints on a map with [GeoNotebook](https://github.com/OpenGeoscience/geonotebook)

In [4]:
from geonotebook.wrappers import VectorData
outline_vector = VectorData(outline_filename)
outline_polygons = [polygon for polygon in outline_vector.polygons]
outline_polygon = outline_polygons[0]
outline_centroid = outline_polygon.centroid
x = outline_centroid.x
y = outline_centroid.y
z = 12
M.set_center(x, y, z);
M.add_layer(outline_vector, name="outline");

In [5]:
def render_image(tile):
    cells = tile.cells
    # Color correct - use magic numbers
    magic_min, magic_max = 4000, 15176
    norm_range = magic_max - magic_min
    cells = cells.astype('int32')
    # Clamp cells
    cells[(cells != 0) & (cells < magic_min)] = magic_min
    cells[(cells != 0) & (cells > magic_max)] = magic_max
    colored = ((cells - magic_min) * 255) / norm_range
    (r, g, b) = (colored[2], colored[1], colored[0])
    alpha = np.full(r.shape, 255)
    alpha[(cells[0] == tile.no_data_value) & \
          (cells[1] == tile.no_data_value) & \
          (cells[2] == tile.no_data_value)] = 0
    rgba = np.dstack([r,g,b, alpha]).astype('uint8')
    #return Image.fromarray(colored[1], mode='P')
    return Image.fromarray(rgba, mode='RGBA')

tms_server = gps.TMS.build(pyramid, display=render_image)
M.add_layer(TMSRasterData(tms_server), name="mosaic")

NameError: name 'pyramid' is not defined

In [6]:
buildings_vector = VectorData(buildings_filename)
# M.add_layer(buildings_vector, name="buildings");