<img src='https://radiant-assets.s3-us-west-2.amazonaws.com/PrimaryRadiantMLHubLogo.png' alt='Radiant MLHub Logo' width='300'/>

# South Africa Field Boundary Detection Tutorial

## Setting up Environment and Extracting Data

This notebook delves into obtaining and extracting the data from an S3 bucket.

We will set up the path for downloading the data, obtain the field data folder from the S3 bucket, then extract the files from this folder.

In [1]:
from pathlib import Path
import rasterio
import numpy as np
import os
downloads_path = str(Path().resolve()) #change to whichever directory you use

In [12]:
import boto3
s3 = boto3.resource('s3') #use amazon s3 resource for this. we will extract the bucket containing the data files

We will then extract the data to the `extracted_data` directory in the our `data` folder.

In [11]:
import os
def download_s3_folder(bucket_name, s3_folder, local_dir='extracted_data'):
    """
    Download the contents of a folder directory
    Args:
        bucket_name: the name of the s3 bucket
        s3_folder: the folder path in the s3 bucket
        local_dir: a relative or absolute directory path in the local file system
    """
    bucket = s3.Bucket(bucket_name)
    for obj in bucket.objects.filter(Prefix=s3_folder):
        target = obj.key if local_dir is None \
            else os.path.join(local_dir, os.path.relpath(obj.key, s3_folder))
        if not os.path.exists(os.path.dirname(target)): #if extracted_data doesn't exist
            os.makedirs(os.path.dirname(target), exist_ok=True)
        if obj.key[-1] == '/':
            continue
        bucket.download_file(obj.key, target)
        
download_s3_folder('radiant-label-chips', 'south-africa-crops-field-boundary', 'data/extracted_data')


If data has already been obtained as a .zip file, this method can be used to extract its contents: (NOTE that you should only run the below block of code if the data was given in a `zip` archive as stated)

In [None]:
import zipfile
path_to_zip_file = f"{downloads_path}/south-africa-crops-field-boundary.zip"
directory_to_extract_to = f"{downloads_path}/data/extracted_data"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

In this notebook, we will make use of the RGB bands from sentinel-2, where
- Red is band `B04`
- Blue is band `B03`
- Green is band `B02`

We will then load the three bands into a single image using `rasterio` and write it into the `images` directory in our `data` folder as seen below

In [2]:
import rasterio
import numpy as np
import os
if not os.path.isdir(f"{downloads_path}/data/images/"):
    os.makedirs(f"{downloads_path}/data/images/")

source = f"{downloads_path}/data/extracted_data/imagery/s2" #source image
for root, dirs, files in os.walk(source):
    for file in dirs:
        source_files = [
            f"{source}/" + file + "/" + file +"_B04_10m.tif", #RED
            f"{source}/"+ file + "/" + file + "_B03_10m.tif", #BLUE
            f"{source}/" + file + "/" + file + "_B02_10m.tif" #GREEN
                ]

        all_bands = np.zeros((len(source_files), 256, 256), dtype=np.uint8) #three bands
        profile = None
        for i, fil in enumerate(source_files):
            with rasterio.open(fil) as dataset:
                bands = dataset.read()
                all_bands[i] = bands[0] #placing the three bands in one image
                profile = dataset.profile

        profile.update(count=len(source_files))

        with rasterio.open(f"{downloads_path}/data/images/" +file+'.tif', 'w+', **profile) as dst: #source images directory
            dst.write(all_bands)