# Week 7: Accessing the Copernicus Open Access Hub with the SentinelSat library

Individual learning outcomes: At the end of this week, all students should be able to access the Copernicus Open Access Hub via the API, set up and submit a data query and automatically download individual Sentinel-2 images to Google Drive and Colab for further analysis. Students should understand the Sentinel-2 file structure and to pre-process the image automatically, including unzipping, reprojecting, and clipping.

The Copernicus Open Access Hub allows you to do a search for single images. However, there are limitations as to how many images you can request from the Long-Term Archive.

# Get a user account on the Sentinel Open Access Data Hub

Before we begin, make sure to register for an account in the Copernicus Open Access Hub.

For registration follow the link to the Open Access Hub and register: https://scihub.copernicus.eu/dhus/#/home

In previous weeks, we had manually uploaded a Sentinel-2 image to our Google Drive directory. We had also used Google Earth Engine to process multi-temporal Sentinel-2 images into image composites for us.

Today, we want to access the Sentinel Data Hub and search for available images over an area of interest of our choice.

Connect to our Google Drive from Colab.

In [None]:
# Load the Drive helper and mount your Google Drive as a drive in the virtual machine
from google.colab import drive
drive.mount('/content/drive')

## Sentinelsat API

We'll be using an API designed by Wille and Clauss (2016) called sentinelsat. This API was designed to query and download Copernicus product imagery from the Copernicus Open Access Hub API.

Follow the link to the API, and see if you can understand how it works: https://sentinelsat.readthedocs.io/en/stable/

In [None]:
#import required libraries, including the sentinelsat library this time
!pip install rasterio
!pip install sentinelsat
!pip install geopandas
import geopandas as gpd
import rasterio
from rasterio import plot
from rasterio.plot import show_hist
from rasterio.windows import Window
import matplotlib.pyplot as plt
import numpy as np
from sentinelsat.sentinel import SentinelAPI, read_geojson, geojson_to_wkt, geojson
from collections import OrderedDict
import json
import math
import os
from os import listdir
from os.path import isfile, isdir, join
from osgeo import gdal, ogr
from pyproj import Proj
from pprint import pprint
import shutil
import sys
import zipfile
from math import floor, ceil

%matplotlib inline

We need a help function at a later stage, so let's define that now. It converts between lat/lon coordinates and pixel locations in a raster.

In [None]:
# define a helper function that converts latitude, longitude coordinates into pixel locations
def longlat2window(lon, lat, dataset):
    """
    Args:
        lon (tuple): Tuple of min and max lon
        lat (tuple): Tuple of min and max lat
        dataset: Rasterio dataset

    Returns:
        rasterio.windows.Window
    """
    p = Proj(dataset.crs)
    t = dataset.transform
    xmin, ymin = p(lon[0], lat[0])
    xmax, ymax = p(lon[1], lat[1])
    col_min, row_min = ~t * (xmin, ymin)
    col_max, row_max = ~t * (xmax, ymax)
    return Window.from_slices(rows=(floor(row_max), ceil(row_min)),
                              cols=(floor(col_min), ceil(col_max)))


# Accessing Sentinel-2 images

The workflow for this practical is similar to the previous one:
* Define an area of interest based on an ESRI shapefile
* Define a time window for our data search
* Set a maximum acceptable cloud cover for our search
* Search the ESA Copernicus Open Access Hub for all available images
* Select the individual images with the least cloud cover and download them to Google Drive
* Reproject (warp) the TCI images and crop them to our area of interest
* Make a movie for our area of interest

Before proceeding, we have to go to Google Drive and create a text file called "sencredentials.txt" with our login details for the ESA Copernicus Sentinel Hub. 

The file has two lines of text.

Line 1: Your username

Line 2: Your password

Set some directory paths on Google Drive.
BEFORE YOU RUN THIS CELL, EDIT THE VARIABLE wd BELOW TO POINT TO YOUR DIRECTORY ON GOOGLE DRIVE



In [None]:
# set up your directories for the satellite data
# Note that we do all the downloading and data analysis on the temporary drive
#    on Colab. We will copy the output directory to our Google Drive at the end.
#    Colab has more disk space (about 40 GB free space) than Google Drive (15 GB).
#    However, the data on the Colab disk space are NOT kept when you log out.

# path to your Google Drive
# EDIT THIS LINE (/content/drive/My Drive is the top directory on Google Drive):
wd = "/content/drive/My Drive/practicals20-21"
print("Connected to data directory: " + wd)

# path to your temporary drive on the Colab Virtual Machine
cd = "/content/work"

# directory for downloading the Sentinel-2 granules
# Note that we are using the 'join' function imported from the os library here for the first time
# It is an easy way of merging strings into a directory structure.
# It is clever and chooses the / or \ depending on whether you are on Windows or Linux.
downloaddir = join(cd, 'download') # where we save the downloaded images
quickdir = join(cd, 'quicklooks')  # where we save the quicklooks
outdir = join(cd, 'out')           # where we save any other outputs

# CAREFUL: This code removes the named directories and everything inside them to free up space
# Note: shutil provides a lot of useful functions for file and directory management
try:
  shutil.rmtree(downloaddir)
except:
  print(downloaddir + " not found.")

try:
  shutil.rmtree(quickdir)
except:
  print(quickdir + " not found.")

try:
  shutil.rmtree(outdir)
except:
  print(outdir + " not found.")

# create the new directories, unless they already exist
os.makedirs(cd, exist_ok=True)
os.makedirs(downloaddir, exist_ok=True)
os.makedirs(quickdir, exist_ok=True)
os.makedirs(outdir, exist_ok=True)

print("Connected to Colab temporary data directory: " + cd)

print("\nList of contents of " + wd)
for f in sorted(os.listdir(wd)):
  print(f)

# check whether the file with the login details exists
if "sencredentials.txt" not in os.listdir(wd):
  print("\nERROR: File sencredentials.txt not found. Cannot log in to Data Hub.\n")

Set up some directory names. 

Modify these string variables to match your data directory structure.

IMPORTANT: You must upload a shapefile of your area of interest to your Google Drive before running the next cell. Set the variable 'shapefile' below to point to this file. You can draw a polygon and save it as a shapefile on http://www.geojson.io.

In [None]:
# BEFORE YOU RUN THIS BLOCK, YOU NEED A USER ACCOUNT ON THE ESA SENTINEL HUB
# In a browser, go to https://scihub.copernicus.eu/dhus/#/home
# Click on the user symbol in the top right and then on 'sign up'
# Follow the instructions.
# When you have your account, create a .txt file in Word that contains two lines:
#   line 1 - your username
#   line 2 - your password
# save it under the name "sencredentials.txt"
# upload it to the same directory as the Jupyter Notebook on your Google Drive.

# This will allow the notebook to connect to your account on the ESA Data Archive.
credentials = join(wd, 'sencredentials.txt')  # contains two lines of text with username and password

# Download options and Data Hub search parameters

# EDIT THE OPTIONS BELOW
ndown = 3 # number of scenes to be downloaded (in order of least cloud cover), should be less than 11 on Colab due to space

# YOU CAN PLACE A DIFFERENT SHAPEFILE ONTO YOUR GOOGLE DRIVE BUT MAKE SURE THAT
#    THE VARIABLE shapefile POINTS TO THE CORRECT FILE:
shapefile = join(wd, 'oakham', 'Polygons_small.shp') # ESRI Shapefile of the study area

# Define a date range for our search
datefrom = '20190301' # start date for imagery search
dateto   = '20191231' # end date for imagery search

# Define which cloud cover we accept in the images
clouds = '[0 TO 10]' # range of acceptable cloud cover % for imagery search
# Note that later versions of the Sentinelsat package require this in the format: clouds = (0, 10) 

# Search for available image files on the ESA data server

We begin by reading in the user name and password we have saved in our text file 'sencredentials.txt'.

In [None]:
# go to working directory
os.chdir(wd)

# load user credentials for Sentinel Data Hub at ESA, i.e. read two lines of text with username and password
with open(join(wd, credentials)) as f:
    lines = f.readlines()
username = lines[0].strip()
password = lines[1].strip()
f.close()

Now let's create the Sentinel API object in Python.

API stands for 'application programming interface'. An API defines interactions between multiple software intermediaries, in this case between our Jupyter Notebook and the ESA Copernicus Data Hub. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow etc. (text modified after Wikipedia)

In [None]:
# Define the API
api = SentinelAPI(username, password, 'https://scihub.copernicus.eu/dhus')

Before going further, let's just check that the shapefile extent and projection are in order.

In [None]:
# Get the shapefile layer's extent
driver = ogr.GetDriverByName("ESRI Shapefile")
ds = driver.Open(shapefile, 0)
lyr = ds.GetLayer()
extent = lyr.GetExtent()
print("Extent of the area of interest (shapefile):\n", extent)

# get projection information from the shapefile to reproject the images to
outSpatialRef = lyr.GetSpatialRef().ExportToWkt()
ds = None # close file
print("\nSpatial referencing information of the shapefile:\n", outSpatialRef)

Because the Copernicus Open Access Hub requires the region of interest coordinates in a specific format, we define the following helper function.

It takes the corner coordinates of the extent of the shapefile and turns them into a closed polygon object.

In [None]:
# We need to define a helper function that creates a simply bounding box polygon from the 
#   extent of our shapefile in the right format for the Data Hub API.
def bbox(coord_list):
  # Create a Polygon from the extent tuple
  box = ogr.Geometry(ogr.wkbLinearRing)
  box.AddPoint(extent[0],extent[2])
  box.AddPoint(extent[1], extent[2])
  box.AddPoint(extent[1], extent[3])
  box.AddPoint(extent[0], extent[3])
  box.AddPoint(extent[0],extent[2])
  poly = ogr.Geometry(ogr.wkbPolygon)
  poly.AddGeometry(box)
  return poly

# Let's see what it does
print(extent)
print(bbox(extent))

Now we have the extent of our area of interest in the right format for the API, we can query the ESA Sentinel data hub by submitting our options as arguments.

The **kwargs argument list is a flexible way of handing over multiple arguments to a Python function.

In [None]:
# set query parameters and search the Sentinel data hub
kwargs = {
        'area': bbox(extent),
        'date': (datefrom, dateto),
        'platformname': 'Sentinel-2',
        'processinglevel': 'Level-2A',
        'cloudcoverpercentage': clouds
        }

# search the Sentinel data hub API
products = api.query(**kwargs)
# the search returns an ordered dictionary object of the image products
print(type(products))

Let's look at the query results, save them in a format that can be read into Excel and select the images we want to download.

In [None]:
# ordered dictionaries are hard to work with, so we convert the list of image 
#    products to a a Pandas DataFrame
# you can read up on this data structure here (https://pymotw.com/2/collections/ordereddict.html)
products_df = api.to_dataframe(products)
print('Search resulted in '+str(products_df.shape[0])+' satellite images with '+
      str(products_df.shape[1])+' attributes.')

os.chdir(outdir) # set working direcory for output files

# sort the search results
products_df_sorted = products_df.sort_values(['cloudcoverpercentage', 'ingestiondate'], ascending=[True, True])
print(products_df_sorted)

# save the full search results to a .csv file that you can read into Excel
outfile = 'searchresults_full.csv'
products_df_sorted.to_csv(outfile)
print("Search results saved: " + outfile)

# limit the download to the first 'ndown' images 
#   sorted by lowest cloud cover and earliest acquisition date
products_df_n = products_df_sorted.head(ndown)
print(products_df_n)

# save the list of data to be downloaded to a .csv file that you can read into Excel
outfile = 'searchresults4download.csv'
products_df_n.to_csv(outfile)
print("Download list saved: " + outfile)

# Download the individual Sentinel-2 granules to Google Drive
This takes a long time if many images are selected. Each is 100x100 km in size and has bands of 10 m and coarser resolution.


In [None]:
# Download all selected images into a data directory
os.chdir(downloaddir) # set working direcory to download directory

# print the unique image IDs that we will submit to the API
for i in products_df_n['uuid']:
  pprint(i)

# download sorted and reduced products in order
api.download_all(products_df_n['uuid'])

We can save the footprints of the images returned in the query. They can be visualised with QGIS or any other GIS software. 

Try it.

In [None]:
# get the footprints of the selected scenes for use in Excel
s2footprints = products_df_n.footprint
outfile = join(outdir, 'footprints.csv')
s2footprints.to_csv(outfile, header = False)
print("Granule footprints saved as csv: " + outfile)

# save the footprints of the scenes marked for download together with their metadata in a Geojson file

# first, we run a new query to get the metadata for the selected images
# we do this by creating an Ordered Dictionary with all our found images 
products_n = OrderedDict()

# iterate over all unique image IDs in our list and get their metadata into our ordered dictionary
for uuid in products_df_n['uuid']:
    kw = kwargs.copy()
    kw['uuid'] = uuid
    pp = api.query(**kw)
    products_n.update(pp)

# write the footprints and metadata to a geojson file
outfile = join(outdir, 'footprints.geojson')
with open(outfile, 'w') as f:
    json.dump(api.to_geojson(products_n), f)
print("Granule footprints saved as GeoJson: " + outfile)

# Explore the data directory structure of our downloaded files


In [None]:
# where we stored the text files and csv files
os.chdir(outdir)
print("contents of ", outdir, ":")
!ls -l

# where we stored the downloaded Sentinel-2 images
os.chdir(downloaddir)
print("contents of ", downloaddir, ":")
!ls -l

Remember that we have saved the downloaded images to a temporary directory that will be deleted when we close the virtual machine. If you want to save your images to your local directory, this is how it goes.

Go to your Google Colab  folder in the panel on the left hand side.

Find the download directory and click on a Sentinel-2 image folder.

Right-click on it and select 'download' to save it.

# Iterate over all images and show the TCI file


The downloaded Sentinel-2 granules (or single images) are zipped. We need to unzip them first. At this stage, we remove the zipped file to free up disk space.

In [None]:
# set working direcory to download directory
os.chdir(downloaddir)

# get list of all zip files in the data directory
allfiles = [f for f in listdir(downloaddir) if isfile(join(downloaddir, f))]

# unzip all downloaded Sentinel-2 files
for x in range(len(allfiles)):

  # we can split the file name and check whether it ends with '.zip'
  if allfiles[x].split(".")[1] == "zip":
    print("Unzipping file ", x+1, ": ", allfiles[x])

    with zipfile.ZipFile(allfiles[x], "r") as zipf:
      # first extract the files
      zipf.extractall(downloaddir)

      # then remove the zip file to save disk space
      os.remove(join(downloaddir, allfiles[x]))

Before we can create a quick visualisation of the TCI files of all downloaded images, we need to find all the files. They are located in the 20m subdirectories of our downloaded Sentinel-2 directories for each image.
How can we do that? We can iterate over all directories and search for the right files.

In [None]:
# make a list of all TCI files across all downloaded image directories
tciIDs = [] # empty list of all Sentinel-2 granule IDs we have downloaded
tcidirs = [] # empty list of all directory paths pointing to the TCI files
tcifiles = [] # empty list of all TCI file names

# get the list of all directories in the download directory
# there is one directory for each Sentinel-2 image (granule)
dirlist = [d for d in listdir(downloaddir) if isdir(join(downloaddir, d))]
pprint(dirlist)

# make a list of only the directories that start with "S2A" or "S2B"
s2dirlist = []
for y in range(len(dirlist)):
  # remove any subdirectories that are not containing Sentinel-2 data
  # for example, .iPythonNotebook checkpoints that are created automatically
  if dirlist[y].split("_")[0] == "S2A" or dirlist[y].split("_")[0] == "S2B":
    # append this TCI file name to a list of all TCI files found in any directory
    s2dirlist.append(dirlist[y])
pprint(s2dirlist)

# iterate over all Sentinel-2 image directories and show the TCI file to check the image quality on screen
for d in range(len(s2dirlist)):
  # the directory names have the following structure, for example:
  # S2A_MSIL2A_20190919T110721_N0213_R137_T30UXD_20190919T140654.SAFE
  # the first part of the directory name is the granule ID
  # so we split off the ".SAFE" as follows:
  sceneID = s2dirlist[d].split(".")[0] 
  tciIDs.append(sceneID) #append the uniqu identifier to the list

  # find the GRANULE, then L2A_*, then IMG_DATA, then R20m directory
  thisdir = join(downloaddir, s2dirlist[d], "GRANULE")
  print(thisdir)

  # find the full name of the L2A_* subdirectory (contains the scene ID)
  subdirlist = [s for s in listdir(thisdir) if isdir(join(thisdir, s))]
  print(subdirlist)
  for y in range(len(subdirlist)):
    if subdirlist[y].split("_")[0] == "L2A":
      thisdir = join(thisdir, subdirlist[y])

  # add IMG_DATA/R20m to subdirectory, this is where the TCI image is found
  tcidir = join(thisdir, "IMG_DATA", "R20m")
  tcidirs.append(tcidir) # add it to our list

  # find the TCI image file name
  files_20m = [f for f in listdir(tcidir) if isfile(join(tcidir, f))]

  # We split the filename into components based on the underscore _
  # e.g. "T30UXD_20190919T110721_TCI_20m.jp2"
  # becomes ["T30UXD", "20190919T110721", "TCI", "20m.jp2"]
  # so the component indexed 2 should be "TCI" if we have found the file
  for y in range(len(files_20m)):
    if files_20m[y].split("_")[2] == "TCI":
      # append this TCI file name to a list of all TCI files found in any directory
      tcifiles.append(files_20m[y])

# the output looks neater if we print each element of the list of strings in a new line
print("List of all Granule IDs:")
for i in tciIDs:
  print(i)
print("List of all TCI directories:")
for i in tcidirs:
  print(i)
print("List of all TCI files:")
for i in tcifiles:
  print(i)

Now we know which image files we want to show on screen, the rest is easy. Just like last week.

In [None]:
# how many files are in the file list?
nfiles = len(tcifiles)

# arrange our subplots, assuming a 16:9 screen ratio
cols = min(nfiles, 4) # maximum of 4 plots in one row
rows = math.ceil(nfiles / cols) # round up to nearest integer

# create a figure with subplots
fig, ax = plt.subplots(rows, cols, figsize=(21,7))
fig.patch.set_facecolor('white')

# iterate over all Sentinel-2 image directories and show the TCI file to check the image quality on screen
for x in range(nfiles):
  # join the directory path with the file name
  tcifile = join(tcidirs[x], tcifiles[x])

  #open bands as separate single-band raster from the image directory pointing to the 20 m resolution bands
  bandTCI = rasterio.open(tcifile, driver='JP2OpenJPEG') #True Colour Image in uint8 data format

  #plot band using RasterIO as we did last week
  # note that this time we define the axes as an indexable list, so we can iterate over the subplots
  plot.show(bandTCI, ax=ax[x])

  # set a title for the subplot
  mytitle = tciIDs[x]
  ax[x].set_title(mytitle, fontsize=8)

To zoom to our shapefile area, let's warp all TCI images to the same projection as the shapefile. 

Remember we did this last week:

```
ds = gdal.Warp('Sentinel-2_stack_100m_BNG.tiff',
               'Sentinel-2_stack_100m.tiff', dstSRS='EPSG:27700')
ds = None #remember to close and save the output file
```

So let's put GDAL to work.



In [None]:
# get the spatial referencing system of our shapefile into which we want to reproject the TCI images
# remember, we did this when we opened the shapefile earlier and saved it in outSpatialRef
print("Reprojecting all TCI images to the following projection:")
print(outSpatialRef)

warpfiles = [] # make an empty list where we can remember all the warped output file names

# iterate over all Sentinel-2 image directories and warp the image
for x in range(nfiles):
  # join the directory path with the file name
  tcifile = join(tcidirs[x], tcifiles[x])

  # make a directory path and file name for the warped output file
  warpfile = join(quickdir, tciIDs[x] + "_warped.jp2")
  warpfiles.append(warpfile) # add it to our list

  # call the GDAL Warp command
  ds = gdal.Warp(warpfile, tcifile, dstSRS=outSpatialRef)
  ds = None #remember to close and save the output file

pprint(warpfiles)

In [None]:
# open the shapefile for plotting
driver = ogr.GetDriverByName("ESRI Shapefile")
ds = driver.Open(shapefile, 0)

# plot the warped images
# Improve the function to convert uint16 to unit8 and rescale to 0-255
# to include the visualisation commands

# create a figure with subplots
fig, ax = plt.subplots(rows, cols, figsize=(21,7))
fig.patch.set_facecolor('white')

# We will use the Geopandas library for plotting the shapefile on top of the raster image.
# This is the easiest available option for plotting.

# iterate over all Sentinel-2 image directories and show the TCI file to check the image quality on screen
for f in range(nfiles):
  # join the directory path with the file name
  tcifile = warpfiles[f]

  #open bands as separate single-band raster from the image directory pointing to the 20 m resolution bands
  bandTCI = rasterio.open(tcifile, 'r') #True Colour Image in uint8 data format

  #plot band using RasterIO as we did last week
  # note that this time we define the axes as an indexable list, so we can iterate over the subplots
  plot.show(bandTCI, ax=ax[f])
  
  # plot the shapefile using Geopandas
  shp = gpd.read_file(shapefile)
  shp.boundary.plot(ax=ax[f], edgecolor="yellow")
  
  # set a title for the subplot
  mytitle = tciIDs[f]
  ax[f].set_title(mytitle, fontsize=8)
  
  #close the file
  bandTCI.close()

#Clip the raster

Let's do the same visualisation, but zoom into our shapefile area. 

We will convert the output file to Geotiff, as this causes fewer problems than JPEG2000. 

You can look up all available options for the Python GDAL library gdal.Warp function here: https://gdal.org/python/osgeo.gdal-pysrc.html 

In [None]:
# clip the files in GDAL
# make an empty list where we will remember all zoom file names
zoomfiles = [] 

# create a figure with subplots
fig, ax = plt.subplots(rows, cols, figsize=(21,7))
fig.patch.set_facecolor('white')

# iterate over all warped Sentinel-2 TCI files to check the image quality on screen
for x in range(nfiles):
  warpfile = warpfiles[x]
  print(warpfile)

  # make the filename of the new zoom image file
  zoomfile = warpfile.split(".")[0] + "_zoom.tif"
  zoomfiles.append(zoomfile) # remember the zoom file name in our list
  print(zoomfile)

  # clip it with rasterio to the shapefile extent
  # rasterio offers an option called 'window' to load a subset of a raster file

  # open the source file
  with rasterio.open(warpfile, 'r') as src:
    
    # convert the shapefile extent to a rasterio window object
    window = longlat2window((extent[0], extent[1]), (extent[2], extent[3]), src)
    print(window)
    
    # read all bands but only for the window extent
    arr = src.read(window=window, out_shape=(src.count, window.height, window.width))
    print(arr.shape)

    # get the data type
    dt = arr.dtype

    # open the destination file
    with rasterio.open(zoomfile, 'w', driver='Gtiff', width=window.width, 
                       height=window.height, count=src.count, crs=src.crs, 
                       transform=src.transform, dtype=dt) as dst:
      
      # iterate over all bands in the source file and write them to the destination file
      for b in range(arr.shape[0]):
        dst.write(arr[b,:,:], b+1)

      # close the destination file
      dst.close()

    # close the sourcefile
    src.close()
  
  # make maps of the destination files
  with rasterio.open(zoomfile, "r") as img:
    plot.show(img, ax=ax[x])
    # set a title for the subplot
    mytitle = tciIDs[x]
    ax[x].set_title(mytitle, fontsize=8)
    # ax[x].set_xlim(extent[0], extent[1])
    # ax[x].set_ylim(extent[2], extent[3])
    
os.chdir(quickdir)
!ls -l

# make a movie from all quicklooks of our area of interest

We will use the imageio library to do this efficiently.

In [None]:
import imageio
# create an empty Numpy array where we will merge all raster images
images = []
# iterate over all zoom files
for f in zoomfiles:
    images.append(imageio.imread(f)) # read the next image and append it
imageio.mimsave(join(outdir, "movie_fast.gif"), images)

# We can improve it by slowing down the movie.
# Let's set the frame rate to 3 seconds.
framerate = { 'duration': 3 }
imageio.mimsave(join(outdir, "movie_slow.gif"), images, **framerate)

Now download the file movie.gif from Colab using the folder icon on the left hand side. Locate the files in the 'out' directory, right-click and select 'download'. Save them both to your local hard drive and open them to view them.

Before we end this session, we want to copy our downloaded Sentinel-2 data from Colab to Google Drive where the data will not be lost when this virtual machine terminates.

You could do it via the Linux command line like this:

```
!cp -r '/content/work/download' '/content/drive/MyDrive/practicals20-21'
```

Or in a more pythonic way using the shutil library. The copytree function copies a directory and all its contents to a destination folder.


In [None]:
# Copy the content of the download directory we created in Colab on a temporary drive
#    to the Google Drive partition, where it is not deleted when this session ends

# name of our new directory on Google Drive
# NOTE THAT THIS WILL BE DELETED AND OVERWRITTEN!
target_dir = "/content/drive/MyDrive/practicals20-21/download"

def copy_and_overwrite(from_path, to_path, delete_target_dir=True):
  '''
  Modified from: https://stackoverflow.com/questions/12683834/how-to-copy-directory-recursively-in-python-and-overwrite-all
  '''
  if os.path.exists(to_path):
    if delete_target_dir:
      shutil.rmtree(to_path)
      shutil.copytree(from_path, to_path)
    else:
      print("Error: Target directory exists and delete_target_dir is set to False.")
  return()

temp = copy_and_overwrite(downloaddir, target_dir, delete_target_dir=True)
