# Landsat Collection 2 downloader
abolfazl.irani@eawag.ch

The code is originally from Michael Brechbühler (eawag).

This notebook uses the [landsatxplore](https://github.com/yannforget/landsatxplore) python package to download Landsat Collection 2 scenes directly from USGS.

### Setup environment
To setup a new environment (using conda):
> conda create --name landsatxplore -c conda-forge python=3 notebook nb_conda_kernels geopandas hvplot cartopy geoviews

To setup a new environment (using conda and environment.yml):
> conda env create -n landsatxplore --file environment.yml

### Imports

In [2]:
import geopandas as gpd
import pandas as pd
import os
import matplotlib.pyplot as plt
import shapely
import hvplot.pandas
from tqdm import tqdm
from pathlib import Path
import tarfile
from contextlib import closing

import landsatxplore
from landsatxplore.earthexplorer import EarthExplorer
from landsatxplore.api import API

## 1. Search query

### USGS EROS credentials
To allow queries and file downloads from the USGS EROS webserver a free account is necessary (https://ers.cr.usgs.gov/register)

In [3]:
# Your USGS  credentials

username = "nls_rtz_1"
password = "b2XRiOz5Cmt9uYIfwtFq"

Now we can set the desired search parameters for our Landsat query. We also have to define which Landsat collection to use.

| Dataset Name | Dataset ID |
| :-- | -- |
| Landsat 5 TM Collection 2 Level 1 | `landsat_tm_c2_l1` |
| Landsat 5 TM Collection 2 Level 2 | `landsat_tm_c2_l2` |
| Landsat 7 ETM+ Collection 2 Level 1 | `landsat_etm_c2_l1` |
| Landsat 7 ETM+ Collection 2 Level 2 | `landsat_etm_c2_l2` |
| Landsat 8 Collection 2 Level 1 | `landsat_ot_c2_l1` |
| Landsat 8 Collection 2 Level 2 | `landsat_ot_c2_l2` |
| Landsat 9 Collection 2 Level 1 | `landsat_ot_c2_l1` |
| Landsat 9 Collection 2 Level 2 | `landsat_ot_c2_l2` |

### Search parameters

In [6]:
# Search parameters
# Load AOIs
aois = gpd.read_file("../../data/feature_layers/aois_test.geojson")

bbox = tuple(list(aois.total_bounds)) # (lonmin, latmin, lonmax, latmax)

months = [1,2,3,4,5,6,7,8,9,10,11,12]

search_params = {
    'dataset': 'landsat_ot_c2_l2',
    'bbox': bbox, 
    'start_date': '2020-05-01',
    'end_date': '2020-07-31',
    'months': months,
    'max_cloud_cover': 50,
    'max_results': 100, # Defaults to 100 if not specified
}

### Plot ROI

In [7]:
# Plot bounding box on map
bbox_poly = shapely.geometry.box(*bbox, ccw=True)
df_bbox = gpd.GeoDataFrame([], geometry=[bbox_poly])

plot_bbox = df_bbox.hvplot(title='ROI bounding box', geo=True, 
                           fill_alpha=0.2, color='red', line_color='red', 
                           #tiles='CartoLight'
                           tiles='OSM', label='ROI'
                          )

plot_bbox

### Start search query

In [9]:
# initialize a new API instance
api = API(username, password)

# search for Landsat 8-9 L2C2 scenes
scenes = api.search(**search_params)

# logout
api.logout()

# create a GeoDataFrame from the returned metadata
df_scenes = pd.DataFrame(scenes)
df_scenes = df_scenes[['display_id','wrs_path', 'wrs_row','satellite','cloud_cover','acquisition_date', 'spatial_coverage']]
df_scenes.sort_values('acquisition_date', ascending=False, inplace=True)
df_scenes['satellite'] = df_scenes.satellite.apply(lambda x: f'Landsat-{x}')
df_scenes['tile'] = df_scenes.display_id.apply(lambda x: x.split('_')[2][:3]+'/'+x.split('_')[2][3:])
gdf_scenes = gpd.GeoDataFrame(df_scenes.drop(columns='spatial_coverage'), geometry=df_scenes.spatial_coverage, crs='EPSG:4326')

gdf_scenes.head()

Unnamed: 0,display_id,wrs_path,wrs_row,satellite,cloud_cover,acquisition_date,tile,geometry
0,LC08_L2SP_115010_20200726_20200908_02_T1,115,10,Landsat-8,50,2020-07-26,115/010,"POLYGON ((145.32366 70.49442, 149.88550 69.736..."
2,LC08_L2SP_117011_20200724_20200911_02_T1,117,11,Landsat-8,50,2020-07-24,117/011,"POLYGON ((140.57727 69.15467, 144.91820 68.436..."
1,LC08_L2SP_117010_20200724_20200910_02_T1,117,10,Landsat-8,31,2020-07-24,117/010,"POLYGON ((142.23395 70.49485, 146.79596 69.736..."
3,LC08_L2SP_119009_20200722_20200911_02_T1,119,9,Landsat-8,26,2020-07-22,119/009,"POLYGON ((141.01866 71.81961, 145.81655 71.015..."
4,LC08_L2SP_119010_20200722_20200911_02_T1,119,10,Landsat-8,17,2020-07-22,119/010,"POLYGON ((139.14517 70.49492, 143.70700 69.736..."


### Plot available tiles over ROI

In [10]:
gdf_tiles = gdf_scenes.groupby('tile').first().reset_index()[['tile', 'wrs_path', 'wrs_row', 'geometry']].set_crs('EPSG:4326')
gdf_labels = gdf_tiles.set_geometry(gdf_tiles.centroid)

plot_labels = gdf_labels.to_crs("EPSG:3857").assign(x=lambda df: df.geometry.x, y=lambda df: df.geometry.y).hvplot.labels(text="tile", x="x", y="y", text_color='white')

plot_tiles = gdf_tiles.hvplot(title='Available tiles over ROI', geo=True,
                              fill_alpha=0.3, line_color='blue',
                              height=800, tiles='OSM', label='Landsat tile'
                             ) * plot_bbox * plot_labels

plot_tiles


  gdf_labels = gdf_tiles.set_geometry(gdf_tiles.centroid)


## 2. Download scenes
### Filter and select scene IDs

In [12]:
# Select scene ids to download from dataframe
#ids = df_scenes.display_id.values # all
# df_scenes_filt = df_scenes.loc[(df_scenes.cloud_cover < 2) & (df_scenes.tile=='196/028')] # filter by cloudcover and tile
df_scenes_filt = df_scenes.loc[(df_scenes.cloud_cover < 10) & (df_scenes.tile!='196/027')] # filter by cloudcover and tile
ids = df_scenes_filt.display_id.values
df_scenes_filt.head()

Unnamed: 0,display_id,wrs_path,wrs_row,satellite,cloud_cover,acquisition_date,spatial_coverage,tile
5,LC08_L2SP_120009_20200713_20200912_02_T1,120,9,Landsat-8,9,2020-07-13,"POLYGON ((139.46994 71.81984, 144.26817 71.015...",120/009
11,LC08_L2SP_116010_20200701_20200913_02_T1,116,10,Landsat-8,8,2020-07-01,"POLYGON ((143.77611 70.49487, 148.3388 69.7360...",116/010
14,LC08_L2SP_120010_20200627_20200823_02_T1,120,10,Landsat-8,0,2020-06-27,"POLYGON ((137.59367 70.4948, 142.15663 69.7360...",120/010
17,LC08_L2SP_115009_20200624_20200823_02_T1,115,9,Landsat-8,3,2020-06-24,"POLYGON ((147.19332 71.8201, 151.99276 71.0155...",115/009
20,LC08_L2SP_119009_20200620_20200823_02_T1,119,9,Landsat-8,1,2020-06-20,"POLYGON ((141.0183 71.81998, 145.81764 71.0152...",119/009


### Download .tar files

In [None]:
# set output directory
path_output_dir = Path('../../data/C2L2')

# initialize the API
ee = EarthExplorer(username, password)

# download the scenes
for id in tqdm(ids, total=len(ids), desc="Total download progress", position=-1):
    path_output_file = path_output_dir.joinpath(id+'.tar')
    try:
        ee.download(id, output_dir=path_output_dir)
        print('{} successful'.format(id))

  # aditional error handling
    except:
        if path_output_file.exists():
            print(f'{id} error but file exists')
        else:
            print(f'{id} error')

ee.logout()



Download failed with dataset id 1 of 3. Re-trying with the next one.


100%|██████████| 775M/775M [00:25<00:00, 32.4MB/s] 


None of the archived ids succeeded! Update necessary!
LC08_L2SP_120009_20200713_20200912_02_T1 error but file exists
Download failed with dataset id 1 of 3. Re-trying with the next one.


100%|██████████| 840M/840M [00:25<00:00, 34.7MB/s] 


None of the archived ids succeeded! Update necessary!
LC08_L2SP_116010_20200701_20200913_02_T1 error but file exists
Download failed with dataset id 1 of 3. Re-trying with the next one.


848MB [00:24, 35.9MB/s]                            


None of the archived ids succeeded! Update necessary!
LC08_L2SP_120010_20200627_20200823_02_T1 error but file exists
Download failed with dataset id 1 of 3. Re-trying with the next one.


100%|██████████| 710M/710M [00:34<00:00, 21.5MB/s] 


None of the archived ids succeeded! Update necessary!
LC08_L2SP_115009_20200624_20200823_02_T1 error but file exists
Download failed with dataset id 1 of 3. Re-trying with the next one.


 64%|██████▍   | 529M/826M [00:20<00:11, 26.8MB/s] 


LC08_L2SP_119009_20200620_20200823_02_T1 error but file exists
Download failed with dataset id 1 of 3. Re-trying with the next one.


 22%|██▏       | 180M/834M [00:06<00:18, 37.7MB/s] 

In [59]:
df_scenes_filt.to_csv(path_output_dir.joinpath('metadata_L8L9_'+search_params['start_date']+'to'+search_params['end_date']+'.csv'))

### Extract .tar files

In [61]:
path_output_dir

WindowsPath('data/C2L2')

In [62]:
# Extract .tar files into folders
paths_output = list(path_output_dir.glob('*.tar'))

for path in tqdm(paths_output, desc="Total extraction progress"):
    with closing(tarfile.open(path)) as fl:
        path_output_folder = path_output_dir.joinpath(path.name.split('.')[0])
        path_output_folder.mkdir(parents=True, exist_ok=True)
        fl.extractall(path_output_folder)
    path.unlink()

Total extraction progress: 100%|█████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.26s/it]
