![header](https://i.imgur.com/I4ake6d.jpg)

# IN SITU BLACK SEA TRAINING
<div style="text-align: right"><i> 13-02-Part-two-out-of-five </i></div>

# BS `NRT` Product/dataset Subsetting & Download

***

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc">
    <ul class="toc-item">
        <li><span><a href="#Introduction" data-toc-modified-id="Introduction">Introduction</a></span></li>
        <li>
            <span><a href="#Setup" data-toc-modified-id="Setup">Setup</a></span>
            <ul>
                <li><span><a href="#Python-packages" data-toc-modified-id="Python-packages">Python packages</a></span></li>
                <li><span><a href="#Copernicus-Database" data-toc-modified-id="Copernicus-Database">Copernicus database</a></span></li>
                <li><span><a href="#Auxiliary-functions" data-toc-modified-id="Setup">Auxiliary functions</a></span></li>
            </ul>
        </li>
        <li><span><a href="#Getting-started" data-toc-modified-id="Getting-started">Getting started</a></span></li>
        <li><span><a href="#Operations" data-toc-modified-id="Operations">Operations</a></span>
        <ul>
            <li>
            <span><a href="#Subsetting" data-toc-modified-id="Subsetting">Subsetting</a></span>
            <ul>
                <li><span><a href="#By-collection" data-toc-modified-id="By-collection">By collection</a></span></li>
                <li><span><a href="#By-time-range" data-toc-modified-id="By-time-range">By time range</a></span></li>
                <li><span><a href="#By-bounding-box" data-toc-modified-id="By-bounding-box">By bounding-box</a></span></li>
                <li><span><a href="#By-last-position" data-toc-modified-id="By-last-position">By last position</a></span></li>
                <li><span><a href="#By-data-type" data-toc-modified-id="By-data-type">By data type</a></span></li>
                <li><span><a href="#By-file-type" data-toc-modified-id="By-file-type">By file type</a></span></li>
                <li><span><a href="#By-parameter" data-toc-modified-id="By-parameter">By parameter</a></span></li>
                <li><span><a href="#By-platform-code" data-toc-modified-id="By-platform-code">By platform code</a></span></li>
                <li><span><a href="#By-provider" data-toc-modified-id="By-provider">By provider</a></span></li>
                <li><span><a href="#Subsetting-by-several-criterias-at-once" data-toc-modified-id="Subsetting-by-several-criterias-at-once">By several criterias at once</a></span></li>
            </ul>
            </li>
            <li><span><a href="#Exporting" data-toc-modified-id="Exporting">Exporting</a></span></li>
            <li><span><a href="#Downloading" data-toc-modified-id="Downloading">Downloading</a></span></li>          
            </ul>
        </li>
        </ul>
        </li>
        <li><span><a href="#Wrap-up" data-toc-modified-id="Wrap-up">Wrap-up</a></span></li>
        <li><span><a href="#Next-tutorial" data-toc-modified-id="Next-tutorial">Next tutorial</a></span></li>
    </ul>
</div>

***

## Introduction 

This notebook focus on selecting (subsetting) and downloading netCDF files from the available collections (`latest`, `monthly` and `history`) available within the In Situ Near Real Time product/dataset covering the Black Sea:`INSITU_BS_NRT_OBSERVATIONS_013_034`. Click [here](http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=INSITU_BS_NRT_OBSERVATIONS_013_034) to  view the dedicated section of this product within the [CMEMS Catalog]('http://marine.copernicus.eu/services-portfolio/access-to-products/').

Any In Situ NRT product/dataset is just a bunch of netcdf files produced by the platforms (*drifters, profilers, gliders, moorings, HF-radars, vessels etc*) deployed in a certain area (in this case, the Black Sea); so many files that navigation/subsetting can be challenging. 

| ![BS.gif](img/BS.gif)| 
|:--:| 
| *GIF of the location (point or trajectory) of the platforms providing near real time data in the BS area since 1922* |

To smooth the process of data discovery to users, the In Situ TAC provides a set of `index files` that describe the aforementioned netCDF collections. In addition to these index files, it is also possible to find further info regarding each platform contributting with files on the `index_platform.txt`.

| Index |  Description |
| ------- | ----------- |
| `index_latest.txt`  |  List of available files within the latest collection + metadata | 
| `index_monthly.txt`   | List of available files within the monthly collection + metadata |
| `index_history.txt`   |  List of available files within the history collection + metadata |
| `index_platform.txt`   | Full list of platforms + metadata |

Check more about these files in the <a href="https://archimer.ifremer.fr/doc/00324/43494/" target="_blank">Product User Manual</a>

<div class="alert alert-block alert-success">
<b>OBJECTIVE</b>
    
***  
To select and download only those netCDFs matching our needs from the whole original set of files that composes the BS NRT product/dataset by using the aforementioned index files. 

## Setup

### Python packages

For the notebook to properly run we need to first load the next packages available from the Jupyter Notebook Ecosystem. Please run the `next cell`:

In [None]:
import warnings
warnings.filterwarnings("ignore")

import IPython
import pandas as pd
import datetime
import random
import os
from collections import namedtuple
import ftputil
from shapely.geometry import box, Point
from urllib.parse import urlparse
import folium
from folium import plugins

<div class="alert alert-block alert-warning">
<b>WARNING</b>
    
***  
If any of them raises any error it means you need to install the module first. For doing so please:
1. Open a new cell int he notebook
2. Run <i>`!conda install packageName --yes`</i> or <i>`!conda install -c conda-forge packageName --yes`</i> or <i>`!pip install packageName`</i>
3. Import again!
<br><br>
Example: <i>how-to-solve import error for json2html module </i>

![region.png](img/errorImporting.gif)

### Copernicus Database

Please `set next your CMEMS User credentials` in the next cell and `run the cell` afterwards:

In [None]:
usr = 'inputHereYourCMEMSUser'
pas = 'inputHereYourCMEMSPassword'

<div class="alert alert-block alert-warning">
<b>WARNING</b>
    
***  
**Don't you have credentials yet?** <br>Please go [here](https://resources.marine.copernicus.eu/?option=com_sla) to get the above credentials to be able to access CMEMS secured FTP server.

As stated before, we will focus on the Near Real Time product/dataset covering the Iberian-Biscay-Ireland seas so, please `run the next` to load the info defining such product/dataset:

In [None]:
dataset = {
    'host': 'nrt.cmems-du.eu',#ftp host => nrt.cmems-du.eu for Near Real Time products
    'product': 'INSITU_BS_NRT_OBSERVATIONS_013_034',#name of the In Situ Near Real Time product in the BS area
    'name': 'bs_multiparameter_nrt',#name of the dataset available in the above In Situ Near Real Time product
    'index_files': ['index_latest.txt', 'index_monthly.txt', 'index_history.txt'],#files describing the content of the lastest, monthly and history netCDF file collections available withint he above dataset
    'index_platform': 'index_platform.txt',#files describing the netwotk of platforms contributting with files in the abve collections
}

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
In case you want to explore any other In Situ NRT product/dataset just set the above definitions accordingly and you will be able to reproduce the subsetting and downloading we will perform next.

### Auxiliary functions

For exploring the product/dataset we will use a set of files called `index files` located in the CMEMS FTP server. 

1. **Getting the linsk to the index files**.
In order to get the links to download manualy the most updated version of these files we will use the next function. <br>Please `run the next cell` to load it in memory for later use: 

In [None]:
def getIndexFileLinks(usr,pas,dataset):
    #Provides the link to download each index file from the ftp server
    url = os.path.join('ftp://', dataset['host'],'Core',dataset['product'],dataset['name'])
    indexes = dataset['index_files'] + [dataset['index_platform']]
    for index in indexes:
        index_url = os.path.join(url,index)
        urlParsed = urlparse(index_url)
        index_url = index_url.replace(urlParsed.netloc, usr + ':' + pas + '@' + urlParsed.netloc)
        print('...Click and download '+index+' from:')
        print(index_url)

2. **Index files reader**.
In order to load the information contained in *each* of the files we will use the next function. <br>Please `run the next cell` to load it in memory for later use: 

In [None]:
def readIndexFileFromCWD(path2file):
    #Load as pandas dataframe the file in the provided path
    with open(path2file, 'rb') as f:
        filename = os.path.basename(path2file)
        print('...Loading info from: '+filename)
        raw_index_info = pd.read_csv(path2file, skiprows=5)
    return raw_index_info

3. **Index files merger**.
In order to load the information contained in *all* the indexes in a one single entity we will use the next function. <br>Please `run the next cell` to load it in memory for later use: 

In [None]:
def getIndexFilesInfo(usr, pas, dataset_info):
    # Load and merge in a single entity all the information contained on each file descriptor of a given dataset
    # 1) Loading the index platform info as dataframe
    path2file = os.path.join(os.getcwd(),'data', 'index_files', dataset['index_platform'])
    indexPlatform = readIndexFileFromCWD(path2file)
    indexPlatform.rename(columns={indexPlatform.columns[0]: "platform_code" }, inplace = True)
    indexPlatform = indexPlatform.drop_duplicates(subset='platform_code', keep="first")
    # 2) Loading the index files info as dataframes
    netcdf_collections = []
    for filename in dataset['index_files']:
        path2file = os.path.join(os.getcwd(),'data', 'index_files',filename)
        indexFile = readIndexFileFromCWD(path2file)
        netcdf_collections.append(indexFile)
    netcdf_collections = pd.concat(netcdf_collections)
    # 3) creating new columns: derived info
    netcdf_collections['netcdf'] = netcdf_collections['file_name'].str.split('/').str[-1]
    netcdf_collections['file_type'] = netcdf_collections['netcdf'].str.split('.').str[0].str.split('_').str[1]
    netcdf_collections['data_type'] = netcdf_collections['netcdf'].str.split('.').str[0].str.split('_').str[2]
    netcdf_collections['platform_code'] = netcdf_collections['netcdf'].str.split('.').str[0].str.split('_').str[3]
    #4) Merging the information of all files
    headers = ['platform_code','wmo_platform_code', 'institution_edmo_code', 'last_latitude_observation', 'last_longitude_observation','last_date_observation']
    result = pd.merge(netcdf_collections,indexPlatform[headers],on='platform_code')
    print('Ready!')
    return result

Also, let's load also the next functions as the are to be used later in the subsetting processing: `run the next cells`

<ul><li id="TemporalOverlap"><i>Time-Overlap</i> function definition</li></ul>

In [None]:
def timeOverlap(row, targeted_range):
    # Checks if a file contains data in the specified time range (targeted_range)
    date_format = "%Y-%m-%dT%H:%M:%SZ"
    targeted_ini = datetime.datetime.strptime(targeted_range.split('/')[0], date_format)
    targeted_end = datetime.datetime.strptime(targeted_range.split('/')[1], date_format)
    time_start = datetime.datetime.strptime(row['time_coverage_start'],date_format)
    time_end = datetime.datetime.strptime(row['time_coverage_end'],date_format)
    Range = namedtuple('Range', ['start', 'end'])
    r1 = Range(start=targeted_ini, end=targeted_end)
    r2 = Range(start=time_start, end=time_end)
    latest_start = max(r1.start, r2.start)
    earliest_end = min(r1.end, r2.end)
    delta = (earliest_end - latest_start).days + 1
    overlap = max(0, delta)
    if overlap != 0:
        return True
    else:
        return False

<ul><li id="SpatialOverlap"><i>Spatial-Overlap</i> function definition</li></ul>

In [None]:
def spatialOverlap(row, targeted_bbox):
    # Checks if a file contains data in the specified area (targeted_bbox)
    geospatial_lat_min = float(row['geospatial_lat_min'])
    geospatial_lat_max = float(row['geospatial_lat_max'])
    geospatial_lon_min = float(row['geospatial_lon_min'])
    geospatial_lon_max = float(row['geospatial_lon_max'])
    targeted_bounding_box = box(targeted_bbox[0], targeted_bbox[1],targeted_bbox[2], targeted_bbox[3])
    bounding_box = box(geospatial_lon_min, geospatial_lat_min,geospatial_lon_max, geospatial_lat_max)
    if targeted_bounding_box.intersects(bounding_box):  # check other rules on https://shapely.readthedocs.io/en/stable/manual.html
        return True
    else:
        return False

<ul><li id="LocationInRange"><i>LocationInRange</i> function definition</li></ul>

In [None]:
def lastLocationInRange(row, targeted_bbox):
    # Checks if a file has been produced by a platform whose last position is within the specified area (targeted_bbox)
    geospatial_lat = float(row['last_latitude_observation'])
    geospatial_lon = float(row['last_longitude_observation'])
    targeted_bounding_box = box(targeted_bbox[0], targeted_bbox[1],targeted_bbox[2], targeted_bbox[3])
    location = Point(geospatial_lon, geospatial_lat)
    if targeted_bounding_box.contains(location):#check other rules on https://shapely.readthedocs.io/en/stable/manual.html
        return True
    else:
        return False

## Getting started

For the subsetting to be carried out we need the most recent version of the index files. In `/data/index_files` we have provided a copy but, **if you are running this notebook later than April 2020** please:<br>

1. Download again the index files. `Run the next cell` to dicover the links to download the index files:

In [None]:
getIndexFileLinks(usr,pas,dataset)

2. Upload to the `/data/index_files` folder the downloaded files. <br> `Run the next cell` to do it from this very same notebook currently opened:

In [None]:
IPython.display.IFrame('data/index_files', width='100%', height=350)

Let's load now the info contained in such files by `running the next cell!`:

In [None]:
info = getIndexFilesInfo(usr, pas, dataset)

`Run now the next cell` to see the information just loaded above:

In [None]:
info

One of the most important fields in the above info is the `file_name`. This filed contains the full path to the file in the FTP server, saving users from having to know the actual FTP structure. As the above preview does not render such field completely, let's see the first file full path: `Run he next cell`

In [None]:
info['file_name'].tolist()[0]

Copy the path into the browser....the file will download straightaway!<br>
Next we will aim:
<ul><li>the `file_name` field (file path) for downloading operations</li>
    <li> the other fileds (file metadata) for the subsetting operations</li>
</ul>

## Operations

### Subsetting

Prior to <i>download</i> we need to select only those netCDF files that are of our interest.<br>There are many different criterias (if they contain a certain parameter, if they have covered a certain area, if they have data in a specific time range....).<br>Next we will see some examples...

#### By collection

This restrict all available files to just one specific collection.

<div class="alert alert-block alert-success">
<b>IMPORTANT</b>

***
Please remember available options:
<ul>
    <li><i>latest</i>: daily files from platforms (last 30 days of data)</li>
    <li><i>monthly</i>: monthly files from platforms (last 5 years of data)</li>
    <li><i>history</i>: one file per platform (all platform data)</li>
</ul>

`Set one collecion` next and `run the cells`:

In [None]:
targeted_collection = 'history' #try 'latest', 'monthly' or 'history'

In [None]:
condition = info['file_name'].str.contains(targeted_collection)
subset = info[condition]
subset

<div class="alert alert-block alert-info"">
<b>TIP</b>
    
***  
To check if the above output only list the files coming from a certain collection (the one set as targeted_collection') just check the `netcdf` column
<ul><li>'latest' timestamp: netcdf file name ends with a YYYYMMDD</li></ul>
<ul><li>'monthly' timestamp: netcdf file name ends with a YYYYMM</li></ul>
<ul><li>'history' timestamp: netcdf file name ends with a YYYY or none timestamp</li></ul>

#### By time range

Let's select only the files containing data within the next range of dates. <br>Please `set the start/end dates` you are interested in `and run the cells bellow`:

In [None]:
targeted_range = '2018-01-01T00:00:00Z/2019-01-01T23:59:59Z' #set your own!

In [None]:
info['timeOverlap'] = info.apply(timeOverlap,targeted_range=targeted_range,axis=1)
condition = info['timeOverlap'] == True
subset = info[condition]
subset

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the above output only list the files containing data within a certain time range (the one set as 'targeted_range') just check `time_coverage_start` and `time_coverage_end` columns.

####  By bounding-box

Let's look for files containing data from a specific area.<br>`Please set your own area limits` next `and run the cell`:

In [None]:
targeted_geospatial_lat_min = 43.0  # enter min latitude of your bounding box
targeted_geospatial_lat_max = 45.0  # enter max latitude of your bounding box
targeted_geospatial_lon_min = 28.0  # enter min longitude of your bounding box
targeted_geospatial_lon_max = 30.0  # enter max longitude of your bounding box
targeted_bbox = [targeted_geospatial_lon_min, targeted_geospatial_lat_min, targeted_geospatial_lon_max, targeted_geospatial_lat_max]  # (minx, miny, maxx, maxy)

Let's see the area you have set before: `run the next cell`

In [None]:
m = folium.Map(location=[39.0, 0], zoom_start=4)
upper_left = [targeted_geospatial_lat_max, targeted_geospatial_lon_min]
upper_right = [targeted_geospatial_lat_max, targeted_geospatial_lon_max]
lower_right = [targeted_geospatial_lat_min, targeted_geospatial_lon_max]
lower_left = [targeted_geospatial_lat_min, targeted_geospatial_lon_min]
edges_ = [upper_left, upper_right, lower_right, lower_left]
m.add_child(folium.vector_layers.Polygon(locations=edges_))
#Zooming closer
m.fit_bounds(edges_, max_zoom=5)
m

<div class="alert alert-block alert-warning">
<b>WARNING</b>
    
***  
If you do not see any map when running the next cell please change your navigator (try chrome!).

`Run the next cell` to obtain the subset of files with data in such area:

In [None]:
info['spatialOverlap'] = info.apply(spatialOverlap,targeted_bbox=targeted_bbox,axis=1)
condition = info['spatialOverlap'] == True
subset = info[condition]
subset

Let's check now if the bbox of the above files truely overlaps with the targeted area!: `run the next cells`

In [None]:
numberOfFiles = 800 #we will check just a sample of files not all

In [None]:
m = folium.Map(location=[39.0, 0], zoom_start=6)
m.add_child(folium.vector_layers.Polygon(locations=edges_))
for platform, files in subset[:numberOfFiles].groupby(['platform_code', 'data_type']):
    color = "%06x" % random.randint(0, 0xFFFFFF)
    for i in range(0, len(files)):
        netcdf = files.iloc[i]['file_name'].split('/')[-1]
        upper_left = [
            files.iloc[i]['geospatial_lat_max'],
            files.iloc[i]['geospatial_lon_min']
        ]
        upper_right = [
            files.iloc[i]['geospatial_lat_max'],
            files.iloc[i]['geospatial_lon_max']
        ]
        lower_right = [
            files.iloc[i]['geospatial_lat_min'],
            files.iloc[i]['geospatial_lon_max']
        ]
        lower_left = [
            files.iloc[i]['geospatial_lat_min'],
            files.iloc[i]['geospatial_lon_min']
        ]
        edges = [upper_left, upper_right, lower_right, lower_left]
        popup_info = '<b>netcdf</b>: ' + files.iloc[i]['netcdf']
        m.add_child(folium.vector_layers.Polygon(locations=edges,color='#' + color,popup=(folium.Popup(popup_info))))
        m.fit_bounds(edges, max_zoom=6)
m

<div class="alert alert-block alert-warning">
<b>WARNING</b>
    
***  
If you do not see any map when running the next cell please change your navigator (try chrome!).

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if everything went well, just check if the files bbox is indeed at some point within the one you were interested in (targeted bbox)

<div class="alert alert-block alert-success">
<b>IMPORTANT</b>
    
***  
If you are not satisfied with the resulting output (i.e you want to get only the files whose bbox is completely within the targeted area) please revisit the [definition of the SpatialOverlap](#SpatialOverlap) function and replace the rule applied (`intsersecs`) by any of the avilable ones [here](https://shapely.readthedocs.io/en/stable/manual.html)

#### By last position

Let's look for files produced by a platform whose last position is within a specific area.<br>`Please set your own area limits` next `and run the cell`:

In [None]:
targeted_geospatial_lat_min = 45.0  # enter min latitude of your bounding box
targeted_geospatial_lat_max = 48.0  # enter max latitude of your bounding box
targeted_geospatial_lon_min = 34.5  # enter min longitude of your bounding box
targeted_geospatial_lon_max = 40.0  # enter max longitude of your bounding box
targeted_bbox = [targeted_geospatial_lon_min, targeted_geospatial_lat_min, targeted_geospatial_lon_max, targeted_geospatial_lat_max]  # (minx, miny, maxx, maxy)

Let's see the area you have set before: `run the next cell`

In [None]:
m = folium.Map(location=[39.0, 0], zoom_start=4)
upper_left = [targeted_geospatial_lat_max, targeted_geospatial_lon_min]
upper_right = [targeted_geospatial_lat_max, targeted_geospatial_lon_max]
lower_right = [targeted_geospatial_lat_min, targeted_geospatial_lon_max]
lower_left = [targeted_geospatial_lat_min, targeted_geospatial_lon_min]
edges_ = [upper_left, upper_right, lower_right, lower_left]
m.add_child(folium.vector_layers.Polygon(locations=edges_))
#Zooming closer
m.fit_bounds(edges_, max_zoom=5)
m

<div class="alert alert-block alert-warning">
<b>WARNING</b>
    
***  
If you do not see any map when running the next cell please change your navigator (try chrome!).

`Run the next cell` to obtain the subset of files with data in such area:

In [None]:
info['lastLocationInRange'] = info.apply(lastLocationInRange,targeted_bbox=targeted_bbox,axis=1)
condition = info['lastLocationInRange'] == True
subset = info[condition]
subset

Let's check now if the bbox of the above files truely overlaps with the targeted area!: `run the next cells`

In [None]:
numberOfFiles = 200 #we will check just a sample of files not all

In [None]:
m = folium.Map(location=[39.3, 0], zoom_start=5)
m.add_child(folium.vector_layers.Polygon(locations=edges_))
for platform, files in subset[:numberOfFiles].groupby(['platform_code', 'data_type']):
    color = "%06x" % random.randint(0, 0xFFFFFF)
    for i in range(0, len(files)):
        #Last reported position to map as marker
        m.add_child(folium.Marker([files.iloc[i]['last_latitude_observation'], files.iloc[i]['last_longitude_observation']], popup=files.iloc[i]['platform_code']+' last position' ))
#Zooming closer
m.fit_bounds(edges, max_zoom=5)
m

<div class="alert alert-block alert-warning">
<b>WARNING</b>
    
***  
If you do not see any map when running the next cell please change your navigator (try chrome!).

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if everything went well, just check if the markers (last platform position) is indeed within the one you were interested in (targeted bbox)

<div class="alert alert-block alert-success">
<b>IMPORTANT</b>
    
***  
If you are not satisfied with the resulting output please revisit the [definition of the lastLocationInRange](#LocationInRange) function and replace the rule applied (`contains`) by any of the avilable ones [here](https://shapely.readthedocs.io/en/stable/manual.html)

#### By data type

Let's look for files produced by a certain data type.<br>`Please set a data type` next `and run the cell`:

In [None]:
targeted_data_type = 'PF' # try others: TG for Tide Gauges, PF for profilers etc =>Product User Manual: https://archimer.ifremer.fr/doc/00324/43494/

`Run the next cell` to obtain just the files reported by such data type:

In [None]:
condition = info['data_type'] == targeted_data_type
subset = info[condition]
subset

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the subset only contains the aimed data type (targeted_data_type) just check the `data_type` column of the above output.

#### By file type

Let's look for certain types of files.<br>`Please set a file type` next `and run the cell`:

In [None]:
targeted_file_type = 'PR' # try others: TS for Time Series...=>Product User Manual: https://archimer.ifremer.fr/doc/00324/43494/

`Run the next cell` to obtain just the above type of files:

In [None]:
condition = info['file_type'] == targeted_file_type
subset = info[condition]
subset

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the subset matches the aimed file type (targeted_file_type) just check the `file_type` column of the above output.

#### By parameter

Let's look for files containing a certain parameter.<br>`Please set a parameter code` next `and run the cell`:

In [None]:
targeted_parameter = 'PSAL' #try others: TEMP, SLEV etc => In Situ parameter list: https://archimer.ifremer.fr/doc/00422/53381/

Run the next cell to obtain just the files reporting such parameter:

In [None]:
condition = info['parameters'].str.contains(targeted_parameter)
subset = info[condition]
subset

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the subset only contains the aimed parameter (targeted_parameter) just check the `parameters` column of the above output.

#### By platform code

Let's look for files produced by a certain platform.<br>`Please set a platform code` next `and run the cell`:

In [None]:
targeted_platform_code = 'Constanta'

Run the next cell to obtain just the files reported by such platform:

In [None]:
condition = info['platform_code']==targeted_platform_code
subset = info[condition]
subset

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the subset matches the aimed platform (targeted_platform_code) just check the `platform_code` column of the above output.

#### By provider

Let's look for files produced by a certain provider.<br>`Please set a provider code` next `and run the cell`:

In [None]:
targeted_provider_code = '850'

Run the next cell to obtain just the files reported by such platform:

In [None]:
info['institution_edmo_code'] = ' '+info['institution_edmo_code']+' '

In [None]:
condition = info['institution_edmo_code'].str.contains(targeted_provider_code, na=False)
subset = info[condition]
subset

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the subset matches the aimed provider (targeted_provider_code) just check the `insitution_edmo_code` column of the above output.

#### Subsetting by several criterias at once

Set the collection of interest, range of time and bbox to find only the files that matches:

In [None]:
targeted_collection = 'history'
targeted_range = '2019-01-01T00:00:00Z/2019-12-01T23:59:59Z'
targeted_data_type = 'TS'
targeted_geospatial_lat_min = 40.0  # enter min latitude of your bounding box
targeted_geospatial_lat_max = 48.0  # enter max latitude of your bounding box
targeted_geospatial_lon_min = 28.0  # enter min longitude of your bounding box
targeted_geospatial_lon_max = 43.0  # enter max longitude of your bounding box
targeted_bbox = [targeted_geospatial_lon_min, targeted_geospatial_lat_min, targeted_geospatial_lon_max, targeted_geospatial_lat_max]  # (minx, miny, maxx, maxy)

Run the next cell to apply yhe above filters:

In [None]:
info['timeOverlap'] = info.apply(timeOverlap,targeted_range=targeted_range,axis=1)
condition1 = info['timeOverlap'] == True
info['spatialOverlap'] = info.apply(spatialOverlap,targeted_bbox=targeted_bbox,axis=1)
condition2 = info['spatialOverlap'] == True

condition3 = info['data_type'] == targeted_data_type
condition4 = info['file_name'].str.contains(targeted_collection)
subset = info[condition1 & condition2 & condition3 & condition4]
subset

### Exporting

If you just want to export the above table containing ftp link to download each file in subset and some more metadata as excel for sharing it (a lot more compact than sharing the actual files), just run the next cell:

In [None]:
subset.to_excel('subsetOffiles.xlsx')

### Downloading

After you have created your own subset (see above examples about how-to), we will loop over the files in such subset and download each of them from the FTP server thanks to the `file_name` column, field that contains the ftp link to the file.<br> Run the next cells:

In [None]:
output_directory = os.getcwd()# Defaults to the current working directory; change it as you please

In [None]:
with ftputil.FTPHost(dataset['host'], usr, pas) as ftp_host:  # connect to CMEMS FTP
    for i in range(0, len(subset)):
        filepath = subset.iloc[i]['file_name'].split(dataset['host'])[1]
        ncdf_file_name = filepath.split('/')[-1]
        if ftp_host.path.isfile(filepath):
            print('.....Downloading ' + ncdf_file_name)
            cwd = os.getcwd()
            os.chdir(output_directory)
            try:
                ftp_host.download(filepath, ncdf_file_name)  # remote, local
                print('Done!')
            except Exception as e:
                print('error: FTP download is forbidden in the remote server...')
            os.chdir(cwd)

<div class="alert alert-block alert-info">
<b>TIP</b>
    
***  
To check if the files have been donwloaded just check if you find them in the directory specified as <i>output directory</i>.

<div class="alert alert-block alert-warning">
<b>WARNING</b>
 
***  
FTP download is forbidden in the remote server for security reasons. <br>Please run this jupyter notebook locally for being able to actually download the files. Steps:
1. [Install anaconda](https://www.anaconda.com/distribution/): according to your OS (Windows,Linux,Mac...)
2. Run the following command at the Terminal (Mac/Linux) or Command Prompt (Windows): jupyter notebook

---

# Wrap-up

So far you should already know how to subset the product/dataset by several subsetting criterias as well as exporting and downloading the resulting subset of files.<br> `If you don't please ask us! it is the moment!`
<br>In the next tutorial we will see how to open and visualize some of the files donloaded. Ready? Let's go!

---

# Next Tutorial

_Click on one of the hyperlinks below to continue the training_
<br>
[**13-03-NearRealTime-product-managing-files-moorings.ipynb**](13-03-NearRealTime-product-managing-files-moorings.ipynb)<br>
[**13-04-NearRealTime-product-managing-files-profilers.ipynb**](13-04-NearRealTime-product-managing-files-profilers.ipynb)<br>
[**13-05-NearRealTime-product-managing-files-thermosal.ipynb**](13-05-NearRealTime-product-managing-files-thermosal.ipynb)<br>