(accessibility:page)=
# Find and Access Data

<br />

```{contents}
```

<br />

There are several ways to find and access the SAR datasets. These datasets contain the calibrated geophysical range Doppler frequency shift retrievals from the ENVISAT ASAR wide-swath acqusitions obtained between 2002 and 2012. In the following a description of some selected methods for finding and extracting these datasets are presented.

<br />
<br />

The code snippet below is just included to set up and activate a virtual environment for which the necessary packages are downloaded. These are listed in the "requirements.txt" folder.

The purpose of this code-snippet is to make the manual more user friendly. This should ensure that each user can run these notebooks without further ado.

In [1]:
%%capture
!python3 -m venv .venv      # sets up a virtual environment
!source .venv/bin/activate  # Activates said environment

# Installing the required packages to current environment
!pip install -r ../requirements.txt

'''
The "%%capture" command ensures that the outputs 
(in this case directories) are not displayed.
'''

<br />
<br />

(data:met:no)=
## Find Data Through data.met.no

By using data.met.no it is possible to both find and visualise datasets. The web search interface can be accessed from the "Data Catalog" menu item, or directly at https://data.met.no/metsis/search. As seen below the search interface consists of a map and a series of filters.

<br />

NB! The fact that the image below showcase a staging site can be ignored. The Data Catalog of https://data.met.no/metsis/search (MET Norway staging site) have the same functionalities.

<a id="data.met.no-search-image"></a>
![Data Catalog Overview](../images/DataMetNo_Data_Catalog_Overview_image.png)

<br />
<br />

The map provides a pagination of available datasets in the metadata catalog [max/min longitude/latitude rectangle], sorted to showcase the latest additions first. One can also interact with the map to better diplay the results, and to perform data search.

* "Select Projection" located just above the map can be altered to change the map projection. "Spatial filter" can be set to both "Within" and "Intersects".
* The "Create bounding box"-button enables to set a bounding box directly on the map and works as a filter on the results.
* The "Reset Search"-button clears the filters and starts a new search.
* The "Reset Map"-button resets the map.

Map widgets allows direct interaction with the map:

* +/-:                     Zoom in/out.
* E:                       Zooms to the extent of the displayed datasets.
* Menu tag:                Opens side panel where WMS Layers, Features and Base Layers can be altered.
* Magnifying glass:        Enables searching for location names.
* '>>':                    Showing the location in an overview world map.
* Upper right hand widget: Full screen mode

Search filters can also be used to find the desired datasets. The results are updated dynamically when filters are selected. These allows:

* A full text search block where the options "Contains all of these words" and "Contains any of these words" are eligible.
* Start and end date of the desired datasets.
* An option named "Has children" which can be ticked to determine whether datasets are parents with children (i.e. records of the same type).
* The desired sorting mechanism (Last metadata update, End date, Start date, Last indexed).
* Isotopic categories: The general subjects for which the geospatial data may be relevant, as defined by the [ISO](https://www.iso.org/standards.html) standard.
* Keywords: Keywords from a controlled vocabulary.
* Activity type: The nature of the dataset(s) generation process (Numerical Simulation, Climate Indicator, In Situ Land-based station, Space Borne Instrument).
* Project: Datasets related to a certain project.

By clicking the "Reset"-button all filters are removed and a new search can be initiated.

<br />
<br />




(Visualise:Data:through:datametno)=
## Visualise Data through data.met.no

To further visualise the data, simply click on "Child data.." under "Data operations / access:" under one of the parent datasets that correspond to the search criteria (as seen towards the bottom of the image <a href="#data.met.no-search-image">above</a>). In this example there is just this one parent dataset that correspond to the search criteria.

<br />

The list of children within this parent dataset are then listed. The bounding boxes corresponding to the different datasets will now be visible on the map. By clicking on a bouding box, the corresponding dataset are also singled out on top of said list of datasets. 

<a id="data.met.no-search-among-children-image"></a>
![Data Catalog Overview Children in Parent](../images/DataMetNo_Data_Catalog_List_of_Children_in_Parent.png)

<br />

Changing the projection of the map might also make it easier to visualise the extent and location of the different boundary boxes:

<a id="data.met.no-search-among-children-UPS-North-image"></a>
![Data Catalog Overview Children in Parent UPS North](../images/DataMetNo_Data_Catalog_List_of_Children_in_Parent_UPS_North.png)

<br />

To visualise the data simply click on the "Visualise" option under "Data operations / access:" on a desired dataset. This option is visibly available at the bottom of the two images above. A close up area in question will then show up. Two clickable image examples are shown below (click on these to visualize the datasets on data.met.no yourself):

<a id="data.met.no-zoom-in-data_gulf_stream"></a>
[![Data Catalog Overview Zoom In Gulf Stream](../images/DataMetNo_Data_Catalog_Visualise_Zoom_In_Gulf_Stream.png)](https://data-staging.met.no/metsis/map/wms?dataset=no-met-staging-f364202b-98b0-45c4-97c3-b37e64f20e64)

<br />

<a id="data.met.no-zoom-in-data_agulhas_current"></a>
[![Data Catalog Overview Zoom In Agulhas Current](../images/DataMetNo_Data_Catalog_Visualise_Zoom_In_Agulhas_Current.png)](https://data-staging.met.no/metsis/map/wms?dataset=no-met-staging-25d48dc4-0387-4c12-9a7c-5077b2880d04)

<br />

To visualise one specific variable of the dataset, e.g. the geophysical doppler, click on the menu tab up towards the top left hand corner. Choose "raster" as the WMS style, and select your desired variable. Below the gephysical doppler is shown for the two examples above. The upper one shows an image from the Gulf Stream, and the lower one shows a part of the Agulhas Current close to the Cape of Good Hope off the coast of South Africa:

<a id="data.met.no-visualise-gephysical-doppler-data-gulf-stream"></a>
![Data Catalog Overview Visualise Geophysical Doppler Gulf Stream](../images/DataMetNo_Data_Catalog_Visualise_Geophysical_Doppler_Gulf_Stream.png)

<br />

<a id="data.met.no-visualise-gephysical-doppler-data-agulhas-current"></a>
![Data Catalog Overview Visualise Doppler Agulhas Current](../images/DataMetNo_Data_Catalog_Visualise_Geophysical_Doppler_Agulhas_Current.png)

NB! Bare in mind that it might tak a bit of time for the visualisation tool to finish the visualisation... 

<br />
<br />


## Access datasets through data.met.no

Above the visualisation tool of data.met.no is showcased, but there are also several other options to get information and access the different datasets available at data.met.no. Taking a step back and looking at a random dataset these options are visible:


<br />

<a id="data.met.no-random-dataset"></a>
![Data Catalog Random Dataset](../images/DataMetNo_Data_Catalog_Random_Dataset.png)

<br />



In addition to the visualisation option (Visualise) there is also possible to:

* "Download data" (this is equivalent as clicking on a HTTPServer link) to download datasets locally.

<br />

* Access "OPeNDAP". Here the OPeNDAP-link (see [OPeNDAP](opendap) explanation below) is available, along with extensive metadata information.

<br />

* Show extended metadata. By clicking this option a right hand side menu of metadata information will show up. In this menu there is also possible to find the other options available under Data operations / access.

<br />

* Export Metadata can also be used to export metadata. When clicking this option it is possible to choose between DIF, Inspire and ISO-Norge-Inspire.

<br />

As an alternative to accessing the extended metadata, it is also possible to visit the dataset landing page. This is simply done by clickig on "Dataset Landing Page" or on the dataset title itself. The dataset landing page contains every bit of information and accessibility options that the "fast track" options above offers.

<br />

<br />
<br />

## Access datasets through thredds.met.no

All data is set to be freely available and some of it can be found in the MET Norway thredds catalog: https://thredds.met.no/thredds/catalog.html.

![Thredds Dataset Overview](../images/Thredds_Dataset_Overview_image_cropped.png)

The ENVISAT ASAR datasets are located at: https://thredds.met.no/thredds/catalog/remotesensingenvisat/asar-doppler/catalog.html

Or just following this folder structure: Observations/Remotesensing_archive/ENVISAT_ASAR_Doppler:

![ENVISAT ASAR Doppler Overview](../images/ENVISAT_ASAR_Doppler_Overview_cropped.png)

Entering the subfolder, each individual netCDF-file is found under separate pathways depending on their respective dates. Wanting to access the files for a specific date, the datasets are listed with the following structure: YEAR/MONTH/DAY

Underneath the path to 2012/01/27 is shown:

![ASAR 2012 overview](../images/ASAR_2012_overview_cropped.png)

![ASAR 2012/01 overview](../images/ASAR_2012_01_overview_cropped.png)

![ASAR 2012/01/27 overview](../images/ASAR_2012_01_27_overview_cropped.png)

The entire list of files from the specified date are then accessible (the list goes on).

Upon accessing a specific netCDF-file four different "Access"-options are available. These are "OPENDAP", "HTTPServer", "WCS" and "WMS". 

![ASAR 2012-01-27 netCDF overview.png](../images/ASAR_2012_01_27_netCDF_overview.png)

In the following the use of "OPENDAP" is explained closer. This is an easy and efficient way of accessing data. In the examples below the netCDF file "ASA_WSDV2PRNMI20120127_215005_000614583111_00101_51839_0000.nc" (the uppermost file under 2012/01/27) is used as an example.

<br />
<br />

(how:to:open:datasets)=
## How to Open Datasets

(opendap)=
### OPENDAP - Using xarray:

The data is easily accessed through OPENDAP by the use of the xarray python package. Below is a an example on how to use xarray to open and investigate a desired dataset.
This procedure makes it easy to inspect the Dimensions, Coordinates, Data Variables, Indexes and Attributes of the dataset in question. 

In [2]:
# Import the required package: xarray
import xarray as xr

''' The backslashes serves as line shifts '''

# Providing the OPENDAP-url
OPENDAP_url = '''https://thredds.met.no/thredds/dodsC\
/remotesensingenvisat/asar-doppler/2012/01/27/\
ASA_WSDV2PRNMI20120127_215005_000612433111_00101\
_51839_0000.nc'''

# Using xarray to open the dataset using the OPENDAP-url
ds = xr.open_dataset(OPENDAP_url)

# Investigating the metadata as an xarray.Dataset 
ds

### HTTPServer (Download) - Using xarray:

As an alternative to using the OPENDAP-link, the dataset can also be downloaded. On data.met.no this is done by clicking the download option on each spesific dataset, or by clicking the HTTPServer link on the landing page of each dataset. The latter is also the case when wanting to download the data from thredds.met.no (click on the HTTPServer link). Using the path to the downloaded dataset it is possible to proceed just as in the above example. NB! The downloaded dataset in question is placed in the current notebooks folder. 

In [3]:
# Import the required package: xarray
import xarray as xr

''' The backslashes serves as line shifts '''

# Providing the path to the downloaded dataset
OPENDAP_url = '''ASA_WSDV2PRNMI20120127\
_215005_000612433111\
_00101_51839_0000.nc'''

# Using xarray to open the dataset using the OPENDAP-url
ds = xr.open_dataset(OPENDAP_url)

# Investigating the metadata as an xarray.Dataset 
ds

<br />
<br />

## Find Data through CSW (Catalog Service on the Web)

Data can also be found through CSW (Catalog Service on the Web). An efficient and practiacl function to extract data which satisfes certain conditions can be found here https://github.com/metno/esa-coscaw-data-search. An example on how to import the required function from its folder, and how to use it is included below. The SearchCSW funtion takes the following arguments:

<br />
<br />

* time - This is a specific datetime.datetime to set as a starting point. Default is "now" ( time = datetime.datetime.now(timezone("utc")) ), i.e. the time at each individual execution of the function.

<br />

* dt - The time intervall to search within. Depending on what "time" is selected, the search will higlight datasets which spans from (time - dt) and up to (time + dt). Default is dt = 24.

<br />

* text - A certain part of the dataset title to be served as a string. Default is text = None.

<br />

* boundary_box - Just what it sounds like; a geographically bounded box for which the desired datasets only need to intersect. It is structured as follows: [Westernmost Longitude, Southernmost Latitude, Easternmost Longitude, Northernmost Latitude]. Values are in degrees east and degrees north. Default spans the entire globe [-180, -90, 180, 90]. 

<br />

* endpoint - The endpoint for which to search through.



NB! There are provided two endpoints below. Before the SAR data is made publicly available at https://data.csw.met.no, the staging site https://csw.s-enda-staging.k8s.met.no is used. The latter is however only accessible to MET Norway employees. Others will have to switch to the publicly available endpoint (https://data.csw.met.no).

In [4]:
from fadg.find_and_collocate import SearchCSW
from datetime import datetime, timedelta

############ Time and dt ############

time_str = '2012-02-14 00:00:00' 
''' Valid datetime string for the SearchCSW function.
    Default is the time right now; now = datetime.now() '''

time = datetime.strptime(time_str, '%Y-%m-%d %H:%M:%S')

dt = 24        # dt : float (default 24)
               # Total time interval in hours before and after the given time 
               # (dt is centered around the selected time).

print(f'Finding data within the timespan of:')
print('\n')
print(f'    {time - timedelta(hours=dt)}')
print(f'             and         ') 
print(f'    {time + timedelta(hours=dt)}')
print('\n')


############ Text ############

Text = "Doppler" 
''' This text string needs to be part of 
    the title of the files to be found.'''

print(f'Finding data with titles containing: "{Text}".')
print('\n')

############ bbox ############

Boundary_Box = [34.9, 80.9, 35.1, 81]   
''' This boundary box only have to be intersected by
    the geographical extent of the desired datasets.
    Default : [-180, -90, 180, 90] '''

print(f'Finding data intersected by this specified boundary box:')
print('\n')
print(f'             {Boundary_Box}')
print('\n')

############ endpoint ############

# Endpoint = "https://data.csw.met.no"      # The endpoint to use 
                                            # when data is made 
                                            # publicly available


Endpoint = "https://csw.s-enda-staging.k8s.met.no"     
                                            # Endpoint used in the 
                                            # original version 
                                            # - only accessible 
                                            # internally at MET 
                                            # Norway
''' The site at which the data is located '''

print(f"Searching for data with endpoint set to:")
print('\n')
print(f'      {Endpoint}     ')
print('\n')

############ Finding the Corresponding datasets ############

sar = SearchCSW(time = time, 
                dt = dt, 
                text = Text, 
                bbox = Boundary_Box, 
                endpoint = Endpoint)




############## How many files are found ####################
if len(sar.urls) == 0:
    print('No data match the chosen credentials...')
elif len(sar.urls) == 1:
    print(f'''
There is {len(sar.urls)} file which match the chosen credentials!''')
else:
    print(f'''
There are {len(sar.urls)} files which match the chosen credentials!''')

print('\n')



############### Provide the found URLs ######################
sar.urls.sort()  # Sorts the list of files
print('''These are the Opendap-URLs of the datasets 
which match the chosen credentials:''')
sar.urls

Finding data within the timespan of:


    2012-02-13 00:00:00
             and         
    2012-02-15 00:00:00


Finding data with titles containing: "Doppler".


Finding data intersected by this specified boundary box:


             [34.9, 80.9, 35.1, 81]


Searching for data with endpoint set to:


      https://csw.s-enda-staging.k8s.met.no     




ServiceException: <html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx</center>
</body>
</html>



<br />
<br />

### Get Parent Datasets and their Children (or Dataset Series in ISO 19115) with OGC CSW 

- Change identifier when no longer on staging site - 5 ALTERSATIONS REQUIRED! 
- Change endpoint (in all links) when data is available on data.met.no: https://csw.s-enda-staging.k8s.met.no --> https://data.csw.met.no.

MET Norway organises datasets in parent-child relationships. A parent can be a set of [Calibrated geophysical ENVISAT ASAR wide-swath range Doppler frequency shift retrievals](https://csw.s-enda-staging.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&q=ASAR), where the hyperlinklink provides the OGC CSW result of a search for "ASAR".

The same search but with results provided in ISO format: https://csw.s-enda-staging.k8s.met.no/csw?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&RESULTTYPE=results&TYPENAMES=csw:Record&ElementSetName=full&q=ASAR&outputschema=http://www.isotc211.org/2005/gmd.

Here, a field gmd:parentIdentifier provides the metadata identification of the parent dataset, i.e., no.met.staging:e19b9c36-a9dc-4e13-8827-c998b9045b54.
CHANGE HERE

Note: If this document is opened as a PDF, all the links below will be incomplete. To see full links below open the document as a HTML or a jupyter notebook.

Get the parent dataset:

     https://csw.s-enda-staging.k8s.met.no/csw?service=CSW&version=2.0.2&request=GetRepositoryItem&id=no.met.staging:e19b9c36-a9dc-4e13-8827-c998b9045b54
     CHANGE HERE

Get all its children:

     https://csw.s-enda-staging.k8s.met.no/csw?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&RESULTTYPE=results&TYPENAMES=csw:Record&ElementSetName=full&outputFormat=application%2Fxml&outputschema=http://www.isotc211.org/2005/gmd&CONSTRAINTLANGUAGE=CQL_TEXT&CONSTRAINT=apiso:ParentIdentifier%20like%20%27no.met.staging:e19b9c36-a9dc-4e13-8827-c998b9045b54. 
     CHANGE HERE

To find all parent datasets:

     https://csw.s-enda-staging.k8s.met.no/csw?SERVICE=CSW&VERSION=2.0.2&REQUEST=GetRecords&RESULTTYPE=results&TYPENAMES=csw:Record&ElementSetName=full&outputschema=http://www.isotc211.org/2005/gmd&CONSTRAINTLANGUAGE=CQL_TEXT&CONSTRAINT=dc:type%20like%20%27series%27.


<br />
<br />



### Find Data with OpenSearch 

- Need to change the endpoint of all links below: https://csw.s-enda-staging.k8s.met.no... --> https://data.csw.met.no...

[OpenSearch](https://en.wikipedia.org/wiki/OpenSearch) is a way for websites and search engines to publish search results in a standard and accessible format.

To find all datasets in the catalogue (Note: To see full links below open the page as a HTML or a jupyter notebook):

    https://csw.s-enda-staging.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results

<br />

Or datasets within a given time span (for instance: from 2012-02-01 to 2012-02-05):

    https://csw.s-enda-staging.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&time=2012-02-01/2012-02-05

<br />

Or datasets within a geographical domain (defined as a box with parameters min_longitude, min_latitude, max_longitude, max_latitude - for instance [0, 70, 10, 80]):

    https://csw.s-enda-staging.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&bbox=0,70,10,80

<br />

Or datasets with "ENVISAT ASAR wide-swath range Doppler frequency shift" in the title:

    https://csw.s-enda-staging.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&q=ENVISAT\\%20ASAR\\%20wide-swath\\%20range\\%20Doppler\\%20frequency\\%20shift

<br />

Or datasets with all the three spesifications above:

    https://csw.s-enda-staging.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&time=2012-01-01/2012-03-01&bbox=0,70,10,80&q=ENVISAT\\%20ASAR\\%20wide-swath\\%20range\\%20Doppler\\%20frequency\\%20shift

<br />
<br />

### More Advanced Geographical Search with OGC CSW

PyCSW opensearch only supports geographical searches querying for a box. For more advanced geographical searches, one must write specific XML files. 

The XML-files listed below are also available in the current notebooks-folder. Also, they are visible in their entirety if this document is open as a HTML or as a jupyter notebook. 

Here are some examples:

* To find all datasets containing a point:

    * XML-file name: my_xml_request_containing_a_point.xml

    * Here the coordinates of the point is 59.0 degrees north and 4.0 degrees east.

```xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="10"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>full</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:Contains>
          <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
          <gml:Point>
            <gml:pos srsDimension="2">59.0 4.0</gml:pos>
          </gml:Point>
        </ogc:Contains>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>
```

<br />


* To find all datasets intersecting a polygon: 

    * XML-file name: my_xml_request_intersecting_a_polygon.xml

    * Here the polygon is [westernmost lon, southernmost lat, easternmost lon, northernmost lat] = [-5.00, -47.00, 20.00, 55.00].
      The first and last coupled coordinate is the same to close the polygon.

```xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="10"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>full</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:Intersects>
          <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
          <gml:Polygon>
            <gml:exterior>
              <gml:LinearRing>
                <gml:posList>
                  47.00 -5.00 55.00 -5.00 55.00 20.00 47.00 20.00 47.00 -5.00
                </gml:posList>
              </gml:LinearRing>
            </gml:exterior>
          </gml:Polygon>
        </ogc:Intersects>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>
```

<br />


* To find all datasets intersecting a polygon within a given time span:

    * XML-file name: my_xml_request_intersecting_a_polygon_within_a_given_time_span.xml

    * Here the polygon is [westernmost lon, southernmost lat, easternmost lon, northernmost lat] = [-10.00, 70.00, 10.00, 80.00].
      The first and last coupled coordinate is the same to close the polygon.

    * Here the start time is 2018-01-01 00:00.
    * Here the end tim is 2022-01-01 00:00.

```xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="100"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>summary</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:And>
          <ogc:Intersects>
            <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
            <gml:Polygon>
              <gml:exterior>
                <gml:LinearRing>
                  <gml:posList>
                    70.00 -10.00 80.00 -10.00 80.00 10.00 70.00 10.00 70.00 -10.00
                  </gml:posList>
                </gml:LinearRing>
              </gml:exterior>
            </gml:Polygon>
          </ogc:Intersects>
          <ogc:PropertyIsGreaterThanOrEqualTo>
            <ogc:PropertyName>apiso:TempExtent_begin</ogc:PropertyName>
            <ogc:Literal>2018-01-01 00:00</ogc:Literal>
          </ogc:PropertyIsGreaterThanOrEqualTo>
          <ogc:PropertyIsLessThanOrEqualTo>
            <ogc:PropertyName>apiso:TempExtent_end</ogc:PropertyName>
            <ogc:Literal>2022-01-01 00:00</ogc:Literal>
          </ogc:PropertyIsLessThanOrEqualTo>
        </ogc:And>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>
```

<br />

* To find all datasets intersecting a polygon within a given time span and with a certain text string:

    * XML-file name: my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml

    * Here the polygon is [westernmost lon, southernmost lat, easternmost lon, northernmost lat] = [-10.00, 70.00, 10.00, 80.00].
      The first and last coupled coordinate is the same to close the polygon.

    * Here the start time is 2012-02-01 00:00.
    * Here the end tim is 2012-02-03 00:00.

    * The recognizable string is "ENVISAT ASAR". 

```xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="100"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>summary</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:And>
          <ogc:Intersects>
            <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
            <gml:Polygon>
              <gml:exterior>
                <gml:LinearRing>
                  <gml:posList>
                    70.00 -10.00 80.00 -10.00 80.00 10.00 70.00 10.00 70.00 -10.00
                  </gml:posList>
                </gml:LinearRing>
              </gml:exterior>
            </gml:Polygon>
          </ogc:Intersects>
          <ogc:PropertyIsGreaterThanOrEqualTo>
            <ogc:PropertyName>apiso:TempExtent_begin</ogc:PropertyName>
            <ogc:Literal>2012-02-01 00:00</ogc:Literal>
          </ogc:PropertyIsGreaterThanOrEqualTo>
          <ogc:PropertyIsLessThanOrEqualTo>
            <ogc:PropertyName>apiso:TempExtent_end</ogc:PropertyName>
            <ogc:Literal>2012-02-03 00:00</ogc:Literal>
          </ogc:PropertyIsLessThanOrEqualTo>
          <ogc:PropertyIsLike wildCard="%" singleChar="_" escapeChar="\\">
            <ogc:PropertyName>dc:title</ogc:PropertyName>
            <ogc:Literal>%ENVISAT ASAR%</ogc:Literal>
          </ogc:PropertyIsLike>
        </ogc:And>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>
```

<br />

### Query CSW Endpoint by the Use of Python

* Then, you can query the CSW endpoint and print the response text using, e.g., python
(alter endpoint from 'https://csw.s-enda-staging.k8s.met.no' to https://data.csw.met.no):

In [30]:
import requests
import xarray as xr
import re
import sys

### Define the headers
headers = {'Content-Type': 'application/xml'}

### Specify the xml-file that should be used for the search 
  # - As mentioned all the XML-files listed above can be found 
  #   in the notebooks folder.

# my_xml_request = 'my_xml_request_containing_a_point.xml'

# my_xml_request = 'my_xml_request_intersecting_a_polygon.xml'

# my_xml_request = 'my_xml_request_intersecting_a_polygon_within_a_given_time_span.xml'

my_xml_request = 'my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml'

# Open and read the XML file
with open(my_xml_request, 'r') as file:
    xml_data = file.read()

### Send the POST request 


# response = requests.post('https://data.csw.met.no', 
#                          data=xml_data, 
#                          headers=headers)

response = requests.post('https://csw.s-enda-staging.k8s.met.no',
                          data=xml_data, 
                          headers=headers)

# The response text
print(response.text)
print('\n')


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- pycsw 2.7.dev0 -->
<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:ows="http://www.opengis.net/ows" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0.2" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd"><csw:SearchStatus timestamp="2025-02-19T11:29:36Z"/><csw:SearchResults numberOfRecordsMatched="19" numberOfRecordsReturned="10" nextRecord="11" recordSchema="http://www.opengis.net/cat/csw/2.0.2" elementSet="summary"><csw:SummaryRecord><dc:identifier>no.met.staging:2a03342d-006e-4e6c-a654-b9d7183a0ea1</dc:identifier><dc:title>Calibrated geophysical ENVISAT ASAR wide-swath range Doppler frequency shift retrievals 

### Extract the OPENDAP urls

Having recieved the response text, it is possible to extract the OPENDAP-urls. This can be read from the response text, but can also be easily extracted using the code snippet below: 

In [31]:
''' The pattern 'https.*?\.nc(?:ml)?' is
"https://thredds.met.no/thredds/dodsC/{regardless_of_what_is_in_between}.ncml" 
where the "ml" ending is inculded only if found. '''

### Opendap url format
my_pattern= r'https://thredds.met.no/thredds/dodsC/.*?\.nc(?:ml)?'


### findall() function returns all non-overlapping matches of 
  # my_pattern in string, as a list of strings
opendap_urls = re.findall(my_pattern, response.text)

# Sort the list of OPENDAP-urls by date and time
opendap_urls.sort()

# List of OPENDAP urls
print(f'List contains {len(opendap_urls)} urls:')
for url in opendap_urls:
    print(url)

# Check if there are any files - Statement if not
if len(opendap_urls) > 0:

    # Open the first dataset in the list of urls
    print('\n')
    print("Opening the first dataset with xarray:")
    ds = xr.open_dataset(opendap_urls[0])
    
else:
    ds = "No file(s) match the search criterias."


ds

List contains 10 urls:
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_101138_000615633111_00166_51904_0000.nc
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_114940_000603223111_00167_51905_0000.nc
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_115002_000623733111_00167_51905_0000.nc
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_115103_000623783111_00167_51905_0000.nc
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_115203_000633973111_00167_51905_0000.nc
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_115304_000623823111_00167_51905_0000.nc
https://thredds.met.no/thredds/dodsC/remotesensingenvisat/asar-doppler/2012/02/01/ASA_WSDH2PRNMI20120201_13

NOTE: There seems to be a server-side limit on the number of records returned in a single response, regardless of the "maxRecords" value in the resquest. 
It's not uncommon for servers to have such limits to prevent excessively large responses. Here this limit appears to be 10 records for every request.

<br />

To retrieve the rest of the records, you can make use of the startPosition attribute. By setting startPosition="11", you can retrieve the next set of records starting from the 11th record.

<br />

Here's how you would add it to one of the XML files listed above:

<br />

```xml
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<csw:GetRecords
    xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
    xmlns:gml="http://www.opengis.net/gml"
    xmlns:ogc="http://www.opengis.net/ogc"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    service="CSW"
    version="2.0.2"
    resultType="results"
    maxRecords="100"
    startPosition="11"
    outputFormat="application/xml"
    outputSchema="http://www.opengis.net/cat/csw/2.0.2"
    xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
  <!-- rest of the XML file -->
</csw:GetRecords>
```

<br />

This way, you can "paginate" through the records by making multiple requests and incrementing startPosition each time.

<br />
<br />



### Query CSW Endpoint by the Use of an HTTP POST (From the Terminal)

* Alternatively, one can also use an HTTP POST request to query to the PyCSW server directly from the terminal. The steps are as follows:

    1. Make sure that you have one of the listed XML-files above saved, or one that you have composed for your search.

    <br />
    
    2. Then, use curl (a command-line tool for making HTTP requests) to send a POST request to the PyCSW server. 
       An example might look like (alter endpoint from 'https://csw.s-enda-staging.k8s.met.no' to https://data.csw.met.no):

In [32]:
### The original bash commands:

'''
%%bash
# curl -X POST -H "Content-Type: application/xml" -d \
# @my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml \
# https://data.csw.met.no
'''

'''
curl -X POST -H "Content-Type: application/xml" -d \
@my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml \
https://csw.s-enda-staging.k8s.met.no
'''

### Using subprocess just to make the output readable when opened as a HTML or PDF:

import subprocess

# Define the curl command
curl_command = [
    "curl", "-X", "POST", "-H", "Content-Type: application/xml", "-d",
    "@my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml",
    "https://csw.s-enda-staging.k8s.met.no"
]

# Run the curl command and capture the output
result = subprocess.run(curl_command, capture_output=True, text=True)

# Print the output
print("Standard Output:\n")
print(result.stdout)
print("\nStandard Error:\n")
print(result.stderr)

Standard Output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- pycsw 2.7.dev0 -->
<csw:GetRecordsResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:ows="http://www.opengis.net/ows" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0.2" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd"><csw:SearchStatus timestamp="2025-02-19T11:29:36Z"/><csw:SearchResults numberOfRecordsMatched="19" numberOfRecordsReturned="10" nextRecord="11" recordSchema="http://www.opengis.net/cat/csw/2.0.2" elementSet="summary"><csw:SummaryRecord><dc:identifier>no.met.staging:2a03342d-006e-4e6c-a654-b9d7183a0ea1</dc:identifier><dc:title>Calibrated geophysical ENVISAT ASAR wide-swath range Doppler frequency

 In this example:
 * https://csw.s-enda-staging.k8s.met.no (https://data.csw.met.no) is the URL of the PyCSW server. 
 
 <br />
 
 * The -X POST option specifies that this is a POST request.
 
 <br />
 
 * The -H "Content-Type: application/xml" option sets the content type of the request to XML.
 
 <br />
 
 * The -d @my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml option attaches the contents of the querying XML file to the request.

 <br />

The server will respond with an XML document containing the search results. You can save this document to a file using the -o option with curl:

In [33]:
### The original bash commands:

'''
%%bash
# curl -X POST -H "Content-Type: application/xml" -d \
# @my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml \
# -o \
# query_results.xml https://data.csw.met.no
'''

'''
curl -X POST -H "Content-Type: application/xml" -d \
@my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml \
-o query_results.xml https://csw.s-enda-staging.k8s.met.no
'''

### Using subprocess just to make the output readable when opened as a HTML or PDF:

import subprocess

# Define the curl command
curl_command = [
    "curl", "-X", "POST", "-H", "Content-Type: application/xml", "-d",
    "@my_xml_request_intersecting_a_polygon_within_a_given_time_span_and_certain_text_str.xml",
    "-o", "query_results.xml",
    "https://csw.s-enda-staging.k8s.met.no"
]

# Run the curl command
result = subprocess.run(curl_command, capture_output=True, text=True)

# Print the output
print("Standard Output:\n")
print(result.stdout)
print("\nStandard Error:\n")
print(result.stderr)

Standard Output:



Standard Error:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1882    0     0  100  1882      0   9325 --:--:-- --:--:-- --:--:--  9316
100 25859  100 23977  100  1882  66317   5205 --:--:-- --:--:-- --:--:-- 71433



NB! In this example, the search results are (as intended) saved to query_results.xml.

<br />
<br />