# 4.3 Accessing metadata via Python from a Jupyter Notebook
Now that we know how searching for metadata works through geoportals, we want to look a bit more behind the HTML interface. When you're filtering the datasets there are requests executed on the CSW in the background. The construction of the requests is always the same at CSWs. But before we can start let's quickly set up our all other components which we need. The OGC have an own Python library, called OWSLib, for client programming. This library allows the users to access and utilize geospatial data from their online services, like WMS or CSW, via Python. After installing the library you will also need to import the CSW class from OWSLib and some methods for later use. 

In [1]:
%pip install OWSLib
from owslib.csw import CatalogueServiceWeb
from owslib.fes import PropertyIsEqualTo, BBox, PropertyIsLike



[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: C:\Users\Tobias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.


Now we need the URL from the INSPIRE CSW. I've you search for the "geodatenkatalog csw" in your browser, you should get the the catalogue service in one of the first links (https://gdk.gdi-de.org/gdi-de/srv/ger/csw?service=CSW&version=2.0.2&REQUEST=GetCapabilities&SERVICE=CSW). It's the GetCapabilities request from the CSW. In the GetCapabilities you get all the metadata for the catalogue service.
But to use the catalogue service from INSPIRE in the CSW class from OWSLib you only need the first part of the URL until the "?". With that and the function CatalogueServiceWeb(), we can after that create a connection to the CSW. 

In [2]:
CSW_URL = 'https://gdk.gdi-de.org/gdi-de/srv/ger/csw'
csw = CatalogueServiceWeb(CSW_URL)

Let's first see which operations the CSW has. With these methods you can work with the CSW. There are some methods every Catalogue Service should have:
1. GetCapabilites(): This operation allows retrieving the available functions and properties of the CSW, including supported operations and data schema.

2. DescribeRecord(): This operation allows retrieving information about the available metadata records, including the metadata schema and supported metadata elements. 

3. GetRecords(): This operation enables retrieving metadata records based on specific criteria such as keywords, spatial extents, or time intervals.

4. GetRecordById(): This operation allows retrieving metadata records by their unique identification number. 

5. Transaction(): This operation supports adding, updating, or deleting metadata records in the CSW.

In [16]:
op = [op.name for op in csw.operations]
print(op)

['GetCapabilities', 'DescribeRecord', 'GetDomain', 'GetRecords', 'GetRecordById', 'Transaction', 'Harvest']


The GetRecords() operation from the CSW is implemented through the getrecords2() method from the OWSLib library. In the method you can give "constraints" to filter through the CSW. With the argument maxrecords you can set the number of the first records that will be returned (The default would be 10). 
There are some methods like PropertyIsLike() or PropertyIsEqualTo() to make filters. There you can insert an string with the information, which element should be searched for, and a string with the keyword you want to search for. As Elements you can use for example:
- csw:Title: which search for a specific title
- csw:AnyText: which finds all records which contain the specific given string anywhere.

After that you can get the results with the results method from the OWSLib library.


In [27]:
NSG_filter = PropertyIsLike('csw:Title', '%Naturschutzgebiete%')
csw.getrecords2(constraints=[NSG_filter], maxrecords=20)
csw.results

{'matches': 94, 'returned': 20, 'nextrecord': 21}

To output your for example the titles of your results you can make a for loop and print the records.

In [9]:
for rec in csw.records:
    print(csw.records[rec].title)

Naturschutzgebiete in Freiburg i. Br.
Waldfunktionen in Sachsen - Naturschutzgebiete nach Sächsischem Naturschutzgesetz
Naturschutzgebiete Landkreis Mittelsachsen
Naturschutzgebiete Landkreis Diepholz
Naturschutzgebiete Wuppertal
Kartenlayer Naturschutzgebiete NRW
Naturschutzgebiete der Stadt Göttingen
Wuppertal ATOM Feed Naturschutzgebiete
Naturschutzgebiete Vogtlandkreis
Naturschutzgebiete (Landkreis Holzminden)
Naturschutzgebiete im Landkreis Oldenburg
Naturschutzgebiete (Landkreis Osterholz)
Naturschutzgebiete im Landkreis Ammerland
Naturschutzgebiete im Landkreis Verden
Naturschutzgebiete im Landkreis Aurich
Naturschutzgebiete (Landkreis Hameln-Pyrmont)
Naturschutzgebiete (Landkreis Leer)
Naturschutzgebiete im Landkreis Northeim
Naturschutzgebiete Landkreis Lüneburg
Naturschutzgebiete (Landkreis Hildesheim)


You can also try to filter with BBox() and that will make it possible filter for datasets in a specific area. The method will use an array with latitudes and longitudes therefor you need two points [latMin, longMin, latMax, longMax]. 

In [10]:
bbox_query = BBox([52.839976, 7.474823, 53.098018, 7.911530])
csw.getrecords2(constraints=[bbox_query])
csw.results

{'matches': 3011, 'returned': 10, 'nextrecord': 11}

If you already know the identifier of our wanted dataset you can also search for that aswell with the getrecordbyid() method. It represents the GetRecordsByID operation from the CSW.

In [11]:
csw.getrecordbyid(id=['EE85FE8F-BD05-4A6D-813B-6ABC4514B18B'])
csw.records['EE85FE8F-BD05-4A6D-813B-6ABC4514B18B'].title


'Naturschutzgebiete (NSG)'

So now that you know how to search in an CSW with OWSLib try to find the dataset that we have found in the last excercise in the geoportal aswell with the mtheod getrecords2().

In [None]:
# define filters


# GetRecords() method
response = csw.getrecords2(constraints=[filter], maxrecords=10)

# If there are results print the titles of them
if response is not None:
    # Ergebnisse auswerten
    for rec in response.records:
        print(rec.title)
else:
    print("There are no results found. Please try to work out your filters")

-------
## Testing  

-------


In [12]:
csw.getrecords2()
print(csw.results)

{'matches': 600007, 'returned': 10, 'nextrecord': 11}


In [None]:
caps = csw.getcapabilities()

# verfügbare Suchkriterien ausgeben
for op in csw.operations:
    if 'GetRecords' in op.name:
        print(f"\nSearch criteria for {op.name}:")
        for constraint in op.constraints:
            print(f"  {constraint.name}")