# 3.4 Accessing metadata with Python from a Jupyter Notebook

Now that we know how geoportals and QGIS work with OGC CSW catalog services, let's take a look behind the scenes and write our own software that interacts with catalog services.  
## 3.4.1 Installing OWSLib

But before we start, we first need to set up all the components that we need for this, particularly OWSLibs.
OWSLib is a Python library for client programming that supports working with OGC Web Services such as WMS, WFS or CSW. After installing the library, you will also need to import the CSW class from OWSLib and some methods for later use. If you are interested to learn more about OWSLib you can find the detailed documentation https://owslib.readthedocs.io/en/latest/ and use generative AI such as ChatGPT to help you with understanding the code and drafting your own code experiments. 

In [None]:
%pip install OWSLib
from owslib.csw import CatalogueServiceWeb
from owslib.fes import PropertyIsEqualTo, BBox, PropertyIsLike, And

## 3.4.2 Creating a CatalogueServiceWeb object and connecting to the catalog

Now we need the URL from a CSW catalog service. Let's use the Catalog Service from GDI-DE Geodatenkatalog that we used the previous examples, which has the url: https://gdk.gdi-de.org/gdi-de/srv/ger/csw    

You can find the url by searching for “Geodatenkatalog” in the Geodatenkatalog and examining the links that are provided with the metadata.

We use this url to create a Catalog Service Object that is able to connect to the catalog service.

In [None]:
CSW_URL = 'https://gdk.gdi-de.org/gdi-de/srv/ger/csw'
csw = CatalogueServiceWeb(CSW_URL)

print(csw.identification.title)


***Yeah we got a connection!***

## 3.4.3 Using CSW operations

In the first sections of the tutorial we learned about the operations of a CSW. Can you remember any except the GetCapabilities() operation? If not here are some reminders for you of the main operations:     

1. GetCapabilites(): This operation allows retrieving the available functions and properties of the CSW, including supported operations and data schema.

2. DescribeRecord(): This operation allows retrieving information about the available metadata records, including the metadata schema and supported metadata elements. 

3. GetRecords(): This operation enables retrieving metadata records based on specific criteria such as keywords, spatial extents, or time intervals.

4. GetRecordById(): This operation allows retrieving metadata records by their unique identification number. 

5. Transaction(): This operation supports adding, updating, or deleting metadata records in the CSW.


**Let's take a look on the operations that the catalog service supports:**

In [None]:
op = [op.name for op in csw.operations]
print(op)

### 3.4.3.1 Using GetRecords for querying Metadata

The GetRecords() operation from the CSW is implemented through the getrecords2() method from the OWSLib library. In the method you can define filter "constraints" on the result set. With the argument maxrecords you can also set the number of the first records that will be returned (The default would be 10). 

There are some methods like PropertyIsLike() or PropertyIsEqualTo(), which we imported in the beginning, to create filters. You can use string-type arguments to specify, which elements and values you want to search for. As Elements you can use for example:
- csw:Title: which search for a specific title
- csw:AnyText: which finds all records which contain the specific given string anywhere.

After that you can get the results with the results method from the OWSLib library:

In [None]:
NSG_filter = PropertyIsLike('csw:Title', '%Naturschutzgebiete%')
csw.getrecords2(constraints=[NSG_filter], maxrecords=20)
csw.results

### 3.4.3.2 Iterating through the Result Set

To get an more informative output, for example the titles or something else, of your results, you can make a for loop and print the records:

In [None]:
for rec in csw.records:
    print(csw.records[rec].title)
    print(csw.records[rec].abstract)
    print(" ")

### 3.4.3.3 Filtering by BBox

You can also try to filter with BBox(). That will make it possible to filter for datasets in a specific area. The method will use an array with latitudes and longitudes. Therefore you need two points in this order: [latMin, longMin, latMax, longMax]. 

In [None]:
bbox_query = BBox([52.839976, 7.474823, 53.098018, 7.911530])
csw.getrecords2(constraints=[bbox_query])
csw.results
# be patient, filtering by bbox may take 10 to 30 seconds..

## 3.4.4. Using the getRecordsByID Operation

If you already know the identifier of our wanted dataset and want to look if it is in the catalog, you can also search for that aswell with the getrecordbyid() method. It represents the GetRecordsByID operation from the CSW.

In [None]:
csw.getrecordbyid(id=['EE85FE8F-BD05-4A6D-813B-6ABC4514B18B'])
csw.records['EE85FE8F-BD05-4A6D-813B-6ABC4514B18B'].title

## 3.4.5 Task: find the NSG dataset

So now that you know how to search in an CSW with OWSLib try to find the dataset that we have found in the last excercise in the geoportal aswell with the mtheod getrecords2(). Use the next cell with incomple Pythin code as a starting point.. 

(be aware that simply running the next cell will result in some error messages..)

In [None]:
# Define filter constraints
# Tip 1: use 'And' constraint: combined_filter = And([filter1, filter2, filter3])
# Tip 2: use 'csw:Title', 'csw:AnyText' or 'csw:subject' (aka Keywords) for filtering

# GetRecords() method
csw.getrecords2(constraints=[combined_filter], maxrecords=20)

# If there are results print the titles of them
if csw.records is not None:
    # Show results
    for rec in csw.records.values():  # assuming response.records is a dictionary
        print(rec.title)
else:
    print("There are no results found. Please try to improve your filters")

## End of this exercise, please return to the main learning material