# PMD Demonstrator: infrastructure components and first steps
## PMD-Server
The PMD develops and maintains a couple of useful services, enabaling users to conduct complex scientific tasks using a web-based digital infrastructure. The services themselves are packaged as docker applications. These services are
- a webserver-application (`nginx`), directing calls to service-URLS to the respective app. Users may use another application (running `certbot`) to manage TLS/SSL certificates automatically. Of course, independently generated certificates can also be used.
- `pyironhub`: a jupyterhub service containing pre-configured kernels/images. These Images contain ready-to-use environments for conducting typical tasks like data analysis or materials simulations. Using `pyiron`, these tasks can be formalized, performed in a systematic, repeatable manner and shared with others.
- `ontodocker`: a service providing an interface to a triplesotre, where semantically annotated data can be stored. `ontodocker` adds a layer managing `SPARQL`- and Data-Endpoints pointing to these datasets as well as the user management.
- `PMD-CKAN`: ...
- Services related to the **PMD-mesh**: ...

### PMD-C: centrally hosted services
### User Management
#### "official" MaterialDigital-SSO
#### Self-hosted Identity providers
### PMD-Mesh

## `pmd_demo_tools`
### `mesh_tools`
### `sparql_tools`
### `query_collection`

## Requirements
### Access to services
The user has access to a (PMD-)Server hosting services which are connected to the pmd-mesh. Handling of the access to web interfaces (e.g. via https) has to be managed ba the resp. IT department (e.g. via firewall rules).

## Mesh-Participant registries
Build a local registry of **servers registered in the mesh**:

In [4]:
%load_ext autoreload
# reload modules automatically before each cell
%autoreload 2

from pmd_demo_tools import mesh_tools

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [18]:
partners_by_zone = mesh_tools.mesh_listing_namespace(key="wg_mesh_dns_zone")
partners_by_zone.__dict__

{'mpi_susmat_pmd_internal': namespace(company='MPISusMat',
           contact='m.bruns@mpie.de',
           dns=['pmds-mpcdf-1.ydns.eu'],
           wg_mesh_dns_zone='mpi-susmat.pmd.internal',
           wg_mesh_subnet='fd51:0:8:1::/64'),
 'c_pmd_internal': namespace(company='Platform Material Digital',
           contact='info@material-digital.de',
           dns=['pmdc.material-digital.de'],
           wg_mesh_dns_zone='c.pmd.internal',
           wg_mesh_subnet='fd51:0:0:1::/64'),
 'aisec_pmd_internal': namespace(company='Fraunhofer AISEC',
           contact='pmd@aisec.fraunhofer.de',
           dns=['material-digital.aisec.fraunhofer.de'],
           wg_mesh_dns_zone='aisec.pmd.internal',
           wg_mesh_subnet='fd51:0:2:1::/64'),
 'isc_dev_pmd_internal': namespace(company='Fraunhofer ISC',
           contact='simon.stier@isc.fraunhofer.de',
           dns=['pmd-s.open-semantic-lab.org'],
           wg_mesh_dns_zone='isc-dev.pmd.internal',
           wg_mesh_subnet='fd51:0:7:1:

Printing the registries content to screen can be done a bit prettier by using `RecursiveNamespace.show()`:

In [19]:
partners_by_zone.show()

aisec_pmd_internal:
    company: 'Fraunhofer AISEC'
    contact: 'pmd@aisec.fraunhofer.de'
    dns:
        - 'material-digital.aisec.fraunhofer.de'
    wg_mesh_dns_zone: 'aisec.pmd.internal'
    wg_mesh_subnet: 'fd51:0:2:1::/64'
bam_s1_pmd_internal:
    company: 'BAM'
    contact: 'philipp.beckmann@bam.de'
    dns:
        - 'pmd-s1.bam.de'
    wg_mesh_dns_zone: 'bam-s1.pmd.internal'
    wg_mesh_subnet: 'fd51:0:3:1::/64'
c_pmd_internal:
    company: 'Platform Material Digital'
    contact: 'info@material-digital.de'
    dns:
        - 'pmdc.material-digital.de'
    wg_mesh_dns_zone: 'c.pmd.internal'
    wg_mesh_subnet: 'fd51:0:0:1::/64'
glassomer_pmd_internal:
    company: 'Glassomer GmbH'
    contact: 'bastian.rapp@glassomer.com'
    dns:
        - 'pmd.glassomer.com'
    wg_mesh_dns_zone: 'glassomer.pmd.internal'
    wg_mesh_subnet: 'fd51:0:11:1::/64'
isc_dev_pmd_internal:
    company: 'Fraunhofer ISC'
    contact: 'simon.stier@isc.fraunhofer.de'
    dns:
        - 'pmd-s.open-seman

It's a bit nicer to group not by the mesh dns zone, but by the company:

In [20]:
partners_by_company = mesh_tools.mesh_listing_namespace(key="company")

ValueError: Duplicate index key after sanitization: 'KIT' vs 'KIT' -> 'KIT'

Error, due to a name collision: multiple server instances may be hosted by the same company, colliding with the flat representation. Solution: hierachial grouping by company (list all servers heirachially unter the same company key). Also not useful parts of sanitized keys (e.g. the "_pmd_internal" part after sanitizing "<server>.pmd.internal") can optionally be trimmed.

The resulting information is the structured like `partners_by_company.<company>.<server>.<info>` and `info` can be `company`, `contact`, `dns` (the external address of the server), `wg_mesh_dns_zone` (the internal adress (space) of the server) or `wg_mesh_subnet` (the server's ipv6 subnet)

In [21]:
partners = mesh_tools.mesh_namespace_grouped_by_company(server_key="wg_mesh_dns_zone", trim_pmd_internal=True)

In [22]:
partners.KIT.kit_2.company

'KIT'

In [23]:
partners.KIT.kit_3.contact

'pmd@kit.edu'

In [24]:
partners.KIT.kit_3.dns

['kit-pmd-3.ydns.eu']

In [25]:
partners.KIT.kit_3.wg_mesh_dns_zone

'kit-3.pmd.internal'

In [26]:
partners.KIT.kit_3.wg_mesh_subnet

'fd51:0:1:3::/64'

Most methods from this package assume this hierarchy to work properly: company -> servers -> services:

In [27]:
partners.show()

BAM:
    bam_s1:
        company: 'BAM'
        contact: 'philipp.beckmann@bam.de'
        dns:
            - 'pmd-s1.bam.de'
        wg_mesh_dns_zone: 'bam-s1.pmd.internal'
        wg_mesh_subnet: 'fd51:0:3:1::/64'
Fraunhofer_AISEC:
    aisec:
        company: 'Fraunhofer AISEC'
        contact: 'pmd@aisec.fraunhofer.de'
        dns:
            - 'material-digital.aisec.fraunhofer.de'
        wg_mesh_dns_zone: 'aisec.pmd.internal'
        wg_mesh_subnet: 'fd51:0:2:1::/64'
Fraunhofer_ISC:
    isc_dev:
        company: 'Fraunhofer ISC'
        contact: 'simon.stier@isc.fraunhofer.de'
        dns:
            - 'pmd-s.open-semantic-lab.org'
        wg_mesh_dns_zone: 'isc-dev.pmd.internal'
        wg_mesh_subnet: 'fd51:0:7:1::/64'
Fraunhofer_IWM:
    iwm:
        company: 'Fraunhofer IWM'
        contact: 'rasmus.antons@iwm.fraunhofer.de'
        dns:
            - 'pmd.iwm.fraunhofer.de'
        wg_mesh_dns_zone: 'iwm.pmd.internal'
        wg_mesh_subnet: 'fd51:0:9:1::/64'
Glassomer_Gmb

### Adding services to the registry
Next step is to add information on **services** hosted on each server.

In [28]:
%%capture cap
mesh_tools.attach_services_in_place(partners)

In [29]:
partners.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.show()

iwt:
    company: 'Leibniz-Institut für Werkstofforientierte Technologien - IWT'
    contact: 'sagehorn@iwt-bremen.de'
    dns:
        - 'pmd.iwt-bremen.de'
    services:
        ontodocker:
            address: 'ontodocker.iwt.pmd.internal'
            name: 'ontodocker'
            token: <SECRET>
    wg_mesh_dns_zone: 'iwt.pmd.internal'
    wg_mesh_subnet: 'fd51:0:10:1::/64'


After attaching the services to the server registry, there is an additional field "services" after the "server" level. Here, all services with a registered DNS in the mesh are listed. Note that, by default, some services are excluded:
- jupyterhub user containers
- the "app" service (helps admins by debugging mesh issues)
- "ns"
- "certbot" - manages certificates
- "nginx" - webserver
- "wg" - runs the local wireguard infrastructure
- "uptime-kuma" - logs pmdc uptime (@pmdc)
- "mesh-listing" - enables lsting mesh servers and services (@pmdc)
- "ca" - certificate authority (@pmdc)
These serveices can optionally also be included by calling `mesh_tools.attach_services_in_place(partners, exclude_patterns=[])`. `exclude_patterns` can be a list of simple [unix shell-style patterns](https://docs.python.org/3/library/fnmatch.html) and defaults to `mesh_tools.DEFAULT_SERVICE_EXCLUDES`:

In [30]:
mesh_tools.DEFAULT_SERVICE_EXCLUDES

['*app*',
 'jupyter-*',
 'ns',
 '*certbot*',
 '*nginx*',
 'wg',
 'uptime-kuma*',
 'mesh-listing',
 'ca']

Below the field `services`, there are 3 more fields:
- `name` - the "naked" service name without TLDs
- `address` - the full address (FQDN) of the service
- `token`, a field for storing eventual JWTs for accessing service APIs.
The token-field has to be filled manually, because such tokens have (for now) to be obtained individually from an admin of respective service. I collected my tokens in a `JSON` file, following a structure `<company>.<service>.<token>`. This file can generally have any structure, but it makes sense to have it somehow related to the fields obtained from the server/service lookup. For reading the token file, we can again use the `RecursiveNamespace` calss and make the content dot-Accassible and enable tab-completion:

#### Web Token

In [31]:
import json

with open('../secrets/tokens.json') as f:
    tokens = json.load(f, object_hook=mesh_tools.namespace_object_hook())

In [32]:
%%capture cap
tokens.Fraunhofer_IWM.ontodocker.token

In [33]:
partners.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.iwt.services.ontodocker.token = tokens.Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT.ontodocker.token
partners.Fraunhofer_IWM.iwm.services.ontodocker.token = tokens.Fraunhofer_IWM.ontodocker.token
partners.KIT.kit_3.services.ontodocker_proxy.token = tokens.KIT.ontodocker_proxy.token
partners.MPISusMat.mpi_susmat.services.ontodocker.token = tokens.MPISusMat.ontodocker.token

**CAUTION**: from now on, sensitive information is stored in the registries. Be careful when to print the content on screen in case the notebook shall be shared in any way! The `RecursiveNamespace.show()` method skips the values of `"token"` field per default.

In [34]:
partners.show()

BAM:
    bam_s1:
        company: 'BAM'
        contact: 'philipp.beckmann@bam.de'
        dns:
            - 'pmd-s1.bam.de'
        services:
            ontodocker_internal:
                address: 'ontodocker-internal.bam-s1.pmd.internal'
                name: 'ontodocker-internal'
                token: <SECRET>
        wg_mesh_dns_zone: 'bam-s1.pmd.internal'
        wg_mesh_subnet: 'fd51:0:3:1::/64'
Fraunhofer_AISEC:
    aisec:
        company: 'Fraunhofer AISEC'
        contact: 'pmd@aisec.fraunhofer.de'
        dns:
            - 'material-digital.aisec.fraunhofer.de'
        services:
        wg_mesh_dns_zone: 'aisec.pmd.internal'
        wg_mesh_subnet: 'fd51:0:2:1::/64'
Fraunhofer_ISC:
    isc_dev:
        company: 'Fraunhofer ISC'
        contact: 'simon.stier@isc.fraunhofer.de'
        dns:
            - 'pmd-s.open-semantic-lab.org'
        services:
        wg_mesh_dns_zone: 'isc-dev.pmd.internal'
        wg_mesh_subnet: 'fd51:0:7:1::/64'
Fraunhofer_IWM:
    iwm:
      

The registry can also be dumped to a json file (also here, tokens are skipped per default):

In [35]:
partners.dump_json("partners.json")

Iteration can be done using `dict`-mapping (`partners.__dict__`) and `getattr()` (See below).  
Once again a reminder: **If any key contains invalid Python identifiers, these characters are replaced by an underscore!** See `pmd_mesh-demonstrator/helpers.py: canonize_string(), RecursiveNamespace`.

### Reducing the registry to only a number of participants

There are two options to reduce the registry:
`mesh_tools.select_toplevel()`: takes a `RecursiveNamespace` instance and keeps only matches of top-level keys with a provided list of keys:

In [36]:
selection = ["Leibniz_Institut_fuer_Werkstofforientierte_Technologien_IWT", "Fraunhofer_IWM", "KIT", "MPISusMat"]
partners_reduced = mesh_tools.select_toplevel(partners, selection, deepcopy=True)
partners_reduced.show()

Fraunhofer_IWM:
    iwm:
        company: 'Fraunhofer IWM'
        contact: 'rasmus.antons@iwm.fraunhofer.de'
        dns:
            - 'pmd.iwm.fraunhofer.de'
        services:
            ckan:
                address: 'ckan.iwm.pmd.internal'
                name: 'ckan'
                token: <SECRET>
            fuseki:
                address: 'fuseki.iwm.pmd.internal'
                name: 'fuseki'
                token: <SECRET>
            jupyterhub_jupyterhub_1:
                address: 'jupyterhub-jupyterhub-1.iwm.pmd.internal'
                name: 'jupyterhub-jupyterhub-1'
                token: <SECRET>
            ontodocker:
                address: 'ontodocker.iwm.pmd.internal'
                name: 'ontodocker'
                token: <SECRET>
        wg_mesh_dns_zone: 'iwm.pmd.internal'
        wg_mesh_subnet: 'fd51:0:9:1::/64'
KIT:
    kit_2:
        company: 'KIT'
        contact: 'pmd@kit.edu'
        dns:
            - 'kit-pmd-2.ydns.eu'
        services:
      

The other option is to reduce by providing a list of service names. These don't have to be exact matches, but can be provided as "shell-style" patterns. Only the services that match the patterns are kept.

In [37]:
patterns = ["*ontodocker*", "*hub*", "*pyiron*"]
partners_selected_services = mesh_tools.select_by_services(partners, patterns, deepcopy=True)
partners_selected_services.show()

BAM:
    bam_s1:
        company: 'BAM'
        contact: 'philipp.beckmann@bam.de'
        dns:
            - 'pmd-s1.bam.de'
        services:
            ontodocker_internal:
                address: 'ontodocker-internal.bam-s1.pmd.internal'
                name: 'ontodocker-internal'
                token: <SECRET>
        wg_mesh_dns_zone: 'bam-s1.pmd.internal'
        wg_mesh_subnet: 'fd51:0:3:1::/64'
Fraunhofer_IWM:
    iwm:
        company: 'Fraunhofer IWM'
        contact: 'rasmus.antons@iwm.fraunhofer.de'
        dns:
            - 'pmd.iwm.fraunhofer.de'
        services:
            ckan:
                address: 'ckan.iwm.pmd.internal'
                name: 'ckan'
                token: <SECRET>
            fuseki:
                address: 'fuseki.iwm.pmd.internal'
                name: 'fuseki'
                token: <SECRET>
            jupyterhub_jupyterhub_1:
                address: 'jupyterhub-jupyterhub-1.iwm.pmd.internal'
                name: 'jupyterhub-jupyterhub

Optionally, by using `mesh_tools.select_by_services(... keep_all_services_on_match=False)`, you cann also remove all not-matching services as well.

### Alternative: Loading the participant information from a "manually" provided json
Helpful, if a user simply want to use her own registry structure from selectively collected information. 

In [38]:
with open('../secrets/participant_registry_manually.json') as f:
    partners_manually = json.load(f, object_hook=mesh_tools.namespace_object_hook())

In [39]:
partners_manually.mpi_susmat.ontodocker.address

'https://ontodocker.mpi-susmat.pmd.internal'