# Remote↔Central perfSONAR Mesh: Automated Setup & Archiving

This notebook provisions a **central VM** and multiple **remote VMs**, installs perfSONAR components on each, wires up **pSConfig** for test orchestration, and configures **local and optional central archiving** so results appear in Grafana.

## What this notebook does

* **Provisioning**

  * Creates one **central VM** and N **remote VMs**.
  * Installs on **all VMs**: `perfsonar-testpoint`, `perfsonar-archive`, and `perfsonar-grafana`.
* **Configuration**

  * **Central** publishes a **pSConfig** template describing all hosts and archives.
  * **Remote Nodes** subscribe to the central template and also load a **local pSConfig** (self + central).
* **Measurements**

  * Schedules the following tests **in both directions** (A→B and B→A where applicable):

    * **Throughput:** `iperf3` (e.g., `-P 4 -t 60 -i 10 -O 10`)
    * **Latency:** `owping`, `twping`, `halfping`
    * **RTT:** `ping`, `tcpping`
    * **MTU:** `fwmtu`
    * **Clock:** `psclock`
    * **Trace:** `traceroute`
  * Test cadences are parameterized; defaults can be set at the top of the notebook.
* **Archiving**

  * **Local archive on each remote VM** via HTTP archiver → Logstash → OpenSearch.
  * **Optional central archiving** (mirrored submission) for fleet-wide dashboards.
* **Visualization**

  * Each VM runs Grafana with perfSONAR dashboards (OpenSearch or pSConfig-driven views).

## Key parameters (set at the top)

* `TOTAL_NODE_CNT = 3`. Controls the number of nodes/VMs to use for the experiment `1(Central) + (N-1) (Remote)`.
* `TEST_INTERVAL = 10M`. Controls the test interval with allowed values `10M`, `2H`, `4H` and `6H`.
* **Archive behavior:**
  * Always send to **local** archive on each remote server.
  * Optionally **mirror** to central by enabling `USE_CENTRAL_ARCHIVE = True`

![](./images/perfsonar-psconfig.png)

## Import the FABlib Library

FABlib is used to programmatically create and manage FABRIC resources such as slices and VMs.

In [None]:
from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network
import ipaddress, os, json

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

## Create the Experiment Slice

This step provisions the VMs on the FABRIC testbed.

In [None]:
slice_name = 'perfSonar-fabric-multi-site'

TOTAL_NODE_CNT = 2

sites = fablib.get_random_sites(TOTAL_NODE_CNT)
print(f"Sites: {sites}")

central_node_name = "central"
remote_node_name = "remote"

USE_CENTRAL_ARCHIVE = False
TEST_INTERVAL = '10M' # Allowed values 10M, 2H, 4H, 6H

In [None]:
#Create Slice
slice = fablib.new_slice(name=slice_name)

for i, s in enumerate(sites):
    prefix = central_node_name if i == 0 else remote_node_name

    net = slice.add_l3network(name=f"{s}-l3", type='IPv4')

    node = slice.add_node(
        name=f"{prefix}-{s}",
        site=s,
        image="default_ubuntu_24",
        cores=16,
        ram=32,
        disk=100
    )

    iface = node.add_component(model='NIC_Basic', name='nic1').get_interfaces()[0]
    iface.set_mode('auto')
    net.add_interface(iface)

    node.add_route(subnet=fablib.FABNETV4_SUBNET, next_hop=net.get_gateway())



#Submit Slice Request
slice.submit();

## Configure `/etc/hosts`

For consistent name resolution across the fleet, update `/etc/hosts` on **every** VM with the same mappings.

**Steps**

* Add an entry for each node: IP address followed by its hostname (and optional aliases).
* Keep the file identical on all nodes to prevent lookup mismatches during monitoring/orchestration.

**Example**

```
10.128.1.2    central-STAR
10.128.127.2  remote-RUTG
10.128.254.2  remote-NEWY
```

In [None]:
etc_hosts = ""
for server in slice.get_nodes():
    print(f"node: {server.get_name()}")
    server_ip=server.get_interface(network_name=f"{server.get_site()}-l3").get_ip_addr()
    etc_hosts += f"{server_ip} {server.get_name()}\n"

for server in slice.get_nodes():
    server.execute(f'sudo sh -c \'echo "{etc_hosts}" >> /etc/hosts\'')

## Install perfSONAR stack (Testpoint, Archive, Grafana)

Install the full perfSONAR stack on **every** VM—both central and remote—for a consistent setup:

* **Testpoint**: pScheduler and measurement tools.
* **Archive**: Logstash/OpenSearch pipeline for storing results.
* **Grafana**: Dashboards for visualization.

This ensures each node can run tests, store results locally, and (optionally) mirror data to the central archive.

In [None]:
ip_map = {}
central_host = ""
for n in slice.get_nodes():
    n.upload_directory('node_tools','.')
    ip_addr = n.get_interface(network_name=f"{n.get_site()}-l3").get_ip_addr() 
    ip_map[n.get_name()] = ip_addr
    if central_node_name in n.get_name():
        central_host = n.get_name()
    n.execute('sudo node_tools/perfsonar-install.sh', quiet=True, output_file=f"{n.get_name()}.log")

## Archive access control — update Apache `/logstash` allowlist & auth

Lock down the Logstash ingest endpoint so only approved hosts (and/or authenticated clients) can post results.

**Steps**

1. **Edit the `/logstash` location block**
   File: `/etc/apache2/conf-available/apache-logstash.conf`

```apache
# ... existing ProxyPass/ProxyPassReverse for /logstash ...

<Location "/logstash">
  AuthType Basic
  AuthName "Logstash Ingest"
  # User file created by the archive install (adjust if different)
  AuthUserFile /etc/perfsonar/opensearch/logstash_login

  # Allow either a valid user OR specific IPs (including localhost)
  <RequireAny>
    Require valid-user
    Require ip 127.0.0.1 ::1
    # Add remote testpoints here (examples):
    Require ip 10.128.127.0/24
    Require ip 203.0.113.45
  </RequireAny>
</Location>
```

3. **Validate & reload Apache**

```bash
sudo apachectl -t
sudo systemctl reload apache2
```

4. **Quick checks**

* On the archive host: `sudo tail -f /var/log/apache2/access.log /var/log/apache2/error.log`
* From a remote VM, run a test that uses the HTTP archiver, then on the archive:

  ```bash
  pscheduler archiving-summary PT30M
  ```

  You should see successes increase.

**Notes**

* Keep HTTPS enabled for `/logstash` and rotate credentials as needed.
* Using both `Require valid-user` and `Require ip` lets you mix “trusted by IP” remote VMs with authenticated clients.

In [None]:
all_ips = ip_map.values()
for n in slice.get_nodes():
    n.execute(f'sudo node_tools/allow_logstash_ips.sh  {" ".join(map(str, all_ips))}')

In [None]:
all_hosts = [x for name, ip in ip_map.items() for x in (name, str(ip))]
print(all_hosts)

## pSConfig

Orchestrate tests with a central template and have remote VMs subscribe to it.

### Central VM: build & publish the mesh template

1. Create a JSON template defining **addresses, groups, tests, schedules, and archives** (the HTTP `/logstash` archive) using `psconfig/psconfig_builder.py`.
2. Publish it so remote VMs can subscribe:

   ```bash
   sudo psconfig publish psconfig/psconfig.json
   sudo psconfig remote add https://localhost/psconfig/psconfig.json
   ```

### Remote VMs: subscribe (and auto-configure archives)

1. Create a JSON template defining **addresses, groups, tests, schedules, and archives** (the HTTP `/logstash` archive) using `psconfig/psconfig_builder.py`.
2. Add a small **local** template on each remote VM (self + central) for specific extras, and subscribe to it as a file:

   ```bash
   sudo psconfig publish psconfig/psconfig.json
   sudo psconfig remote add https://localhost/psconfig/psconfig.json
   ```

### Verify

* After \~1 minute, tasks should appear:

  ```bash
  pscheduler schedule
  pscheduler archiving-summary PT30M
  ```
* In Grafana (central and/or remote), confirm data is arriving for the time window covering the first runs.


In [None]:
for n in slice.get_nodes():
    n.upload_directory('psconfig', '.')

    if central_host in n.get_name():
        n.execute(f'python3 psconfig/psconfig_builder.py --base_config_file psconfig/base_psconfig.json '
                  f'--output_file psconfig/psconfig.json --no_add_tests --host_list {" ".join(map(str, all_hosts))}')
    else:
        if USE_CENTRAL_ARCHIVE:
            cmd = (
                    f"python3 psconfig/psconfig_builder.py "
                    f"--base_config_file psconfig/base_psconfig.json "
                    f"--output_file psconfig/psconfig.json "
                    f"--remote {ip_map[central_host]} "
                    f'--host_list {n.get_name()} {ip_map[n.get_name()]} {central_host} {ip_map[central_host]}'
                )
        else:
            cmd = (
                    f"python3 psconfig/psconfig_builder.py "
                    f"--base_config_file psconfig/base_psconfig.json "
                    f"--output_file psconfig/psconfig.json "
                    f'--host_list {n.get_name()} {ip_map[n.get_name()]} {central_host} {ip_map[central_host]}'
                )
            
        print(cmd)
        n.execute(cmd)

    n.execute('sudo psconfig validate psconfig/psconfig.json')
    n.execute('sudo psconfig publish psconfig/psconfig.json')

    n.execute('sudo psconfig remote add "https://localhost/psconfig/psconfig.json"')

## Create SSH Tunnel (Grafana Access)

Use an SSH local port forward so you can open the remote Grafana UI in your local browser.

In [None]:
fablib.create_ssh_tunnel_config(overwrite=True)

## perfSONAR Toolkit Grafana

Use SSH local port forwards to map your **local** ports to each remote host’s **HTTPS (443)**. Then open the URLs below in your browser.

### Central Node (maps local 8443 → remote 443)

```bash
ssh -N -L 8443:localhost:443 <USER>@<CENTRAL_IP>
# Browse: https://127.0.0.1:8443
```

### Remote VMs (add one tunnel per VM)

```bash
# Remote-1  (local 8444 → remote-1:443)
ssh -N -L 8444:localhost:443 <USER>@<REMOTE1_IP>
# Browse: https://127.0.0.1:8444

# Remote-2  (local 8445 → remote-2:443)
ssh -N -L 8445:localhost:443 <USER>@<REMOTE2_IP>
# Browse: https://127.0.0.1:8445
```

> Tip: add `-f` to send the tunnel to the background (`ssh -fN -L ...`).
> If remote VMs are only reachable via central, use a jump host:
> `ssh -J <USER>@<CENTRAL_IP> -N -L 8444:localhost:443 <USER>@REMOTE1_IP>`

### Notes

* perfSONAR/Grafana often uses a self-signed cert — your browser may warn; proceed if you trust the host.
* Log in with your Grafana credentials.
* Stop a tunnel with `Ctrl+C` (or kill the background `ssh` if you used `-f`).


In [None]:
# Port on your local machine that you want to map the File Browser to.
local_port=8443
# Local interface to map the File Browser to (can be `localhost`)
local_host='127.0.0.1'

# Port on the node used by the File Browser Service
target_port='443'

for n in slice.get_nodes():
    # Username/node on FABRIC
    target_host=f'{n.get_username()}@{n.get_management_ip()}'
    print(f'Tunnel command for {n.get_name()}')
    print(f'ssh  -L {local_host}:{local_port}:127.0.0.1:{target_port} -i {os.path.basename(fablib.get_default_slice_public_key_file())[:-4]} -F ssh_config {target_host}')
    local_port += 1
    print()


## Verification

### pScheduler health

```bash
# Basic end-to-end checks (network, tools, auth, archives)
pscheduler troubleshoot

# What’s scheduled / running
pscheduler schedule
pscheduler task --state on-run

# Archiving success/fail in the last 30 minutes
pscheduler archiving-summary PT30M
```

### OpenSearch spot-checks (recent docs)

```bash
# Replace with your archive host and credentials
curl -k -u '<user>:<pass>' \
  'https://<archive-host>/opensearch/pscheduler*/_search?size=5' \
  -H 'Content-Type: application/json' -d '{
    "sort": [{"pscheduler.start_time":{"order":"desc"}}]
  }' | jq '.[].hits.hits[]._source
           | { time: .pscheduler.start_time,
               type: .test.type,
               src: .test.spec.source,
               dst: .test.spec.dest,
               summary: .summary }'
```

You should see `src` and `dst` populated; if blank, ensure tests specify **both** `--source` and `--dest` (pSConfig does this automatically).

### Grafana dashboards

* Open the dashboard and set the time range to include your latest runs.
* Confirm panels populate and show **source/dest** labels.
* If no data:

  * Check the Grafana data source (OpenSearch URL, index `pscheduler*`, time field `pscheduler.start_time`).
  * Re-check `pscheduler archiving-summary` and the archive’s Apache/Logstash logs.


## Delete the Slice

Please delete your slice when you are done with your experiment.

In [None]:
slice = fablib.get_slice(slice_name)
slice.delete()