Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Change]: Define spec in a machine readable format (e.g. YAML, or other), and generate text from that #254

Open
manics opened this issue Sep 28, 2023 · 2 comments
Labels
discussion point A general discussion point for the community proposed change A proposed change to the specification

Comments

@manics
Copy link
Member

manics commented Sep 28, 2023

Summary

Define specification in a machine readable format

Source

Previous discussions in various meetings

Detail

Currently the spec is written in markdown text and markdown tables. This is a relatively accessible way to contribute to the spec, but it has the big disadvantage that it's difficult to generate downstream outputs. For example, the evaluation spreadsheet involves parsing the Markdown (using Sphinx libraries) which means it is liable to break in future.

Other disadvantages include needing to manually number everything, and being unable to have a user-modifiable view of the spec. Ideally the website would present a dynamic view of the spec, including allowing viewers to sort or filter by importance.

Ideally the spec would be written in a machine readable format, with all outputs (website, spreadsheet, anything else) generated automatically. The format could be something we define ourselves (e.g. YAML with our own schema), or we could see if there are exiting tools/format we can reuse.

Where

This would affect all files, but the content of the specificaiton wouldn't change

Proposal

See detail

Who can help

Everyone.

@manics manics added discussion point A general discussion point for the community proposed change A proposed change to the specification labels Sep 28, 2023
@vvcb
Copy link

vvcb commented Nov 20, 2023

@manics , this would be a very useful feature to have especially as TREs starting evaluating our own environments against this. I did put together a not-entirely-useless script that creates GitHub issues out of each specification. Hopefully, this allows us to assign people to these, discuss each one separately, and document them with a history of these discussions. Obviously, when upstream spec changes, we have to incorporate these manually - but possibly a good or equally possibly entirely unnecessary way of doing things.

User will have to setup a GitHub 'personal access token' that only requires read/write access on issues on the repo where these need to be created.

import time

import pandas as pd
from github import Auth, Github

user = "github_username"
pat = "personal-access-token-load-from-env-dont-store-in-code"
repo_name = "nwsde/nwsde-satre" # or similar

auth = Auth.Login(user, pat)
gh = Github(auth=auth)
repo = gh.get_repo(repo_name)

urls = {
    "Information Governance": "https://satre-specification.readthedocs.io/en/stable/pillars/information_governance.html",
    "Computing Technology": "https://satre-specification.readthedocs.io/en/stable/pillars/computing_technology.html",
    "Data Management": "https://satre-specification.readthedocs.io/en/stable/pillars/data_management.html",
    "Supporting Capability": "https://satre-specification.readthedocs.io/en/stable/pillars/supporting.html",
}

arr = []
for domain, url in urls.items():
    tables = pd.read_html(url)
    df_temp = pd.concat(tables)
    df_temp["Domain"] = domain
    arr.append(df_temp)

df = pd.concat(arr).reset_index(drop=True)
df = df.rename(columns={"Unnamed: 0": "Item"})
df = df.fillna("")
 
issues = []
for row in df.itertuples():
    i = repo.create_issue(
        title=f"{row.Item}: {row.Statement}",
        body=row.Guidance,
        labels=[row.Domain, row.Importance],
    )
    issues.append(i)
    # Pause here to avoid hitting GH timeout errors.
    time.sleep(10)

@manics
Copy link
Member Author

manics commented Nov 22, 2023

Nice idea, I didn't know Pandas could pull out HTML tables!

FYI we've got an automatically generated Excel spreadsheet https://satre-specification.readthedocs.io/en/stable/evaluation.html#evaluation-spreadsheet
that's built with a custom Sphinx plugin https://github.com/sa-tre/satre-specification/blob/8784f3ab8416da9fa98d07d5954ead82905c60ad/docs/extensions/satrecsv.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion point A general discussion point for the community proposed change A proposed change to the specification
Projects
Status: No status
Development

No branches or pull requests

2 participants