Skip to content

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Aug 2, 2024

This script can be run manually to synchronize the CSV files that we use to track Standards Conformance with the Github issues that track our implementation of LWG issues and papers.

… issues

This script can be run manually to synchronize the CSV files that we
use to track Standards Conformance with the Github issues that track
our implementation of LWG issues and papers.
@ldionne ldionne requested a review from a team as a code owner August 2, 2024 16:14
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2024

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

This script can be run manually to synchronize the CSV files that we use to track Standards Conformance with the Github issues that track our implementation of LWG issues and papers.


Full diff: https://github.com/llvm/llvm-project/pull/101704.diff

1 Files Affected:

  • (added) libcxx/utils/synchronize_csv_status_files.py (+229)
diff --git a/libcxx/utils/synchronize_csv_status_files.py b/libcxx/utils/synchronize_csv_status_files.py
new file mode 100755
index 0000000000000..8fd7186fde006
--- /dev/null
+++ b/libcxx/utils/synchronize_csv_status_files.py
@@ -0,0 +1,229 @@
+#!/usr/bin/env python3
+
+from typing import List, Dict, Tuple, Optional
+import csv
+import itertools
+import json
+import os
+import pathlib
+import re
+import subprocess
+
+# Number of the 'Libc++ Standards Conformance' project on Github
+LIBCXX_CONFORMANCE_PROJECT = '31'
+
+class PaperInfo:
+    paper_number: str
+    """
+    Identifier for the paper or the LWG issue. This must be something like 'PnnnnRx', 'Nxxxxx' or 'LWGxxxxx'.
+    """
+
+    paper_name: str
+    """
+    Plain text string representing the name of the paper.
+    """
+
+    meeting: Optional[str]
+    """
+    Plain text string representing the meeting at which the paper/issue was voted.
+    """
+
+    status: Optional[str]
+    """
+    Status of the paper/issue. This must be '|Complete|', '|Nothing To Do|', '|In Progress|',
+    '|Partial|' or 'Resolved by <something>'.
+    """
+
+    first_released_version: Optional[str]
+    """
+    First version of LLVM in which this paper/issue was resolved.
+    """
+
+    labels: Optional[List[str]]
+    """
+    List of labels to associate to the issue in the status-tracking table. Supported labels are
+    'format', 'ranges', 'spaceship', 'flat_containers', 'concurrency TS' and 'DR'.
+    """
+
+    original: Optional[object]
+    """
+    Object from which this PaperInfo originated. This is used to track the CSV row or Github issue that
+    was used to generate this PaperInfo and is useful for error reporting purposes.
+    """
+
+    def __init__(self, paper_number: str, paper_name: str,
+                       meeting: Optional[str] = None,
+                       status: Optional[str] = None,
+                       first_released_version: Optional[str] = None,
+                       labels: Optional[List[str]] = None,
+                       original: Optional[object] = None):
+        self.paper_number = paper_number
+        self.paper_name = paper_name
+        self.meeting = meeting
+        self.status = status
+        self.first_released_version = first_released_version
+        self.labels = labels
+        self.original = original
+
+    def for_printing(self) -> Tuple[str, str, str, str, str, str]:
+        return (
+            f'`{self.paper_number} <https://wg21.link/{self.paper_number}>`__',
+            self.paper_name,
+            self.meeting if self.meeting is not None else '',
+            self.status if self.status is not None else '',
+            self.first_released_version if self.first_released_version is not None else '',
+            ' '.join(f'|{label}|' for label in self.labels) if self.labels is not None else '',
+        )
+
+    def __repr__(self) -> str:
+        return repr(self.original) if self.original is not None else repr(self.for_printing())
+
+    def is_implemented(self) -> bool:
+        if self.status is None:
+            return False
+        if re.search(r'(in progress|partial)', self.status.lower()):
+            return False
+        return True
+
+    @staticmethod
+    def from_csv_row(row: Tuple[str, str, str, str, str, str]):# -> PaperInfo:
+        """
+        Given a row from one of our status-tracking CSV files, create a PaperInfo object representing that row.
+        """
+        # Extract the paper number from the first column
+        match = re.search(r"((P[0-9R]+)|(LWG[0-9]+)|(N[0-9]+))\s+", row[0])
+        if match is None:
+            raise RuntimeError(f"Can't parse paper/issue number out of row: {row}")
+
+        return PaperInfo(
+            paper_number=match.group(1),
+            paper_name=row[1],
+            meeting=row[2] or None,
+            status=row[3] or None,
+            first_released_version=row[4] or None,
+            labels=[l.strip('|') for l in row[5].split(' ') if l] or None,
+            original=row,
+        )
+
+    @staticmethod
+    def from_github_issue(issue: Dict):# -> PaperInfo:
+        """
+        Create a PaperInfo object from the Github issue information obtained from querying a Github Project.
+        """
+        # Extract the paper number from the issue title
+        match = re.search(r"((P[0-9R]+)|(LWG[0-9]+)|(N[0-9]+)):", issue['title'])
+        if match is None:
+            raise RuntimeError(f"Issue doesn't have a title that we know how to parse: {issue}")
+        paper = match.group(1)
+
+        # Figure out the status of the paper according to the Github project information.
+        #
+        # Sadly, we can't make a finer-grained distiction about *how* the issue
+        # was closed (such as Nothing To Do or similar).
+        status = '|Complete|' if 'status' in issue and issue['status'] == 'Done' else None
+
+        # Handle labels
+        valid_labels = ('format', 'ranges', 'spaceship', 'flat_containers', 'concurrency TS', 'DR')
+        labels = [label for label in issue['labels'] if label in valid_labels]
+
+        return PaperInfo(
+            paper_number=paper,
+            paper_name=issue['title'],
+            meeting=issue.get('meeting Voted', None),
+            status=status,
+            first_released_version=None, # TODO
+            labels=labels if labels else None,
+            original=issue,
+        )
+
+def load_csv(file: pathlib.Path) -> List[Tuple]:
+    rows = []
+    with open(file, newline='') as f:
+        reader = csv.reader(f, delimiter=',')
+        for row in reader:
+            rows.append(row)
+    return rows
+
+def write_csv(output: pathlib.Path, rows: List[Tuple]):
+    with open(output, 'w', newline='') as f:
+        writer = csv.writer(f, quoting=csv.QUOTE_ALL, lineterminator='\n')
+        for row in rows:
+            writer.writerow(row)
+
+def sync_csv(rows: List[Tuple], from_github: List[PaperInfo]) -> List[Tuple]:
+    """
+    Given a list of CSV rows representing an existing status file and a list of PaperInfos representing
+    up-to-date (but potentially incomplete) tracking information from Github, this function returns the
+    new CSV rows synchronized with the up-to-date information.
+
+    Note that this only tracks changes from 'not implemented' issues to 'implemented'. If an up-to-date
+    PaperInfo reports that a paper is not implemented but the existing CSV rows report it as implemented,
+    it is an error (i.e. the result is not a CSV row where the paper is *not* implemented).
+    """
+    results = [rows[0]] # Start with the header
+    for row in rows[1:]: # Skip the header
+        # If the row contains empty entries, this is a "separator row" between meetings.
+        # Preserve it as-is.
+        if row[0] == "":
+            results.append(row)
+            continue
+
+        paper = PaperInfo.from_csv_row(row)
+
+        # If the row is already implemented, basically keep it unchanged but also validate that we're not
+        # out-of-sync with any still-open Github issue tracking the same paper.
+        if paper.is_implemented():
+            dangling = [gh for gh in from_github if gh.paper_number == paper.paper_number and not gh.is_implemented()]
+            if dangling:
+                raise RuntimeError(f"We found the following open tracking issues for a row which is already marked as implemented:\nrow: {row}\ntracking issues: {dangling}")
+            results.append(paper.for_printing())
+        else:
+            # Find any Github issues tracking this paper
+            tracking = [gh for gh in from_github if paper.paper_number == gh.paper_number]
+
+            # If there is no tracking issue for that row in the CSV, this is an error since we're
+            # missing a Github issue.
+            if not tracking:
+                raise RuntimeError(f"Can't find any Github issue for CSV row which isn't marked as done yet: {row}")
+
+            # If there's more than one tracking issue, something is weird too.
+            if len(tracking) > 1:
+                raise RuntimeError(f"Found a row with more than one tracking issue: {row}\ntracked by: {tracking}")
+
+            # If the issue is closed, synchronize the row based on the Github issue. Otherwise, use the
+            # existing CSV row as-is.
+            results.append(tracking[0].for_printing() if tracking[0].is_implemented() else row)
+
+    return results
+
+CSV_FILES_TO_SYNC = [
+    'Cxx14Issues.csv',
+    'Cxx17Issues.csv',
+    'Cxx17Papers.csv',
+    'Cxx20Issues.csv',
+    'Cxx20Papers.csv',
+    # TODO: The Github issues are not created yet.
+    # 'Cxx23Issues.csv',
+    # 'Cxx23Papers.csv',
+    # 'Cxx2cIssues.csv',
+    # 'Cxx2cPapers.csv',
+]
+
+def main():
+    libcxx_root = pathlib.Path(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+    # Extract the list of PaperInfos from issues we're tracking on Github.
+    print("Loading all issues from Github")
+    gh_command_line = ['gh', 'project', 'item-list', LIBCXX_CONFORMANCE_PROJECT, '--owner', 'llvm', '--format', 'json', '--limit', '9999999']
+    project_info = json.loads(subprocess.check_output(gh_command_line))
+    from_github = [PaperInfo.from_github_issue(i) for i in project_info['items']]
+
+    for filename in CSV_FILES_TO_SYNC:
+        print(f"Synchronizing {filename} with Github issues")
+        file = libcxx_root / 'docs' / 'Status' / filename
+        csv = load_csv(file)
+        synced = sync_csv(csv, from_github)
+        write_csv(file, synced)
+
+if __name__ == '__main__':
+    main()

Copy link

github-actions bot commented Aug 2, 2024

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 937cbe270e5ee4e2e4d6f5568768a55d5e383076...05f33a0cde5587603ad4ce32c0c235f6a9ce40ea libcxx/utils/synchronize_csv_status_files.py
View the diff from darker here.
--- synchronize_csv_status_files.py	2024-08-05 15:26:55.000000 +0000
+++ synchronize_csv_status_files.py	2024-08-05 15:42:52.587010 +0000
@@ -15,11 +15,12 @@
 import pathlib
 import re
 import subprocess
 
 # Number of the 'Libc++ Standards Conformance' project on Github
-LIBCXX_CONFORMANCE_PROJECT = '31'
+LIBCXX_CONFORMANCE_PROJECT = "31"
+
 
 class PaperInfo:
     paper_number: str
     """
     Identifier for the paper or the LWG issue. This must be something like 'PnnnnRx', 'Nxxxxx' or 'LWGxxxxx'.
@@ -56,46 +57,58 @@
     """
     Object from which this PaperInfo originated. This is used to track the CSV row or Github issue that
     was used to generate this PaperInfo and is useful for error reporting purposes.
     """
 
-    def __init__(self, paper_number: str, paper_name: str,
-                       meeting: Optional[str] = None,
-                       status: Optional[str] = None,
-                       first_released_version: Optional[str] = None,
-                       labels: Optional[List[str]] = None,
-                       original: Optional[object] = None):
+    def __init__(
+        self,
+        paper_number: str,
+        paper_name: str,
+        meeting: Optional[str] = None,
+        status: Optional[str] = None,
+        first_released_version: Optional[str] = None,
+        labels: Optional[List[str]] = None,
+        original: Optional[object] = None,
+    ):
         self.paper_number = paper_number
         self.paper_name = paper_name
         self.meeting = meeting
         self.status = status
         self.first_released_version = first_released_version
         self.labels = labels
         self.original = original
 
     def for_printing(self) -> Tuple[str, str, str, str, str, str]:
         return (
-            f'`{self.paper_number} <https://wg21.link/{self.paper_number}>`__',
+            f"`{self.paper_number} <https://wg21.link/{self.paper_number}>`__",
             self.paper_name,
-            self.meeting if self.meeting is not None else '',
-            self.status if self.status is not None else '',
-            self.first_released_version if self.first_released_version is not None else '',
-            ' '.join(f'|{label}|' for label in self.labels) if self.labels is not None else '',
+            self.meeting if self.meeting is not None else "",
+            self.status if self.status is not None else "",
+            self.first_released_version
+            if self.first_released_version is not None
+            else "",
+            " ".join(f"|{label}|" for label in self.labels)
+            if self.labels is not None
+            else "",
         )
 
     def __repr__(self) -> str:
-        return repr(self.original) if self.original is not None else repr(self.for_printing())
+        return (
+            repr(self.original)
+            if self.original is not None
+            else repr(self.for_printing())
+        )
 
     def is_implemented(self) -> bool:
         if self.status is None:
             return False
-        if re.search(r'(in progress|partial)', self.status.lower()):
+        if re.search(r"(in progress|partial)", self.status.lower()):
             return False
         return True
 
     @staticmethod
-    def from_csv_row(row: Tuple[str, str, str, str, str, str]):# -> PaperInfo:
+    def from_csv_row(row: Tuple[str, str, str, str, str, str]):  # -> PaperInfo:
         """
         Given a row from one of our status-tracking CSV files, create a PaperInfo object representing that row.
         """
         # Extract the paper number from the first column
         match = re.search(r"((P[0-9R]+)|(LWG[0-9]+)|(N[0-9]+))\s+", row[0])
@@ -106,58 +119,72 @@
             paper_number=match.group(1),
             paper_name=row[1],
             meeting=row[2] or None,
             status=row[3] or None,
             first_released_version=row[4] or None,
-            labels=[l.strip('|') for l in row[5].split(' ') if l] or None,
+            labels=[l.strip("|") for l in row[5].split(" ") if l] or None,
             original=row,
         )
 
     @staticmethod
-    def from_github_issue(issue: Dict):# -> PaperInfo:
+    def from_github_issue(issue: Dict):  # -> PaperInfo:
         """
         Create a PaperInfo object from the Github issue information obtained from querying a Github Project.
         """
         # Extract the paper number from the issue title
-        match = re.search(r"((P[0-9R]+)|(LWG[0-9]+)|(N[0-9]+)):", issue['title'])
+        match = re.search(r"((P[0-9R]+)|(LWG[0-9]+)|(N[0-9]+)):", issue["title"])
         if match is None:
-            raise RuntimeError(f"Issue doesn't have a title that we know how to parse: {issue}")
+            raise RuntimeError(
+                f"Issue doesn't have a title that we know how to parse: {issue}"
+            )
         paper = match.group(1)
 
         # Figure out the status of the paper according to the Github project information.
         #
         # Sadly, we can't make a finer-grained distiction about *how* the issue
         # was closed (such as Nothing To Do or similar).
-        status = '|Complete|' if 'status' in issue and issue['status'] == 'Done' else None
+        status = (
+            "|Complete|" if "status" in issue and issue["status"] == "Done" else None
+        )
 
         # Handle labels
-        valid_labels = ('format', 'ranges', 'spaceship', 'flat_containers', 'concurrency TS', 'DR')
-        labels = [label for label in issue['labels'] if label in valid_labels]
+        valid_labels = (
+            "format",
+            "ranges",
+            "spaceship",
+            "flat_containers",
+            "concurrency TS",
+            "DR",
+        )
+        labels = [label for label in issue["labels"] if label in valid_labels]
 
         return PaperInfo(
             paper_number=paper,
-            paper_name=issue['title'],
-            meeting=issue.get('meeting Voted', None),
+            paper_name=issue["title"],
+            meeting=issue.get("meeting Voted", None),
             status=status,
-            first_released_version=None, # TODO
+            first_released_version=None,  # TODO
             labels=labels if labels else None,
             original=issue,
         )
 
+
 def load_csv(file: pathlib.Path) -> List[Tuple]:
     rows = []
-    with open(file, newline='') as f:
-        reader = csv.reader(f, delimiter=',')
+    with open(file, newline="") as f:
+        reader = csv.reader(f, delimiter=",")
         for row in reader:
             rows.append(row)
     return rows
 
+
 def write_csv(output: pathlib.Path, rows: List[Tuple]):
-    with open(output, 'w', newline='') as f:
-        writer = csv.writer(f, quoting=csv.QUOTE_ALL, lineterminator='\n')
+    with open(output, "w", newline="") as f:
+        writer = csv.writer(f, quoting=csv.QUOTE_ALL, lineterminator="\n")
         for row in rows:
             writer.writerow(row)
+
 
 def sync_csv(rows: List[Tuple], from_github: List[PaperInfo]) -> List[Tuple]:
     """
     Given a list of CSV rows representing an existing status file and a list of PaperInfos representing
     up-to-date (but potentially incomplete) tracking information from Github, this function returns the
@@ -165,12 +192,12 @@
 
     Note that this only tracks changes from 'not implemented' issues to 'implemented'. If an up-to-date
     PaperInfo reports that a paper is not implemented but the existing CSV rows report it as implemented,
     it is an error (i.e. the result is not a CSV row where the paper is *not* implemented).
     """
-    results = [rows[0]] # Start with the header
-    for row in rows[1:]: # Skip the header
+    results = [rows[0]]  # Start with the header
+    for row in rows[1:]:  # Skip the header
         # If the row contains empty entries, this is a "separator row" between meetings.
         # Preserve it as-is.
         if row[0] == "":
             results.append(row)
             continue
@@ -178,59 +205,89 @@
         paper = PaperInfo.from_csv_row(row)
 
         # If the row is already implemented, basically keep it unchanged but also validate that we're not
         # out-of-sync with any still-open Github issue tracking the same paper.
         if paper.is_implemented():
-            dangling = [gh for gh in from_github if gh.paper_number == paper.paper_number and not gh.is_implemented()]
+            dangling = [
+                gh
+                for gh in from_github
+                if gh.paper_number == paper.paper_number and not gh.is_implemented()
+            ]
             if dangling:
-                raise RuntimeError(f"We found the following open tracking issues for a row which is already marked as implemented:\nrow: {row}\ntracking issues: {dangling}")
+                raise RuntimeError(
+                    f"We found the following open tracking issues for a row which is already marked as implemented:\nrow: {row}\ntracking issues: {dangling}"
+                )
             results.append(paper.for_printing())
         else:
             # Find any Github issues tracking this paper
-            tracking = [gh for gh in from_github if paper.paper_number == gh.paper_number]
+            tracking = [
+                gh for gh in from_github if paper.paper_number == gh.paper_number
+            ]
 
             # If there is no tracking issue for that row in the CSV, this is an error since we're
             # missing a Github issue.
             if not tracking:
-                raise RuntimeError(f"Can't find any Github issue for CSV row which isn't marked as done yet: {row}")
+                raise RuntimeError(
+                    f"Can't find any Github issue for CSV row which isn't marked as done yet: {row}"
+                )
 
             # If there's more than one tracking issue, something is weird too.
             if len(tracking) > 1:
-                raise RuntimeError(f"Found a row with more than one tracking issue: {row}\ntracked by: {tracking}")
+                raise RuntimeError(
+                    f"Found a row with more than one tracking issue: {row}\ntracked by: {tracking}"
+                )
 
             # If the issue is closed, synchronize the row based on the Github issue. Otherwise, use the
             # existing CSV row as-is.
-            results.append(tracking[0].for_printing() if tracking[0].is_implemented() else row)
+            results.append(
+                tracking[0].for_printing() if tracking[0].is_implemented() else row
+            )
 
     return results
 
+
 CSV_FILES_TO_SYNC = [
-    'Cxx14Issues.csv',
-    'Cxx17Issues.csv',
-    'Cxx17Papers.csv',
-    'Cxx20Issues.csv',
-    'Cxx20Papers.csv',
+    "Cxx14Issues.csv",
+    "Cxx17Issues.csv",
+    "Cxx17Papers.csv",
+    "Cxx20Issues.csv",
+    "Cxx20Papers.csv",
     # TODO: The Github issues are not created yet.
     # 'Cxx23Issues.csv',
     # 'Cxx23Papers.csv',
     # 'Cxx2cIssues.csv',
     # 'Cxx2cPapers.csv',
 ]
 
+
 def main():
-    libcxx_root = pathlib.Path(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+    libcxx_root = pathlib.Path(
+        os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+    )
 
     # Extract the list of PaperInfos from issues we're tracking on Github.
     print("Loading all issues from Github")
-    gh_command_line = ['gh', 'project', 'item-list', LIBCXX_CONFORMANCE_PROJECT, '--owner', 'llvm', '--format', 'json', '--limit', '9999999']
+    gh_command_line = [
+        "gh",
+        "project",
+        "item-list",
+        LIBCXX_CONFORMANCE_PROJECT,
+        "--owner",
+        "llvm",
+        "--format",
+        "json",
+        "--limit",
+        "9999999",
+    ]
     project_info = json.loads(subprocess.check_output(gh_command_line))
-    from_github = [PaperInfo.from_github_issue(i) for i in project_info['items']]
+    from_github = [PaperInfo.from_github_issue(i) for i in project_info["items"]]
 
     for filename in CSV_FILES_TO_SYNC:
         print(f"Synchronizing {filename} with Github issues")
-        file = libcxx_root / 'docs' / 'Status' / filename
+        file = libcxx_root / "docs" / "Status" / filename
         csv = load_csv(file)
         synced = sync_csv(csv, from_github)
         write_csv(file, synced)
 
-if __name__ == '__main__':
+
+if __name__ == "__main__":
     main()

@h-vetinari
Copy link
Contributor

This is cool! :)

For all the "non-special" ones (i.e. issue['status'] == ''), it might be worth a thought to insert a link back to the github issue into the status page. I was even thinking` about doing this manually (because IMO it's a value add to be able to go from the status page to the issue where the approach for an implementation is discussed), but if this becomes scriptable, it'd be even easier. The sync logic would have to become a bit smarter, but nothing extraordinary.

Copy link
Member

@mordante mordante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! :)

For all the "non-special" ones (i.e. issue['status'] == ''), it might be worth a thought to insert a link back to the github issue into the status page. I was even thinking` about doing this manually (because IMO it's a value add to be able to go from the status page to the issue where the approach for an implementation is discussed), but if this becomes scriptable, it'd be even easier. The sync logic would have to become a bit smarter, but nothing extraordinary.

IIRC this is something we discussed and in favour of.

@ldionne can you run this script and upload the changes to the CSV files? This makes it easier to review the changes.

@ldionne
Copy link
Member Author

ldionne commented Aug 5, 2024

@ldionne can you run this script and upload the changes to the CSV files? This makes it easier to review the changes.

Running the script currently doesn't produce any diff. I made some NFC tweaks to our CSV files so they would be consistent to the point where running the script doesn't produce any difference.

@ldionne
Copy link
Member Author

ldionne commented Aug 5, 2024

For all the "non-special" ones (i.e. issue['status'] == ''), it might be worth a thought to insert a link back to the github issue into the status page. I was even thinking` about doing this manually (because IMO it's a value add to be able to go from the status page to the issue where the approach for an implementation is discussed), but if this becomes scriptable, it'd be even easier. The sync logic would have to become a bit smarter, but nothing extraordinary.

I think this makes sense, however I would like to tackle this as a follow-up item. Indeed, ideally we would do this by adding a new column in the CSV file that contains a link to the Github issue. This is going to require a wholesale modification of the CSV files. I am also planning to retroactively create issues for already-implemented stuff so that the Github project view is exhaustive. Adding links to the GH issues in the CSV would be best done after this is all set up.

@ldionne
Copy link
Member Author

ldionne commented Aug 13, 2024

As discussed in the monthly meeting, merging.

@ldionne ldionne merged commit f117f0a into llvm:main Aug 13, 2024
50 of 53 checks passed
@ldionne ldionne deleted the review/synchronize-csv-files branch August 13, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants