---
title: pyOpenSci Current Software Review Stats
subtitle: pyOpenSci Peer Review Summary Stats
author:
  - name: Leah Wasser
    affiliations: pyOpenSci
    orcid: 0000-0002-7859-8394
    email: leah@pyopensci.org
license:
  code: MIT
date: 2024/06/20
---


* https://github.com/ryantam626/jupyterlab_code_formatter

This is a workflow that colates all GitHub issues associated with our reviews. 

Questions i have

* How to add figure captions and alt text
* 

In [1]:

from datetime import datetime

import altair as alt
import pandas as pd

from pyosmeta import ProcessIssues
from pyosmeta.github_api import GitHubAPI

In [2]:
def parse_single_issue(issue) -> dict:
    """
    Parse a single issue from the GitHub API response.

    Parameters
    ----------
    issue : dict
        Dictionary containing information about a single issue.

    Returns
    -------
    dict
        Dictionary containing parsed information about the issue.
    """
    parsed_issue = {}

    # Extract labels
    parsed_issue["labels"] = [label["name"] for label in issue.get("labels", [])]

    # Extract header text (title of the issue)
    parsed_issue["header_text"] = issue.get("title", "")

    # Extract date opened
    parsed_issue["date_opened"] = datetime.strptime(
        issue.get("created_at"), "%Y-%m-%dT%H:%M:%SZ"
    )

    # Extract date closed (if available)
    if issue.get("closed_at"):
        parsed_issue["date_closed"] = datetime.strptime(
            issue.get("closed_at"), "%Y-%m-%dT%H:%M:%SZ"
        )
        # Calculate total time issue was open
        time_open = parsed_issue["date_closed"] - parsed_issue["date_opened"]
        parsed_issue["time_open_days"] = time_open.total_seconds() / (60 * 60 * 24)
    else:
        parsed_issue["date_closed"] = None
        parsed_issue["time_open_days"] = None

    return parsed_issue

In [3]:
# Get all issues from GitHub software-submission repo, Return df with labels, title, date_opened and closed and total time open in days
github_api = GitHubAPI(
    org="pyopensci",
    repo="software-submission",
)

process_review = ProcessIssues(github_api)
issues = process_review.return_response()

all_issues = []
for issue in issues:
    all_issues.append(parse_single_issue(issue))

df = pd.DataFrame(all_issues)

# Remove issues that are unlabeled or say help wanted
valid_issues = df[
    ~(
        (df["labels"].apply(len) == 0)
        | df["labels"].apply(lambda x: "help wanted" in x or "Help Request" in x)
    )
]

# Total presubmissions - get the total number of pre-submission inquiries (all time)
total_presubmissions = valid_issues[
    valid_issues["labels"].apply(lambda x: "presubmission" in x)
]


Unnamed: 0,labels,header_text,date_opened,date_closed,time_open_days
0,[presubmission],Presubmission Inquiry for MontePy,2024-06-17 18:07:13,NaT,
7,[presubmission],Presubmission inquiry for Stingray,2024-06-01 19:58:40,NaT,
12,[presubmission],Presubmission Inquiry for gentropy,2024-05-24 14:41:14,NaT,
13,[presubmission],Presubmission Inquiry for GALAssify: A Python ...,2024-05-24 10:37:47,NaT,
14,[presubmission],Presubmission Inquiry for property-utils,2024-05-24 10:00:23,2024-05-28 19:20:05,4.388681


## Currently open presubmissions

Can some of these be closed?  

* it could be useful to grab the most recent comments on each
* It would also be useful to grab the gh usernames of all people involved in the discussion and credit them. So for one i see astropy editors + alex being involved. 


In [4]:
# Get all currently open presubmissions 
open_presubmissions = total_presubmissions[total_presubmissions['date_closed'].isna()]
open_presubmissions

Unnamed: 0,labels,header_text,date_opened,date_closed,time_open_days
0,[presubmission],Presubmission Inquiry for MontePy,2024-06-17 18:07:13,NaT,
7,[presubmission],Presubmission inquiry for Stingray,2024-06-01 19:58:40,NaT,
12,[presubmission],Presubmission Inquiry for gentropy,2024-05-24 14:41:14,NaT,
13,[presubmission],Presubmission Inquiry for GALAssify: A Python ...,2024-05-24 10:37:47,NaT,
18,[presubmission],Presubmission Inquiry for Great Tables,2024-05-23 20:25:47,NaT,
50,"[presubmission, ⌛ pending-maintainer-response]",WasteAndMaterialFootprint - presubmission enquiry,2023-12-30 17:23:18,NaT,


In [13]:
total_submissions = valid_issues[
    valid_issues["labels"].apply(lambda x: "presubmission" not in x)
]

open_submissions = total_submissions[total_submissions['date_closed'].isna()]
total_open = len(open_submissions)
open_submissions.head()

Unnamed: 0,labels,header_text,date_opened,date_closed,time_open_days
1,"[0/pre-review-checks, New Submission!]",Great Tables submission,2024-06-14 19:55:59,NaT,
2,"[0/pre-review-checks, New Submission!]",Stingray Submission,2024-06-14 12:59:47,NaT,
8,"[0/pre-review-checks, New Submission!]",Fluidimage submission,2024-05-30 12:53:48,NaT,
21,"[3/reviewers-assigned, astropy]",astrodata,2024-05-13 23:48:03,NaT,
22,[1/editor-assigned],QuadratiK Submission,2024-05-13 21:23:44,NaT,


## Open Issues 

Next we explore the currently open issues.

pyOpenSci currently has **{eval}`total_open`** total open submissions.

* x of these are in active review
* x of these are in pre-review
* x of these are being submitted to JOSS
* x of these have been approvved and are a part of our ecosystem 