---
title: pyOpenSci Current Software Review Stats
subtitle: pyOpenSci Peer Review Summary Stats
author:
  - name: Leah Wasser
    affiliations: pyOpenSci
    orcid: 0000-0002-7859-8394
    email: leah@pyopensci.org
license:
  code: MIT
date: 2024/06/20
---


* https://github.com/ryantam626/jupyterlab_code_formatter

This is a workflow that colates all GitHub issues associated with our reviews. 

Questions i have

* How to add figure captions and alt text
* 

In [1]:
import warnings
from datetime import datetime, timezone

import altair as alt
import pandas as pd
import pytz
from pyosmeta import ProcessIssues
from pyosmeta.github_api import GitHubAPI

# Suppress all warnings
warnings.filterwarnings("ignore")

In [2]:
# Open Reviews

In [3]:
# Get a list of reviews submitted to us
# This potentially doesn't include issues that were deemed out of scope...
github_api = GitHubAPI(
    org="pyopensci",
    repo="software-submission",
    labels=[
        "0/seeking-editor",
        "1/editor-assigned",
        "2/seeking-reviewers",
        "3/reviewers-assigned",
        "4/reviews-in-awaiting-changes",
        "5/awaiting-reviewer-response",
        "7/under-joss-review",
        "8/joss-review-complete",
        "New Submission!",
    ],
)
process_review = ProcessIssues(github_api)
issues = process_review.get_issues()
reviews, errors = process_review.parse_issues(issues)
review_table = [
    {
        "package_name": name,
        "created_at": review.created_at,
        "date_closed": review.closed_at,
        "editor": review.editor.github_username,
        # "editor": review.editor.name,
        "labels": review.labels,
    }
    for name, review in reviews.items()
]

### Summary -- TODO: calculate reviews in active review, reviews in pre-review, reviews in JOSS
pyOpenSci currently has **{eval}`total_open`** total open submissions.

* x of these are in active review
* x of these are in pre-review
* x of these are being submitted to JOSS
* x of these have been approvved and are a part of our ecosystem 


## Current open reviews & total days open


In [4]:
reviews_df = pd.DataFrame(review_table)
open_reviews = reviews_df[reviews_df["date_closed"].isna()]

today = datetime.now(timezone.utc)
open_reviews["days_open"] = (today - open_reviews["created_at"]).dt.days
open_reviews.drop(columns=["date_closed"], inplace=True)
open_reviews["created_at"] = open_reviews["created_at"].dt.date
open_reviews

Unnamed: 0,package_name,created_at,editor,labels,days_open
0,MontePy,2024-07-01,TBD,"[0/pre-review-checks, New Submission!]",12
1,Great Tables,2024-06-14,TBD,[0/seeking-editor],29
2,Stingray,2024-06-14,hamogu,"[2/seeking-reviewers, astropy]",29
3,Fluidimage,2024-05-30,TBD,[0/seeking-editor],44
4,astrodata,2024-05-13,hamogu,"[3/reviewers-assigned, astropy]",61
5,QuadratiK,2024-05-13,isabelizimm,[1/editor-assigned],61
6,PyPartMC,2024-05-03,TBD,[0/seeking-editor],71
7,ANDES,2024-04-22,TBD,"[0/pre-review-checks, New Submission!, on-hold]",82
8,CyNetDiff,2024-04-22,sneakers-the-rat,[1/editor-assigned],83
9,AMS,2024-04-22,TBD,"[2/seeking-reviewers, 1/editor-assigned]",83


In [5]:
# Get Presubmission inquiries
gh_presubmissions = GitHubAPI(
    org="pyopensci", repo="software-submission", labels=["presubmission"]
)
process_review = ProcessIssues(gh_presubmissions)
pre_issues = process_review.get_issues()
pre_submissions, errors = process_review.parse_issues(pre_issues)
pre_submission_table = [
    {
        "package_name": name,
        "created_at": review.created_at,
        "date_closed": review.closed_at,
        "labels": review.labels,
    }
    for name, review in pre_submissions.items()
]

In [6]:
presubmission_df = pd.DataFrame(pre_submission_table)
all_presubmissions = len(presubmission_df)

## All presubmissions

There are **{eval}`all_presubmissions`** total presubmissions to date, including closed presubmissions. 

In [15]:
# Get all currently open presubmissions
open_presubmissions = presubmission_df[presubmission_df["date_closed"].isna()]
today = datetime.now(timezone.utc)
open_presubmissions["days_open"] = (today - open_presubmissions["created_at"]).dt.days
open_presubmissions["created_at"] = open_presubmissions["created_at"].dt.date
open_presubmissions.reset_index(drop=True, inplace=True)

total_open = len(open_presubmissions)

# Stylize table columns
open_presubmissions.sort_values(by="created_at", ascending=False, inplace=True)
open_presubmissions.drop(columns=["date_closed"], inplace=True)

## Currently open software presubmission inquiries

* It could be useful to grab the most recent comments on each
* It would also be useful to grab the gh usernames of all people involved in the discussion and credit them. So for one i see astropy editors + alex being involved.

There are **{eval}`total_open` presubmission requests** currently open. 

In [8]:
# Render table of strictly open presubmissions
open_presubmissions

Unnamed: 0,package_name,created_at,labels,days_open
0,Solar Data Tools,2024-06-28,['presubmission'],15
1,gentropy,2024-05-24,['presubmission'],50
2,GALAssify,2024-05-24,['presubmission'],50


## Current open software review submissions 

Next we explore the currently open issues.



## Available editors

In [9]:
# Static list of all editors, updated 7/13/2024
# TODO: get this list of current editors dynamically
all_editors = [
    "cmarmo",
    "dhomeier",
    "ocefpaf",
    "NikleDave",
    "SimonMolinsky",
    "Batalex",
    "sneakers-the-rat",
    "tomalrussel",
    "ctb",
    "mjhajharia",
    "hamogu",
    "isabelizimm",
    "yeelauren",
    "banesullivan",
]
submissions_per_editor = {}
for editor in all_editors:
    submissions_per_editor[editor] = 0

# Get table of editors who are currently assigned to an open submission
busy_editors = open_reviews.drop(open_reviews[open_reviews.editor == "TBD"].index)
busy_editors = busy_editors["editor"]

# Populate dictionary of number of open submissions per editor
for editor in busy_editors:
    if not (editor in submissions_per_editor):
        submissions_per_editor[editor] = 0
        # Add new editor to current editor list
        # for debugging only, TODO: output to error log
        # print("Editor", editor, "has an assigned project but is not in master list of editors")

    submissions_per_editor[editor] = submissions_per_editor[editor] + 1

# Render table of all editors and their number of open submissions
editor_activity_df = pd.DataFrame(
    list(submissions_per_editor.items()), columns=["editor", "num_submissions"]
)
editor_activity_df = editor_activity_df.sort_values(by="num_submissions")
editor_activity_df.reset_index(drop=True, inplace=True)

# Get counts of available and unavailable editors
num_busy_editors = editor_activity_df[editor_activity_df["num_submissions"] > 0].shape[
    0
]
num_available_editors = len(editor_activity_df) - num_busy_editors

There are currently **{eval}`num_available_editors` available editors** and **{eval}`num_busy_editors` editors who are assigned to a submission**.


In [10]:
# Display editor table
editor_activity_df

Unnamed: 0,editor,num_submissions
0,ocefpaf,0
1,NikleDave,0
2,SimonMolinsky,0
3,tomalrussel,0
4,mjhajharia,0
5,banesullivan,0
6,cmarmo,1
7,dhomeier,1
8,Batalex,1
9,ctb,1


In [11]:
# Get all currently closed / approved issues

# calculate time that they were in review.