# Ranking

Ranking all projects by a total score can provide a much deeper understanding of the ecosystem. While quantifying the state and health of a project is challenging, using a multidimensional index creates a more comprehensive picture. A repository's state is calculated by combining three dimensions: **Size**, **Community** and **Activity**. These dimensions are represented by an index of a project's rank relative to the other. The Activity score can best illustrate this: Each project is ranked according to the variables `Total Commits Last Year`, `Issues closed last Year`, `Day Until last Issue closed`, and `Last Release Data` and then normalised by 1. The total score is the weighted sum of all scores. The [code cell below](#dimensions-and-calculation) shows how the ranking is calculated in detail.

Unlike [Stars](./popularity.ipynb), which can provide insight into a project's overall popularity, this ranking unveils unpopular but otherwise strong projects. For example, larger projects like [EnergyPlus](https://github.com/NREL/EnergyPlus) suddenly make up for lost ground at the top. However, as with any index, there are limitations. In this case, monolithic software developments have a higher probability of achieving a high score, meaning that projects which rely more on a modular approach (i.e., projects distributed across multiple repositories) may be significantly underrepresented.

The real value of such health analytics comes into play when development and community data is combined with usage data. Unfortunately, this data is currently only available to a limited extent via Python dependencies. Further work is required to extend usage metrics to include other software package managers and survey methods.

In [10]:
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
from opensustain_template import *

In [11]:
df_active = pd.read_csv("../csv/project_analysis.csv")

In [12]:
df_total_score = df_active.nlargest(40, "total_score")

fig = px.bar(
    df_total_score,
    x=df_total_score["total_score"],
    y=df_total_score["project_name"],
    orientation="h",
    range_x=(0.85, 0.96),
    custom_data=["oneliner","topic","git_url"],
    color = df_total_score["development_distribution_score"],
    color_continuous_scale=color_continuous_scale
)

fig.update_layout(
    height=1000,  # Added parameter
    width=800,
    xaxis_title="Total Score",
    yaxis_title=None,
    title="Top 40 total score",
    coloraxis_colorbar=dict(
    title="DDS",
    ),   
    hoverlabel=dict(
    bgcolor="white"
)
)
fig.update(layout_showlegend=False)
fig['layout'].update(margin=dict(l=200,r=0,b=0,t=40))

fig.add_layout_image(
    dict(
        source=logo_img,
        xref="paper", yref="paper",
        x=1, y=0,
        sizex=0.05, sizey=0.05,
        xanchor="right", yanchor="bottom"
    )
)

fig.update_traces(
    hovertemplate="<br>".join([
        "Project Info: <b>%{customdata[0]}</b>",
        "Topic: <b>%{customdata[1]}</b>",
        "Git URL: <b>%{customdata[2]}</b>"
    ])
)
fig['layout']['yaxis']['autorange'] = "reversed"
fig.show()


```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: total-score
The 40 Projects with the highest total score
```

In [13]:
df_activity_score = df_active.nlargest(40, "activity")

fig = px.bar(
    df_activity_score,
    x=df_activity_score["activity"],
    y=df_activity_score["project_name"],
    orientation="h",
    custom_data=["oneliner","topic","git_url"],
    color = df_activity_score["development_distribution_score"],
    color_continuous_scale=color_continuous_scale,
    range_x=(2.8, 3.6)
)

fig.update_layout(
    height=1000,  # Added parameter
    width=800,
    xaxis_title="Activity Score",
    yaxis_title=None,
    title="Top 40 Activity Score",
    coloraxis_colorbar=dict(
    title="DDS",
    ),   
    hoverlabel=dict(
    bgcolor="white"
)
)
fig.update(layout_showlegend=False)
fig['layout'].update(margin=dict(l=200,r=0,b=0,t=40))

fig.add_layout_image(
    dict(
        source=logo_img,
        xref="paper", yref="paper",
        x=1, y=0,
        sizex=0.05, sizey=0.05,
        xanchor="right", yanchor="bottom"
    )
)

fig.update_traces(
    hovertemplate="<br>".join([
        "Project Info: <b>%{customdata[0]}</b>",
        "Topic: <b>%{customdata[1]}</b>",
        "Git URL: <b>%{customdata[2]}</b>"
    ])
)
fig['layout']['yaxis']['autorange'] = "reversed"
fig.show()

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: activity-score
The 40 Projects with the highest activity score
```

In [14]:
df_community_score = df_active.nlargest(40, "community")

fig = px.bar(
    df_community_score,
    x=df_community_score["community"],
    y=df_community_score["project_name"],
    orientation="h",
    range_x=(2.5, 3),
    custom_data=["oneliner","topic","git_url"],
    color = df_community_score["development_distribution_score"],
    color_continuous_scale=color_continuous_scale
)

fig.update_layout(
    height=1000,  # Added parameter
    width=800,
    xaxis_title="Community Score",
    yaxis_title=None,
    title="Top 40 Community Score",
    coloraxis_colorbar=dict(
    title="DDS",
    ),   
    hoverlabel=dict(
    bgcolor="white"
)
)
fig.update(layout_showlegend=False)
fig['layout'].update(margin=dict(l=200,r=0,b=0,t=40))

fig.add_layout_image(
    dict(
        source=logo_img,
        xref="paper", yref="paper",
        x=1, y=0,
        sizex=0.05, sizey=0.05,
        xanchor="right", yanchor="bottom"
    )
)

fig.update_traces(
    hovertemplate="<br>".join([
        "Project Info: <b>%{customdata[0]}</b>",
        "Topic: <b>%{customdata[1]}</b>",
        "Git URL: <b>%{customdata[2]}</b>"
    ])
)
fig['layout'].update(margin=dict(l=30,r=0,b=0,t=40))
fig['layout']['yaxis']['autorange'] = "reversed"
fig.show()

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: community-score
The 40 Projects with the highest community score
```

In [15]:
df_size_score = df_active.nlargest(40, "size")

fig = px.bar(
    df_size_score,
    x=df_size_score["size"],
    y=df_size_score["project_name"],
    orientation="h",
    range_x=(3.7, 4),
    custom_data=["oneliner","topic","git_url"],
    color = df_size_score["development_distribution_score"],
    color_continuous_scale=color_continuous_scale
)

fig.update_layout(
    height=1000,  # Added parameter
    width=800,
    xaxis_title="Size Score",
    yaxis_title=None,
    title="Top 40 Size Score",
    coloraxis_colorbar=dict(
    title="DDS",
    ),   
    hoverlabel=dict(
    bgcolor="white"
)
)
fig.update(layout_showlegend=False)
fig.add_layout_image(
    dict(
        source=logo_img,
        xref="paper", yref="paper",
        x=1, y=0,
        sizex=0.05, sizey=0.05,
        xanchor="right", yanchor="bottom"
    )
)

fig.update_traces(
    hovertemplate="<br>".join([
        "Project Info: <b>%{customdata[0]}</b>",
        "Topic: <b>%{customdata[1]}</b>",
        "Git URL: <b>%{customdata[2]}</b>"
    ])
)
fig['layout'].update(margin=dict(l=30,r=0,b=0,t=40))
fig['layout']['yaxis']['autorange'] = "reversed"
fig.show()

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: size-score

The 40 Projects with the highest size score
```

## Dimensions and calculation

```python
# Each project is ranked according to different indicators in the dimensions of community, activity and size. 
# A value of 1 represents the highest rank and 0 the lowest.
# The individual values are summed up within the dimensions to create the scores for the different dimensions.
df_active["activity"] = (
    df_active["total_commits_last_year"].rank(pct=True)
    + df_active["issues_closed_last_year"].rank(pct=True)
    + df_active["days_until_last_issue_closed"].rank(pct=True)
    + df_active["last_released_date"].rank(pct=True, na_option="top")
)

df_active["community"] = (
    df_active["contributors"].rank(pct=True)
    + df_active["development_distribution_score"].rank(pct=True)
    + df_active["reviews_per_pr"].rank(pct=True)
)

df_active["size"] = (
    df_active["total_number_of_commits"].rank(pct=True)
    + df_active["contributors"].rank(pct=True)
    + df_active["closed_issues"].rank(pct=True)
    + df_active["closed_pullrequests"].rank(pct=True)
)

# The scores are summed up and normalised so that 1 represents the largest total score. 
df_active["total_score"] = (
    df_active["activity"] / df_active["activity"].max()
    + df_active["community"] / df_active["community"].max()
    + df_active["size"] / df_active["size"].max()
) / 3
```