(dds_chapter)=
# Development Distribution Score

The distribution of responsibility and workload among contributors is essential to a stable community. A common way to estimate the distribution of work, knowledge and responsibility is the so-called [Bus Factor](https://en.wikipedia.org/wiki/Bus_factor). It is a measurement of the risk resulting from information and capabilities not being shared among team members, derived from the phrase "in case they get hit by a bus".

Quantifying a bus factor requires deep insights into the project's organisational structure. In this study, a proxy is developed to quantify the bus factor at the scale, the Development Distribution Score (DDS). The DDS weighs how the development is distributed between project contributors by benchmarking the contributor with the most commits in relation to the other contributors. The distribution of knowledge, work, and governance is vital to project sustainability. When a project or organisation experiences significant social or technological changes (e.g., personnel leave a project or can no longer contribute), others have the knowledge and capacity to continue with the initiative. The metric compares a project's reliance on a small group of contributors and, therefore, its resilience to change. Projects with a low DDS appear to be more vulnerable to decisions made by one organisation or developer that affects not only other immediate developers or users but the dependencies of other projects.

**The commits of the strongest contributor are measured in relation to the total number of commits.** Even though commits are not an absolute measure of the performance of individuals within a project, they do reflect working relationships after a certain period of development in relative terms. In addition, this offers the possibility to evaluate the state of a project without having to form a direct comparison with other projects. The following formula shows the calculation of the DDS value:

```{figure} ../images/dds_calc.png
---
align: center
height: 100
---
```
**For instance, a DDS of 0.1 means that 90% of the transfers come from a single developer.** Without the high engagement of the individual, it will become challenging for the rest of the community to maintain and further develop the existing code base. The following table shows the statistical median of the DDS on the whole dataset.

**Across all active and inactive projects, the median DDS is at 0.3. This means that most open source projects depend heavily on a single developer contributing 70% of the commits to a project.** Looking at the inactive projects, this value dropped down to 0.12. The active projects show a DDS of 0.33. The highest values are shown for projects within GitHub organisations, with a DDS of 0.38. The top 50 projects ranked by stars have a median DDS of 0.4.

In [1]:
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
from opensustain_template import *

In [2]:
df_active = pd.read_csv("../csv/project_analysis.csv")
df_raw = pd.read_csv("../csv/projects.csv")

In [6]:
df_personal_projects = df_active[df_active["organization"].isna()]
df_organization_projects = df_active[df_active["organization"].notna()]
df_inactive = df_raw[(df_raw["project_active"] == False)]
df_top_stargazers = df_active[(df_active["stargazers_count"] > 100)]

fig = go.Figure(
    data=[
        go.Table(
            header=dict(values=["Group", "Median DDS"],line_color='#000000',fill_color='#ffffff',font_size=18),
            cells=dict(
                        line_color='#ffffff',fill_color='#ffffff', font_size=16, height =30,
                values=[
                    [
                        "All projects",
                        "Active projects in personal namespace",
                        "Active organisation projects",
                        "Active projects",
                        "Inactive projects",
                        "Active projects over than 50 Stars",
                        "Projects with most contributors"

                    ],
                    [
                        round(df_raw["development_distribution_score"].median(),3),
                        round(df_personal_projects["development_distribution_score"].median(),3),
                        round(df_organization_projects["development_distribution_score"].median(),3),
                        round(df_active["development_distribution_score"].median(),3),
                        round(df_inactive["development_distribution_score"].median(),3),
                        round(df_top_stargazers["development_distribution_score"].median(),3),
                        round(df_active.nlargest(50, "contributors")["development_distribution_score"].median(),3)
                    ],
                ]
            ),
        )
    ]
)

fig.update_layout(
width=800,
height=255

)
fig['layout'].update(margin=dict(l=20,r=0,b=0,t=20))

fig.show()

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: median-dds

Median Development Distribution Score within various groups of projects
```

In particular, the difference between inactive and active projects makes it clear that the DDS is an important indicator for the longevity of an open source project. It should be noted that a high DDS is not advantageous in every case. [Brooks' Law](https://en.wikipedia.org/wiki/Brooks's_law) is an observation about software project management according to which "adding manpower to a late software project makes it later". Especially for projects of high complexity, very large team sizes quickly lead to overhead and communication problems. Under these conditions, the distribution of work between many can become problematic. One solution to this is to split software projects into modular components that can be managed by smaller groups. This approach is known as the [Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy) in software development.

The development communities with the most contributors also show the highest DDS value of 0.685, **illustrating that for a large development community, the work is more evenly distributed among individuals**. 

In [4]:
max_age_in_years=10
fig = px.scatter(
    df_active.query("project_age_in_years<@max_age_in_years"),
    x="project_age_in_years",
    y="topic",
    size="size",
    color="development_distribution_score",
    custom_data=["project_name","oneliner","topic","git_url"],
    size_max=20,
)

fig.update_layout(
    coloraxis_colorbar=dict(
        title="DDS",
    ),
    yaxis_title=None,
    xaxis_title="Project Age in Years",
    height=1000,  # Added parameter
    width=1200,
    title="Development Distribution Score within Topics",
    hoverlabel=dict(
    bgcolor="white"
)
)
fig.update_traces(
    hovertemplate="<br>".join([
        "Project Name: <b>%{customdata[0]}",
        "Project Info: <b>%{customdata[1]}</b>",
        "Topic: <b>%{customdata[2]}</b>",
        "Git URL: <b>%{customdata[3]}</b>"
    ])
)
fig.add_layout_image(
    dict(
        source=logo_img,
        xref="paper", yref="paper",
        x=1, y=1,
        sizex=0.05, sizey=0.05,
        xanchor="right", yanchor="top"
    )
)
fig['layout']['xaxis']['autorange'] = "reversed"

fig.show()

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: median-dds-overview

Development Distribution Score of all projects
```