# Overview

At the time of writing, 1339 projects have been identified worldwide. Of these, 1188 projects are hosted on GitHub, 27 on GitLab and 125 on other websites or self-hosted Git platforms.

**A total of 996 active project repositories have been found**. A project is considered active if the public repository has at least one commit or closed issue within the last year. Inactive projects, or those that have become inactive since data collection began two years ago (192), have been excluded from our analysis to prevent distortion of current trends. Unless otherwise noted, all plots in this study refer to active projects. <!-- The statistics on all active and inactive projects in the table below are based on the raw dataset. -->

In [32]:
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
from opensustainTemplate import *

In [33]:
df_raw = pd.read_csv("../csv/projects.csv")
df_raw.rename(columns={"rubric": "topic"}, inplace=True)
df_raw.rename(columns={"topics": "labels"}, inplace=True)

# This projects is two times in the database
df_raw = df_raw[
    df_raw["git_url"] != "https://github.com/openfoodfacts/openfoodfacts-server.git"
]


In [34]:
df_raw["project_age_in_days"] = df_raw["project_age_in_days"].fillna(0)

In [35]:
# Age plots are better in years
df_raw["project_age_in_years"] = df_raw["project_age_in_days"].apply(lambda x: int(x) / 365)
max_age_in_years = 8.0

In [36]:
fig = go.Figure(
    data=[
        go.Table(
            columnwidth=[100, 30],
            header=dict(
                values=["Dimension", "Value"],
                line_color="#000000",
                fill_color="#ffffff",
                font_size=18,
            ),
            cells=dict(
                fill_color="#ffffff",
                line_color="#ffffff",
                font_size=16,
                height=30,
                values=[
                    [
                        "Total number of projects",
                        "GitHub projects",
                        "GitLab projects",
                        "Other platforms",
                        "Number of projects in personal namespace",
                        "Number of projects in community namespace",
                        "Total stars of all projects",
                        "Total contributors in all projects",
                        "Active GitHub projects",
                        "Inactive GitHub projects",
                        "Projects with contribution guide in %",
                        "Projects with code of conduct in %",
                        "Projects accepting donations in %",
                        "Median number of commits",
                        "Median stargazers",
                        "Median stars last year",
                        "Median Development Distribution Score",
                        "Median number of contributors",
                        "Median closed issues last year",
                        "Median commits last year",
                        "Median age in years",
                    ],
                    [
                        df_raw["project_name"].count(),
                        df_raw["platform"].value_counts()["github"],
                        df_raw["platform"].value_counts()["gitlab"],
                        df_raw["platform"].value_counts()["custom"],
                        df_raw["project_name"].count() - df_raw["organization"].count(),
                        df_raw["organization"].count(),
                        df_raw["stargazers_count"].sum(),
                        df_raw["contributors"].sum(),
                        df_raw["project_active"].value_counts()[True],
                        df_raw["project_active"].value_counts()[False],
                        round(
                            df_raw["contribution_guide"].value_counts(normalize=True)[
                                True
                            ]
                            * 100,
                            2,
                        ),
                        round(
                            df_raw["code_of_conduct"].value_counts(normalize=True)[True]
                            * 100,
                            2,
                        ),
                        round(
                            df_raw["accepts_donations"].value_counts(normalize=True)[
                                True
                            ]
                            * 100,
                            2,
                        ),
                        df_raw["total_number_of_commits"].median(),
                        df_raw["stargazers_count"].median(),
                        df_raw["stars_last_year"].median(),
                        round(df_raw["development_distribution_score"].median(), 4),
                        df_raw["contributors"].median(),
                        df_raw["issues_closed_last_year"].median(),
                        df_raw["total_commits_last_year"].median(),
                        round(df_raw["project_age_in_years"].median(), 2),
                    ],
                ],
            ),
        )
    ]
)


fig["layout"].update(margin=dict(l=5, r=5, b=0, t=5))
fig.update_layout(height=700, dragmode=False)
config = {
  'toImageButtonOptions': {
    'format': 'svg', # one of png, svg, jpeg, webp
  },
  'responsive':'true'
}
fig.show(config=config)

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: statistics-all-projects

<br/>Statistics on all active and inactive projects
```

In [37]:
df_active = df_raw.copy()
# Filter out the inactive project for further analysis
df_active = df_active[(df_active["project_active"] == True)]
# Ciruated Lists are no classical open source projects and are not included into the analysis
df_active = df_active[(df_active["topic"] != "Curated Lists")]
# At the time of the data processing just one project was active in this topic.
df_active = df_active[(df_active["topic"] != "Production and Industry")]

# Filter out the projects not on the GitHub platform
df_active = df_active[(df_active["platform"] == "github")]
df_active["project_name"] = df_active["project_name"].replace(
    {
        "A Global Inventory of Commerical-, Industrial-, and Utility-Scale Photovoltaic Solar Generating Units": "A Global Inventory of Photovoltaic"
    }
)
df_active["project_name"] = df_active["project_name"].replace(
    {
        "Asset-level Transition Risk in the Global Coal, Oil, and Gas Supply Chains": "Global Coal, Oil, and Gas Supply Chains"
    }
)

df_active["project_name"] = df_active["project_name"].replace(
    {
        "The REgional Model of INvestments and Development": "REMIND"
    }
)

df_active["project_name"] = df_active["project_name"].replace(
    {
        "Hierarchical Engine for Large-scale Infrastructure Co-Simulation": "HELICS"
    }
)

df_active["project_name"] = df_active["project_name"].replace(
    {
        "Grid Singularity Energy Exchange Engine (D3A)": "Grid Singularity Energy Exchange"
    }
)

df_active["project_name"] = df_active["project_name"].replace(
    {
        "Integrated Valuation of Ecosystem Services and Tradeoffs": "InVEST"
    }
)



def text_to_link(project_name, git_url):
    return '<a href="' + git_url + '" target="_blank" style = "color: black">' + project_name + "</a>"

def text_to_bolt(topic):
    return "<b>" + topic + "</b>"

df_active["project_name"] = df_active.apply(
    lambda x: text_to_link(x.project_name, x.git_url), axis=1
)

In [38]:
## Hack field content into dataset


def topic_to_field(topic):
    if topic in (
        "Photovoltaics and Solar Energy",
        "Wind Energy",
        "Hydro Energy",
        "Geothermal Energy",
        "Bioenergy",
    ):
        field = "Renewable Energy"
    elif topic in ("Battery", "Hydrogen"):
        field = "Energy Storage"
    elif topic in (
        "Energy Monitoring and Management",
        "Energy Modeling and Optimization",
        "Energy Distribution and Grids",
        "Energy System Data Access",
    ):
        field = "Energy Systems"
    elif topic in (
        "Buildings and Heating",
        "Mobility and Transportation",
        "Production and Industry",
        "Computation and Communication",
    ):
        field = "Consumption"
    elif topic in (
        "Carbon Intensity and Accounting",
        "Carbon Offsets and Trading",
        "Carbon Credits and Capture",
        "Carbon Capture",
        "Emission Observation and Modeling",
    ):
        field = "Emissions"
    elif topic in ("Life Cycle Assessment", 
                "Circular Economy and Waste"):
        field = "Industrial Ecology"
    elif topic in ("Biosphere", "Cryosphere", "Hydrosphere", "Atmosphere"):
        field = "Earth Systems"

    elif topic in ("Biodiversity and Species Distribution",
                   "Conservation and Restoration",
                   "Forest Observation and Management",
                   "Plants and Vegetation",
                   "Biomass",
                   "Wildfire",
                   "Marine Life and Fishery",
                   "Terrestrial Animals", 
                   ):
        field = "Biosphere"
    
    elif topic in ("Sea Ice",
                   "Glacier and Ice Sheets",
                   "Snow and Permafrost"):
        field = "Cryosphere"
    
    elif topic in ("Freshwater and Hydrology",
                   "Ocean Circulation Models",
                   "Waves and Currents",
                   "Ocean Carbon and Temperature",
                   "Coastal and Reefs",
                   "Ocean and Hydrology Data Access",
                   "Ocean Data Processing and Access"):
        field = "Hydrosphere"
    
    elif topic in ("Atmospheric Composition and Dynamics",
                   "Atmospheric Dispersion and Transport",
                   "Atmospheric Chemistry and Aerosol",
                   "Meteorological Observation and Forecast",
                   "Radiative Transfer"):
        field = "Atmosphere"

    elif topic in ("Earth and Climate Modeling",
                   "Climate Data Standards",
                   "Climate Data Visualization and Access",
                   "Climate Data Processing and Analysis",
                   "Climate Downscaling",
                   "Natural Hazard and Storm",
                   "Integrated Assessment and Climate Policy"):
        field = "Climate and Earth"    

    elif topic in (
        "Air Quality",
        "Water Supply",
        "Soil and Land",
        "Agriculture and Nutrition",
    ):
        field = "Natural Resources"
    elif topic in (
        "Sustainable Development Goals",
        "Sustainable Investment",
        "Environmental Satellites",
        "Knowledge Platforms",
        "Data Catalogs and Interfaces",
        "Taxonomy and Ontology",
        "Curated Lists",
    ):
        field = "Sustainable Development"
    else:
        print(topic)
        raise ValueError("Topic not within fields")
    return field


df_active["field"] = df_active["topic"].apply(topic_to_field)
df_active["field_bolt"] = df_active["field"].apply(text_to_bolt)
df_active["topic_bolt"] = df_active["topic"].apply(text_to_bolt)

In [39]:
# Each project is ranked according to different indicators in the categories of community, activity and size.
# A value of 1 represents the highest rank and 0 the lowest.
# The individual values are summed up within the categories to create the scores for the different categories.
df_active["activity"] = (
    df_active["total_commits_last_year"].rank(pct=True)
    + df_active["issues_closed_last_year"].rank(pct=True)
    + df_active["days_until_last_issue_closed"].rank(pct=True)
    + df_active["last_released_date"].rank(pct=True, na_option="top")
) / 4

df_active["community"] = (
    df_active["contributors"].rank(pct=True)
    + df_active["development_distribution_score"].rank(pct=True)
    + df_active["reviews_per_pr"].rank(pct=True)
) / 3

df_active["size"] = (
    df_active["total_number_of_commits"].rank(pct=True)
    + df_active["contributors"].rank(pct=True)
    + df_active["closed_issues"].rank(pct=True)
    + df_active["closed_pullrequests"].rank(pct=True)
) / 4

# The scores are summed up and normalised so that 1 represents the largest total score.
df_active["total_score"] = (
    df_active["activity"] / df_active["activity"].max()
    + df_active["community"] / df_active["community"].max()
    + df_active["size"] / df_active["size"].max()
) / 3

In [40]:
# Save the dataset with the scores
df_active_path = "../csv/project_analysis.csv"
df_active.to_csv(df_active_path)

## The Open Source Sustainability Ecosystem

Projects are grouped into fields based on their primary topic of focus. While the boundaries often overlap, these fields help to paint a broad landscape and can provide insight into the ecosystem health and complexity of fields relative to each other. The following sunburst diagram shows the relationship between fields, topics, and projects. The colour represents the {ref}`dds_chapter`.

In [41]:
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
from opensustainTemplate import *

df_active = pd.read_csv("../csv/project_analysis.csv")

color_continuous_scale = px.colors.make_colorscale(["#30b1ce","#ea7070"])
#color_continuous_scale = [[0, "#3b52ff"], [0.5, "#F6157"], [1, "#169485"]]

fig = px.sunburst(
    df_active.assign(
        hole="<b>The Open Source <br> Sustainability Ecosystem</b>"
    ),
    path=["hole", "field_bolt", "topic_bolt", "project_name"],
    maxdepth=3,
    color="development_distribution_score",
    custom_data=["oneliner", "topic", "git_url"],
    color_continuous_scale=px.colors.sequential.Aggrnyl_r,
    #color_continuous_scale=color_continuous_scale,
    #color_continuous_scale=color_continuous_scale, # Diverging colors
    #color_continuous_midpoint=df_active['development_distribution_score'].median(), # Diverging colors
)

fig.update_layout(
    # title="Ecosystem overview",
    coloraxis_colorbar=dict(title='<b> Development Distribution Score </b>',
    orientation='h',
    y=-0.15,
    x=0.5
    ),
    height=1200,
    # width=1000,
    title_x=0.5,
    font_size=18,
    dragmode=False,
    margin=dict(l=2, r=2, b=0, t=10),
    title_font_family="Open Sans",
    font_family="Open Sans",
    font_color="black"
)
fig.update_coloraxes(colorbar_title_side='top')
# animated transitions are currently not implemented when uniformtext is used
fig.update_traces(
    insidetextorientation="radial",
    # texttemplate = '%{label}<br>%{percentRoot:.0%}',
    marker=dict(line=dict(color="#000000", width=2)),
    hovertemplate="<br>".join(
        [   "<b>%{label}</b>",
            "%{customdata[1]}",
            "%{customdata[0]}",
            "%{customdata[2]}",
        ]
    ),
)

config = {'responsive': True, 
            'toImageButtonOptions':{
                # 'width': 2000,
                # 'height': 2000,
                'format': 'svg',
                'filename': 'The_Open_Source_Sustainability_Ecosystem'}}

fig.show(config=config)

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: projects-within-sectors
\- All studied projects grouped into the corresponding fields and topics
```

`````{admonition} Tip
:class: tip
The above plot is fully interactive. Drill into fields, topics, and projects with a click! The project name links to the project's repository.
`````