# Users and Usage

<!-- 
- Users: Who the users are
- Usage: How are users are using OSS
- Value: What is the use value and impact
-->

Capturing the use of open source projects, and understanding its users and the value they provide to, and obtain from the ecosystem, presents important yet significant challenges. Furthermore, there is no standard for measuring the time and resources saved by "[standing on the shoulders of giants](https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants)" and to what degree the added height extended one's view or reach. In future studies, it will be essential to use the various metrics and methods we have created in a targeted way. For example, by identifying projects that have high [usage](./15_users-and-usage.ipynb), fast [growth](./09_growth.ipynb) and low [DDS](./14a_development-distribution-score.ipynb), we can conduct targeted interviews to pinpoint key users and usage patterns, and evaluate ways and means of supporting effective projects. Until this time, we have considered several proxy methods to help paint a picture of what is possible.

Since open source is often free-of-charge and open access, it is difficult to trace its final use. The people using these software products often act as evangelists, sharing OSS with other users or organisations who may find value in it and providing valuable feedback and expert knowledge in return. Much of the open source usage arises in integrating libraries or APIs as dependencies of other software projects. Importantly, this dependency on OSS will not be apparent to many users; especially in closed-source software, where the dependency on OSS is not always made evident. 

Project usage data from public software development and version control platforms is remains scarce. GitHub, unfortunately, offers little support in compiling accurate statistics. Additionally, statistics on package manager downloads are not universally available and must be obtained through the various platforms and their APIs. While this is technically possible, it was not feasible given the study's limited resources and timeframe. However, with the limited data obtained from the Python ecosystem, it was possible to identify individual projects with a high circulation but a low [DDS](./14a_development-distribution-score.ipynb) score. Here projects like [cfgrib](https://github.com/ecmwf/cfgrib), [sentinelhub-py](https://github.com/sentinel-hub/sentinelhub-py) or [Meteostat](https://github.com/meteostat/meteostat-python) stand out. Those projects widely used and depend highly on the goodwill of a single developer. The median DDS of 0.436 over the 50 most used Python projects indicates that the burden still lies on a few strong contributors leading the development.

The user community of major projects in Energy and Battery modelling, such as [PyBaMM](https://github.com/pybamm-team/PyBaMM) and [PyPSA](https://github.com/PyPSA/PyPSA), is split relatively evenly between academia and industry, with fewer users coming from NGOs and independent consultancies. In some cases, industry can drive explosive user growth. For example, over a five years period, pvlib-python saw thousands of downloads per month. This was driven primarily by several commercial firms who integrated the library into their software products, effectively distributing pvlib-python to their clients.



In [1]:
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px

In [2]:
def count_strings(comma_seperated_string):
    """Count number of delimiters (commas) in a string 
    Arguments:
        comma_seperated_string - a string containing commas
    Outputs: 
        A number (int) of commas found in comma_seperated_string
        
    """
    
    if type(comma_seperated_string) == str:
        return comma_seperated_string.count(",")
    else:
        return 0

In [3]:
# default plotting options
# Palette https://coolors.co/palette/0e7c7b-17bebb-ffc857-e9724c-c5283d
height = (800,)  # Added parameter
color_continuous_scale = px.colors.sequential.Aggrnyl[::-1]
marker_color = "#0E7C7B"
color_discrete_sequence = ["#0E7C7B", "#17BEBB", "#FFC857", "#E9724C", "#C5283D"]

# Register your theme as a named template
pio.templates["OpenSustain"] = go.layout.Template(
    layout=dict(
        margin=dict(
        b=0 #bottom margin
    ),
        font=dict(
            family="Open Sans",
            color="#040404",
            size=15,
        ),
        title_font_family="Open Sans",
        title_font_color="#040404",
    ),
)

# Combine your theme with plotly's default
pio.templates.default = "plotly+OpenSustain"

In [4]:
df_active = pd.read_csv("../csv/project_analysis.csv")

In [5]:
# KK: can topics be grouped in fewer categories? can DDS be bucketed into categories, e.g. 0.3>=, 0.3<=&<=0.6, 0.6>=? Do we need to show all three variables, projects, DDS and dependents? 

df_active["dependents_count"] = df_active["dependents_repos"].apply(count_strings)

most_dependent_projects = df_active.nlargest(50, "dependents_count")

# The API gives wrong numbers for this project:
most_dependent_projects = most_dependent_projects[most_dependent_projects["git_url"] != "https://github.com/Open-MSS/MSS.git"]

fig = px.bar(
    most_dependent_projects,
    x=most_dependent_projects["dependents_count"],
    y=most_dependent_projects["project_name"],
    orientation="h",
    custom_data=["oneliner","topic","git_url"],
    color=most_dependent_projects["development_distribution_score"],
    color_continuous_scale=color_continuous_scale
)

fig.update_layout(
    height=1200,  # Added parameter
    yaxis_title=None,
    xaxis_title="Dependents",
    title="Python Projects most frequently used as Dependencies",
    coloraxis_colorbar=dict(
    title="DDS",
    ),
    hoverlabel=dict(
    bgcolor="white"
)
)


fig.update_traces(
    hovertemplate="<br>".join([
        "Project Info: <b>%{customdata[0]}</b>",
        "Topic: <b>%{customdata[1]}</b>",
        "Git URL: <b>%{customdata[2]}</b>"
    ])
)
fig.add_layout_image(
    dict(
        source="logo_img",
        xref="paper", yref="paper",
        x=1, y=0,
        sizex=0.05, sizey=0.05,
        xanchor="right", yanchor="bottom"
    )
)
fig['layout']['yaxis']['autorange'] = "reversed"
fig.show()

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: python-dependencies

Python Projects most frequently used as dependencies
```

### Final use and impact

While open source business models, including the commonly used premium model, are unlikely to capture the total value that OSS contributes to society, recent research into the economic contributions of open source can provide a rough indication. A 2021 [study](https://digital-strategy.ec.europa.eu/en/library/study-about-impact-open-source-software-and-hardware-technological-independence-competitiveness-and) on the economic impact of open source software and hardware concluded that open source technologies injected **€65-95 billion** into the European economy. Open source significantly boosted small and mid-size enterprises – Europe's most important horizontal economic stakeholders - reflecting in the increased creation of more than 650 technology startups per year. While it is currently unknown to what extent open source has contributed to environmental sustainability, or what its potential impact is in monetary terms, we anticipate the figure to be several orders of magnitude greater.