# Geography of organisations

A coherence in open source development and economic strength is visible, whereby the stronger the economy the higher the number of open source communities. Analysis of the geographic distribution of organisations behind OSS for sustainability projects shows an overwhelming majority (64%) place in Europe and North America. 28% of the projects are considered global as no geographical affiliation could be identified. 

In [17]:
import dateparser
import datetime
import handcalcs.render
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
import pycountry
from pycountry_convert import (
    country_alpha2_to_continent_code,
    country_alpha3_to_country_alpha2,
)
from opensustain_template import *

In [18]:
# Clean up the dataset
def name_to_iso3(x):
    """Perform a fuzzy search for UK-like strings
    Arguments:
        x - a string with a country name
        
    Outputs: 
        A string with ISO3 name standard for the UK
        
    """
    
    if x == "UK":
        x = "United Kingdom"
    try:
        iso3 = pycountry.countries.search_fuzzy(x)[0].alpha_3
    except:
        iso3 = ""
    return iso3

def alpha3_to_alpha2(x):
    """Convert country code ISO 3166-1 alpha-3 to country code ISO 3166-1 alpha-2 .
    Arguments:
        x - a string with a country name following ISO 3166-1 alpha-3 standard
        
    Outputs: 
        A string with a country name following country code ISO 3166-1 alpha-2
        
    """
    
    try:
        alpha_2 = country_alpha3_to_country_alpha2(x)
    except:
        alpha_2 = ""
    return alpha_2


def alpha2_to_continent(x):
    """Convert country code ISO 3166-1 alpha-2 to continent name
    Arguments:
        x - a string with a country name following ISO 3166-1 alpha-2 standard
        
    Outputs: 
        A string with a continent name
        
    """
    
    try:
        continent = country_alpha2_to_continent_code(x)
    except:
        continent = ""
    return continent


def upper_string(lower_string):
    """Apply title format
    Arguments:
        lower_string - a string 
    Outputs: 
        A string with a title format
        
    """
    
    return lower_string.title()

def calc_age(start_date):
    """Calculate age in years between now and start_date
    Arguments:
        start_date - a date
    Outputs: 
        A float with number of years between now and start_date
        
    """
    return (datetime.datetime.now() - dateparser.parse(start_date, settings={'TIMEZONE': 'CEST'})).days/365

def count_strings(comma_seperated_string):
    """Count number of delimiters (commas) in a string 
    Arguments:
        comma_seperated_string - a string containing commas
    Outputs: 
        A number (int) of commas found in comma_seperated_string
        
    """
    
    if type(comma_seperated_string) == str:
        return comma_seperated_string.count(",")
    else:
        return 0

In [19]:
df_organizations = pd.read_csv("../csv/github_organizations.csv")
df_organizations["ISO_3"] = df_organizations["location_country"].apply(name_to_iso3)
df_organizations["ISO_3_alpha2"] = df_organizations["ISO_3"].apply(alpha3_to_alpha2)
df_organizations["continent"] = df_organizations["ISO_3_alpha2"].apply(alpha2_to_continent)

In [20]:
continent_his = df_organizations["continent"].value_counts().to_frame().rename_axis("continent_name")
continent_his.rename(index={"EU": "Europe", "NA": "North America", "": "Global", "OC":"Oceania", "AS":"Asia", "SA":"South America", "AF":"Africa"},inplace=True)
fig = px.pie(continent_his.reset_index(), values="continent", names="continent_name", color_discrete_sequence=color_discrete_sequence, hole=0.2)

fig.update_layout(title="Distribution of Organizations between Continents", font_size=16, showlegend=False, hovermode=False)
fig.update_traces(textposition='outside', textinfo='label+percent', marker=dict(line=dict(color='#000000', width=2)))
fig['layout'].update(margin=dict(l=0,r=0,b=0,t=40))

fig.show()

In [21]:
## https://octoverse.github.com/
values = {31.5,31.2,27.3,5.9,2.3,1.7}
index_labels=['Oceania','Africa','South America','Europe','Asia','North America']
df_users_continent_cotoverse = pd.DataFrame(values,index=index_labels).reset_index()

However, if one compares the ratios with other statistics, clear differences become apparent. We use baseline data from “[The State of the Octoverse](https://octoverse.github.com/)”, a study which provides the Geographic distribution of millions active GitHub users. 

In [22]:
# similar pooling to the one in cell 53 could be done here for Africa + Oceania

fig = px.pie(df_users_continent_cotoverse, values=0, names="index", color_discrete_sequence=color_discrete_sequence, hole=0.2)

fig.update_layout(title="Distribution of all GitHub Users between Continents", font_size=16, showlegend=False, hovermode=False)
fig.update_traces(textposition='outside', textinfo='label+percent', marker=dict(line=dict(color='#000000', width=2)))
fig['layout'].update(margin=dict(l=0,r=0,b=0,t=40))

fig.show()

At the country level , the United States, Germany, France, and the United Kingdom stand out. Also, despite having more GitHub users than Europe overall, Asia accounts for only 1.9% of organisations working in OSS for sustainability. What’s more, the absence of Indian communities is noticeable despite the large number of open source developers present, no large organisations or projects could be identified. Despite the high number of scientific publications in general, there are very few organisations and projects from China. 

In [27]:
df_countries = (
    df_organizations["ISO_3"]
    .value_counts()
    .to_frame()
    .rename_axis("country")
    .reset_index()
)
df_countries = df_countries.rename(columns={"ISO_3": "counts"})

fig = px.choropleth(
    df_countries,
    locations="country",
    locationmode="ISO-3",
    color="counts",
    color_continuous_scale=color_continuous_scale
)

fig.update_layout(title="Global Distribution of Organisations",
                    coloraxis_colorbar=dict(
                    title="Organisations",
                    
                    ),
                    autosize=True
                    )

fig['layout'].update(margin=dict(l=0,r=0,b=0,t=40))

fig.show()