## Ideas for visualisation :

* Cooperation graph (find which unis have common projects)
* "Age of repos"

## TODO

* pins with link to users


# Interactive geographic visualisation of Swiss Github users

One of the most interesting visualisations we can do if to situate swiss github users with respect to their geographic features. Our goal here is to see if we can visualize some interesting patterns that might have arisen intuitively while thinking about the community of Swiss users. 

One example of this intuition would be that users are concentrated around universities, most notably EPFL and ETHZ. Other interesting geographic divisions to study could be the Rostigraben or differences between cantons.

## Geodata pre-processing

The first step is to process our data. This includes connecting to our database of course, but also extracting relevant features and statistics, geocoding users.

In [141]:
# Include ALL the things

# Pretty plots
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_context('notebook')
%matplotlib inline

# Connecting to DB
from utils import get_mongo_db

# Requesting stuff
import requests

# Data handling
import pandas as pd
import itertools
import pickle
import numpy as np
import re

# Map drawing
import folium

### Collecting the users

Connecting to the database, then fetching our dataset of users.

In [107]:
db = get_mongo_db()

Connecting to MongoDB at localhost:27017...


In [378]:
# Get users from DB
res = db.users.find({ 'in_ch': True, 'repositories': { '$ne': None } })

users = []

# For each user, find his repositories
for user in res:
    repos = db.repositories.find(
        { '_id': { '$in': user['repositories'] } }
    )
    
    geo = user.get('geocode', {})
    canton = geo.get('state', '')
    lat = geo.get('lat', '')
    lng = geo.get('lng', '')
    
    users.append({
        '_id': user['_id'],
        'login': user['login'],
        'name': user['name'],
        'location': user['location'],
        'repositories_docs': list(repos),
        'canton' : canton,
        'lat' : lat,
        'lng' : lng
    })
    
print("Our dataset includes {} users.".format(len(users)))

Our dataset includes 5976 users.


We now have **5976** users, with the following data fields :

* *_id* : a uid
* *login* : username
* *name* : the user's name
* *location* : the user's location
* *repositories_docs* : a list of the user's repositories

### Collecting statistics

For each user let us collect a few interesting statistics :

* *repo_count* : The number of repositories for each user
* *star_count* : The number of stars on all the user's repos
* *watchers_count* : The number of watchers on all the user's repos
* *forks_count* : The number of forks on all the user's repos


In [379]:
# Define the helper functions

def count_repos(user):
    return len(user['repositories_docs'])

def count_stat(user, key):
    count = 0
    for repo in user['repositories_docs']:
        count = count + repo[key]
    return count

def count_stars(user):
    return count_stat(user, 'stargazers_count')

def count_watchers(user):
    return count_stat(user, 'watchers_count')

def count_forks(user):
    return count_stat(user, 'forks_count')


In [382]:
users_data = [{ 'id': user['_id'],
                'location': user['location'],
                'name' : user['name'],
                'username' : user['login'],
                'canton' : user['canton'],
                'lat' : user['lat'],
                'lng' : user['lng'],
                'repo_count' : count_repos(user),
                'star_count' : count_stars(user),
                'watchers_count' : count_watchers(user),
                'forks_count' : count_forks(user),
                'users_count' : 1
              } 
              for user in users]

users_df = pd.DataFrame(users_data)

In [647]:
users_df.sample(10)

Unnamed: 0,canton,forks_count,id,lat,lng,location,name,repo_count,star_count,username,users_count,watchers_count
2814,,0,1446968,46.8182,8.22751,Switzerland,David Blum,1,0,dblooom,1,0
791,,2,5999628,46.8182,8.22751,Switzerland,Yves Buschor,14,4,yvesbuschor,1,4
4253,Zürich,30,608386,47.3686,8.54044,Zürich,David Aerne,25,109,meodai,1,109
1925,,11,1421564,46.8182,8.22751,CH,CoinSpace Bitcoin Wallet,17,12,CoinSpace,1,12
2547,,1,612594,46.8182,8.22751,Switzerland,Cem Sever,7,2,CemSever,1,2
2613,,0,1251942,46.8182,8.22751,Switzerland,Romain Maffina,5,3,cidzoo,1,3
2066,BE,0,608405,46.948,7.44745,Bern,Admir Serifi,78,1,adoweb,1,1
290,,10,4719,46.8182,8.22751,Switzerland,Thomas Maurer,15,39,tma,1,39
978,,0,125339,46.8182,8.22751,Switzerland,,10,2,jkoenig,1,2
4431,,1,1322627,46.7986,8.23197,Switzerland,Naturwerk,5,6,naturwerk,1,6


In [650]:
users_df.sort_values(by='star_count', ascending=False)

Unnamed: 0,canton,forks_count,id,lat,lng,location,name,repo_count,star_count,username,users_count,watchers_count
0,,18807,2022803,46.8182,8.22751,Switzerland,victor felder,105,76131,vhf,1,76131
2,,2016,51388,46.8182,8.22751,Switzerland,Nicolas Seriot,58,11641,nst,1,11641
6,ZH,617,293536,47.3769,8.54169,"Zurich, Switzerland",Jonas Wagner,45,10543,jwagner,1,10543
25,ZH,1557,2203704,47.3769,8.54169,"Zurich, Switzerland",Gion Kunz,73,9102,gionkunz,1,9102
3657,Wien,1358,3076177,48.2084,16.3725,"Vienna, Austria",Jan Paepke,11,7952,janpaepke,1,7952
1,ZH,1262,183678,47.3769,8.54169,"Zürich, Zurich, Switzerland",Jordi Boggiano,174,7430,Seldaek,1,7430
2894,,2949,117904,46.8182,8.22751,Switzerland,Divio,97,6529,divio,1,6529
3,FR,878,51363,46.8065,7.16197,"Fribourg, Switzerland",Cédric Luthi,136,6469,0xced,1,6469
2489,NY,720,245270,40.7128,-74.0059,New York,Carbo Kuo,37,5841,BYVoid,1,5841
51,,452,748594,46.8182,8.22751,Switzerland,Dave Halter,17,5053,davidhalter,1,5053


*tmp*
### Guessing locations

*We'd like to localize our users in Switzerland in order to draw the relevant maps later on. As the `location` data is relatively uniform, we'll first try using some heuristics to deduce the majority of the needed locations. The easiest is using common city names to deduce the canton. A simple match-and-map system should work well here.*

In [404]:
#with_canton = users_df.copy()
# Remove impossible to resolve values
#with_canton = with_canton[with_canton['location'] != 'Switzerland']
#with_canton = with_canton[with_canton['location'] != 'Schweiz']
#with_canton = with_canton[with_canton['location'] != 'suisse']
#with_canton = with_canton[with_canton['location'] != 'CH']
#with_canton = with_canton[with_canton['location'] != '']

word_to_canton = {
    'bern': 'BE',
    'lausanne': 'VD',
    'genève': 'GE',
    'geneva': 'GE',
    'luzern': 'LU',
    'zürich': 'ZH',
    'zurich': 'ZH',
    'zuerich': 'ZH',
    'lugano': 'TI',
    'basel': 'BS',
    'vaud': 'VD',
    'fribourg': 'FR',
    'davos': 'GR',
    'sagw': 'BE',
    'st. gallen' : 'SG'
}

cantons = [
    'ZH','BE','LU','UR','SZ','OW','NW','GL','ZG','FR','SO','BS','BL',
    'SH','AR','AI','SG','GR','AG','TG','TI','VD','VS','NE','GE','JU'
]

# Tries to guess the canton by seeing if the given text
# contains a word defined in the dict above.
def guess_canton(text, axis):
    if text is not None:
        for word in word_to_canton:
            if word in text.lower():
                return word_to_canton[word]
        
    return ''

#with_canton['Canton'] = with_canton['location'].apply(guess_canton, axis=1)

In [405]:
with_canton = users_df.copy()

n_found = len(with_canton[with_canton['canton'] != ''])
n_total = len(with_canton['canton'])
n_left = n_total - len(with_canton[with_canton['canton'] != ''])
n_frac = n_found / n_total * 100
print("We have found {} locations out of {} ({} %, {} missing)".format(n_found, n_total, round(n_frac,2), n_left))

We have found 5886 locations out of 5976 (98.49 %, 90 missing)


We currently have over two-thirds of locations from simple heuristics. We can now use some available APIs to fill in the gaps.

In [252]:
#wc = with_canton.copy()

In [406]:
params = {
    'username': 'ada_drs3',
    'country': 'CH',
    'type': 'json' }

def geoname_query(q):
    params['q'] = q
    return requests.get('http://api.geonames.org/search', params)

def process_geo(res, idx):
    canton = res['geonames'][0].get('adminCode1', '')
    if canton != '00':
        print('=> Found ' + canton)
        wc.set_value(i, 'Canton', canton)
    lat = res['geonames'][0].get('lat', '')
    lng = res['geonames'][0].get('lng', '')
    if lat is not '' and lng is not '':
        print('=> Found location lat : ' + lat + ', lng : ' + lng)
        wc.set_value(idx, 'lat', lat)
        wc.set_value(idx, 'lng', lng)

def search_by(col):
    hourly_idx_start = 4057; hourly_idx_end = len(wc);
    for i in range(hourly_idx_start, hourly_idx_end):
        print(i)
        row = wc.iloc[i]
        
        if row[col] is not None:
            res = geoname_query(row[col].strip())
            json = res.json()
            if json.get('status', None) is not None:
                if json['status'].get('value', 500) is 19:
                    print("Request limit reached, try again in 1 hour")
                    return None
            if json.get('totalResultsCount', 0) > 0:
                process_geo(json, i)

#search_by('location')

In [407]:
n_found = len(wc[wc['Canton'] != ''])
n_total = len(wc['Canton'])
n_left = n_total - len(wc[wc['Canton'] != ''])
n_frac = n_found / n_total * 100
print("We have found {} locations out of {} ({} %, {} missing)".format(n_found, n_total, round(n_frac,2), n_left))

We have found 3714 locations out of 5168 (71.87 %, 1454 missing)


In [408]:
grouped = with_canton.groupby(['canton']).sum().reset_index()
grouped = grouped.drop('id', axis=1)

In [411]:
missing_cantons = [canton for canton in cantons if canton not in grouped['canton'].values]

with_all_cantons = grouped.copy()

for canton in missing_cantons:
    data = {
        'canton': [canton],
        'star_count': [0],
        'repo_count' : [0],
        'watchers_count' : [0],
        'forks_count' : [0],
        'users_count' : [0]
    }
    df = pd.DataFrame.from_dict(data, orient='columns')
    
    with_all_cantons = with_all_cantons.append(df, ignore_index=True)

with_all_cantons = with_all_cantons[with_all_cantons['canton'].isin(cantons)].reset_index()
with_all_cantons

Unnamed: 0,index,canton,forks_count,repo_count,star_count,users_count,watchers_count
0,3,AG,538,603,3014,55,3014
1,4,AR,0,8,0,1,0
2,9,BE,3419,4951,10502,265,10502
3,10,BL,83,169,302,16,302
4,11,BS,2056,3125,5838,154,5838
5,33,FR,1422,961,8872,47,8872
6,36,GE,4722,6408,15706,308,15706
7,37,GL,11,129,34,1,34
8,39,GR,64,169,272,13,272
9,53,JU,2,78,12,5,12


In [412]:
pickle.dump(with_all_cantons[['canton', 'star_count']], open('stars_by_cantons_alt.p','wb'))
pickle.dump(with_all_cantons[['canton', 'repo_count']], open('repos_by_cantons_alt.p','wb'))
pickle.dump(with_all_cantons[['canton', 'users_count']], open('users_by_cantons_alt.p','wb'))

In [709]:
pickle.dump(wc[['username', 'lng', 'lat']], open('users_locations_alt.p','wb'))

In [711]:
pickle.dump(users_df, open('users_data.p', 'wb'))

### Maps

We'll re-use the topojson overlay that was given to us in HW03 to build a map over Swiss cantons. For this we will use a similar procedure to HW03 where we use folium to draw over the topojson.

In [414]:
# Map overlay
canton_overlay  = 'ch-cantons.topojson.json'

# Statistics
stars_by_canton = 'stars_by_cantons_alt.p'
repos_by_canton = 'repos_by_cantons_alt.p'
users_by_canton = 'users_by_canton_alt.p'
users_locations = 'users_locations_alt.p'

In [415]:
# Initialize the map to ~ the center of Switzerland
ch_center_loc = [46.92287,8.3829913] # Empirical "center" of Switzerland
map_ch = folium.Map(location=ch_center_loc, zoom_start=8)

# overlay the cantons onto the map
folium.TopoJson(open(canton_overlay),
                'objects.cantons',
                name='topojson'
               ).add_to(map_ch)

<folium.features.TopoJson at 0x144b7e7f0>

In [416]:
map_ch

### Map 1 : Number of repositories per canton

In [417]:
# Load the data
repos_by_cantons_data = pickle.load(open('repos_by_cantons.p','rb')).reset_index()

# Plot a Choropleth map
cols = ['Canton', 'repo_count'] # Columns of interest
color_map = 'YlOrRd'                 # Color Map used, Yellow for low values, Red for high
legend_str = 'Number of repositories'   # Legend title

map_ch.choropleth(
    geo_path=canton_overlay, 
    data=repos_by_cantons_data,
    columns=cols,
    topojson='objects.cantons',
    key_on='feature.id',
    fill_color=color_map,
    fill_opacity=0.7, 
    line_opacity=0.5,
    legend_name=legend_str,
    reset=True
)

map_ch



In [423]:
from folium.plugins import MarkerCluster

users = pickle.load(open('users_locations.p', 'rb'))
locations = list(zip(users['lat'], users['lng']))

ch_center_loc = [46.92287,8.3829913] # Empirical "center" of Switzerland
map_ch2 = folium.Map(location=ch_center_loc, zoom_start=8, tiles='OpenStreetMap')

# overlay the cantons onto the map
folium.TopoJson(open(canton_overlay),
                'objects.cantons',
                name='topojson'
               ).add_to(map_ch2)

map_ch2.add_child(MarkerCluster(locations=locations))

# Load the data
users_by_cantons_data = pickle.load(open('users_by_cantons_alt.p','rb')).reset_index()

In [425]:
# Plot a Choropleth map
cols = ['canton', 'users_count'] # Columns of interest
color_map = 'YlOrRd'                 # Color Map used, Yellow for low values, Red for high
legend_str = 'Number of users'   # Legend title

map_ch2.choropleth(
    geo_path=canton_overlay, 
    data=users_by_cantons_data,
    columns=cols,
    topojson='objects.cantons',
    key_on='feature.id',
    fill_color=color_map,
    fill_opacity=0.7, 
    line_opacity=0.5,
    legend_name=legend_str,
    reset=True)

map_ch2



### User Heatmap

We can also visualise our users on a heatmap, which assigns colors to the density of users in certain areas. Our original intuitions are confirmed, where we have major activity centers near Zurich, around Neuchâtel, and along the Lemanic arc. 

In [418]:
from folium.plugins import HeatMap

locations = list(zip(users['lat'], users['lng']))

ch_center_loc = [46.92287,8.3829913] # Empirical "center" of Switzerland
map_ch3 = folium.Map(location=ch_center_loc, zoom_start=8, tiles='stamentoner')

HeatMap(locations).add_to(map_ch3)

<folium.plugins.heat_map.HeatMap at 0x1495b96d8>

In [419]:
map_ch3

### Language usage

Let's take a look at the most used languages in Github Switzerland.

In [673]:
# Get users from DB
res = db.users.find({ 'in_ch': True, 'repositories': { '$ne': None } })

localized_repos = []

# For each user, find his repositories
for user in res:
    repos = db.repositories.find(
        { '_id': { '$in': user['repositories'] } }
    )
    
    geo = user.get('geocode', {})
    canton = geo.get('state', '')
    lat = geo.get('lat', None)
    lng = geo.get('lng', None)
    
    for repo in repos:
        localized_repos.append({
            'created_by' : user['login'],
            'project_name' : repo['full_name'],
            'url' : repo['clone_url'],
            'language' : repo['language'],
            'canton' : canton,
            'star_count' : repo['stargazers_count'],
            'lat' : lat,
            'lng' : lng
        })
    
print("Our dataset includes {} repos.".format(len(localized_repos)))

Our dataset includes 98862 repos.


In [674]:
localized_repos_df = pd.DataFrame(localized_repos)
localized_repos_df = localized_repos_df[localized_repos_df['canton'].isin(cantons)].reset_index()
localized_repos_df.sample(10)

Unnamed: 0,index,canton,created_by,language,lat,lng,project_name,star_count,url
16674,25236,BE,cstuder,C,46.947974,7.447447,cstuder/genderReader,14,https://github.com/cstuder/genderReader.git
14380,21530,GE,traylenator,Puppet,46.204391,6.143158,traylenator/puppet-module-nfs,0,https://github.com/traylenator/puppet-module-n...
23572,37565,ZH,SchumacherFM,Go,47.376887,8.541694,SchumacherFM/mailout,28,https://github.com/SchumacherFM/mailout.git
38530,84003,ZG,raskhadafi,Visual Basic,47.166167,8.515495,raskhadafi/DBScanalyzer,2,https://github.com/raskhadafi/DBScanalyzer.git
26459,41270,LU,10sun,Lua,47.050168,8.309307,10sun/torchnet,0,https://github.com/10sun/torchnet.git
43561,93039,ZH,tocco,Java,47.376887,8.541694,tocco/pegdown-doclet,0,https://github.com/tocco/pegdown-doclet.git
27438,42682,VD,aickley,VimL,46.519653,6.632273,aickley/dotfiles,0,https://github.com/aickley/dotfiles.git
29677,48338,ZH,becompany,TypeScript,47.376887,8.541694,becompany/angular2-rss-reader-tutorial,1,https://github.com/becompany/angular2-rss-read...
27931,45025,GE,AlexisTp,Java,46.204391,6.143158,AlexisTp/enhanced-pet-clinic,0,https://github.com/AlexisTp/enhanced-pet-clini...
35422,58111,VD,epfl-projects,TeX,46.519653,6.632273,epfl-projects/dis-project,0,https://github.com/epfl-projects/dis-project.git


In [675]:
def count_lang_occurences(g):
    count_lang = {}
    for idx, row in g.iterrows():
        if row['canton'] in count_lang:
            if row['language'] in count_lang[row['canton']]:
                count_lang[row['canton']][row['language']] += 1
            elif row['language'] is not None: 
                count_lang[row['canton']][row['language']] = 1
        else:
            count_lang[row['canton']] = {row['language'] : 1}
    return count_lang
            
count_lang = count_lang_occurences(localized_repos_df)

In [676]:
count_lang

{'AG': {'AGS Script': 1,
  'Arduino': 1,
  'Assembly': 2,
  'C': 17,
  'C#': 23,
  'C++': 14,
  'CSS': 18,
  'Clojure': 2,
  'CoffeeScript': 2,
  'Cuda': 1,
  'Elm': 1,
  'Emacs Lisp': 1,
  'Go': 22,
  'HTML': 11,
  'Haskell': 2,
  'Java': 100,
  'JavaScript': 155,
  'Julia': 3,
  'Lua': 1,
  'Objective-C': 11,
  'PHP': 61,
  'Perl': 9,
  'Puppet': 4,
  'Python': 11,
  'R': 5,
  'Ruby': 24,
  'Scala': 2,
  'Scheme': 1,
  'Shell': 19,
  'Swift': 5,
  'VimL': 2,
  'Visual Basic': 2,
  'XSLT': 1},
 'AR': {'HTML': 1, 'JavaScript': 3, 'PHP': 2, 'Shell': 1},
 'BE': {'ASP': 1,
  'ApacheConf': 4,
  'AppleScript': 1,
  'Arduino': 18,
  'Assembly': 5,
  'Batchfile': 7,
  'C': 95,
  'C#': 112,
  'C++': 122,
  'CMake': 1,
  'CSS': 142,
  'CartoCSS': 1,
  'Clojure': 7,
  'CoffeeScript': 20,
  'D': 1,
  'Dart': 1,
  'Elixir': 11,
  'Elm': 1,
  'Emacs Lisp': 52,
  'Erlang': 3,
  'F#': 2,
  'GDScript': 1,
  'GLSL': 1,
  'Go': 37,
  'Groff': 1,
  'Groovy': 8,
  'HTML': 122,
  'Haskell': 4,
  'Java': 46

In [677]:
count_lang_df = pd.DataFrame(count_lang, dtype='int').fillna(0)

In [678]:
count_lang_df

Unnamed: 0,AG,AR,BE,BL,BS,FR,GE,GL,GR,JU,...,SG,SH,SO,SZ,TG,TI,VD,VS,ZG,ZH
AGS Script,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Arduino,1.0,0.0,18.0,0.0,5.0,3.0,8.0,0.0,0.0,0.0,...,6.0,0.0,0.0,0.0,0.0,2.0,27.0,0.0,1.0,24.0
Assembly,2.0,0.0,5.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,5.0,0.0,0.0,7.0
C,17.0,0.0,95.0,5.0,70.0,28.0,296.0,0.0,10.0,7.0,...,11.0,0.0,7.0,2.0,9.0,20.0,358.0,11.0,44.0,623.0
C#,23.0,0.0,112.0,3.0,27.0,16.0,82.0,0.0,4.0,0.0,...,25.0,0.0,3.0,0.0,3.0,3.0,137.0,4.0,2.0,285.0
C++,14.0,0.0,122.0,2.0,115.0,82.0,556.0,0.0,7.0,5.0,...,11.0,0.0,2.0,2.0,1.0,53.0,372.0,45.0,21.0,706.0
CSS,18.0,0.0,142.0,3.0,97.0,22.0,175.0,0.0,3.0,1.0,...,30.0,1.0,1.0,2.0,6.0,23.0,215.0,10.0,13.0,450.0
CartoCSS,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Clojure,2.0,0.0,7.0,0.0,1.0,0.0,120.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,60.0,0.0,1.0,53.0
CoffeeScript,2.0,0.0,20.0,2.0,15.0,1.0,27.0,0.0,4.0,0.0,...,8.0,0.0,0.0,0.0,0.0,4.0,33.0,1.0,1.0,122.0


In [679]:
def get_most_popular_langs(df):
    pop_langs = {}
    for column in df:
        pop_langs[column] = df[column].idxmax()
    return pop_langs

pop_langs = get_most_popular_langs(count_lang_df)

In [680]:
pop_langs_df = pd.DataFrame(pop_langs, index={'Language'}).transpose()
pop_langs_df

Unnamed: 0,Language
AG,JavaScript
AR,JavaScript
BE,JavaScript
BL,PHP
BS,JavaScript
FR,Python
GE,Python
GL,Ruby
GR,Python
JU,JavaScript


In [681]:
mapping = {'JavaScript' : 1, 'Ruby' : 2, 'Python' : 3, 'PHP' : 4, 'Perl' : 5, 'Java' : 6}
pop_langs_df = pop_langs_df.replace({'Language': mapping}).reset_index().rename(columns={'index': 'Cantons'})
missing_cantons = [canton for canton in cantons if canton not in pop_langs_df['Cantons'].values]

pop_langs_df_complete = pop_langs_df.copy()

for canton in missing_cantons:
    data = {
        'Cantons': [canton],
        'Language' : [0]
    }
    df = pd.DataFrame.from_dict(data, orient='columns')
    
    pop_langs_df_complete = pop_langs_df_complete.append(df, ignore_index=True)

pop_langs_df_complete = pop_langs_df_complete[pop_langs_df_complete['Cantons'].isin(cantons)]

In [682]:
# Initialize the map to ~ the center of Switzerland
ch_center_loc = [46.92287, 8.3829913] # Empirical "center" of Switzerland
map_ch4 = folium.Map(location=ch_center_loc, zoom_start=8)

# overlay the cantons onto the map
folium.TopoJson(open(canton_overlay),
                'objects.cantons',
                name='topojson').add_to(map_ch4)

# Plot a Choropleth map
color_map = 'Spectral'

map_ch4.choropleth(
    geo_path=canton_overlay, 
    data=pop_langs_df_complete,
    columns = ['Cantons','Language'],
    topojson='objects.cantons',
    key_on='feature.id',
    fill_color=color_map,
    fill_opacity=0.7, 
    line_opacity=0.5)

map_ch4



> Legend :

* **0** : no data
* **1** : JavaScript 
* **2** : Ruby
* **3** : Python
* **4** : PHP
* **5** : Perl 
* **6** : Java

## Most popular Swiss repositories explorer


In [690]:
# Top 100 repositories in terms of stars
top_100_repos = localized_repos_df.sort_values(by='star_count', ascending=False)[0:100].reset_index()
top_100_repos.head(10)

Unnamed: 0,level_0,index,canton,created_by,language,lat,lng,project_name,star_count,url
0,487,650,ZH,jwagner,JavaScript,47.376887,8.541694,jwagner/smartcrop.js,9477,https://github.com/jwagner/smartcrop.js.git
1,1467,1737,ZH,gionkunz,JavaScript,47.376887,8.541694,gionkunz/chartist-js,8999,https://github.com/gionkunz/chartist-js.git
2,10,115,ZH,Seldaek,PHP,47.376887,8.541694,Seldaek/monolog,5601,https://github.com/Seldaek/monolog.git
3,2244,3111,ZH,adrai,JavaScript,47.376887,8.541694,adrai/flowchart.js,3535,https://github.com/adrai/flowchart.js.git
4,189,352,FR,0xced,Objective-C,46.806477,7.161972,0xced/iOS-Artwork-Extractor,2635,https://github.com/0xced/iOS-Artwork-Extractor...
5,9580,14088,AG,garnele007,Swift,47.387666,8.255429,garnele007/SwiftOCR,2133,https://github.com/garnele007/SwiftOCR.git
6,252,415,FR,0xced,Objective-C,46.806477,7.161972,0xced/XCDYouTubeKit,1952,https://github.com/0xced/XCDYouTubeKit.git
7,1189,1455,GE,tobie,Perl,46.204391,6.143158,tobie/ua-parser,1775,https://github.com/tobie/ua-parser.git
8,776,939,ZH,sustrik,C,47.376887,8.541694,sustrik/libmill,1714,https://github.com/sustrik/libmill.git
9,2975,4400,ZH,The-Compiler,Python,47.49882,8.723689,The-Compiler/qutebrowser,1668,https://github.com/The-Compiler/qutebrowser.git


In [697]:
def gh_popup(repo, rank):
    stars = repo['star_count']
    name = repo['project_name']
    url = repo['url']
    return "#" + str(rank) + " : " + name + " (" + str(stars) + " stars)"

In [708]:
import random

# Create the map
ch_center_loc = [46.92287, 8.3829913] # Empirical "center" of Switzerland
map_ch5 = folium.Map(location=ch_center_loc, zoom_start=8)


for r in range(100):
    folium.Marker(
        location=[top_100_repos.iloc[r]['lat']+random.uniform(-0.05, 0.05), top_100_repos.iloc[r]['lng']+random.uniform(-0.05, 0.05)],
        popup=gh_popup(top_100_repos.iloc[r], r),
        icon=folium.Icon(icon='star'),
    ).add_to(map_ch5)

map_ch5