# Conda Forge Dependency Graph

Conda forge keeps metadata about package dependencies in a JSON file on Github.  It's easy to download and manipulate this data.

## Download Conda Forge Graph

In [1]:
!wget https://github.com/regro/libcfgraph/raw/master/conda-forge.json

--2018-10-04 10:24:17--  https://github.com/regro/libcfgraph/raw/master/conda-forge.json
Resolving github.com (github.com)... 192.30.253.112, 192.30.253.113
Connecting to github.com (github.com)|192.30.253.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/regro/libcfgraph/master/conda-forge.json [following]
--2018-10-04 10:24:17--  https://raw.githubusercontent.com/regro/libcfgraph/master/conda-forge.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.208.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.208.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1581105 (1.5M) [text/plain]
Saving to: ‘conda-forge.json.1’


2018-10-04 10:24:17 (10.8 MB/s) - ‘conda-forge.json.1’ saved [1581105/1581105]



## Load data into Python

In [2]:
import json

with open('conda-forge.json') as f:
    data = json.load(f)

## Construct dependency graph

In [3]:
dependencies = {}
for x in data['nodes']:
    key = x['id']
    try:
        value = tuple(x['req']['elements'])
    except KeyError:
        value = ()
    dependencies[key] = value
    
dependencies['pandas']

('cython', 'numpy', 'pip', 'python', 'python-dateutil', 'pytz')

## Reverse dependency graph

In [4]:
from dask.core import reverse_dict
dependents = reverse_dict(dependencies)
list(dependents['pandas'])[:10]

['fbprophet',
 'altair',
 'jsontableschema-pandas',
 'pyseidon',
 'ogh',
 'trackpy',
 'reports',
 'erddapy',
 'ps2ff',
 'qgrid']

## Switch to Pandas

In [5]:
import pandas as pd
L = [(k, vv) for k, v in dependencies.items() for vv in v]
df = pd.DataFrame(L, columns=['downstream', 'upstream'])

df.head()

Unnamed: 0,downstream,upstream
0,ad3-cpp,cmake
1,ad3-cpp,eigen
2,ad3-cpp,toolchain
3,addict,python
4,addict,setuptools


In [6]:
dependencies = df.set_index('downstream', inplace=False)
dependencies.loc['pandas']

Unnamed: 0_level_0,upstream
downstream,Unnamed: 1_level_1
pandas,cython
pandas,numpy
pandas,pip
pandas,python
pandas,python-dateutil
pandas,pytz


In [7]:
dependents = df.set_index('upstream', inplace=False)
dependents.loc['pandas']

Unnamed: 0_level_0,downstream
upstream,Unnamed: 1_level_1
pandas,alpenglow
pandas,altair
pandas,aospy
pandas,axelrod
pandas,batman
pandas,bcolz
pandas,beakerx
pandas,biopandas
pandas,bkcharts
pandas,bootstrap_contrast


In [8]:
df.upstream.value_counts().nlargest(20)

python        2892
setuptools    1893
r-base        1092
pip            773
toolchain      728
numpy          705
gcc            518
six            447
libgcc         420
scipy          330
cmake          239
matplotlib     239
requests       229
pandas         214
pkg-config     188
cython         158
zlib           128
pyyaml         125
make           108
r-rcpp         104
Name: upstream, dtype: int64

In [9]:
df.downstream.value_counts().nlargest(20)

sage                     126
sagelib                   55
qgis                      45
r-essentials              37
bob                       33
r-userfriendlyscience     33
datalad                   31
doconce                   31
ncl                       31
steem                     29
octave                    29
caffe                     29
libgdal                   28
paraview                  25
r-sjstats                 25
cyclus-build-deps         25
mss                       25
hyperspy                  25
fenics                    24
datacube                  23
Name: downstream, dtype: int64