#### Stack tests for Pypi 

In this notebook, we will look at how the model behaves when it is replaced by different set of packages. We will explore results of two models

1. Model Trained with Transitives
2. Model Trained without Transitives

In [23]:
# Let's load the model and the relevant dictionaries

import pickle
import json

with open('HPF_model.pkl', 'rb') as f:
    model = pickle.load(f)

In [2]:
with open('manifest-to-id.pickle', 'rb') as f:
    manifest_to_id_dict = pickle.load(f)

In [3]:
with open('package-to-id-dict.json', 'r') as f:
    package_to_id_dict = json.load(f)

In [4]:
with open('id-to-package-dict.json', 'r') as f:
    id_to_package_dict = json.load(f)

#### Experiment Set 1 - Model With Transitives

We will run the experiment for a data science persona. We will use stacks which have tensorflow, numpy, scipy, keras in them.

In [17]:
# Let's get started
count = 0
l = []
for item in manifest_to_id_dict.items():
    if 'tensorflow' in item[0] and 'scipy' in item[0] and 'numpy' in item[0] and 'keras' in item[0]:
        count+=1
        l.append(item[0])
print(count)

188


As we can see, there are 188 users which have the combination

In [19]:
# Let's see which users have that
x = l[:10]
print(x)

[frozenset({'atari-py', 'codacy-coverage', 'pyopengl', 'pillow', 'keras', 'numpy', 'h5py', 'tensorflow', 'pytest-xdist', 'scipy', 'glances', 'pandas', 'mem-top', 'pytest-cov', 'gym', 'six', 'matplotlib', 'seaborn'}), frozenset({'pillow', 'keras', 'numpy', 'bleach', 'jupyter', 'tensorflow', 'jupyter-tensorboard', 'numexpr', 'scipy', 'nltk', 'nbdime', 'jupyter-contrib-nbextensions', 'pandas', 'scikit-learn', 'imageio', 'urlextract', 'matplotlib', 'scikit-image'}), frozenset({'h5py', 'scipy', 'html5lib', 'futures', 'graphviz', 'pydot', 'keras', 'mock', 'bleach', 'tensorflow', 'pyparsing', 'pbr', 'markdown', 'protobuf', 'werkzeug', 'enum34', 'tensorflow-tensorboard', 'funcsigs', 'pyyaml', 'six', 'numpy', 'backports-weakref'}), frozenset({'packaging', 'jinja2', 'h5py', 'markupsafe', 'itsdangerous', 'scipy', 'click', 'keras', 'mock', 'tensorflow', 'nltk', 'pyparsing', 'theano', 'pbr', 'protobuf', 'flask', 'werkzeug', 'pyyaml', 'funcsigs', 'six', 'tqdm', 'numpy', 'gunicorn', 'appdirs'}), froz

In [14]:
def map_input_to_package_ids(input_stack):
    package_id_list = list()
    for package in input_stack:
        package_id = package_to_id_dict.get(package)
        if package_id is not None:
            package_id_list.append(package_id)
    return package_id_list

In [15]:
def get_packages_from_id(package_ids):
    package_list = list()
    for i in package_ids:
        package = id_to_package_dict.get(str(i))
        package_list.append(package)
    return package_list

In [33]:
# Let's get top 10 recommendations for our 10 users

for stack in x:
    # First get the id for the stack
    stack_id = manifest_to_id_dict.get(stack)
    print("Stack is: ", stack)
    recommendations = model.topN(user=stack_id, n=10)
    print("========================================")
    print("Recommendations are: ", set(get_packages_from_id(recommendations)) - set(stack))
    print("========================================")

Stack is:  frozenset({'atari-py', 'codacy-coverage', 'pyopengl', 'pillow', 'keras', 'numpy', 'h5py', 'tensorflow', 'pytest-xdist', 'scipy', 'glances', 'pandas', 'mem-top', 'pytest-cov', 'gym', 'six', 'matplotlib', 'seaborn'})
Recommendations are:  {'python-dateutil', 'nltk', 'scikit-learn', 'cython', 'pyparsing', 'pytest', 'pytz'}
Stack is:  frozenset({'pillow', 'keras', 'numpy', 'bleach', 'jupyter', 'tensorflow', 'jupyter-tensorboard', 'numexpr', 'scipy', 'nltk', 'nbdime', 'jupyter-contrib-nbextensions', 'pandas', 'scikit-learn', 'imageio', 'urlextract', 'matplotlib', 'scikit-image'})
Recommendations are:  {'python-dateutil', 'tqdm', 'h5py', 'cython', 'pyparsing', 'six', 'networkx', 'pytz'}
Stack is:  frozenset({'h5py', 'scipy', 'html5lib', 'futures', 'graphviz', 'pydot', 'keras', 'mock', 'bleach', 'tensorflow', 'pyparsing', 'pbr', 'markdown', 'protobuf', 'werkzeug', 'enum34', 'tensorflow-tensorboard', 'funcsigs', 'pyyaml', 'six', 'numpy', 'backports-weakref'})
Recommendations are:  {

As we can see, for most of the stacks the recommendations look very similar (which is good to an extent) but also very generic. Now, also if you observe, some packages like python-dateutil are appearing because of their popularity.

#### Experiment Set 2 - Model Without Transitives

We will run the experiment for a data science persona. We will use stacks which have tensorflow, keras in them.

In [34]:
# Let's load the model and relevant dictionaries

with open('HPF_model_without_trans.pkl', 'rb') as f:
    model = pickle.load(f)

with open('manifest-to-id-without-trans.pickle', 'rb') as f:
    manifest_to_id_dict = pickle.load(f)

with open('package-to-id-dict-without-trans.json', 'r') as f:
    package_to_id_dict = json.load(f)

with open('id-to-package-dict-without-trans.json', 'r') as f:
    id_to_package_dict = json.load(f)

In [36]:
# Let's get started
count = 0
l = []
for item in manifest_to_id_dict.items():
    if 'tensorflow' in item[0] and 'keras' in item[0]:
        count+=1
        l.append(item[0])
print(count)

248


As we can see, there are 248 users which have the combination

In [37]:
# Let's see which users have that
x = l[:10]
print(x)

[frozenset({'tensorflow', 'lxml', 'keras'}), frozenset({'tensorflow', 'midi', 'keras'}), frozenset({'hdfs3', 'python-resize-image', 'opencv-python', 'tqdm', 'keras', 'tensorflow', 'docopt', 'logger', 'scikit-image'}), frozenset({'dill', 'keras', 'tensorflow', 'scikit-learn', 'matplotlib'}), frozenset({'pymongo', 'pip', 'packaging', 'keras', 'transforms3d', 'tensorflow', 'pykitti', 'unrealcv', 'xxhash'}), frozenset({'sgf', 'tqdm', 'keras', 'pygame', 'tensorflow', 'scikit-learn', 'theano'}), frozenset({'sphinx-gallery', 'pillow', 'ipykernel', 'keras', 'nbsphinx', 'tensorflow', 'scikit-learn', 'cython'}), frozenset({'tensorflow', 'pillow', 'dill', 'keras'}), frozenset({'olefile', 'singledispatch', 'certifi', 'html5lib', 'pytz', 'backports-shutil-get-terminal-size', 'subprocess32', 'keras', 'tensorflow', 'pyparsing', 'simplegeneric', 'theano', 'scandir', 'pbr', 'pillow', 'jupyter', 'jupyter-console', 'werkzeug', 'moviepy', 'imageio', 'matplotlib', 'backports-abc', 'tqdm', 'cycler', 'pathli

In [38]:
# Let's get top 10 recommendations for our 10 users

for stack in x:
    # First get the id for the stack
    stack_id = manifest_to_id_dict.get(stack)
    print("Stack is: ", stack)
    recommendations = model.topN(user=stack_id, n=10)
    print("========================================")
    print("Recommendations are: ", set(get_packages_from_id(recommendations)) - set(stack))
    print("========================================")

Stack is:  frozenset({'tensorflow', 'lxml', 'keras'})
Recommendations are:  {'requests', 'django', 'python-dateutil', 'fabric', 'flask', 'scipy', 'docutils', 'future', 'gunicorn', 'wsgiref'}
Stack is:  frozenset({'tensorflow', 'midi', 'keras'})
Recommendations are:  {'requests', 'django', 'tqdm', 'flask', 'colorama', 'docutils', 'botocore', 'gunicorn', 'virtualenv', 'networkx'}
Stack is:  frozenset({'hdfs3', 'python-resize-image', 'opencv-python', 'tqdm', 'keras', 'tensorflow', 'docopt', 'logger', 'scikit-image'})
Recommendations are:  {'requests', 'pygments', 'python-dateutil', 'docutils', 'pyyaml', 'gunicorn', 'psycopg2', 'virtualenv', 'pytz', 'wheel'}
Stack is:  frozenset({'dill', 'keras', 'tensorflow', 'scikit-learn', 'matplotlib'})
Recommendations are:  {'requests', 'tqdm', 'networkx', 'colorama', 'docutils', 'pyparsing', 'virtualenv', 'pytest', 'tornado', 'nose'}
Stack is:  frozenset({'pymongo', 'pip', 'packaging', 'keras', 'transforms3d', 'tensorflow', 'pykitti', 'unrealcv', 'xx

In my observation, after removing the transitives from training, the recommendations are becoming more irrelevant. 