# import2vec - Playbook

Use this notebook to query the model for nearest neighboring libraries around some libraries you know. 

⚠️ The evaluation will use the vectors you have trained using [Training.ipynb](Training.ipynb), so please make sure the model is sufficiently large to actually have the libraries you are looking for and to be able to learn meaningful relationships. Alternatively, you can also download our pre-trained models from https://zenodo.org/record/2546488 and update the `vectors` variable to point to those vector models.

In [11]:
language = 'javascript'
dim = 100
vectors = '../datasets/{}/models/w2v_dim{}.txt.gz'.format(language, dim)

In [12]:
from gensim.models.keyedvectors import KeyedVectors

w2v = KeyedVectors.load_word2vec_format(vectors, binary=False)
vocab = w2v.vocab.keys()
print(len(vocab), "vectors loaded")

9303 vectors loaded


### Utility to find libraries in your dataset

In [13]:
# prints libraries in your model having 'name' as a substring
def find_library(name):
    result = []
    for l in vocab:
        if name in l:
            result.append(l)
    return result

In [19]:
find_library('sql')

['react-native-sqlite-storage',
 'mysql',
 'sqlite3',
 'sql-template-strings',
 'mysql2/promise',
 'mysql2',
 'sqlite',
 'promise-mysql',
 'alasql',
 'better-sqlite3',
 'think-model-mysql',
 'node-sqlparser',
 'sqlstring',
 'mssql',
 'sqlops']

### Nearest Neighbor Search

In [15]:
?? w2v.most_similar

[0;31mSignature:[0m
 [0mw2v[0m[0;34m.[0m[0mmost_similar[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mpositive[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnegative[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtopn[0m[0;34m=[0m[0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrestrict_vocab[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindexer[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
    [0;32mdef[0m [0mmost_similar[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mpositive[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mnegative[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mtopn[0m[0;34m=[0m[0;36m10[0m[0;34m,[0m [0mrestrict_vocab[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mindexer[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""Find the top-N most si

In [18]:
w2v.most_similar(['mysql'], topn=20)

[('pg', 0.6635434031486511),
 ('ioredis', 0.6613692045211792),
 ('body-parser', 0.6572729349136353),
 ('dotenv', 0.6570829749107361),
 ('express', 0.6560029983520508),
 ('jsonwebtoken', 0.6221519112586975),
 ('helmet', 0.6159390211105347),
 ('cors', 0.6155110597610474),
 ('sequelize', 0.6083593368530273),
 ('mongoose', 0.6004490852355957),
 ('log4js', 0.5931798815727234),
 ('aws-sdk', 0.5810474157333374),
 ('passport-github', 0.5806360244750977),
 ('cookie-parser', 0.5773148536682129),
 ('express-jwt', 0.5626721382141113),
 ('express-session', 0.5596650838851929),
 ('socket.io', 0.5591639280319214),
 ('redis', 0.5585757493972778),
 ('mongodb', 0.5527012944221497),
 ('discord.js', 0.5520840883255005)]

### Analogical Reasoning

In [17]:
# express : body-parser :: koa : ?
w2v.most_similar(['express-session', 'koa'],['express'])

[('koa-static', 0.7851258516311646),
 ('koa-router', 0.776435136795044),
 ('koa-bodyparser', 0.7712484002113342),
 ('koa-json', 0.6993788480758667),
 ('koa-body', 0.6988320350646973),
 ('koa-views', 0.6964592933654785),
 ('koa-logger', 0.683550238609314),
 ('koa-onerror', 0.652797281742096),
 ('@koa/cors', 0.6326575875282288),
 ('koa-session', 0.6273043751716614)]