## IMPLEMENTATION OF INVENTOR SIMILARITY
Testing the implementation of obtaining inventor similarity using **node2vec**. <br >
Patent: id: 4136359 Apple Microcomputer

In [1]:
import neo4j 
import pandas as pd
from credentials import uri, user, pwd
from patent_neo4j.connection import Neo4jConnection
from patent_neo4j.utils import inventor_to_int

List of Important Patents

In [2]:
df = pd.read_csv("Data/important_patents.csv")
df.head(8)

Unnamed: 0,id,name
0,4136359,AppleMicrocomputer
1,4237224,MolecularChimeras
2,4371752,DigitalVoiceMailSystems
3,4399216,Co-transformationGeneCoding
4,4683195,PolymeraseChainReaction
5,5061620,StemCell
6,5108388,LaserSurgeryMethod
7,6285999,PageRank


Set up Neo4j connections and Query Coinventors

In [3]:
conn = Neo4jConnection(uri, user, pwd)
result = conn.query_coinventors(root=df.id[0])

Create file compatible for node2vec

In [4]:
file_name = "./node2vec/graph/" + df.name[0] + ".edgelist"
file_emb = "./node2vec/emb/" + df.name[0] + ".emb"
file_map = "./node2vec/map/" + df.name[0] + ".map"
coinventors = inventor_to_int(result, write=True, file=file_name, mapping=file_map)

Runs node2vec Implementation

In [None]:
import os

In [5]:
command = f'./node2vec/./node2vec -i:{file_name} -o:{file_emb} -l:20 -r:10 -k:10 -e:2 -p:0.1 -d:24'
ret = os.system(command)

if ret != 0:
    print("Error in node2vec!!!")

0

In [6]:
# Mapping for coinventors
coinventor_mapping = coinventors[1]

Sampling 25 Random Inventors

In [None]:
import random

In [7]:
rand_inventors = random.sample(list(coinventor_mapping.keys()), 25)

### Constructing Dataset for Analysis
This section is to construct a dataset for analysis of using similarity score <br>
- **for each** sampled inventor
    - get **related** inventors
    - obtain the **shortest path** to each inventor
    - **for each** coinventor pair
        - **calculate similarity**

In [8]:
from patent_neo4j.analysis import related_inventors_sp, similarity_score, convert_np

In [9]:
df = pd.DataFrame(columns = ['inventor', 'related', 'hops','sim'])
node_emb = convert_np(file_emb)
for inventor in rand_inventors:
    related_inventors = conn.query_related_inventors(inventor, max_depth=3)
    sp = related_inventors_sp(inventor,related_inventors)
    sim = []
    for index,row in sp.iterrows():
        sim.append(similarity_score(coinventor_mapping, node_emb, row.inventor, row.related))
    sp['sim'] = sim
    df = df.append(sp)

In [10]:
df.dropna(subset=['sim']).to_csv("inventor_sim.csv")