# INVENTOR COMBINATIONS TO MEASURE INVENTOR SIMILARITY
Measuring the similarity of the inventors, with two distinct types:
1. **Interpatent** Inventor Similarity - Inventor Similarity between patents. Implementation here is between direct ancestor-descendant relationship 
2. **Intrapatent** Inventor Similarity - Similarity between inventors on the patent 

In [1]:
import neo4j 
import pandas as pd
from credentials import uri, user, pwd
from patent_neo4j.connection import Neo4jConnection
from patent_neo4j.analysis import get_direct_ancestor
from patent_neo4j.analysis import interpatent_inventor_combination
import itertools

Important Patents List

In [2]:
df = pd.read_csv("Data/important_patents.csv")
df.head(8)

Unnamed: 0,id,name
0,4136359,AppleMicrocomputer
1,4237224,MolecularChimeras
2,4371752,DigitalVoiceMailSystems
3,4399216,Co-transformationGeneCoding
4,4683195,PolymeraseChainReaction
5,5061620,StemCell
6,5108388,LaserSurgeryMethod
7,6285999,PageRank


## Inter-Patent Inventor Similarity
**Overall Algorithm**:
1. **Obtain direct ancestor-descendant** relationship
1. **Combine inventors** of the two patents with repeats allowed <br> 
    *("wozniak", "wozniak") would have a similarity score of 1*
1. Compute **average similarity**

Establishing Connection

In [3]:
conn = Neo4jConnection(uri, user, pwd)
result = conn.query_citation_tree(root=df.id[0])

In [4]:
result.head()

Unnamed: 0,id,date,country,claims,kind,lineage,similarity
0,6226017,2001-05-01,US,48,A,[4136359],[0.057604003697633736]
1,9940907,2018-04-10,US,20,B2,"[6226017, 4136359]","[0.13514360785484314, 0.057604003697633736]"
2,7705842,2010-04-27,US,16,B2,"[6226017, 4136359]","[0.1497403383255005, 0.057604003697633736]"
3,9001133,2015-04-07,US,22,B2,"[7705842, 6226017, 4136359]","[0.2110498994588852, 0.1497403383255005, 0.057..."
4,9153179,2015-10-06,US,11,B2,"[7705842, 6226017, 4136359]","[0.3195275664329529, 0.1497403383255005, 0.057..."


In [5]:
# Get Direct Lineage
direct_lineage = get_direct_ancestor(result)

In [6]:
direct_lineage.head()

Unnamed: 0,id,lineage,similarity,hops
0,6226017,4136359,0.057604,1
1,9940907,6226017,0.135144,2
2,7705842,6226017,0.14974,2
3,9001133,7705842,0.21105,3
4,9153179,7705842,0.319528,3


Obtain All Inventors from the Built Citation Tree

In [7]:
inventor_tree = conn.query_inventor_tree(root=df.id[0])

In [19]:
inventor_tree.head()

Unnamed: 0,patent,inventor
0,10002407,"[fl:s_ln:griffiths-15, fl:m_ln:wollersheim-2, ..."
1,10006610,"[fl:j_ln:li-228, fl:r_ln:cui-5, fl:x_ln:lin-78]"
2,10007679,"[fl:z_ln:guo-135, fl:z_ln:zhang-599]"
3,10007687,"[fl:h_ln:clement-1, fl:t_ln:mastronardi-1, fl:..."
4,10007868,"[fl:g_ln:miller-11, fl:h_ln:jin-151, fl:z_ln:w..."


Generate Interpatent Combination

In [8]:
interpatent = interpatent_inventor_combination(direct_ancestor=direct_lineage, inventor_tree=inventor_tree)

In [9]:
interpatent.head()

Unnamed: 0,id,lineage,similarity,hops,combination
0,6226017,4136359,0.057604,1,"[(fl:a_ln:godfrey-5, fl:j_ln:goossen-4), (fl:a..."
1,9940907,6226017,0.135144,2,"[(fl:c_ln:ergan-1, fl:j_ln:priestley-1), (fl:c..."
2,7705842,6226017,0.14974,2,"[(fl:r_ln:panabaker-1, fl:j_ln:creasey-4), (fl..."
3,9001133,7705842,0.21105,3,"[(fl:p_ln:nambi-1, fl:t_ln:kim-394), (fl:p_ln:..."
4,9153179,7705842,0.319528,3,"[(fl:d_ln:redman-5, jzxgsipopha4myd9pvdnbp0ov)..."


### Calculation of Similarity
This section assumes the embeddings for the inventors have been persisted in the files below as well as the mapping. <br>
**Important Notes:**
* **node2vec** uses biased random walks, re-running the algorithm would likely yield a different results. However, the results should be consistent
* There exist inventors that **do not have relationships up to 3 hops**. In this case, there would **NOT be any embeddings for such nodes**, resulting a return of None by *similarity_score()*. Such cases are treated as 0

In [10]:
from patent_neo4j.analysis import similarity_score, convert_np
import json

In [11]:
file_emb = "./node2vec/emb/" + df.name[0] + ".emb"
file_map = "./node2vec/map/" + df.name[0] + ".map"
node_emb = convert_np(file_emb)
with open(file_map) as json_file:
    coinventor_mapping = json.load(json_file)

Calculating the **Average Combination Similarity**

In [12]:
inventor_sim = []

for index,row in interpatent.iterrows():
    sim_score = 0
    length = len(row.combination)
    
    for pair in row.combination:
        curr_score = similarity_score(coinventor_mapping, node_emb, pair[0], pair[1])
        if curr_score is not None: # Effectively making no connections to be 0
            sim_score = sim_score + curr_score
        
    if length != 0:
        inventor_sim.append(sim_score/length)
    else:
        inventor_sim.append(0)

In [15]:
interpatent = interpatent.assign(inventor_sim=inventor_sim)

Change Combinations to the Number of Combinations 

In [17]:
interpatent['combination'] = interpatent['combination'].apply(lambda x: len(x))