# Graph Exploration

Graph created using entity co-occurrences in sentences from text of the following posts:

* [Deutsche Bank: A Global Bank for Oligarchs - American and Russian, Part I](https://whowhatwhy.org/2018/01/08/deutsche-bank-global-bank-oligarchs-american-russian-part-1/)
* [Deutsche Bank: A Global Bank for Oligarchs - American and Russian, Part II](https://whowhatwhy.org/2018/01/15/deutsche-bank-global-bank-oligarchs-american-russian-part-2/)
* [Deutsche Bank: A Global Bank for Oligarchs - American and Russian, Part III](https://whowhatwhy.org/2018/02/01/deutsche-bank-global-bank-oligarchs-american-russian-part-3/).

In this notebook, we will try to capture the important ideas in the text from the graph, similar to how people build mind-maps to understand complex material.

In [1]:
import pandas as pd
import py2neo
import os

In [2]:
NEO4J_CONN_URL = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASS = "graph"

DATA_DIR = "../../data/entity-graph"

pd.options.display.max_colwidth = 100

In [3]:
graph = py2neo.Graph(NEO4J_CONN_URL, auth=(NEO4J_USER, NEO4J_PASS))

<img src="graph-snapshot.png"/>

## Find Important Nodes

In [4]:
def get_nodes_by_pagerank(graph, ent_type):
    query = """
CALL algo.pageRank.stream('%s', 'REL', {iterations:20, dampingFactor:0.85})
YIELD nodeId, score
RETURN algo.asNode(nodeId).ename AS page, score
ORDER BY score DESC
    """ % (ent_type)
    results = graph.run(query).data()
    return pd.DataFrame(results)

### Important People

In [5]:
important_persons_df = get_nodes_by_pagerank(graph, "PER")
important_persons_df.head(10)

Unnamed: 0,page,score
0,Donald J. Trump,1.674359
1,Jared Kushner,0.731569
2,Boris Rotenberg,0.439233
3,Magnitsky,0.405
4,Vladimir Putin,0.313868
5,Steve Bannon,0.30599
6,Dmitry Firtash,0.2775
7,Lev Leviev,0.256715
8,Donald Trump Jr.,0.242898
9,Alex Sapir,0.21375


### Important Organizations

In [6]:
important_orgs_df = get_nodes_by_pagerank(graph, "ORG")
important_orgs_df.head(10)

Unnamed: 0,page,score
0,Deutsche Bank,2.516419
1,US Senate,1.779962
2,IRS Office of Appeals,1.422156
3,US Department of Justice,1.368949
4,Russian Commercial Bank,0.818218
5,VTB Capital,0.78668
6,New York Times,0.443449
7,Prevezon Holdings,0.417437
8,Department of Financial Services,0.417437
9,Treasury Department,0.413723


### Important Places

In [7]:
important_gpes_df = get_nodes_by_pagerank(graph, "GPE")
important_gpes_df.head()

Unnamed: 0,page,score
0,Russia,1.531063
1,New York,0.900733
2,Africa,0.860292
3,Israel,0.645919
4,Southern District,0.608951


## Find interesting neighbors

In [8]:
def get_neighbors_by_type(graph, src_name, src_type, neighbor_type):
    query = """
MATCH (e1:%s {ename:"%s"})<-[r:REL]->(e2:%s) 
RETURN e1.ename AS src, e2.ename AS dst
    """ % (src_type, src_name, neighbor_type)
    results = graph.run(query).data()
    results_df = (pd.DataFrame(results)
        .groupby(["src", "dst"])["dst"]
        .count()
        .reset_index(name="count")
        .sort_values("count", ascending=False)
    )
    return results_df

### Neighbors of Donald J. Trump

In [9]:
djt_per_neighbors_df = get_neighbors_by_type(graph, "Donald J. Trump", "PER", "PER")
djt_per_neighbors_df.head()

Unnamed: 0,src,dst,count
4,Donald J. Trump,Jared Kushner,14
5,Donald J. Trump,Lev Leviev,4
7,Donald J. Trump,Preet Bharara,4
11,Donald J. Trump,Steve Bannon,4
0,Donald J. Trump,Alex Sapir,2


In [10]:
djt_org_neighbors_df = get_neighbors_by_type(graph, "Donald J. Trump", "PER", "ORG")
djt_org_neighbors_df.head()

Unnamed: 0,src,dst,count
1,Donald J. Trump,Deutsche Bank,24
6,Donald J. Trump,US Department of Justice,5
2,Donald J. Trump,IRS Office of Appeals,3
4,Donald J. Trump,Prevezon Holdings,2
0,Donald J. Trump,Department of Financial Services,1


In [11]:
djt_org_neighbors_df = get_neighbors_by_type(graph, "Donald J. Trump", "PER", "GPE")
djt_org_neighbors_df.head()

Unnamed: 0,src,dst,count
3,Donald J. Trump,Russia,7
2,Donald J. Trump,New York,5
5,Donald J. Trump,Southern District,5
4,Donald J. Trump,Seychelles,2
0,Donald J. Trump,Africa,1


### Neighbors of Jared Kushner

In [12]:
kush_per_neighbors_df = get_neighbors_by_type(graph, "Jared Kushner", "PER", "PER")
kush_per_neighbors_df.head()

Unnamed: 0,src,dst,count
1,Jared Kushner,Donald J. Trump,14
3,Jared Kushner,Lev Leviev,4
7,Jared Kushner,Sergey Gorkov,4
8,Jared Kushner,Steve Bannon,4
5,Jared Kushner,Preet Bharara,3


In [13]:
kush_org_neighbors_df = get_neighbors_by_type(graph, "Jared Kushner", "PER", "ORG")
kush_org_neighbors_df.head()

Unnamed: 0,src,dst,count
1,Jared Kushner,Deutsche Bank,14
2,Jared Kushner,New York Times,2
3,Jared Kushner,Prevezon Holdings,2
4,Jared Kushner,US Department of Justice,2
0,Jared Kushner,Department of Financial Services,1


In [14]:
kush_gpe_neighbors_df = get_neighbors_by_type(graph, "Jared Kushner", "PER", "GPE")
kush_gpe_neighbors_df.head()

Unnamed: 0,src,dst,count
3,Jared Kushner,New York,6
4,Jared Kushner,Russia,5
6,Jared Kushner,Southern District,4
0,Jared Kushner,Africa,3
1,Jared Kushner,Eastern District,2


## Find nature of relationship

In [15]:
def build_sentence_dictionary(sent_file):
    sent_dict = {}
    fsent = open(sent_file, "r")
    for line in fsent:
        pid, sid, sent_text = line.strip().split('\t')
        sent_dict[sid] = sent_text
    fsent.close()
    return sent_dict

sent_dict = build_sentence_dictionary(os.path.join(DATA_DIR, "sentences.tsv"))
len(sent_dict)

585

In [16]:
def show_connecting_sentences(graph, src_name, src_type, dst_name, dst_type, sent_dict):
    query = """
MATCH (e1:%s {ename:"%s"})<-[r:REL]->(e2:%s {ename:"%s"}) 
RETURN e1.ename AS src, e2.ename AS dst, r.sid AS sid
ORDER BY sid
    """ % (src_type, src_name, dst_type, dst_name)
    result = graph.run(query).data()
    result_df = pd.DataFrame(result)
    result_df["sent_text"] = result_df["sid"].apply(lambda x: sent_dict[x])
    return result_df

### Donald J. Trump and Deutsche Bank

In [17]:
djt_db_rel_df = show_connecting_sentences(graph, "Donald J. Trump", "PER", "Deutsche Bank", "ORG", sent_dict)
djt_db_rel_df.head(10)

Unnamed: 0,src,dst,sid,sent_text
0,Donald J. Trump,Deutsche Bank,23,Does the fact that Deutsche Bank is currently on the hook for approximately $ 360 million in loa...
1,Donald J. Trump,Deutsche Bank,232,The letter further goes on to note that “ the suspicious ties between President ( Donald ) Trump...
2,Donald J. Trump,Deutsche Bank,233,Waters ’s letter also reminds the chairman of Trump and his companies ’ estimated $ 360 million ...
3,Donald J. Trump,Deutsche Bank,238,Does Donald Trump have a conflict of interest with Deutsche Bank with regard to any potential cr...
4,Donald J. Trump,Deutsche Bank,240,"Donald Trump ’s relationship with Deutsche Bank began in 1998 when he negotiated , and received ..."
5,Donald J. Trump,Deutsche Bank,244,"Through the years , he sought more loans from Deutsche ."
6,Donald J. Trump,Deutsche Bank,245,It should be noted that Trump sued Deutsche Bank in 2008 when it called in a big loan that Trump...
7,Donald J. Trump,Deutsche Bank,248,"Not long after the nastiness of the 2008 countersuits , Trump ’s business with Deutsche moved fr..."
8,Donald J. Trump,Deutsche Bank,249,"Trump ’s wealth manager at Deutsche , Rosemary Vrablic , specializes in real estate and is close..."
9,Donald J. Trump,Deutsche Bank,250,"In the past six years , the Deutsche Bank private wealth unit helped finance three of Trump ’s p..."


### Donald J. Trump and Jared Kushner

In [18]:
djt_kush_rel_df = show_connecting_sentences(graph, "Donald J. Trump", "PER", "Jared Kushner", "PER", sent_dict)
djt_kush_rel_df.head(10)

Unnamed: 0,src,dst,sid,sent_text
0,Donald J. Trump,Jared Kushner,109,"He had met with President - elect Trump , Kushner and Bannon at Trump Tower in November , 2016 ,..."
1,Donald J. Trump,Jared Kushner,249,"Trump ’s wealth manager at Deutsche , Rosemary Vrablic , specializes in real estate and is close..."
2,Donald J. Trump,Jared Kushner,402,"In the final installment , we ’ll detail how Trump ’s son - in - law and White House adviser Jar..."
3,Donald J. Trump,Jared Kushner,411,"Preet Bharara in Trump Tower on November 30 , 2016 , along with advisor Steve Bannon and Jared K..."
4,Donald J. Trump,Jared Kushner,417,The timing of the Deutsche refinancing transaction with Kushner and the subsequent media attenti...
5,Donald J. Trump,Jared Kushner,444,"Additionally , Kushner neglected to mention that — subsequent to another meeting with both Flynn..."
6,Donald J. Trump,Jared Kushner,445,VEB had provided financing for Trump Tower Toronto — as Kushner surely knew previous to his meet...
7,Donald J. Trump,Jared Kushner,464,October 2016 Trump ’s campaign team announces that Kushner will recuse himself from any involvem...
8,Donald J. Trump,Jared Kushner,468,US Attorney Southern District of New York Preet Bharara meets on the 26th Floor of Trump Tower w...
9,Donald J. Trump,Jared Kushner,515,"Donald Trump Jr. sent Kushner an email inviting him to a Trump Tower meeting on June 9 , 2016 wi..."


### Jared Kushner and Deutsche Bank

In [19]:
kush_db_rel_df = show_connecting_sentences(graph, "Jared Kushner", "PER", "Deutsche Bank", "ORG", sent_dict)
kush_db_rel_df.head(10)

Unnamed: 0,src,dst,sid,sent_text
0,Jared Kushner,Deutsche Bank,17,( Eastern District of New York ) has requested records from Deutsche Bank relative to Jared Kush...
1,Jared Kushner,Deutsche Bank,249,"Trump ’s wealth manager at Deutsche , Rosemary Vrablic , specializes in real estate and is close..."
2,Jared Kushner,Deutsche Bank,402,"In the final installment , we ’ll detail how Trump ’s son - in - law and White House adviser Jar..."
3,Jared Kushner,Deutsche Bank,405,This “ cooperation agreement ” becomes more pertinent when we consider Jared Kushner ’s relation...
4,Jared Kushner,Deutsche Bank,406,"When he first filed the financial disclosure forms mandated for all federal employees , Kushner ..."
5,Jared Kushner,Deutsche Bank,407,"But at that time , Kushner failed to disclose the $ 285 million refinancing his firm received fr..."
6,Jared Kushner,Deutsche Bank,417,The timing of the Deutsche refinancing transaction with Kushner and the subsequent media attenti...
7,Jared Kushner,Deutsche Bank,463,October 2016 Jared Kushner ’s business receives a $ 285 million loan from Deutsche Bank relative...
8,Jared Kushner,Deutsche Bank,464,October 2016 Trump ’s campaign team announces that Kushner will recuse himself from any involvem...
9,Jared Kushner,Deutsche Bank,512,Kushner must have forgotten about the promise he made to recuse himself from contact with any pa...


## Find path connecting a pair of nodes

In [20]:
def get_path_between(graph, src_name, src_type, dst_name, dst_type):
    query = """
MATCH (start:%s {ename:'%s'}), (end:%s {ename:'%s'})
CALL algo.shortestPath.stream(start, end)
YIELD nodeId, cost
RETURN algo.asNode(nodeId).ename AS name, cost    
    """ % (src_type, src_name, dst_type, dst_name)
    results = graph.run(query).data()
    path = [x["name"] for x in results]
    return path


### Path from Donald J. Trump to Vladimir Putin

In [21]:
djt_putin_link = get_path_between(graph, "Donald J. Trump", "PER", "Vladimir Putin", "PER")
print(djt_putin_link)

['Donald J. Trump', 'Deutsche Bank', 'Vladimir Putin']
