# Graph Traversal

## Connect to GDS instance

In [None]:
from dotenv import load_dotenv
from graphdatascience import GraphDataScience
import os

# >> Update the password and the URL here <<
load_dotenv()
gds = GraphDataScience(os.environ["NEO4J_URI"], auth=(os.environ["NEO4J_USER"], os.environ["NEO4J_PASS"]))

  from .autonotebook import tqdm as notebook_tqdm


### Money Laundering

For money laundering, a group of people or accounts are used to move funds across different locations and employ diverse strategies to evade detection. Using graph pattern matching, we could easily detect these type of movement.
This type of suspicious rings would have the following characteristics:

1. The ring starts and ends with the same account
2. The transactions that form the ring occur sequentially in time
3. The accounts in the ring are unique (the same account doesn’t appear more than once)
4. Each account in the ring retains up to 20% of the money being moved
5. The ring is comprised of between 3 and 8 accounts

In [2]:
result = gds.run_cypher(
    """
    MATCH (a:Client)-[f:PERFORMED]->(first_tx:Transaction)
    MATCH path=(a)-[f]->(first_tx)
        (
            (tx_i:Transaction)-[:TO]->(a_i:Client)-[:PERFORMED]->(tx_j:Transaction)
    WHERE tx_i.amount >= tx_j.amount >= 0.80 * tx_i.amount
    // AND tx_i.globalStep < tx_j.globalStep
        ){2,5}
    (last_tx:Transaction)-[:TO]->(a)
    WHERE COUNT {WITH a, a_i UNWIND [a] + a_i AS b RETURN DISTINCT b} = size([a] + a_i)
    RETURN COUNT {WITH a, a_i UNWIND [a] + a_i AS b RETURN DISTINCT b} as ringSize, a.id as startingClient;
    """
)

result

Unnamed: 0,ringSize,startingClient
0,4,4497122618770519
1,4,4497122618770519
2,6,4063209563133993
3,4,4727712507081424
4,4,4727712507081424
5,4,4727712507081424
6,4,4727712507081424
7,4,4727712507081424
8,6,4615520989644311
9,5,4513670839841733


### Risk Score

Graph traversal lets us score how “risky” an account or customer is based on everything they’re connected to. Most traditional systems stops at first-degree checks: did this person transact with a known fraudster, or are they using a dodgy device or email? Anything beyond those direct links usually gets too slow or too expensive to calculate, especially if you need that score while a transaction is happening.

Neo4j doesn’t hit that wall. Because each node stores its actual connections, it can follow long chains of 10 to 20 hops across accounts, devices, emails, addresses and more in just a few milliseconds. That makes deeper risk checks practical in real time, not just in offline analysis.

#### High Risk Nodes
First, let's quickly create high risk nodes on PII data(Email, Phone, SSN);

We'll label any PII data that's been shared between 3 or more clients as `HighRisk` node. 
Let's have a look at these records.

In [4]:
result = gds.run_cypher("""
    MATCH path=(c1:Client)--(pii:Email|Phone|SSN)--(c2:Client)
    WHERE c1 < c2
    WITH pii, collect(DISTINCT c1.name) + collect(DISTINCT c2.name) as combined
    WITH pii, apoc.coll.toSet(combined) AS cnames
    WHERE size(cnames) > 2
    RETURN pii, cnames, size(cnames) as size
    ORDER BY size DESC
""")

result.head(10)

Unnamed: 0,pii,cnames,size
0,(phoneNumber),"[Hudson Howard, Layla Valdez, Charlotte Schroe...",8
1,(phoneNumber),"[Tyler Weiss, Layla Britt, Kylie Wooten, Sebas...",6
2,(ssn),"[Alexis Dale, Hudson Howard, Camila Myers, Ant...",6
3,(ssn),"[Tristan Meyer, Samantha Mcclain, Brody Dunlap...",6
4,(phoneNumber),"[Anthony Mitchell, Connor Christian, Chloe Pat...",5
5,(phoneNumber),"[Emma Weaver, Gianna Atkinson, Hunter Wood, Al...",5
6,(phoneNumber),"[Skylar Watson, Wyatt Howell, Brooklyn Fry, Ev...",5
7,(phoneNumber),"[Makayla Mcfadden, Savannah Shields, Mason Wag...",5
8,(phoneNumber),"[Sophia Hickman, Savannah Bright, Logan Zamora...",5
9,(ssn),"[Kennedy Kline, Emma Weaver, Gianna Atkinson, ...",5


Now you can modify this query slightly to label these nodes as `HighRisk`. 

In [None]:
gds.run_cypher("""
    MATCH path=(c1:Client)--(pii:Email|Phone|SSN)--(c2:Client)
    WHERE c1 < c2
    WITH pii, collect(DISTINCT c1.name) + collect(DISTINCT c2.name) as combined
    WITH pii, apoc.coll.toSet(combined) AS cnames
    WHERE size(cnames) > 2
    SET pii:HighRisk
""")

### First Party Fraud link

In [11]:
result=gds.run_cypher("""
    MATCH path=(c:Client {name: "Logan Adams"}) ((src)-[r1:PERFORMED]->(txn)-[r2:TO]->(tgt) WHERE txn.globalStep > 100000){1,3} (n:FirstPartyFraud) 
    RETURN DISTINCT c.name as name, count(path) as riskScore
""")

result

Unnamed: 0,name,riskScore
0,Logan Adams,6
