<a href="https://colab.research.google.com/github/samkoyun-neo4j/fraud-workshop/blob/main/workshop_notebooks/3-money-mules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Following the money

According to the FBI, criminals frequently recruit money mules to help launder proceeds derived from online scams and fraud. Money mules introduce layers of separation between victims and fraudsters, deliberately obscuring money trails and making investigations significantly harder.

In this module, we focus on detecting money mule behaviour using the PaySim dataset.
Our working hypothesis is simple:

**"Clients who send money to, or receive money from, confirmed first-party fraudsters are strong candidates for money mule involvement."**

In this exercise, we will:
1. Identify and explore transactions (money transfers) between first-party fraudsters and other clients
2. Use Weakly Connected Components (WCC) to reveal transaction networks linked to known fraudsters
3. Use PageRank (centrality) to score clients based on their influence in terms of the amount of money transferred to/from fraudsters and assign risk score to these clients

While identifying and blocking fraudulent accounts is valuable, the real power of graph analytics lies in the ability to continuously expand the investigation across transactions, entities, and behaviours as new data becomes available.

Now that we have suspected fraudulent accounts(*first party*) identified, what can we learn from any transaction activity they have been able to perform. There must be a way for members to profit from these accounts and looking deeper as connections between the groups might lead us to central players in a larger fraud operation.

## How many *risky* transactions are there?

The following Cypher looks at transactional relationships that members of larger fraud groups have with accounts outside their immediate group. Obviously transfers within the group are expected but looking at how money moves out of the group is a key to finding the central actors in a larger organisation.

In [None]:
# Install Neo4j GDS Python Client
import sys
!{sys.executable} -m pip install graphdatascience dotenv

# Import our GDS entry point
from graphdatascience import GraphDataScience
from dotenv import load_dotenv
import os

# (Desktop) Load environment variables from a .env file
# load_dotenv(override=True)
# gds = GraphDataScience(os.environ["NEO4J_URI"], auth=(os.environ["NEO4J_USER"], os.environ["NEO4J_PASS"]))

# (Colab) Directly provide connection details (Replace the placeholders below)
gds = GraphDataScience("uri", auth=("neo4j", "password"))

In [None]:
# We will focus on fraud groups above 5 members
result = gds.run_cypher("""
    MATCH (c1:Client&FirstPartyFraudster)--(txn:Transaction)--(c2:Client)
    WHERE c2.firstPartyFraudGroup IS NULL OR c1.firstPartyFraudGroup <> c2.firstPartyFraudGroup
    UNWIND labels(txn) AS transactionType
    RETURN transactionType, count(*) AS freq;
""")

result

Across the hundreds of thousands of transactions in the dataset, only a relatively small subset emanates outward from the identified first-party fraud groups. We capture these outward money movements as a distinct layer in the graph, allowing us to isolate and analyse connections that warrant closer scrutiny.
These transactions are exclusively *money transfers*. By following the flow of funds through this layer, we can surface intermediary accounts and uncover the underlying money mule network.

## 1. Suspicious transactions (Anchored analysis)

This section starts from known fraudsters and expands outward.

### 1.1 Direct exposure

We begin by identifying clients who either send money to, or receive money from, confirmed fraudster accounts. These direct connections represent the first layer of exposure and form the initial pool of money mule suspects.


In [None]:
result = gds.run_cypher("""
    MATCH p=(fpFraudster:Client&FirstPartyFraudster)--(txn:Transaction)--(c:Client)
    WHERE c.firstPartyFraudGroup IS NULL
    RETURN c.name AS ClientName, count(*) AS TransactionCount, round(sum(txn.amount), 2) + ' $' AS TotalAmount
    ORDER BY TransactionCount DESC
    LIMIT 10;
""")
result

### 1.2 Indirect exposure

Fraudsters rarely transfer funds directly to the ultimate beneficiaries. Instead, money is routed through chains of intermediary mule accounts to weaken traceability. To surface this behaviour, we analyse transaction paths that extend **N hops** away from known fraudsters (for example, up to 10 hops). 

This allows us to:
- Reveal intermediary mule layers
- Identify downstream recipients
- Understand how far money travels before reaching an apparent "exit" account

Let's look at the below Cypher query to reveal this pattern. 
This pattern of money flow would have the following characteristics:
1. Transactions occur sequentially in time
2. Each account in the ring retains up to 20% of the money being moved
3. Ultimate beneficiary is different from the identified fraudster who initiates the transaction

In [None]:
result = gds.run_cypher("""
    MATCH (first_c:Client&FirstPartyFraudster)-[r:PERFORMED]->(txn:Transaction)
    MATCH path=(first_c)-[r]->(txn)
        (
            (tx_i:Transaction)-[:TO]->(a_i:Client)-[:PERFORMED]->(tx_j:Transaction)
            WHERE tx_i.amount >= tx_j.amount >= 0.80 * tx_i.amount
            AND tx_i.globalStep < tx_j.globalStep
        ){2,10}
    (last_tx:Transaction)-[:TO]->(last_c:Client)
    WHERE first_c <> last_c
    RETURN 
        first_c.name as firstPartyFraudster, 
        last_c.name as lastRecipient, 
        size(apoc.coll.toSet(tx_i + tx_j)) AS transactionHops
    ORDER BY transactionHops DESC
    LIMIT 10;
""")
result

## 2. Money Flow Patterns (Unanchored analysis)

In this section, we look for suspicious transaction structures without relying on known fraudster labels.


### 2.1 Circular Money Movement Pattern

Money laundering often involves circulating funds through multiple accounts to disguise their origin. Using graph pattern matching, we can efficiently detect circular or near-circular money movements that would be extremely difficult to identify using traditional queries.

We focus on rings with the following characteristics:

1. The path starts and ends at the same account
2. Each account appears only once in the ring
3. Transactions occur sequentially in time
4. Each account in the ring retains up to 20% of the transferred amount

In [None]:
result = gds.run_cypher(
    """
    MATCH (a:Client)-[f:PERFORMED]->(first_tx:Transaction)
    MATCH path=(a)-[f]->(first_tx)
        (
            (tx_i:Transaction)-[:TO]->(a_i:Client)-[:PERFORMED]->(tx_j:Transaction)
            WHERE tx_i.amount >= tx_j.amount >= 0.80 * tx_i.amount
              AND tx_i.globalStep < tx_j.globalStep
        ){2,10}
    (last_tx:Transaction)-[:TO]->(a)
    WHERE COUNT {WITH a, a_i UNWIND [a] + a_i AS b RETURN DISTINCT b} = size([a] + a_i)
    RETURN 
        COUNT {WITH a, a_i UNWIND [a] + a_i AS b RETURN DISTINCT b} as ringSize, 
        a.name as startingClient
    ORDER BY ringSize DESC
    LIMIT 10;
    """
)

result

<img src="../img/circular_money_flow.png?raw=1" alt="Circular Money Flow" width="150%" title="Circular Money Flow">

### 2.2 Fan-in / Fan-out Pattern

Another common laundering signal is transactional imbalance within a short time window.

- Fan-in: Many accounts sending money to a single account (aggregation mule)
- Fan-out: One account distributing money to many others (distribution mule)

While not conclusive on their own, these **anomalies** are powerful risk signals and can be incorporated into composite fraud or mule risk scores.

In [None]:
result = gds.run_cypher(
    """
    // Fan-out
    MATCH (c:Client)-[:PERFORMED]->(txn:Transaction)-[:TO]->(:Client)
    WHERE 150000 < txn.globalStep < 160000
    RETURN c.name AS potentialDistMule, count(txn) as txnCount
    ORDER BY txnCount DESC
    LIMIT 10;
    """
)

result

<img src="../img/fan_out.png?raw=1" alt="Fan Out Pattern" width="70%" title="Fan Out Pattern">

## 3. Structured Discovery Using Graph Data Science

In this section, we use Neo4j Graph Data Science (GDS) to uncover direct mule networksâ€”clients who transact directly with confirmed first-party fraudsters.

Rather than analysing the entire transaction graph, we focus on high-signal connections between fraudsters and external clients. This allows us to identify:
- Potential money mules
- Transactional groupings that span multiple fraud groups (*intra-group insight*)
- Structurally important accounts that enable money movement at scale

### 3.1 Find Who Transacted with the Fraudsters

We begin by identifying clients who have transacted outside of their original fraud rings, either sending money to or receiving money from confirmed first-party fraudsters.

Using these suspicious connections, we construct a new transaction meta-graph that explicitly captures fraud-adjacent money movement.

The Cypher code below identifies these suspects transacting outside of each fraud ring, marks them with a `suspect` property and connects them together with the new relationship type `TRANSACTED_WITH`.

Rather than projecting the entire transaction graph, we use a **Cypher projection** to precisely define the subgraph we want to analyse. This allows us to:
- Reduce noise from unrelated transactions
- Improve algorithm performance
- Ensure that detected structures are directly relevant to fraud investigation

In [None]:
fraudGroupMinSize = 5

result = gds.run_cypher("""
    MATCH (c:Client) WHERE c.firstPartyFraudGroup IS NOT NULL
    WITH c.firstPartyFraudGroup AS groupId, collect(c.id) AS members
    WITH groupId, size(members) AS groupSize WHERE groupSize > $gs
    MATCH (fpFraudster:Client {firstPartyFraudGroup:groupId})--(txn:Transaction)--(c:Client)
    WHERE c.firstPartyFraudGroup IS NULL
    SET c.muleSuspect = true
    MERGE (fpFraudster)-[r:TRANSACTED_WITH]->(c)
    ON CREATE SET r += txn
    RETURN count(DISTINCT r) AS NewRelationshipsCreated;
""", params= {'gs': fraudGroupMinSize})
result

### 3.2 Discovering Mule Networks

#### 3.2.1 Creating a Targeted Mule Network Projection

This time we are using a [cypher projection](https://neo4j.com/docs/graph-data-science-client/current/graph-object/#cypher-projection) that specifically targets only those client nodes marked as `firstPartyFraudster` or mule suspects, connected by the new `TRANSACTED_WITH` relationships.

In [None]:
graphName = 'muleNetwork'

# Remove existing graph with the same name
if gds.graph.exists(graphName).exists:
    gds.graph.drop(gds.graph.get(graphName))

In [None]:
projection, projectionPandas = gds.graph.cypher.project(
    """
    MATCH (source:Client)-[r:TRANSACTED_WITH]->(target:Client)
    RETURN gds.graph.project(
        $graphName,
        source,
        target,
        { relationshipProperties: r { .amount } },
        { undirectedRelationshipTypes: ['*'] }
    )
    """,
    graphName=graphName
)
projectionPandas

#### 3.2.2 Identifying Transactional Cells Using WCC

With the targeted mule network projection in place, we apply the [Weakly Connected Components (WCC)](https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/) algorithm to identify connected groups of activity.

WCC groups together clients that are connected through transaction paths, regardless of direction, revealing:
- Clusters of interconnected fraudsters and suspected mules
- Transactional cells that may operate locally
- The broader network when multiple fraud rings overlap or intersect

Note this time we are executing the WCC algorithm in `write` mode, which will directly write the detected group id for every projected node as a property on these nodes in the database.

In [None]:
result = gds.wcc.write(projection, writeProperty='muleDirectNetworkId')
result

In [None]:
# Create an index on the new property
gds.run_cypher("CREATE INDEX MuleDirectGroupIndex IF NOT EXISTS FOR (c:Client) on c.muleDirectNetworkId;")

In [None]:
result = gds.run_cypher("""
    MATCH (c:Client) WHERE c.muleDirectNetworkId IS NOT NULL
    WITH c.muleDirectNetworkId AS muleDirectNetworkId, collect(c.id) AS members
    RETURN muleDirectNetworkId, size(members) AS groupSize
    ORDER BY groupSize DESC;
""")
result.head(5)

#### 3.2.3 Finding the *Really* Bad Actors using Betweenness Centrality

Having identified transactional cells, the investigation now shifts from *'who is connected'* to *'who enables the network to function'*.

In complex fraud networks, activity is often divided into multiple transactional cells that operate locally. However, the funds from these cells must eventually be *aggregated, moved across groups, or cashed out*.

This creates a structural dependency on a small number of **central accounts** that sit between otherwise separate clusters of activity. While these accounts may initially appear legitimate, their importance is revealed by their position in the network, not by their individual attributes.

To surface these actors, we apply the [Betweenness Centrality](https://neo4j.com/docs/graph-data-science/current/algorithms/betweenness-centrality/) algorithm from Neo4j Graph Data Science. Betweenness Centrality measures how frequently a node lies on the shortest paths between other nodes, making it particularly effective at identifying:
- Brokers
- Coordinators
- Exit points through which funds flow across multiple fraud cells

In [None]:
result = gds.betweenness.write(projection, writeProperty='score')
result

Let's have a look at clients with high betweenness score. These people are the key actors connecting different communities. These accounts might not be labelled as fraudulent, but their structural role makes them critical points of leverage within the network.

In [None]:
result = gds.run_cypher("""
    MATCH (c:Client) WHERE c.score IS NOT NULL
    RETURN c.name AS clientName, c.score AS BetweennessScore, c.muleDirectNetworkId AS muleDirectNetworkId
    ORDER BY BetweennessScore DESC;
""")
result.head(5)

## 4. Using Bloom to highlight key fraudsters!

Graph Data Science helped us compute structure and importance.
Bloom helps us see it.

In this section, we use Neo4j Bloom to visually explore the fraud networks we identified in Section 3, making it easier to:
- Understand how fraud groups are connected
- Spot key brokers at a glance
- Follow the actual flow of money through mule networks


### 4.1 Intra-Group Analysis: Identifying Key Brokers

We begin by visualising the largest intra-fraud community(transactional cell) identified in Section 3. This community connects multiple first-party fraud groups, and the highest-impact accounts uncovered by betweenness centrality.

Using a [saved Cypher search phase](https://neo4j.com/docs/bloom-user-guide/current/bloom-tutorial/search-phrases-advanced/), Bloom retrieves and renders this community directly from the graph. Rather than inspecting tables or scores, we can quickly see:
1. How fraud groups are linked
2. Which accounts sit between them
3. Which nodes play a structurally central role

Bloom is also able to use rule based scene rendering to colour and size nodes and relationships based on any of their data properties. In this example, high-betweenness nodes naturally stand out as bridges between otherwise separate fraud groups/accounts that enable the broader operation to function.

Try the following search phrase in Bloom

* "Show intra-group transactions"
  * Use Bloom rule based styling to highlight centrality results

<img src="https://github.com/samkoyun-neo4j/fraud-workshop/blob/main/img/betweeness_analysis.png?raw=1" alt="Visualising betweeness centrality" width="100%" height="100%" title="Visualising betweeness centrality">

Note:
- Betweenness centrality was calculated using UNDIRECTED relationships, reflecting the fact that `TRANSACTED_WITH` captures association, not money direction.
- The visualised graph shows connections between multiple first-party fraud groups, linked via shared mule or broker accounts.

### 4.2 Money Flow Analysis: Identifying Aggregators and Distributors

One of the hardest tasks using traditional relational tools is following money movement across long transaction chains. Direction, repetition, and branching quickly become difficult to reason with. Graph traversal makes this intuitive, and Bloom makes it visible. 

Building on the earlier transaction analysis([section 1.2](#12-indirect-exposure)), we now visualise directed money flow from a known fraudster through the mule network. By combining traversal with centrality measures, we can identify where disruption would be most effective.

Different centrality orientations reveal different operational roles:
- Betweenness Centrality (natural orientation)
Highlights choke points; accounts that sit on many transaction paths and enable funds to move through the network.
- Degree Centrality (natural orientation)
Highlights distribution mules sending funds to many others.
- Degree Centrality (reverse orientation)
Highlights aggregation mules or beneficiaries receiving funds from many sources.

Try the following search phrase in Bloom

* "Show top 5 fraudsters' money flow"
* "Show me the money flow from fraudster Gianna Hickman"
  * Use Bloom rule based styling to highlight centrality results (Betweenness, Degree)

<img src="../img/money_flow.png?raw=1" alt="Visualising betweeness centrality" width="200%" title="Visualising betweeness centrality">
