**DISCLAIMER**

By accessing this code, you acknowledge the code is made available for presentation and demonstration purposes only and that the code (1) is not subject to SOC 1 and SOC 2 compliance audits, and (2) is not designed or intended to be a substitute for the professional advice, diagnosis, treatment, or judgment of a certified financial services professional. Do not use this code to replace, substitute, or provide professional financial advice, or judgement. You are solely responsible for ensuring the regulatory, legal, and/or contractual compliance of any use of the code, including obtaining any authorizations or consents, and any solution you choose to build that incorporates this code in whole or in part.

<img src=https://brysmiwasb.blob.core.windows.net/demos/geoscan/databricks_fsi_white.png width="600px">

# Geospatial fraud detection

*A large scale fraud prevention system is usually a complex ecosystem made of various controls (all with critical SLAs), a mix of traditional rules and AI and a patchwork of technologies between proprietary on-premises systems and open source cloud technologies. In a previous [solution accelerator](https://databricks.com/blog/2021/01/19/combining-rules-based-and-ai-models-to-combat-financial-fraud.html), we addressed the problem of blending rules with AI in a common orchestration layer powered by MLFlow. In this series of notebooks centered around geospatial analytics, we demonstrate how Lakehouse enables organizations to better understand customers behaviours, no longer based on who they are, but how they bank, no longer using a one-size-fits-all rule but a truly personalized AI. After all, identifying abnormal patterns can only be made possible if one first understands what a normal behaviour is, and doing so for millions of customers becomes a challenge that requires both data and AI combined into one platform. As part of this solution, we are releasing a new open source geospatial library, [GEOSCAN](https://github.com/databrickslabs/geoscan), to detect geospatial behaviours at massive scale, track customers patterns over time and detect anomalous card transactions*

---
+ <a href="https://databricks.com/notebooks/geoscan/00_geofraud_context.html">STAGE0</a>: Home page
+ <a href="https://databricks.com/notebooks/geoscan/01_geofraud_clustering.html">STAGE1</a>: Using a novel approach to geospatial clustering with H3
+ <a href="https://databricks.com/notebooks/geoscan/02_geofraud_fraud.html">STAGE2</a>: Detecting anomalous transactions as ML enriched rules
---
<antoine.amend@databricks.com>

In [0]:
displayHTML("""<iframe src="https://drive.google.com/file/d/1QEqsofS2qBELW3lJrwT740K5FGxiZrDV/preview"></iframe>""")


## Context
In the first notebook, we introduce [GEOSCAN](https://github.com/databrickslabs/geoscan), a novel approach to geospatial clustering. We will aim at learning user transactional behaviour based on synthetic transactions data in NYC. In a second notebook, we leverage this information to detect transactions deviating from the norm and explore different ways to surface these anomalies from an analytics environment to an online data store.

<img src=https://brysmiwasb.blob.core.windows.net/demos/geoscan/geoscan_architecture.png alt="logical_flow" width="800">

## Hexagonal Hierarchical Spatial Index
In this series of notebooks and companion library, we will be using [H3](https://eng.uber.com/h3/), an Hexagonal Hierarchical Spatial Index developed by Uber to analyze large spatial data sets. Partitioning areas of the Earth into identifiable grid cells as per image below, we will leverage this technique to detect transactions happening in close vicinity from one another.

<img src="https://1fykyq3mdn5r21tpna3wkdyi-wpengine.netdna-ssl.com/wp-content/uploads/2018/06/image12.png" width=300>
<br>
[source](https://eng.uber.com/h3/)

&copy; 2021 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source].  All included or referenced third party libraries are subject to the licenses set forth below.

| library                                | description             | license    | source                                              |
|----------------------------------------|-------------------------|------------|-----------------------------------------------------|
| com.uber:h3:3.6.3                      | Uber geospatial library | Apache2    | https://github.com/uber/h3                          |
| h3                                     | Uber geospatial library | Apache2    | https://github.com/uber/h3-py                       |
| org.scala-graph:graph-core_2.12:1.12.5 | Scala graph             | Apache2    | https://github.com/scala-graph/scala-graph          |
| com.databricks.labs:geoscan:0.0.1      | Geoscan algorithm       | Databricks | https://github.com/databrickslabs/geoscan           |
| folium                                 | Geospatial visualization| MIT        | https://github.com/python-visualization/folium      |
| pybloomfiltermmap3                     | Bloom filter            | MIT        | https://github.com/prashnts/pybloomfiltermmap3      |