# **Hybrid AI/ML Approach for Fraud Detection on GCP**
date: February 2024 </br>
author: lessismore@

Bringing **best of all worlds** : Combining different AI approaches including descriptive, predictive and generative AI
to tackle complex problems such as fraud can be very impactful.

*   Enriching data sources and providing additional insights
*   Reducing the false positives and false negatives by cross-examination


#Pre-requisite
1. Download the dataset from https://www.kaggle.com/ealaxi/paysim1
2. Unzip the file and load to a google cloud storage (Local uploads to BQ are limited to 100 MB)
3. Update the parameter values in the cell below


In [None]:
# Install the Google Cloud Platform SDK;
!pip install google-cloud-aiplatform

In [2]:
# Update the parameters;
GCP_PROJECT = "tadelle-372416"                        #TO-DO : update this
GCP_REGION  = "us-central1"                           #TO-DO : update this
GCP_BUCKET  = "gs://tadelle-bucket/fraud"             #TO-DO : update this
FILE_NAME   = 'PS_20174392719_1491204439457.csv'      #TO-DO : update this

import vertexai
# Initialise the project and the location;
vertexai.init(project=GCP_PROJECT, location=GCP_REGION)

In [None]:
# Create your dataset in BigQuery;
!bq --location {GCP_REGION} mk --dataset {GCP_PROJECT}:fraud

Dataset 'tadelle-372416:fraud' successfully created.


In [None]:
# Load file from google cloud storage to BigQuery table;
!bq load --source_format=CSV --autodetect fraud.raw_data {GCP_BUCKET}/{FILE_NAME}

Waiting on bqjob_r55c73c05c7e305d6_0000018de0064194_1 ... (15s) Current status: DONE   


# Data preparation


In [3]:
%%bigquery
# Stratified sampling and creating new features;
CREATE OR REPLACE TABLE fraud.input_data AS
SELECT
     type,
     amount,
     nameOrig,
     nameDest,
     row_number() over(partition by nameDest order by step desc) as destCnt,
     oldbalanceOrg as oldbalanceOrig,  #standardize the naming to avoid confusions.
     newbalanceOrig,
     oldbalanceDest,
     newbalanceDest,
     if(oldbalanceOrg = 0.0, 1, 0) as origzeroFlag,
     if(newbalanceDest = 0.0, 1, 0) as destzeroFlag,
     round((newbalanceDest-oldbalanceDest-amount)) as amountError,
     generate_uuid() as id,            #create a unique id for each transaction.
     isFraud
FROM fraud.raw_data
WHERE type in("CASH_OUT","TRANSFER") and (isFraud = 1 or (RAND()< 10/100));  # select 10% of the non-fraud cases


Query is running:   0%|          |

In [None]:
%%bigquery
# Split for test;
CREATE OR REPLACE TABLE fraud.data_test AS
SELECT *
FROM fraud.input_data
where RAND() < 20/100;

# Split for train with partition flag for validation;
CREATE OR REPLACE TABLE fraud.data_train AS
SELECT *, if(RAND()< 20/100, true, false) as part_flag
FROM
 (
 SELECT *
 FROM fraud.input_data
 EXCEPT distinct select * from fraud.data_test
 );

Query is running:   0%|          |

# Modeling

## Descriptive AI

### Creating kmeans models with and without hyperparameter tuning

In [2]:
%%bigquery
# Build an unsupervised model with kmeans;
CREATE OR REPLACE MODEL
  fraud.unsupervised OPTIONS(model_type='kmeans', kmeans_init_method = "kmeans++", num_clusters=7, standardize_features=true) AS
   SELECT * EXCEPT (id, isFraud, nameOrig, nameDest, part_flag)
   FROM
   `fraud.data_train`;

# Build an unsupervised model with kmeans with automated hyperparameter tuning;
CREATE OR REPLACE MODEL
  fraud.unsupervised_tuned OPTIONS(model_type='kmeans', kmeans_init_method = "kmeans++", standardize_features=true) AS
   SELECT * EXCEPT (id, isFraud, nameOrig, nameDest, part_flag)
   FROM
   `fraud.data_train`;

Executing query with job ID: c8f4d340-2cca-46cc-94e9-da69fe103250
Query executing: 0.32s

KeyboardInterrupt: 

### Analysing the fraud rate in test data
#### *Nice to have the fraud rate cross check* *- since the descriptive model might be applied when there is no labelled data*

In [None]:
%%bigquery
# Run the clustering model on test data;
SELECT *, round(sum_is_fraud/cnt_all, 2) as fraud_rate
from(
SELECT
 centroid_id, sum(isFraud) as sum_is_fraud,  count(*) cnt_all
FROM
 ML.PREDICT(MODEL fraud.unsupervised,
   (
   SELECT *
   FROM  fraud.data_test))
group by centroid_id
order by centroid_id
);

### Detect anomalies in kmeans model

In [None]:
%%bigquery
# Find anomalies in each cluster;
SELECT *, round(sum_fraud/cnt_all, 2) as fraud_rate
from(
SELECT
 centroid_id, sum(isfraud) as sum_fraud,  count(*) cnt_all
FROM
 ML.DETECT_ANOMALIES(MODEL fraud.unsupervised, struct(0.2 as contamination),
   (
   SELECT *
   FROM  fraud.data_test))
group by centroid_id
order by centroid_id
);

## Predictive AI

### Predicting fraudulent scores using labeled data

In [None]:
%%bigquery
# Build a supervised model using xgboost classifier;
CREATE OR REPLACE MODEL `fraud.supervised_btree`
  OPTIONS(MODEL_TYPE='BOOSTED_TREE_CLASSIFIER',
          INPUT_LABEL_COLS = ["isFraud"], SUBSAMPLE=0.8,
          DATA_SPLIT_METHOD = "CUSTOM",
          DATA_SPLIT_COL= "part_flag",
          ENABLE_GLOBAL_EXPLAIN=True,
          MODEL_REGISTRY = "VERTEX_AI")
AS SELECT
 * EXCEPT (id, nameOrig, nameDest)
FROM `fraud.data_train`;

# Evaluate the model;
SELECT *
FROM ML.EVALUATE(MODEL fraud.supervised_btree);

# Global explainability of the model;
SELECT *
FROM ML.GLOBAL_EXPLAIN(MODEL fraud.supervised_btree);

### Extracting highest scores

In [None]:
%%bigquery

# Get predictions on test data;
SELECT id, isFraud, p.prob as score
FROM
 ML.PREDICT(MODEL fraud.supervised_btree,
 (SELECT * FROM
 fraud.data_test
 )
)
, unnest(predicted_isFraud_probs) as p
where p.label = 1 and p.prob > 0.3
order by p.prob desc;


## Generative AI

In [8]:
import vertexai
from vertexai.language_models import TextGenerationModel

### Testing scam emails

In [9]:
prompt= """ Analyse the prompt below and assess if it includes any fraudulent activity. Explain step by step.

Dear Friend,

I am Mr. Ben Suleman a custom officer and work as Assistant controller of the Customs and Excise department Of the Federal Ministry of Internal Affairs stationed at the Murtala Mohammed International Airport, Ikeja, Lagos-Nigeria.

After the sudden death of the former Head of state of Nigeria General Sanni Abacha on June 8th 1998 his aides and immediate members of his family were arrested while trying to escape from Nigeria in a Chartered jet to Saudi Arabia with 6 trunk boxes Marked "Diplomatic Baggage". Acting on a tip-off as they attempted to board the Air Craft,my officials carried out a thorough search on the air craft and discovered that the 6 trunk boxes contained foreign currencies amounting to US$197,570,000.00(One Hundred and  Ninety-Seven Million Five Hundred Seventy Thousand United States Dollars).

I declared only (5) five boxes to the government and withheld one (1) in my custody containing the sum of (US$30,000,000.00) Thirty Million United States Dollars Only, which has been disguised to prevent their being discovered during transportation process.Due to several media reports on the late head of state about all the money him and his co-government officials stole from our government treasury amounting
to US$55 Billion Dollars (ref:ngrguardiannews.com) of July 2nd 1999. Even the London times of July 1998 reported that General Abacha has over US$3.Billion dollars in one account overseas. We decided to conceal this one (1)box till the situation is calm and quite on the issue. The box was thus deposited with a security company here in Nigeria and tagged as "Precious Stones and Jewellry" in other that its
content will not be discovered. Now that all is calm, we (myself and two of my colleagues in the operations team) are now ready to move this box out of the country through a diplomatic arrangement which is the safest means.

However as government officials the Civil Service Code of Conduct does not allow us by law to operate any foreign account or own foreign investment and the amount of money that can be found in our account
cannot be more than our salary on the average, thus our handicapp and our need for your assistance to help collect and keep safely in your account this money.

Therefore we want you to assist us in moving this money out of Nigeria. We shall definitely compensate you handsomely for the assistance. We can do this by instructing the Security Company here in Nigeria to
move the consignment to their affiliate branch office outside Nigeria through diplomatic means and the consignment will be termed as Precious Stones and Jewelleries" which you bought during your visit to Nigeria and is being transfered to your country from here for safe keeping. Then we can arrange to meet at the destination country to take the delivery of the consignment. You will thereafter open an account there and lodge the Money there and gradually instruct remittance to your Country.

This business is 100% risk free for you so please treat this matter with utmost confidentiality .If you indicate your interest to assist us please just e-mail me for more Explanation on how we plan to execute the transaction.

Expecting your response urgently.

Best regards,

Mr. Ben Suleman"""

In [10]:
parameters = {
    "candidate_count": 1,
    "max_output_tokens": 500,
    "temperature": 0.2,
    "top_p": 0.8,
    "top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison@002")

In [11]:
response = model.predict(prompt, **parameters)
print(f"Response from Model: {response.text}")

Response from Model:  **Potential fraudulent activity:**

1. **The sender claims to be a government official, which could be a way to gain trust and credibility.**
2. **The story involves a large sum of money that was allegedly stolen from the Nigerian government, which raises suspicions about the legitimacy of the funds.**
3. **The sender wants the recipient to help move the money out of Nigeria, which could be a money laundering scheme.**
4. **The sender offers to compensate the recipient handsomely for their assistance, which could be a way to entice them into participating in the fraudulent activity.**
5. **The sender asks the recipient to treat the matter with utmost confidentiality, which could be a way to prevent them from seeking advice or reporting the activity to authorities.**

**Based on these factors, it is likely that this prompt includes fraudulent activity.**


### Testing false positive

In [12]:
# Test with an agressive campaign emails
prompt1=""" Analyse the prompt below and assess if it includes any fraudulent activity. If it is not fraudulent then only display "PASS":

Ready to ditch banking frustrations and say hello to simplicity? We're excited to announce the launch of our brand-new account, [New Account Name], designed to make managing your money easier than ever!

Here's what makes 4YOU account your perfect banking partner:

No monthly fees: Keep more of your hard-earned money with zero monthly maintenance charges:
Earn high-interest rates on your everyday balance.
Get rewarded for debit card spending.
Offer easy budgeting tools or financial insights.
Free ATM access nationwide.
Early paycheck access.
Mobile banking with advanced feature.
Seamless experience: Manage your account anytime, anywhere with our user-friendly mobile app and online banking platform.
Security you can trust: We take your financial security seriously with advanced fraud protection and industry-leading encryption.
Ready to join the 4YOU  revolution?

Opening an account is quick and easy! Simply click the link below and you'll be up and running in minutes.
"""

In [13]:
# Test with an agressive campaign emails
prompt2=""" Analyse the prompt below and assess if it includes any fraudulent activity. Explain step by step. If it is not fraudulent then only display "PASS":

Ready to ditch banking frustrations and say hello to simplicity? We're excited to announce the launch of our brand-new account, [New Account Name], designed to make managing your money easier than ever!

Here's what makes 4YOU account your perfect banking partner:

No monthly fees: Keep more of your hard-earned money with zero monthly maintenance charges:
Earn high-interest rates on your everyday balance.
Get rewarded for debit card spending.
Offer easy budgeting tools or financial insights.
Free ATM access nationwide.
Early paycheck access.
Mobile banking with advanced feature.
Seamless experience: Manage your account anytime, anywhere with our user-friendly mobile app and online banking platform.
Security you can trust: We take your financial security seriously with advanced fraud protection and industry-leading encryption.
Ready to join the 4YOU  revolution?

Opening an account is quick and easy! Simply click the link below and you'll be up and running in minutes.
"""

In [15]:
response1 = model.predict(prompt1, **parameters)
print(f"Response from Model: {response1.text}")

Response from Model:  PASS


In [16]:
response2 = model.predict(prompt2, **parameters)
print(f"Response from Model: {response2.text}")

Response from Model:  **Step 1: Identify the claims made in the prompt**

The prompt makes several claims about the 4YOU account, including:

* No monthly fees
* High-interest rates on everyday balances
* Rewards for debit card spending
* Easy budgeting tools and financial insights
* Free ATM access nationwide
* Early paycheck access
* Mobile banking with advanced features
* Seamless experience
* Security you can trust

**Step 2: Assess the legitimacy of the claims**

Each of the claims made in the prompt appears to be legitimate. There are many banks and credit unions that offer similar features and benefits to their customers.

**Step 3: Look for red flags**

There are no obvious red flags in the prompt. The language is clear and concise, and there are no promises of unrealistic returns or guarantees.

**Conclusion: PASS**

Based on the information provided in the prompt, there is no evidence of fraudulent activity. The claims made about the 4YOU account appear to be legitimate and t

In [17]:
prompt= """ You are a fraud investigator and your mission is to flag fraudulent text messages.
Below are some examples of fraudulent text messages:
Example 1:
Wells Fargo Bank Fraud Alert:
Did you attempt a purchase at Walmart for $1,263.89? Reply YES or NO
Example 2:
ATT Free Msg: December bill is paid. Thanks, here’s a little gift for you: xxx
Happy New Year!
Example 3:
USPS: Since your package address does not have a house number, we are unable to arrange home delivery for you. Please update online.
Example 4:
Whole Foods Market is starting an exceptionally huge project in your area. This project happens each week, we select shoppers to function
as a store evaluator. You will get $450 on every task. Click the link below to process your application: www.***

Now, you received a new message below. Examine this message and find if there is any fraudulent activity and explain step by step.

This is the message you will process:
Transaction update: Your account is being debited for iPhone 13 USD $599.97. Not you? Call Amazon at (888)523-6754"

"""

In [18]:
import vertexai
from vertexai.preview.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models

def generate():
  vertexai.init(project=GCP_PROJECT, location=GCP_REGION)
  model = GenerativeModel("gemini-1.0-pro-001")
  responses = model.generate_content(
    prompt,
    generation_config={
        "max_output_tokens": 2048,
        "temperature": 0.2,
        "top_p": 1
    },
    safety_settings={
          generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
          generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
          generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
          generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
    },
    stream=True,
  )

  for response in responses:
    print(response.text, end="")

generate()

**Step 1: Identify the sender**

The sender is "Amazon". Amazon is a legitimate company, so this is not a red flag.

**Step 2: Examine the content**

The message claims that the recipient's account is being debited for an iPhone 13 for $599.97. The message also provides a phone number to call if the recipient did not make the purchase.

**Step 3: Check for urgency**

The message does not create a sense of urgency. It does not say that the recipient needs to take action immediately or that their account will be closed if they do not call.

**Step 4: Look for suspicious links**

The message does not contain any links.

**Step 5: Check the grammar and spelling**

The message is well-written and has no grammatical or spelling errors.

**Conclusion**

Based on the above analysis, there are no red flags that indicate that this message is fraudulent. However, it is always a good idea to be cautious when receiving unsolicited text messages. If you are unsure whether a message is legitimate, yo

end of file.