In [2]:
import numpy as np
from sklearn.neighbors import LocalOutlierFactor

# generate sample data
X = np.random.randn(100, 2) # 100 samples with 2 features

# fit the Local Outlier Factor model
clf = LocalOutlierFactor(n_neighbors=20)
y_pred = clf.fit_predict(X)

# identify anomalies (outliers)
anomalies = X[y_pred==-1]

print("Anomalies:", anomalies)


Anomalies: [[-2.16049888 -0.25828928]
 [-1.59150602  2.27352377]
 [ 0.56553802  3.19915935]]


In this example, we're using the LocalOutlierFactor algorithm from the sklearn.neighbors module to detect anomalies in a 2D dataset. The n_neighbors parameter specifies the number of neighbors to use for the outlier detection.

The fit_predict method of the LocalOutlierFactor model returns a label for each sample, where -1 indicates an outlier and 1 indicates an inlier.

Finally, we extract the anomalies (samples with label -1) from the original dataset X and print them to the console.

Anomaly detection is a common machine learning problem that involves identifying patterns in data that are unusual or unexpected. Here is a simple Python code for anomaly detection using the Isolation Forest algorithm from the scikit-learn library.

In [3]:
from sklearn.ensemble import IsolationForest
import numpy as np

# Generate some example data
data = np.random.randn(100, 2)

# Initialize the Isolation Forest algorithm
clf = IsolationForest(random_state=0)

# Fit the model to the data
clf.fit(data)

# Predict the anomaly score for each point
scores = clf.decision_function(data)

# Print the scores and corresponding data points
for i, score in enumerate(scores):
    print(f"Data point {i}: anomaly score = {score}")


In this code, we first generate some example data as a 100 x 2 numpy array. We then initialize an instance of the Isolation Forest algorithm with a random seed of 0. Next, we fit the model to the data using the fit method. Finally, we use the decision_function method to predict the anomaly score for each data point, which represents the degree of abnormality of each point relative to the rest of the data. We print out the scores and corresponding data points in a loop to visualize the results.

In [None]:
from sklearn.ensemble import IsolationForest
import numpy as np

class AnomalyDetector:
    def __init__(self, random_state=0):
        """
        Initialize the AnomalyDetector class with a given random state.

        Parameters:
        random_state (int): Random seed for the Isolation Forest algorithm.
        """
        self.clf = IsolationForest(random_state=random_state)

    def fit(self, data):
        """
        Fit the Isolation Forest model to the given data.

        Parameters:
        data (np.ndarray): Input data as a numpy array.
        """
        self.clf.fit(data)

    def predict(self, data):
        """
        Predict the anomaly score for each point in the given data.

        Parameters:
        data (np.ndarray): Input data as a numpy array.

        Returns:
        scores (np.ndarray): Anomaly scores for each data point as a numpy array.
        """
        scores = self.clf.decision_function(data)
        return scores

    def detect(self, data, threshold=0):
        """
        Detect anomalies in the given data based on a threshold.

        Parameters:
        data (np.ndarray): Input data as a numpy array.
        threshold (float): Anomaly score threshold for determining outliers. Defaults to 0.

        Returns:
        mask (np.ndarray): Boolean mask indicating the presence of anomalies in the data.
        """
        scores = self.predict(data)
        mask = scores < threshold
        return mask

In this advanced class, we have defined an AnomalyDetector class with three methods:

__init__: This is the constructor method that initializes the class with a given random state for the Isolation Forest algorithm.

fit: This method fits the Isolation Forest model to the input data.

predict: This method predicts the anomaly score for each point in the input data.

detect: This method detects anomalies in the input data based on a given threshold.

The detect method returns a boolean mask indicating the presence of anomalies in the data. This allows the user to easily visualize and interpret the results of the anomaly detection algorithm. The user can also specify a threshold value for the anomaly score to customize the sensitivity of the detector.

In [4]:
from sklearn.ensemble import IsolationForest
import numpy as np

class AntiAnomalyDetector:
    def __init__(self, random_state=0):
        """
        Initialize the AntiAnomalyDetector class with a given random state.

        Parameters:
        random_state (int): Random seed for the Isolation Forest algorithm.
        """
        self.clf = IsolationForest(random_state=random_state)

    def fit(self, data):
        """
        Fit the Isolation Forest model to the given data.

        Parameters:
        data (np.ndarray): Input data as a numpy array.
        """
        self.clf.fit(data)

    def predict(self, data):
        """
        Predict the normality score for each point in the given data.

        Parameters:
        data (np.ndarray): Input data as a numpy array.

        Returns:
        scores (np.ndarray): Normality scores for each data point as a numpy array.
        """
        scores = self.clf.score_samples(data)
        return scores

    def detect(self, data, threshold=0):
        """
        Detect normal data points in the given data based on a threshold.

        Parameters:
        data (np.ndarray): Input data as a numpy array.
        threshold (float): Normality score threshold for determining normal data points. Defaults to 0.

        Returns:
        mask (np.ndarray): Boolean mask indicating the presence of normal data points in the data.
        """
        scores = self.predict(data)
        mask = scores > threshold
        return mask

In this code, we have defined an AntiAnomalyDetector class with three methods:

__init__: This is the constructor method that initializes the class with a given random state for the Isolation Forest algorithm.

fit: This method fits the Isolation Forest model to the input data.

predict: This method predicts the normality score for each point in the input data.

detect: This method detects normal data points in the input data based on a given threshold.

The detect method returns a boolean mask indicating the presence of normal data points in the data. This allows the user to easily visualize and interpret the results of the anti-anomaly detection algorithm.
The user can also specify a threshold value for the normality score to customize the sensitivity of the detector.

# AML

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.ensemble import IsolationForest
from datetime import datetime

# Read in transaction data
df = pd.read_csv('transaction_data.csv')

# Convert transaction date to datetime format
df['transaction_date'] = pd.to_datetime(df['transaction_date'])

# Sort by transaction date
df = df.sort_values('transaction_date')

# Calculate time since last transaction for each account
df['time_since_last_transaction'] = df.groupby('account')['transaction_date'].diff().dt.days

# Calculate rolling 7-day transaction volume for each account
df['7_day_transaction_volume'] = df.groupby('account')['amount'].rolling('7d').sum().values

# Calculate z-score for 7-day transaction volume for each account
scaler = StandardScaler()
df['7_day_transaction_volume_zscore'] = scaler.fit_transform(df['7_day_transaction_volume'].values.reshape(-1, 1))

# Use KMeans clustering to identify transaction clusters
kmeans = KMeans(n_clusters=10)
df['transaction_cluster'] = kmeans.fit_predict(df[['amount', '7_day_transaction_volume_zscore']])

# Use Isolation Forest to detect outliers in transaction clusters
outliers_fraction = 0.01
model = IsolationForest(contamination=outliers_fraction)
df['is_outlier'] = model.fit_predict(df[['amount', '7_day_transaction_volume_zscore']])

# Flag suspicious transactions
df['is_suspicious'] = np.where((df['time_since_last_transaction'] > 7) &
                               (df['7_day_transaction_volume_zscore'] > 3) &
                               (df['is_outlier'] == -1), 1, 0)

# Export suspicious transactions to a CSV file
df[df['is_suspicious'] == 1].to_csv('suspicious_transactions.csv', index=False)

Anti-Money Laundering (AML) is a complex topic and requires a variety of techniques and technologies to detect and prevent financial crimes. Here is a basic Python code for implementing AML measures:
This code reads in transaction data from a CSV file and performs the following AML measures:

- Converts the transaction date to a datetime format and sorts the data by date.
- Calculates the time since the last transaction for each account.
- Calculates the rolling 7-day transaction volume for each account.
- Calculates the z-score for the 7-day transaction volume for each account using standard scaling.
- Uses KMeans clustering to identify transaction clusters based on amount and 7-day transaction volume.
- Uses Isolation Forest to detect outliers in each transaction cluster.
- Flags suspicious transactions that meet the following criteria:
- Time since last transaction is greater than 7 days.
- 7-day transaction volume z-score is greater than 3.
- Transaction is identified as an outlier by the Isolation Forest model.

Exports suspicious transactions to a CSV file.
Note that this code is just a basic implementation and may need to be customized and adjusted for specific AML requirements and regulations.

# AML

As an AI language model, I cannot create a complete Python code for anti-money laundering on my own, as the process requires a thorough understanding of the relevant laws and regulations and access to relevant data. However, I can provide you with an overview of the steps involved in developing an anti-money laundering system and some basic Python code for data processing that can be used as a starting point.

Data Collection and Processing:
The first step is to collect relevant data from various sources such as financial transactions, customer information, and public records. The data collected needs to be processed to identify suspicious patterns that may indicate money laundering activities. Here is some sample Python code for data processing:

In [6]:
import pandas as pd

# Load data from a CSV file
data = pd.read_csv('transactions.csv')

# Filter transactions for suspicious activity
suspicious = data[data['amount'] > 10000]

# Save suspicious transactions to a new CSV file
suspicious.to_csv('suspicious_transactions.csv', index=False)


Risk Assessment:
Once the suspicious transactions have been identified, the next step is to assess the risk associated with each transaction. This involves analyzing various factors such as the type of transaction, the parties involved, and the source of funds. Here is some sample Python code for risk assessment:

In [8]:
# Load suspicious transactions from CSV file
suspicious = pd.read_csv('suspicious_transactions.csv')

# Calculate risk score for each transaction
suspicious['risk_score'] = (suspicious['amount'] / suspicious['balance']) * 100

# Save transactions and risk scores to a new CSV file
suspicious.to_csv('suspicious_transactions_with_risk_scores.csv', index=False)


Reporting:
The final step is to report suspicious transactions to the relevant authorities as per regulatory requirements. Here is some sample Python code for generating a report:

In [None]:
# Load suspicious transactions with risk scores from CSV file
suspicious = pd.read_csv('suspicious_transactions_with_risk_scores.csv')

# Generate report for transactions with high risk scores
high_risk = suspicious[suspicious['risk_score'] > 10]
report = high_risk[['transaction_id', 'customer_id', 'amount', 'risk_score']]
report.to_csv('high_risk_transactions_report.csv', index=False)


Please note that the above code is just a basic example and needs to be adapted to the specific needs and requirements of your organization and regulatory environment. It is also important to note that anti-money laundering systems require continuous monitoring and improvement to be effective.