# Transactions Fraud Detection

**Authors:** [Peter Macinec](https://github.com/pmacinec), [Timotej Zatko](https://github.com/timzatko)

## Nature Inspired Algorithms - Firefly Algorithm, Cuckoo Search Algorithm, Bat Algorithm, Flower Pollination Algorithm for Feature Selection

We will train the model with selected features on the whole dataset. With the following algorithm from Xin-She Yang.

- Firefly Algorithm
- Cuckoo Search Algorithm
- Bat Algorithm
- Flower Pollination Algorithm

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys
sys.path.append('..')

from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.tree import DecisionTreeClassifier

from NiaPy.task import StoppingTask, OptimizationType
from NiaPy.algorithms.basic import FireflyAlgorithm, CuckooSearch, \
    BatAlgorithm, FlowerPollinationAlgorithm

from src.dataset import load_processed_data
from src.data_balancing import random_undersample
from src.evaluation import custom_classification_report

from src.classification_benchmark import ClassificationBenchmark

### Load the data

In [None]:
x_train, y_train, x_test, y_test = load_processed_data()

In [None]:
x_train, y_train = random_undersample(x_train, y_train)
len(x_train), len(y_train)

### Setup classification benchmark

In [None]:
def model_fn():
    return DecisionTreeClassifier(random_state=42)

In [None]:
columns_count = len(x_train.columns)

_x_train, _x_val, _y_train, _y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

benchmark = ClassificationBenchmark(
    model_fn,
    roc_auc_score,
    _x_train,
    _y_train,
    _x_val,
    _y_val
)

In [None]:
def optimize(benchmark, algorithm, nGEN, num_runs = 5):
    """
    Optimize task with provided algorithm.
    
    :param benchmark: NiaPy.Benchmark to optimize.
    :param algorithm: algorithm object to use for optimization task.
    :param nGEN: number of generations.
    :param num_runs: number of algorithm runs (defaults to 5).
    """
    best_columns = None
    best_score = 0
    
    for i in range(num_runs):
        # when using OptimizationType.MAXIMIZATION, the library will fail
        # we use OptimizationType.MINIMIZATION instead and invert the score
        task = StoppingTask(
            D=benchmark.get_length(),
            nGEN=nGEN,
            optType=OptimizationType.MINIMIZATION,
            benchmark=benchmark
        )
        
        solution_vec, score = algorithm.run(task=task)    
        
        # invert the score
        score = 1 - score
        columns = benchmark.select_columns(solution_vec)
        
        print('--------------')
        print(f'Run {i + 1}')
        print('--------------')
        print(f'Score: {score}')
        print(f'Number of features selected: {len(columns)}\n')
        print('\n')
        
        if score > best_score:
            best_score = score
            best_columns = columns

    print(f'\nBest score of {num_runs} runs: {best_score}')
    print(f'Number of features selected: {len(best_columns)}')
            
    return best_columns

### Firefly Algorithm (FA)

In [None]:
%%time

columns = optimize(benchmark, FireflyAlgorithm(), 100)

In [None]:
clf = model_fn()
clf = clf.fit(x_train[columns], y_train)
custom_classification_report(clf, x_test[columns], y_test)

### Cuckoo Search Algorithm (CS)

In [None]:
%%time

columns = optimize(benchmark, CuckooSearch(), 100)

In [None]:
clf = model_fn()
clf = clf.fit(x_train[columns], y_train)
custom_classification_report(clf, x_test[columns], y_test)

### Flower Pollination Algorithm (FPA)

In [None]:
%%time

columns = optimize(benchmark, FlowerPollinationAlgorithm(), 100)

In [None]:
clf = model_fn()
clf = clf.fit(x_train[columns], y_train)
custom_classification_report(clf, x_test[columns], y_test)

### Bat Algorithm (BA)

In [None]:
%%time

columns = optimize(benchmark, BatAlgorithm(), 100)

In [None]:
clf = model_fn()
clf = clf.fit(x_train[columns], y_train)
custom_classification_report(clf, x_test[columns], y_test)

### Conclusion

In [None]:
TODO