# Simulating Rainbow Trades, Clients & Fraud Classifications

We need to demonstrate the ability to detect normal & abnormal trading behaviour of clients and client groups (client profiling & anomaly detection). 

We also want to be able to flag trade sequences that are similar to known fraudulent activity (classification).

Finally, we want to demonstrate the ability to predict trading behaviour of clients or groups of clients.

Given the above requirements we will create the following data:

* Referential data that identifies some arbitrary grouping for each client
* Ficticious trades - let's call then "Rainbow Trades" - that have 3 varying attributes - Red, Green & Blue mix that allows us to easily visualise trades in 3 dimensional space and to create classification labels
* Noise will be added to the RGB value to allow us to create "similar" but not identical trades
* Different sequences of ficticious trades will be created for each group of clients, (e.g. Red, Green, Orange, Red, Green Orange .....) to facilitate learning temporal behaviour and prediction
* Changes in sequences of ficticious trades will be introduced to facilitate the detection of abnormalities

### Simulate client referential

In [None]:
# six clients in 3 groups - 'sectors'
#
clients = [{'name': 'ABC_Ltd', 'group': 'construction'},
           {'name': 'DEF_Ltd', 'group': 'construction'},
           {'name': 'HIG_Ltd', 'group': 'ecommerce'},
           {'name': 'KLM_Ltd', 'group': 'ecommerce'},
           {'name': 'NOP_Ltd', 'group': 'car_manufacturer'},
           {'name': 'QRS_Ltd', 'group': 'car_manufacturer'},
          ]

# create map of clients in each group
#
groups = {}
for client in clients:
    if client['group'] not in groups:
        groups[client['group']] = [client]
    else:
        groups[client['group']].append(client)
groups

## Simulate the Rainbow Trades

In [None]:
# use standard random library to generate random numbers
#
import random

# so we can always repeat the experiment
#
random.seed(221166)


# standard RGB colours (red, green, blue)

rgb = {'RED': (255, 0, 0),
       'ORANGE': (255, 129, 0),
       'YELLOW': (255, 233, 0),
       'GREEN': (0, 202, 14),
       'BLUE': (22, 93, 239),
       'PURPLE': (166, 1, 214),
       'BROWN': (151, 76, 2),
       'GREY': (128, 128, 128),
       'BLACK': (0, 0, 0),
       'TURQUOISE': (150, 255, 255),
       'CORAL': (240,128,128)}

# number of times to repeat sequence
#
sequence_count = 10


# standard behaviour (sequence of rainbow trades) for different groups
#
group_sequences = {'construction': ['RED', 'BLUE', 'YELLOW'],
                   'ecommerce': ['PURPLE', 'GREEN', 'ORANGE', 'GREY', 'BROWN', 'RED', 'BLUE', 'YELLOW'],
                   'car_manufacturer':['TURQUOISE', 'BLACK', 'PURPLE', 'GREEN', 'ORANGE', 'GREY', 'BROWN', 'RED', 'BLUE', 'YELLOW']}

# anomalous sequences for each group 
# - first sequence is a innocent change in behaviour
# - second sequence will be a fruadulent sequence due to presence of 'CORAL'
#
anomaly_sequences = {'construction': [['RED', 'YELLOW', 'BLUE'],
                                      ['CORAL', 'BLUE', 'YELLOW']
                                     ],
                     'ecommerce': [['ORANGE', 'GREY', 'PURPLE', 'GREEN', 'BROWN', 'RED', 'BLUE', 'YELLOW'],
                                   ['PURPLE', 'GREEN', 'ORANGE', 'GREY', 'BROWN', 'CORAL', 'BLUE', 'YELLOW']
                                  ],
                     'car_manufacturer':[['TURQUOISE', 'BLACK', 'PURPLE', 'GREY', 'BROWN', 'GREEN', 'ORANGE', 'RED', 'BLUE', 'YELLOW'],
                                         ['TURQUOISE', 'BLACK', 'PURPLE', 'GREEN', 'ORANGE', 'GREY', 'BROWN', 'CORAL', 'BLUE', 'YELLOW']
                                        ]}

# keep track of stats and sequence types for each client
#
client_state = {}

# maximum amount of noise is 10% of the max value of 255
#
noise = 255 * 0.1

# the data for training
#
simulated_data = []

# the unique trade id
#
trade_id = 0
while trade_id < 1000:
    
    # get the group
    #
    group = random.choice(list(groups.keys()))
    
    # get the client
    #
    client_idx = random.randint(0, 1)
    client = groups[group][client_idx]['name']


    # keep track of the client state
    #
    if client not in client_state:
        client_state[client] = {'nos_sequences': 1,
                                'nos_trades':0,
                                'seq_idx': 0,
                                'fraud': 'no',
                                'anomaly': 'no'}
    
    # keep trake of number of trades for client
    #
    client_state[client]['nos_trades'] += 1
    
    # choose the sequence to use - which depends on if we are using an anomalous, fraudulent or normal sequence
    #
    if client_state[client]['anomaly'] == 'yes':
        sequence = anomaly_sequences[group][0]
        if group == 'construction':
            anomaly_seq_idx = 1
        elif group == 'ecommerce':
            anomaly_seq_idx = 0
        else:
            anomaly_seq_idx = 3
        if client_state[client]['seq_idx'] == anomaly_seq_idx:
            client_state[client]['anomaly_id'] = trade_id
            
    elif client_state[client]['fraud'] == 'yes':
        sequence = anomaly_sequences[group][1]
        
        if group == 'construction':
            fraud_seq_idx = 0
        elif group == 'ecommerce':
            fraud_seq_idx = 5
        else:
            fraud_seq_idx = 7
        
        if client_state[client]['seq_idx'] == fraud_seq_idx:
            client_state[client]['fraud_id'] = trade_id

    else:
        # must be normal
        #
        sequence = group_sequences[group]
    
    # get the colour of the rainbow trade
    #
    rainbow_trade_colour = sequence[client_state[client]['seq_idx']]
    
    # move index for client and wrap if required
    #
    client_state[client]['seq_idx'] += 1
    if client_state[client]['seq_idx'] >= len(sequence):
        client_state[client]['seq_idx'] = 0
        client_state[client]['nos_sequences'] += 1
        
        # if either the anomaly or fraud sequence has been completed then flag as done
        #
        if client_state[client]['anomaly'] == 'yes':
            client_state[client]['anomaly'] = 'done'
            
        elif client_state[client]['fraud'] == 'yes':
            client_state[client]['fraud'] = 'done'
        
        # if either the anomaly or fraud hasn't been done and the nomber of sequences is enough then randomly choose to change the sequence
        #
        if client_state[client]['nos_sequences'] > 10 and random.random()> 0.5 and (client_state[client]['anomaly'] == 'no' or client_state[client]['fraud'] == 'no'):
            if client_state[client]['anomaly'] == 'no':
                client_state[client]['anomaly'] = 'yes'
                
            elif client_state[client]['fraud'] == 'no':
                client_state[client]['fraud'] = 'yes'
            
    # trade is a simple dictionary of: 
    #     trade_id
    #     client
    #     r
    #     g
    #     b
    #
    trade = {'trade_id': trade_id, 'client': client}

    # calc some noise and modify the RGB component of this colour if its not CORAL (the one fraudulent trade)
    #
    if rainbow_trade_colour !='CORAL':
        rnd = random.random()
        noise_delta = noise * rnd 
        if rnd >= 0.5:
            trade['r'] = int(min(rgb[rainbow_trade_colour][0] + noise_delta, 255))
        else:
            trade['r'] = int(max(rgb[rainbow_trade_colour][0] - noise_delta, 0))

        # same for green
        #
        rnd = random.random()
        noise_delta = noise * rnd 
        if rnd >= 0.5:
            trade['g'] = int(min(rgb[rainbow_trade_colour][1] + noise_delta, 255))
        else:
            trade['g'] = int(max(rgb[rainbow_trade_colour][1] - noise_delta, 0))

        # same for blue
        #
        rnd = random.random()
        noise_delta = noise * rnd 
        if rnd >= 0.5:
            trade['b'] = int(min(rgb[rainbow_trade_colour][2] + noise_delta, 255))
        else:
            trade['b'] = int(max(rgb[rainbow_trade_colour][2] - noise_delta, 0))
    else:
        trade['r'] = rgb[rainbow_trade_colour][0]
        trade['g'] = rgb[rainbow_trade_colour][1]
        trade['b'] = rgb[rainbow_trade_colour][2]
        
                
    # add the trade
    #
    simulated_data.append(trade)
    
    # inc trade_id
    #
    trade_id += 1

print(len(simulated_data))
client_state

## We will make use of Plotly to visualise what is going on

Below is a plot of the timeseries of colours we have created

In [None]:
# library to plot data
#
import plotly.graph_objects as go


# lists to hold x y coordinates of each Rainbow Trade
#
x = []
y = []

# list of RGB colours for each Rainbow Trade
#
colours = []

# list of labels to be displayed when mouse hovers over a Rainbow Trade
#
labels = []

# size of plot for each Rainbow Trade
#
size = []

# the type of marker per trade
#
symbol = []

for trade_id in range(len(simulated_data)):

    # a coloured blob for each trade
    #
    x.append(trade_id)
    y.append(simulated_data[trade_id]['client'])
    
    if (('anomaly_id' in client_state[simulated_data[trade_id]['client']] and client_state[simulated_data[trade_id]['client']]['anomaly_id'] == trade_id) or 
       ('fraud_id' in client_state[simulated_data[trade_id]['client']] and client_state[simulated_data[trade_id]['client']]['fraud_id'] == trade_id)):
        size.append(15)
        
        if client_state[simulated_data[trade_id]['client']]['anomaly_id'] == trade_id:
            symbol.append('square')
        
            trade_class = 'ABNORMAL'
        else:
            symbol.append('diamond')
        
            trade_class = 'FRAUD'
    else:
        size.append(10)
        symbol.append('circle')
        trade_class = ''

        
    # specify the hover text
    #
    labels.append('client: {} r: {} g: {} b: {} {}'.format(simulated_data[trade_id]['client'], simulated_data[trade_id]['r'], simulated_data[trade_id]['g'], simulated_data[trade_id]['b'], trade_class))
        
    # specify the colour for this blob
    #
    colours.append('rgb({},{},{})'.format(simulated_data[trade_id]['r'], simulated_data[trade_id]['g'], simulated_data[trade_id]['b']))
    
    
# the coloured points
#
ts_points = go.Scatter(x=x, y=y, hovertext=labels, mode='markers', marker=dict(size=size, color=colours, opacity=1.0, symbol=symbol))

# a plot consists of a figure object
#
fig = go.Figure(data=[ts_points])
fig.update_layout(width=2000, height=900, title=dict(text='Time Series'))
fig.show()

In [None]:
fraudulent_sequences = [['CORAL', 'BLUE', 'YELLOW', 'RED', 'BLUE'],
                        ['GREY', 'BROWN', 'CORAL', 'BLUE', 'YELLOW'],
                        ['GREEN', 'ORANGE', 'GREY', 'BROWN', 'CORAL'],
                       ]

fraud_sequences = []

f_trade_id = 0

for f_seq_idx in range(len(fraudulent_sequences)):
    
    f_seq = {'fraud_seq_id':f_seq_idx, 'trades':[]}
    
    for trade_id in range(len(fraudulent_sequences[f_seq_idx])):
        trade = {'trade_id': trade_id}
        
        rainbow_trade_colour = fraudulent_sequences[f_seq_idx][trade_id]
        
        # calc some noise and modify the RGB component of this colour if its not CORAL (the one fraudulent trade)
        #
        if rainbow_trade_colour !='CORAL':
            rnd = random.random()
            noise_delta = noise * rnd 
            if rnd >= 0.5:
                trade['r'] = int(min(rgb[rainbow_trade_colour][0] + noise_delta, 255))
            else:
                trade['r'] = int(max(rgb[rainbow_trade_colour][0] - noise_delta, 0))

            # same for green
            #
            rnd = random.random()
            noise_delta = noise * rnd 
            if rnd >= 0.5:
                trade['g'] = int(min(rgb[rainbow_trade_colour][1] + noise_delta, 255))
            else:
                trade['g'] = int(max(rgb[rainbow_trade_colour][1] - noise_delta, 0))

            # same for blue
            #
            rnd = random.random()
            noise_delta = noise * rnd 
            if rnd >= 0.5:
                trade['b'] = int(min(rgb[rainbow_trade_colour][2] + noise_delta, 255))
            else:
                trade['b'] = int(max(rgb[rainbow_trade_colour][2] - noise_delta, 0))
        else:
            trade['r'] = rgb[rainbow_trade_colour][0]
            trade['g'] = rgb[rainbow_trade_colour][1]
            trade['b'] = rgb[rainbow_trade_colour][2]
        f_seq['trades'].append(trade)
    
    fraud_sequences.append(f_seq)
fraud_sequences

## Save Data to JSON file

In [None]:
# library to read / write json
#
import json

file_name= '../data/example_clients.json'
with open(file_name, 'w') as fp:
    json.dump(clients, fp, indent=4)           

file_name= '../data/rainbow_trades.json'
with open(file_name, 'w') as fp:
    json.dump(simulated_data, fp, indent=4)
        
file_name= '../data/client_stats.json'
with open(file_name, 'w') as fp:
    json.dump(client_state, fp, indent=4)

file_name= '../data/fraud_sequences.json'
with open(file_name, 'w') as fp:
    json.dump(fraud_sequences, fp, indent=4)