## IAT: detecting micro-clusters on suspicious behavors

A group of fraudsters always behave synchronously in a regular (fixed) pattern, which probably shows
suspiciousness comparing to the normally behaving people.
Therefore, we study the overall time intervals of users, and detect the suspicious micro-clusters that stands out of the majority distributions.
It can be used with vision-guided detection algorithm, EagleMine.


In [None]:
import spartan as st

Load data by function ```loadTensor```.<br/>

In [None]:
tensor_data = st.loadTensor(path = "./inputData/test.reid.gz", header=None, sep='\x01')
tensor_data.data

In [None]:
coords, data = tensor_data.do_map(hasvalue=False, mappers={0:st.TimeMapper(timeformat='%Y-%m-%d %H:%M:%S', timebin = 1, mints = 0)})

Use ```to_aggts``` function to extract time stamps in log files or edgelist tensor

In [None]:
aggts = tensor_data.to_aggts(coords, time_col=0, group_col=[1])
# print(aggts)

## IAT class

calaggiat function：calculate iat dict **aggiat** (key:user, value: iat list)

caliatcount function：calculate iat count dict **iatcount** (key:iat, value: frequency) and iat prob dict **iatprob** (key:iat; value:probability)

caliatpaircount function：calculate iat dict **iatpaircount** (key:(iat1, iat2), value: frequency)

get_user_iatpair_dict function：calculate iat dict **user_iatpair** (key:user, value: (iat1, iat2) list)

get_iatpair_user_dict function：calculate iat dict **iatpair_user** (key:(iat1, iat2), value: user list)

find_iatpair_user function: find users who have input iat pairs

get_user_dict function: get users dict that have pairs in iatpairs ordered by decreasing frequency

find_topk_user function: find Top-K users that have pairs in iatpairs ordered by decreasing frequency

drawIatPdf: Plot Iat-Pdf line

In [None]:
instance = st.IAT()

In [None]:
# calculate aggiat dict
instance.calaggiat(aggts)

In [None]:
aggiat=instance.aggiat
# print(aggiat)

In [None]:
instance.save_aggiat('./output/aggiat.dictlist.gz')

In [None]:
instance.load_aggiat('./output/aggiat.dictlist.gz')

In [None]:
xs, ys = instance.getiatpairs()
len(xs), len(ys)

In [None]:
# invoke drawHexbin function
hexfig = st.drawHexbin(xs, ys, gridsize=5, xlabel='IATn', ylabel='IATn+1',outfig='./images/iathexbin_demo.png')

It is the result:
<img src="images/iathexbin_demo.png" width="400"/> 

In [None]:
# invoke drawRectbin function
fig, hist = st.drawRectbin(xs, ys, gridsize=10, xlabel='IATn', ylabel='IATn+1', outfig='./images/iatrectbin_demo.png')

It is the result:
<img src="images/iatrectbin_demo.png" width="400"/> 

## class RectHistogram
1. draw function: draw 2D histogram with rectangular bin


2. find_peak_range function: find the range of coordinates which bin with the largest number of samples in the range of 

    horizontal axis: [x-radius, x+radius]
    
    vertical axis: [y-radius, y+radius]

    return xrange: the range of max bin along the x axis and yrange: the range of max bin along the y axis.
    

3. find_peak_rects function: find coordinate pairs in the max bin

    return: (x,y) pairs in the bin that has the largest number of samples 

In [None]:
recthistogram = st.RectHistogram(xscale='log', yscale='log', gridsize=10)

To get iatpairs, you need to execute draw function first.

In [None]:
fig = recthistogram.draw(xs, ys, xlabel='IATn', ylabel='IATn+1')

In [None]:
xrange, yrange = recthistogram.find_peak_range(x=100, y=100, radius=100)
print(f"the range of max bin along the x axis:\n {xrange}")
print(f"the range of max bin along the y axis:\n {yrange}")

In [None]:
iatpairs = recthistogram.find_peak_rect(xrange, yrange)
print(iatpairs)

### Find Top-k suspicious users

In [None]:
instance.get_iatpair_user_dict()

In [None]:
instance.get_user_dict(iatpairs)

In [None]:
usrlist = instance.find_topk_user(k=5) # default return all, k = -1
print(f"Top-5 user: \n{usrlist}")

plot iat-pdf line by function `drawIatPdf`

In [None]:
instance.caliatcount()

In [None]:
fig = instance.drawIatPdf(usrlist, outfig='./images/iatpdf_demo.png')

It is the result:
<img src="images/iatpdf_demo.png" width="400"/> 