## IAT: detecting micro-clusters on suspicious behavors

A group of fraudsters always behave synchronously in a regular (fixed) pattern, which probably shows
suspiciousness comparing to the normally behaving people.
Therefore, we study the overall time intervals of users, and detect the suspicious micro-clusters that stands out of the majority distributions.
It can be used with vision-guided detection algorithm, EagleMine.


In [None]:
import spartan as st

Load data by function ```loadTensor```.<br/>

In [None]:
tensor_data = st.loadTensor(path = "/home/liushenghua/Data/wbcovid19rummor/renameoppuid/*.reid.reid.gz", header=None, sep='\x01')

In [None]:
tensor_data.data

In [None]:
coords, data = tensor_data.do_map(hasvalue=False, mappers={0:st.TimeMapper(timeformat='%Y-%m-%d %H:%M:%S', timebin = 1, mints = 0)})

In [None]:
aggts = tensor_data.to_aggts(coords, time_col=0, group_col=[1])
len(aggts)

## IAT class

calaggiat function：calculate iat dict **aggiat** (key:user, value: iat list)

caliatcount function：calculate iat count dict **iatcount** (key:iat, value: frequency)

caliatpaircount function：calculate iat dict **iatpaircount** (key:(iat1, iat2), value: frequency)

get_user_iatpair_dict function：calculate iat dict **user_iatpair** (key:user, value: (iat1, iat2) list)

get_iatpair_user_dict function：calculate iat dict **iatpair_user** (key:(iat1, iat2), value: user list)

find_iatpair_user function: find users who have input iat pairs

find_iatpair_user_ordered function: find Top-K users that have pairs in iatpairs ordered by decreasing frequency

drawIatPdf: Plot Iat-Pdf line

In [None]:
instance = st.IAT()

In [None]:
# calculate aggiat dict
instance.calaggiat(aggts)

In [None]:
aggiat=instance.aggiat

In [None]:
instance.save_aggiat('/home/liushenghua/Data/wbcovid19rummor/renameoppuid/aggiat2.dictlist.gz')
#instance.load_aggiat('./output/aggiat.dictlist.gz')

In [None]:
xs, ys = instance.getiatpairs()
len(xs), len(ys)

## class RectHistogram
draw function: draw 2D histogram with rectangular bin

find_peak_rects function: find the bin with the largest number of samples in the range of
horizontal axis: [x-radius, x+radius]
vertical axis: [y-radius, y+radius]
    
return: (x,y) pairs in the bin that has the largest number of samples 

In [None]:
recthistogram = st.RectHistogram(xscale='log', yscale='log', gridsize=100)

In [None]:
fig, H, xedges, yedges = recthistogram.draw(xs, ys, xlabel='IATn', ylabel='IATn+1')

In [None]:
coordpairs = recthistogram.find_peak_rect(xs, ys, H, xedges, yedges, x=100, y=100, radius=100)
print(coordpairs)

### Find Top-k suspicious users

In [None]:
usrlist = instance.find_iatpair_user_ordered(coordpairs) # default return all, k = -1
print(f"All user: \n{usrlist}")
usrlist = instance.find_iatpair_user_ordered(coordpairs, k=5)
print(f"Top-5 user: \n{usrlist}")

plot iat-pdf line by function `drawIatPdf`

In [None]:
fig = instance.drawIatPdf(usrlist, outfig='./images/iatpdf_demo.png')

It is the result:
<img src="images/iatpdf_demo.png" width="400"/> 