# Fidelis Baarncat  + Graphistry Mashup

This notebook explores how to easily view the result of Barncat RAT IOCs as graphs. By looking at multiple incidents, and how the metadata involved links across them, we can see the characteristics of attacks. 

The notebook generates two views:

1. A bipartite graph entity/event graph: events link to referenced entities, and thereby form chains with the events who also reference those entities.

2. An entity graph: nodes are entities, and they're linked by events in common.


## Setup
*Install*:  

1. `pip install juypter graphistry`

2. Optionally, put in your own exported CSV for the `read_csv` call below.

*Run*: 

1. `juypter`

2. Navigate to this file

3. Put your API key into the first cell

4. "Cell" -> "Run all"

In [12]:
import pandas as pd
import graphistry
#graphistry.register(key='...')

In [14]:
df = pd.read_csv('barncat.1k.csv', encoding = "utf8")
eval(df[:10]['value'].tolist()[0])

{'Campaign': 'TRANSFORMICE',
 'Date': '2015-11-19 14:04:23',
 'Domain': 'spynet1.ddns.net',
 'InstallDir': 'TEMP',
 'InstallFlag': 'True',
 'InstallName': 'svchost.exe',
 'NetworkSeparator': "|'|'|",
 'Origin': 'vt',
 'Port': '1177',
 'RegistryValue': 'ba4c12bee3027d94da5c81db2d196bfd',
 'Version': '0.6.4',
 'compile_date': '2015-11-18 21:25:59',
 'imphash': 'f34d5f2d4577ed6d9ceec516c1f5a744',
 'magic': 'PE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono/.Net assembly',
 'md5': '007a8403b3281fd4d48c69f4c96da0b8',
 'rat_name': 'njRat',
 'section_.RELOC': '7905c1aa858eb5484ad08a2e10b7e50e',
 'section_.RSRC': '5b346ed223699f15252c1fdad182859f',
 'section_.TEXT': 'f414cace41511d02fb8e278cf36fd2a3',
 'sha1': 'd215edec90c5487800d961cc1ac2808e221818fa',
 'sha256': '2beb53ca652d9d4f73516ce45365ae824370d2408d6b0d5a809cf3cd177ba694'}

In [15]:
#avoid double counting
df3 = df[df['value'].str.contains("{")]
df3[5:10]

Unnamed: 0,uuid,event_id,category,type,value,to_ids,date
56,56e1ae5e-73a4-4d7b-9110-0da4ac1f3af3,221,External analysis,comment,"{""InstallFlag"": ""True"", ""RegistryValue"": ""d5a3...",0,20160310
65,56e1ae5f-7178-43f3-9467-4f6fac1f3af3,222,External analysis,comment,"{""OfflineKeylogger"": ""1"", ""Version"": ""#KCMDDC5...",0,20160310
77,56e1ae60-e0e0-4b6d-8e0d-5056ac1f3af3,223,External analysis,comment,"{""OfflineKeylogger"": ""1"", ""Version"": ""#KCMDDC5...",0,20160310
81,56e1ae62-2d98-45f7-8840-501fac1f3af3,224,External analysis,comment,"{""MeltFile"": ""False"", ""InstallFlag"": ""True"", ""...",0,20160310
90,56e1ae63-1b60-4942-9a16-5058ac1f3af3,225,External analysis,comment,"{""Group"": ""Facebook"", ""Version"": ""\u00071.2.2....",0,20160310


## Flatten JSON

In [16]:
def copy(expanded, r): 
    v = r['value']
    try:
        v = eval(v)
        if type(v) != dict:
            v = {'val': v}
    except:
        v = {'hash': v}
    out = {k: r[k] for k in df.columns if k != 'value' and k != 'val'}
    for k in v:
        out[k] = v[k]
    expanded.append(out)
    return 1
    
def barncatToGraph(df):
    expanded = []
    df.apply(lambda r: copy(expanded, r), axis=1)
    return pd.DataFrame(expanded)

barncatToGraph(df3[5:7].copy())

Unnamed: 0,Campaign,Date,Domain,FireWallBypass,Gencode,InstallDir,InstallFlag,InstallName,Mutex,NetworkSeparator,...,section_.RDATA,section_.RELOC,section_.RSRC,section_.TEXT,section_.TLS,sha1,sha256,to_ids,type,uuid
0,KICASS_2015_TALGHIMM_BY_PAYWAND,2015-09-02 16:58:14,paywandbacktrack12.no-ip.biz,,,TEMP,True,chrome.exe,,|'|'|,...,,12efddae46005085884d52cb3a83a3a7,5b346ed223699f15252c1fdad182859f,97079ba2e13a5e968d7b4e177b43fc73,,420d4afd349ca6cfb144f079987654faf0fc1104,2a89d83d82d9b2f22af992806a4d92ba3265893e93a1e3...,0,comment,56e1ae5e-73a4-4d7b-9110-0da4ac1f3af3
1,RL Hacker,2015-11-10 23:04:15,nve.no-ip.org,0.0,lYuWr45j19q9,,,,DC_MUTEX-1NP4XTP,,...,c1788dfeb92bbf0cff5aeaeaf1270ff8,e55564594dad16a2ca19fb85903b9300,2af4f890e9c69b99c04af0f2ec8852fe,8067456c5dc713997e61924c501c8cb2,d41d8cd98f00b204e9800998ecf8427e,0f3fb80eea1087ecfecc02b54d90236ee81366c6,17845a58e5909ccaaae78facedf6a63defe657beafa5c3...,0,comment,56e1ae5f-7178-43f3-9467-4f6fac1f3af3


## Graph Conversions
**Simple graph**: Show malware metadata as nodes and connect when they have a malware sample in common. This view helps for seeing malware families, but gets crowded within a family.

**Hyper graph**: Bipartite graph between malware samples and the metadata they have in common. This helps clarify what metadata a family of malware samples have in common vs. how they vary. It has linearly more nodes than the simple graph, but quadratically fewer edges, and is thus generally preferred.

In [17]:
### COMMON TO HYPERGRAPH AND SIMPLE GRAPH
def makeDefs(DEFS, opts={}):
    defs = {key: opts[key] if key in opts else DEFS[key] for key in DEFS}    
    base_skip = opts['SKIP'] if 'SKIP' in opts else defs['SKIP']
    skip = [x for x in base_skip] #copy
    defs['SKIP'] = skip
    for key in DEFS:
        if not defs[key] in skip:
            skip.append(defs[key])
    return defs

def screen_entities(events, entity_types, defs):
    base = entity_types if not entity_types == None else events.columns
    return [x for x in base if not x in defs['SKIP']]

#ex output: pd.DataFrame([{'val::state': 'CA', 'nodeType': 'state', 'nodeID': 'state::CA'}])
def format_entities(events, entity_types, defs, drop_na):
    lst = sum([[{
                    col: v,
                    defs['TITLE']: v,
                    defs['NODETYPE']: col, 
                    defs['NODEID']: col + defs['DELIM'] + str(v)
                } 
                for v in events[col].unique() if v != 'nan'] for col in entity_types], [])
    return pd.DataFrame(lst)


In [18]:
DEFS_HYPER = {
    'TITLE': 'nodeTitle',
    'DELIM': '::',
    'NODEID': 'nodeID',
    'ATTRIBID': 'attribID',
    'EVENTID': 'eventID',
    'NODETYPE': 'nodeType',
    'EDGETYPE': 'edgeType',
    'SKIP': ['_time', 'priority', 'severity', 'finalDeviceVendor', 'categoryDeviceType']
}


#ex output: pd.DataFrame([{'edgeType': 'state', 'attribID': 'state::CA', 'eventID': 'eventID::0'}])
def format_hyperedges(events, entity_types, defs, drop_na, drop_edge_attrs):
    subframes = []
    for col in entity_types:
        raw = events[[col, defs['EVENTID']]].copy()
        if drop_na:
            raw = raw.dropna()[[col, defs['EVENTID']]].copy()
        if len(raw):
            raw[defs['EDGETYPE']] = raw.apply(lambda r: col, axis=1)
            raw[defs['ATTRIBID']] = raw.apply(lambda r: col + defs['DELIM'] + str(r[col]), axis=1)
            subframes.append(raw)
    print('printing')
    if len(subframes):
        return pd.concat(subframes)[[defs['EDGETYPE'], defs['ATTRIBID'], defs['EVENTID']]]
    return pd.DataFrame([])

def format_hypernodes(events, defs, drop_na):
    event_nodes = events.copy()
    event_nodes[defs['NODETYPE']] = defs['EVENTID']
    event_nodes[defs['NODEID']] = event_nodes[defs['EVENTID']]    
    event_nodes[defs['TITLE']] = event_nodes[defs['EVENTID']]    
    return event_nodes

def hyperbinding(defs, entities, event_entities, edges):
    return graphistry\
        .bind(source=defs['ATTRIBID'], destination=defs['EVENTID']).edges(edges)\
        .bind(node=defs['NODEID'], point_title=defs['TITLE']).nodes(pd.concat([entities, event_entities]))

def hypergraph(raw_events, entity_types=None, opts={}, drop_na=True, drop_edge_attrs=True):
    defs = makeDefs(DEFS_HYPER, opts)
    entity_types = screen_entities(raw_events, entity_types, defs)
    events = raw_events.copy()
    if defs['EVENTID'] in events.columns:
        events[defs['EVENTID']] = events.apply(
            lambda r: defs['EVENTID'] + defs['DELIM'] + str(r[defs['EVENTID']]), 
            axis=1)
    else:
        events[defs['EVENTID']] = events.reset_index().apply(
            lambda r: defs['EVENTID'] + defs['DELIM'] + str(r['index']), 
            axis=1)
    events[defs['NODETYPE']] = 'event'
    entities = format_entities(events, entity_types, defs, drop_na)
    event_entities = format_hypernodes(events, defs, drop_na)
    edges = format_hyperedges(events, entity_types, defs, drop_na, drop_edge_attrs)
    print('# links', len(edges))
    print('# event entities', len(events))
    print('# attrib entities', len(entities))
    return hyperbinding(defs, entities, event_entities, edges)


In [19]:
DEFS_SIMPLE = {
    'SRC': 'from',
    'DST': 'to',
    'TITLE': 'nodeTitle',    
    'DELIM': '::',
    'NODETYPE': 'nodeType',
    'NODEID': 'nodeID',
    'SKIP': ['_time', 'priority', 'severity', 'finalDeviceVendor', 'categoryDeviceType']
}

def simpleGraph (events, entity_types=None, opts={}, drop_na=True):
    defs = makeDefs(DEFS_SIMPLE, opts)
    entity_types = screen_entities(events, entity_types, defs)
    edges2 = []
    ATTRIB_PAIRS = [(a,b) for a in entity_types for b in entity_types if a > b]
    for (colA, colB) in ATTRIB_PAIRS:
        edges3 = events.copy()
        if drop_na:
            edges3 = edges3.dropna(subset=[colA, colB])
        if len(edges3):
            edges3[defs['SRC']] = edges3.apply(lambda r: colA + defs['DELIM'] + str(r[colA]), axis=1)
            edges3[defs['DST']] = edges3.apply(lambda r: colB + defs['DELIM'] + str(r[colB]), axis=1)
            edges2.append(edges3)
    edges = pd.concat(edges2)
    nodes = format_entities(events, entity_types, defs, drop_na)
    return graphistry.bind(source=defs['SRC'], destination=defs['DST']).edges(edges)\
        .bind(node=defs['NODEID'], point_title=defs['TITLE']).nodes(nodes)

## Sample Hypergraph vs. Simple Graph views

In [None]:
hypergraph(barncatToGraph(df3[:100].copy()), opts={'SKIP': ['uuid', 'event_id', 'InstallFlag', 'type', 'val', 'Date', 'date', 'Port', 'FTPPort', 'Origin', 'category', 'comment', 'to_ids']}).plot()

printing
('# links', 2163)
('# event entities', 100)
('# attrib entities', 1106)


In [23]:
simpleGraph(barncatToGraph(df3[:100].copy()), opts={'SKIP': ['uuid', 'event_id', 'InstallFlag', 'type', 'val', 'Date', 'date', 'Port', 'FTPPort', 'Origin', 'category', 'comment', 'to_ids']}).plot()

Uploading 5931 kB. This may take a while...


In [24]:
hypergraph(barncatToGraph(df3[:1000].copy()), opts={'SKIP': ['uuid', 'event_id', 'InstallFlag', 'type', 'val', 'Date', 'date', 'Port', 'FTPPort', 'Origin', 'category', 'comment', 'to_ids']}).plot()

printing
('# links', 2230)
('# event entities', 102)
('# attrib entities', 1129)
