#Splunk/Notebook/Graphistry Mashup

This notebook shows a different kind of way to explore alerts:
* **Exploratory notebook rather than an interactive dashboard.** This simplifies doing & sharing more complicated analysis, and with coming versions, can be quickly converted into a reusable dashboard.
* **node-link diagrams rather than bar charts.** This is a more natural way to understand behavior across a distributed system and spot patterns within it.

## Setup
*Install*:  

1. `pip install juypter splunk-sdk graphistry`

2. Plug in Splunk config info below. (Warning: we like to go from SIEM->HDFS->notebook as direct SIEM queries are slow.)

*Run*: 

1. `juypter`

2. Navigate to this file

3. "Cell" -> "Run all"

In [39]:
import pandas
import graphistry
graphistry.register('contact info@graphistry.com for a GPU server key')

In [30]:
#Connect to Splunk. Replace settings with your own setup.
import splunklib.client as client
import splunklib.results as results
cargs = {
    'host': 'localhost',
    'scheme': 'https',
    'port': 8089,
    'username': 'apiuser',
    'password': 'grapher'   
}
service = client.connect(**cargs)

In [31]:
#Data adapter to splunk (in its entirety!)
def splunkToPandas (qry):
    fields = ["attackerAddress", "destinationAddress", \
          "priority", "severity", "name", "message", "_time",\
         "categoryDeviceType", "finalDeviceVendor"]
    kwargs_blockingsearch = {"count": 0, "f": fields}
    out = service.jobs.oneshot(qry, **kwargs_blockingsearch)
    reader = results.ResultsReader(out)
    lst = [x for x in reader]
    print('# alerts', len(lst))
    return pandas.DataFrame(lst)

##Sample Splunk Query
From the ArcSight2 logs, get 1000 alerts at priority 7+

In [32]:
%time df = splunkToPandas('search index="arcsight2" priority > 6 | head 1000')
df[:3]

('# alerts', 1000)
CPU times: user 4.94 s, sys: 43 ms, total: 4.98 s
Wall time: 5.41 s


Unnamed: 0,_time,attackerAddress,categoryDeviceType,destinationAddress,finalDeviceVendor,message,name,priority,severity
0,2015-09-01T15:47:28.000-07:00,,Operating System,245.76.133.247,Unix,IpmiIfcSelReadEntry:error 203.,IpmiIfcSelReadEntry:error 203.,7,0
1,2015-09-01T15:47:28.000-07:00,245.197.146.45,Network-based IDS/IPS,30.16.8.2,Symantec,Passed traffic per rule 'Allow IGMP traffic',Passed traffic,7,0
2,2015-09-01T15:47:28.000-07:00,,,,ArcSight,,Update Connector Caching Status,7,0


## Visualize 1000 Alerts at Priority 7+

In [33]:
#Define visual schema
g = graphistry.bind(source='attackerAddress', destination='destinationAddress')

In [40]:
#Get/bind data & plot
%time g.edges(splunkToPandas('search index="arcsight2" priority > 6 | head 1000')).plot()

('# alerts', 1000)
CPU times: user 4.94 s, sys: 8.65 ms, total: 4.95 s
Wall time: 8.49 s


## Visualize the Top 5 Attackers + 1 Step of Context

In [41]:
top5Attackers = df['attackerAddress'].value_counts().reset_index()['index'].tolist()[:5]
top5Attackers

['245.19.188.141',
 '245.125.122.30',
 '189.162.130.53',
 '245.147.0.237',
 '245.28.141.251']

###First level of attacks

In [42]:
ips = ' OR '.join(map(lambda x: 'attackerAddress=' + x, top5Attackers))
%time top5AttackersAttacks = splunkToPandas('search index="arcsight2" (' + ips + ')')
top5AttackersAttacks[:3]

('# alerts', 9230)
CPU times: user 46.5 s, sys: 79.5 ms, total: 46.6 s
Wall time: 47.5 s


Unnamed: 0,_time,attackerAddress,categoryDeviceType,destinationAddress,finalDeviceVendor,message,name,priority,severity
0,2014-04-01T16:48:42.000-07:00,245.19.188.141,Firewall,117.231.243.130,NetScreen,"""IP spoofing! From 245.19.188.141:3961 to 117....",IP spoofing!,9,5
1,2014-04-01T16:48:42.000-07:00,245.19.188.141,Firewall,117.231.243.131,NetScreen,"""IP spoofing! From 245.19.188.141:3960 to 117....",IP spoofing!,9,5
2,2014-04-01T16:48:42.000-07:00,245.19.188.141,Firewall,117.231.243.131,NetScreen,"""IP spoofing! From 245.19.188.141:3962 to 117....",IP spoofing!,9,5


In [43]:
g.edges(top5AttackersAttacks).plot()

###Second level of attacks

In [44]:
targets = top5AttackersAttacks[['destinationAddress']].drop_duplicates()['destinationAddress'].tolist()
print('# roots', len(targets + top5Attackers))

('# roots', 146)


In [45]:
ips2 = ' OR '.join(map(lambda x: 'attackerAddress=' + x, targets))
%time moreAttacks = splunkToPandas('search index="arcsight2" (' + ips2 + ')')
moreAttacks[:3]

('# alerts', 8271)
CPU times: user 41.5 s, sys: 55.5 ms, total: 41.6 s
Wall time: 44 s


Unnamed: 0,_time,attackerAddress,categoryDeviceType,destinationAddress,finalDeviceVendor,message,name,priority,severity
0,2014-04-01T16:48:42.000-07:00,245.19.188.141,Firewall,117.231.243.130,NetScreen,"""IP spoofing! From 245.19.188.141:3961 to 117....",IP spoofing!,9,5
1,2014-04-01T16:48:42.000-07:00,245.19.188.141,Firewall,117.231.243.131,NetScreen,"""IP spoofing! From 245.19.188.141:3960 to 117....",IP spoofing!,9,5
2,2014-04-01T16:48:42.000-07:00,245.19.188.141,Firewall,117.231.243.131,NetScreen,"""IP spoofing! From 245.19.188.141:3962 to 117....",IP spoofing!,9,5


In [46]:
g.edges(top5AttackersAttacks + moreAttacks).plot()

## As A Pure Splunk Query

In [47]:
query = """
search index="arcsight2" priority > 8 
    | eval S = "A" | eval X = destinationAddress    
| append [ search index="arcsight2" 
    | eval S = "B" 
    | eval X = attackerAddress | eval Y = destinationAddress]
| fields attackerAddress destinationAddress name priority S X Y

| eval mangled = toString(Y) + ";" + toString(priority) + ";" + toString(name)
| stats values(S) as IDX, values(mangled) as mangled by X
| stats dc(IDX) as matches, values(mangled) as mangled by X | where matches=2
| mvexpand mangled
| rex field=mangled "(?<Y>.*);(?<priority>.*);(?<name>.*)"
| fields - matches mangled
| rename X as attackerAddress | rename Y as destinationAddress
| appendpipe [ 
    | search index="arcsight2" priority > 8
    | fields attackerAddress destinationAddress name priority
]
""".replace("\n"," ")
query

' search index="arcsight2" priority > 8      | eval S = "A" | eval X = destinationAddress     | append [ search index="arcsight2"      | eval S = "B"      | eval X = attackerAddress | eval Y = destinationAddress] | fields attackerAddress destinationAddress name priority S X Y  | eval mangled = toString(Y) + ";" + toString(priority) + ";" + toString(name) | stats values(S) as IDX, values(mangled) as mangled by X | stats dc(IDX) as matches, values(mangled) as mangled by X | where matches=2 | mvexpand mangled | rex field=mangled "(?<Y>.*);(?<priority>.*);(?<name>.*)" | fields - matches mangled | rename X as attackerAddress | rename Y as destinationAddress | appendpipe [      | search index="arcsight2" priority > 8     | fields attackerAddress destinationAddress name priority ] '

In [48]:
%time g.edges(splunkToPandas(query)).plot()

('# alerts', 1321)
