# Tutorial: Data Analysis in Graphistry

1. Load data
2. Plot: 
  - Simple: input is a list of edges
  - Arbitrary: input is a table (_hypergraph_ transform)
3. Advanced bindings
4. Further docs
  - [UI Guide](https://labs.graphistry.com/graphistry/ui.html)
  - [More demos: database connectors, ...](https://github.com/graphistry/pygraphistry/tree/master/demos)
  - [CSV upload notebook app](upload_csv_miniapp.ipynb)

In [3]:
import graphistry
#graphistry.register(key='MY_API_KEY', server='labs.graphistry.com')

## 1. Load CSV
Graphistry works seamlessly with Pandas dataframes

In [14]:
import pandas as pd

df = pd.read_csv('./data/honeypot.csv')
df.sample(3)

Unnamed: 0,attackerIP,victimIP,victimPort,vulnName,count,time(max),time(min)
23,119.157.215.18,172.31.14.66,445.0,MS08067 (NetAPI),4,1419022000.0,1419021000.0
102,191.116.125.233,172.31.14.66,445.0,MS08067 (NetAPI),9,1420004000.0,1420003000.0
178,77.90.250.248,172.31.14.66,445.0,MS08067 (NetAPI),1,1416523000.0,1416523000.0


## 2. Plot

### A. Simple graphs
* Build up a set of bindings. Simple graphs are for edge lists, or an edge list + node list.
* See [UI Guide](https://labs.graphistry.com/graphistry/ui.html) for in-tool activity


In [8]:
g = graphistry.edges(df).bind(source='attackerIP', destination='victimIP')

In [7]:
g.plot()

## B. Hypergraphs -- Plot arbitrary tables

### Approach 1: Each row is a node, and link to each value in it

In [40]:
hg1 = graphistry.hypergraph(
    df,
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
    opts={
        'CATAGORIES': {
            'ip': ['attackerIP', 'victimIP'] #merge nodes across these columns
        }
    })

hg1_g = hg1['graph']
hg1_g.plot()

('# links', 880)
('# events', 220)
('# attrib entities', 221)


### Approach 2: Link values from entries

In [41]:
hg2 = graphistry.hypergraph(
    df,
    entity_types=['attackerIP', 'victimIP', 'victimPort', 'vulnName'],
    direct=True,
    opts={
        'EDGES': { ### OPTIONAL, DEFAULTS TO CREATING ALL-TO-ALL
            'attackerIP': ['victimIP', 'victimPort', 'vulnName'],
            'victimPort': ['victimIP'],
            'vulnName': ['victimIP']         
        },
        'CATAGORIES': {
            'ip': ['attackerIP', 'victimIP'] #merge nodes across these columns
        }
    })

hg2_g = hg2['graph']
hg2_g.plot()

('# links', 1100)
('# events', 220)
('# attrib entities', 221)


## 3. Advanced bindings
* Point size based on number of attacks
* Point color based on attacker vs victim
  * Color palette values: https://labs.graphistry.com/graphistry/docs/palette.html 
* Save dynamic workbook settings across sessions

In [42]:
# Create nodes

targets_df = df[['victimIP']].drop_duplicates().rename(columns={'victimIP': 'node_id'})\
    .assign(type='victim')

attackers_df = df.groupby(['attackerIP']).agg({'count': {'attacks': 'sum'}}).reset_index()
attackers_df.columns = attackers_df.columns.get_level_values(0)
attackers_df = attackers_df.rename(columns={'attackerIP': 'node_id'}).assign(type='attacker')
attackers_df

nodes_df = pd.concat([targets_df, attackers_df], ignore_index=True)
nodes_df.sample(3)

Unnamed: 0,count,node_id,type
124,7.0,31.207.231.61,attacker
170,11.0,81.198.39.193,attacker
179,23.0,85.25.226.156,attacker


In [64]:
g2 = g.nodes(nodes_df).bind(node='node_id')

#optional
nodes_df['my_color'] = nodes_df['type'].apply(lambda v: 0 if v == 'attacker' else 2)
nodes_df = nodes_df.fillna(value={'count': (nodes_df['count'].max() + nodes_df['count'].min()) / 2.0 })
g2 = g2.bind(point_size = 'count', point_color='my_color')
g2 = g2.settings(url_params={'workbook': 'my_analysis_wb_1'})

g2.plot()

### Advanced bindings work with hypergraphs too

In [62]:
nodes = hg2_g._nodes

types = list(nodes['type'].unique())
nodes_with_colors = nodes.assign(color=nodes.type.apply(lambda t: types.index(t)))
nodes_with_colors.sample(3)

Unnamed: 0,attackerIP,nodeID,nodeTitle,type,victimIP,victimPort,vulnName,category,color
58,180.178.153.209,attackerIP::180.178.153.209,180.178.153.209,attackerIP,,,,attackerIP,0
23,119.157.215.18,attackerIP::119.157.215.18,119.157.215.18,attackerIP,,,,attackerIP,0
39,173.215.217.47,attackerIP::173.215.217.47,173.215.217.47,attackerIP,,,,attackerIP,0


In [65]:
hg2_g\
  .nodes(nodes_with_colors).bind(point_color='color')\
  .settings(url_params={'workbook': 'my_analysis_wb_2'})\
  .plot()

## Further docs:
  - [UI Guide](https://labs.graphistry.com/graphistry/ui.html)
  - [More demos: database connectors, ...](https://github.com/graphistry/pygraphistry/tree/master/demos)
  - [CSV upload notebook app](upload_csv_miniapp.ipynb)