# Tutorial: Data Analysis in Graphistry

1. Load data
2. Plot: 
  - Simple: input is a list of edges
  - Arbitrary: input is a table (_hypergraph_ transform)
3. Advanced bindings
4. Further docs
  - [UI Guide](https://labs.graphistry.com/graphistry/ui.html)
  - [More demos: database connectors, ...](demos_databases_apis)
  - [CSV upload notebook app](upload_csv_miniapp.ipynb)

In [1]:
import graphistry
graphistry.register(key='3bd0ff5a5304a3ee27de2ca78ac7b67bbc48dc67409c255c0f70250955cf967c')




## 1. Load CSV
Graphistry works seamlessly with Pandas dataframes

In [2]:
import urllib
with urllib.request.urlopen("https://dl.dropboxusercontent.com/s/9zmm0euo1f03s4i/honeypot.csv?dl=1") as f:
    data = f.read().decode('utf-8')
with open('honeypot.csv', 'w') as f:
    f.write(data) 
    
import pandas as pd

df = pd.read_csv('df_2795.csv', sep = '\t')
del df["Unnamed: 0"]
df.sample(3)

FileNotFoundError: [Errno 2] No such file or directory: 'df_2795.csv'

In [None]:
ids = df["ID"].unique()
df_small = df[df["Friends"].isin(ids)].copy()
df_small["count"] = [1]*len(df_small)
df_small.sample(3)

## 2. Plot

### A. Simple graphs
* Build up a set of bindings. Simple graphs are for edge lists, or an edge list + node list.
* See [UI Guide](https://labs.graphistry.com/graphistry/ui.html) for in-tool activity

Demo graph schema:
* Edges: Alerts linking `ID -> Friends`
* Nodes: Synthesized from `ID -> Friends` edges
* Default colors: Automatic based on inferred commmunity
* Default node size: Number of edges

In [5]:
g = graphistry.edges(df_small).bind(source='ID', destination='Friends')
g.plot()



## B. Hypergraphs -- Plot arbitrary tables

To quickly understand correlations across all your table's values, hypergraph is a convenient transformation.

A hypergraph will link values occurring in the sample table row to one another. By default, the hypergraph plot does not link values directly to one another, but indirects through a node representing the row.

### Approach 1: Each row is a node, and links to each value in it

Demo graph schema:
* Edges: row -> attckerIP, row -> victimIP, row -> victimPort, row ->  volnName
* Nodes: row, attackerIP, victimIP, victimPort, vulnName
* Default colors: Automatic based on inferred commmunity
* Default node size: Number of edges

To allow nodes from the `attackerIP` and `victimIP` columns to merge together when they have the same value, instead of generating distinct nodes such as `attackerIP::127.0.0.1` and `victimIP::127.0.0.1`, we combine them into one category, `ip`. The result is one node `ip::127.0.0.1`.


In [26]:
hg1 = graphistry.hypergraph(
    df_small,
    entity_types=['ID', 'Friends'],
    )

hg1_g = hg1['graph']
hg1_g.plot()

# links 108568
# events 54284
# attrib entities 4399


### Approach 2: Link values from entries

For more advanced hypergraph control, we can skip the row node, and control which edges are generated, by enabling `direct`.

Demo graph schema:
* Edges: 
  * attackerIP -> victimIP, attackerIP -> victimPort, attackerIP -> vulnName
  * victimPort -> victimIP
  * vulnName -> victimIP
* Nodes: attackerIP, victimIP, victimPort, vulnName
* Default colors: Automatic based on inferred commmunity
* Default node size: Number of edges


In [59]:
hg2 = graphistry.hypergraph(
    df_small,
    entity_types= ['ID', 'Friends',"Name"], 
    direct=True,
    opts={
        'EDGES': { ### OPTIONAL, DEFAULTS TO CREATING ALL-TO-ALL
            'ID': ['Name', 'Friends'],
#             'Name': ['ID'],     
        },
        'CATAGORIES': {
            'friend': ['ID', 'Friends'] #merge nodes across these columns
        }
    })

hg2_g = hg2['graph']
hg2_g.plot()

# links 108568
# events 54284
# attrib entities 6570


## 3. Advanced bindings

By default, you do not need to explictly create a table of nodes. However, if you do provide one, you can then drive visual styles based on node attributes.

Demo schema:

* Point size based on number of friends_count
* Point color based on user vs friend
  * Color palette values: https://labs.graphistry.com/graphistry/docs/palette.html 
* Save dynamic workbook settings across sessions

In [53]:
# 1. Create nodes, tag type as `attacker`

targets_df = df_small[['Friends']].drop_duplicates().rename(columns={'Friends': 'node_id'})\
    .assign(type='friend')

attackers_df = df_small.groupby(['ID']).agg({'count': {'Friends': 'sum'}}).reset_index()
attackers_df.columns = attackers_df.columns.get_level_values(0)
attackers_df = attackers_df.rename(columns={'ID': 'node_id'}).assign(type='user')
attackers_df

nodes_df = pd.concat([targets_df, attackers_df], ignore_index=True)
nodes_df.sample(3)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,count,node_id,type
494,,VK_176778970,friend
327,,VK_154911213,friend
4316,27.0,VK_93085778,user


In [56]:
# 2. Plot nodes, and color based on type `user`

g2 = g.nodes(nodes_df).bind(node='node_id')

#optional
nodes_df['my_color'] = nodes_df['type'].apply(lambda v: 0 if v == 'user' else 2)
nodes_df = nodes_df.fillna(value={'count': (nodes_df['count'].max() + nodes_df['count'].min()) / 2.0 })
g2 = g2.bind(point_size = 'count', point_color='my_color')
g2 = g2.settings(url_params={'workbook': 'my_analysis_wb_1'})

g2.plot()

### Advanced bindings work with hypergraphs too

In [60]:
nodes = hg2_g._nodes

types = list(nodes['type'].unique())
nodes_with_colors = nodes.assign(color=nodes.type.apply(lambda t: types.index(t)))
nodes_with_colors.sample(3)

Unnamed: 0,Friends,ID,Name,nodeID,nodeTitle,type,category,color
3063,VK_83310864,,,Friends::VK_83310864,VK_83310864,Friends,Friends,1
1606,,VK_46321791,,ID::VK_46321791,VK_46321791,ID,ID,0
1079,,VK_6380879,,ID::VK_6380879,VK_6380879,ID,ID,0


In [61]:
hg2_g\
  .nodes(nodes_with_colors).bind(point_color='color')\
  .settings(url_params={'workbook': 'my_analysis_wb_2'})\
  .plot()

## Further docs:
  - [UI Guide](https://labs.graphistry.com/graphistry/ui.html)
  - [More demos: database connectors, ...](demos_databases_apis)
  - [CSV upload notebook app](upload_csv_miniapp.ipynb)