# Modern graph visualization

In [3]:
# ! pip install --user graphistry

import graphistry, json, pandas as pd
graphistry_creds = json.load(open('./graphistry_creds.json'))
graphistry.register(api=3, **graphistry_creds)

graphistry_creds.keys()

dict_keys(['username', 'password'])

# 1. Task<>Persona matchup

### Task: Exploration, data app / dashboard, report

### Persona: Data scientist, data engineer, data analyst, operator, business user, manager, developer

Ex: Data scientist & domain expert may want big graphs with flexible low-code

Ex: Data engineer may care just about data quality -- query->viz/table

Ex: Business user might want only a few big controls like "search"



### Not always a fit

Ex: Business user might want a *list* of a few top entities, and not care it's a graph

```
YEAR: [ 2020 | 2021 (Projected) ]

Top Accounts
---
1. Frank - $100M
2. Donna - $90M
3. Jeff - $50M
```

### Advice: Think business impact

* Get valuable result: You, then others
* Get uptake: Defined by others
* Should you be spending time on wrangling & models, or html-in-python?
  * Minimal amount before someone else can be assigned?


# 2. Multi-dimensional data

### Graph viz degrees of freedom: Encodings!

* Position: X/Y
  * NOT z: 3D causes data to overlap and hide!
  * Ex: Kineviz started as 3d, now looks like Graphistry
  * ... But if you're doing VR or art, have fun!
* Color
* Size
* Icon
* Curvature

Start simple:

* Use node type: color, icon

* Pick 1-2 more things to highlight (alert flag, ...)



### Cross-linking

Ex: Time

* Show as edge colors
* And as timebar


In [42]:
events_df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/honeypot.csv')
events_df['time_min'] = pd.to_datetime(events_df['time(min)'].astype('int64') * 1000000000)
events_df['time_max'] = pd.to_datetime(events_df['time(max)'].astype('int64') * 1000000000)

In [46]:
(graphistry
 .edges(events_df, 'attackerIP', 'victimIP')
 .encode_edge_color('time_min', palette=['blue', 'red', 'yellow'], as_continuous=True)).plot(memoize=False)

### Cross-linking solves information overload: The "Information seeking mantra"

Analyst flow:
1. Overview: Zoomed out, legend, ...
2. Zoom & filter
3. Details on demand

Hard if "just" the graph!

Need summary components (histograms...), Selection/Filter system, Table/item inspectors, ...


# 3. Big graphs

## Useful to see more data when the answer is not known: Information seeking mantra!

## Common: Even if few physical entities -> a lot of metadata, events, ...

## Ex: 4K friends => 88K relationships

In [1]:
%%html
<iframe src="https://hub.graphistry.com/graph/graph.html?dataset=Facebook" width="100%" height="600"></iframe>

### Top 3 big graph tricks

* GPU frontend + backend
* Design: FA2 layout, curved edges, ...
* Cross-linking with filtering!

# 4. Dashboarding as data scientists

* Goal: Empower users with an easy button 
* Goal: UI -> DB -> data science -> visual insights
* Goal: Not specialize in frontend engineering (click handlers, div tags, css ...)

### StreamLit (graph-app-kit)

1. Simple UI input pipeline
2. Turn into a DB query
3. Run your normal pydata ML
4. Drop in no-code plots

### Sample task

#### Warmup
1. Run graph-app-kit
2. Open `views/demo_04_simple/init.py` in an editor
  - in a graph-app-kit aws quicklaunch, reach via http://instance_ip/notebook/edit/notebooks/graph-app-kit/private/views/demo_04_simple/__init__.py
  - you may need to log in as `admin` / `i-instanceID`
3. Open the live version in a browser
  - in a graph-app-kit aws quicklaunch, reach via http://instance_ip/private/dash
  - pick `INTRO: SIMPLE PIPELINE` from dropdown
4. Edit the dashboard title
5. Back in the dashboard view, refresh the view (see header button): title should update

#### Task

Let's add a form control to manipulate the graph: increase the number of multiedges by some scaling factor

1. Modify `sidebar_area()` to have a new widget `copies`, and ensure it shows up in the UI
2. Thread through the new parameter through the filter pipeline, and ensure the modified UI still works:
  - Return of `sidebar_area()`
  - Inputs of `run_filters()`
  - Inputs of `main_area()`
3. Modify method `run_filters()` to generate a graph with multiedges:
  - Modify data generators fo rcolumns  `s` and `d` to create `num_edges * copies` edges
  

  




## Next steps

* Data science dashboarding
  * https://streamlit.io/  (with graph: https://github.com/graphistry/graph-app-kit)
  * https://plotly.com/dash/
* Graph widget: https://github.com/graphistry/pygraphistry