PyGraphistry is a visual graph analytics library to extract, transform, and load big graphs into Graphistry's GPU-cloud-accelerated explorer.
Friendship Communities on Facebook. Click to open interactive version! Source data: SNAP |
-
Fast & Gorgeous: Cluster, filter, and inspect large amounts of data at interactive speed. We layout graphs with a descendant of the gorgeous ForceAtlas2 layout algorithm introduced in Gephi. Our data explorer connects to Graphistry's GPU cluster to layout and render hundreds of thousand of nodes+edges in your browser at unparalleled speeds.
-
Notebook Friendly: PyGraphistry plays well with interactive notebooks like IPython/Juypter, Zeppelin, and Databricks: Process, visualize, and drill into with graphs directly within your notebooks.
-
Batteries Included: PyGraphistry works out-of-the-box with popular data science and graph analytics libraries. It is also very easy to use. To create the visualization shown above, download this dataset of Facebook communities from SNAP and load it with your favorite library:
-
edges = pandas.read_csv('facebook_combined.txt', sep=' ', names=['src', 'dst']) graphistry.bind(source='src', destination='dst').plot(edges)
-
graph = igraph.read('facebook_combined.txt', format='edgelist', directed=False) graphistry.bind(source='src', destination='dst').plot(graph)
-
graph = networkx.read_edgelist('facebook_combined.txt') graphistry.bind(source='src', destination='dst', node='nodeid').plot(graph)
-
Twitter Botnet |
Edit Wars on Wikipedia Source: SNAP |
Uber Trips in SF |
Port Scan Attack |
Protein Interactions Source: BioGRID |
Programming Languages Source: Socio-PLT project |
Python 2.7 or 3.4.
The simplest way to install PyGraphistry is with Python's pip package manager:
- Pandas only:
pip install graphistry
- Pandas, IGraph, and NetworkX:
pip install "graphistry[all]"
An API key gives each visualization access to our GPU cluster. We currently ask for API keys to make sure our servers are not melting :) To get your own, email pygraphistry@graphistry.com. Set your key after the import graphistry
statement and you are good to go:
import graphistry
graphistry.register(key='Your key')
We recommend IPython notebooks to interleave code and visualizations.
- Install IPython:
pip install "ipython[notebook]"
- Launch notebook server:
ipython notebook
Let's visualize relationships between the characters in Les Misérables. For this example, we'll choose Pandas to wrangle data and IGraph to run a community detection algorithm. You can download the IPython notebook containing this example.
Our dataset is a CSV file that looks like this:
source | target | value |
---|---|---|
Cravatte | Myriel | 1 |
Valjean | Mme.Magloire | 3 |
Valjean | Mlle.Baptistine | 3 |
Source and target are character names, and the value column counts the number of time they meet. Parsing is a one-liner with Pandas:
import pandas
links = pandas.read_csv('./lesmiserables.csv')
PyGraphistry can plot graphs directly from Pandas dataframes, IGraph graphs, or NetworkX graphs. Calling plot uploads the data to our visualization servers and return an URL to an embeddable webpage containing the visualization.
To define the graph, we bind
source and destination to the columns indicating the start and end nodes of each edges:
import graphistry
graphistry.register(key='YOUR_API_KEY_HERE')
g = graphistry.bind(source="source", destination="target")
g.plot(links)
You should see a beautiful graph like this one:
Let's add labels to edges in order to show how many times each pair of characters met. We create a new column called label in edge table links that contains the text of the label and we bind edge_label to it.
links["label"] = links.value.map(lambda v: "#Meetings: %d" % v)
g = g.bind(edge_label="label")
g.plot(links)
Let's size nodes based on their PageRank score and color them using their community. IGraph already has these algorithms implemented for us. If IGraph is not already installed, fetch it with pip install igraph-python
. (Warning: pip install igraph
will install the wrong package!)
We start by converting our edge dateframe into an IGraph. The plotter can do the conversion for us using the source and destination bindings. Then we create two new node attributes (pagerank & community).
ig = g.pandas2igraph(links)
ig.vs['pagerank'] = ig.pagerank()
ig.vs['community'] = ig.community_infomap().membership
g.bind(point_color='community', point_size='pagerank').plot(ig)
This is a more complex example: we link together Marvel characters who co-star in the same comic. The dataset is split in three files:
Find out who is the most popular Marvel hero! Run the code in the Marvel Demo notebook to browse the entire Marvel universe.
Full Python (including IPython/Juypter) API documentation.
See also: REST API and deprecrated Python docs.
Email pygraphistry@graphistry.com for an API key!