In [None]:
#! pip install graphistry[igraph]

# Botnet on Twitter?

In [None]:
import igraph, graphistry, pandas as pd

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...')
# For more options, see https://github.com/graphistry/pygraphistry#configure


## Step 1: Loading The Data

This dataset was created by a Twitter user who was surprised that one of his very innocuous tweet ("Hey let's grab a coffee") got retweeted several times. Intrigued, he had a closer look at the accounts that retweeted his message. He found that those accounts all had inprononcable names that looked like gibberish. Suspecting that those accounts might be fake, he crawled the twitter social network around the suspicious accounts to produce this dataset.

The dataset is in a CSV file named `twitterDemo.csv` which looks like that:
```
#dstAccount,srcAccount
arley_leon16,wxite_pymp
michaelinhooo2,wxite_pymp
steeeva,wxite_pymp
...
```
Each row in `twitterDemo.csv` denotes two twitter accounts "following" (Twitter's equivalent of friending) each other.

In [None]:
follows_df = pd.read_csv('../../data/twitterDemo.csv')
#follows_df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/twitterDemo.csv')
follows_df.sample(3)

Unnamed: 0,dstAccount,srcAccount
9879,awesehufoyeq,aqugocaqiyuq
4824,owozopikirif30,awadosiluq42
7409,ozuq_ayijukux77,agun_acovipuj


## Step 2: First Simple Visualization

We can visualize this subset of the Twitter network as a graph: Each node is a Twitter account and edges encode the "follows" relation.

In [None]:
g = graphistry.edges(follows_df, 'srcAccount', 'dstAccount')

g.plot()

Can you answer the following questions by exploring the visualization you have just created?
- Is the structure of the graph what you would expect from a social network?
- Can you tell which accounts might be fake and which ones are likely real users?

## Step 3: Computing Graph Metrics With IGraph

Next, we are going to use [IGraph](http://igraph.org/python/), a graph computation library, to compute metrics like pagerank to help us understand the dataset.

In [None]:
ig = g.to_igraph()
igraph.summary(ig)

IGRAPH DN-- 7889 10063 -- 
+ attr: name (v)


In [None]:
%%time

ig.vs['pagerank'] = ig.pagerank(directed=False)
ig.vs['betweenness'] = ig.betweenness(directed=True)
ig.es['ebetweenness'] = ig.edge_betweenness(directed=True)

ig.vs['community_spinglass'] = ig.community_spinglass(spins=12, stop_temp=0.1, cool_fact=0.9).membership

uig = ig.copy()
uig.to_undirected()
ig.vs['community_infomap'] = uig.community_infomap().membership
ig.vs['community_louvain'] = uig.community_multilevel().membership

CPU times: user 7.24 s, sys: 29.2 ms, total: 7.26 s
Wall time: 8.18 s


In [None]:
g2 = g.from_igraph(ig)
print(g2._nodes.dtypes)
g2._nodes.sample(3)

_n_implicit              int64
name                    object
pagerank               float64
betweenness            float64
community_spinglass      int64
community_infomap        int64
community_louvain        int64
dtype: object


Unnamed: 0,_n_implicit,name,pagerank,betweenness,community_spinglass,community_infomap,community_louvain
4155,4155,etuz_eredimof90,6e-05,0.0,2,144,37
3313,3313,isudovodutit26,9.6e-05,0.0,2,184,16
5854,5854,omoposafadat,0.000134,0.0,11,660,10


In [None]:
g3 = (g2

      #Convert to int32 to use the built-in color palettes:
      #https://github.com/graphistry/pygraphistry/blob/master/demos/more_examples/graphistry_features/encodings-colors.ipynb
      .nodes(g2._nodes.assign(community_spinglass=g2._nodes.community_spinglass.astype('int32')))
      .encode_point_color('community_spinglass')

      .encode_point_size('pagerank')
)
print(g3._nodes.dtypes)
g3.plot()

_n_implicit              int64
name                    object
pagerank               float64
betweenness            float64
community_spinglass      int32
community_infomap        int64
community_louvain        int64
dtype: object


## Step 4: Visual Drill Downs

Within the visualization, you can filter and drill down into the graph. Try the following:

1. Open the histogram panel, and add histograms for `pagerank`, `betweenness`, `ebetweenness`, etc. By selecting a region of a histogram or clicking on a bar, you can filter the graph.

2. You can also manually create filters in the filter panel ("funnel" icon in the left menu bar). For instance, try filtering on `point:pagerank` such that `point:pagerank >= 0.01`. We select the most "influencial accounts". Those are the likely botnet owners/customers.

3. Still in the histogram panel, you can visually show attributes using on the graph node/edge colors. Try clicking on each of the three square icons on top of each histogram. Notice that when point color is bound to `community_spinglass`, the "tail" of the network forms a distinct community. What makes those accounts different from the rest?

4. With the histogram panel open, click on data brush and then lasso a selection on the graph. The histograms highlight the subset of nodes under the selection. You can drag the data brush selection to compare different subgraphs.
