<h1>Ditchley S2DS project August 2020 - Code Pipeline<h1>
    <h2>Team: Adam Hawken, Luca Lamoni, Elizabeth Nicholson, Robert Webster<h2>

In [1]:
#![]() #graphical representation of the pipeline here

<h3>Section 0: Working directory and graph DB setup<h3>
    <h4>0.1: Modules and working directory setup<h4>

In [2]:
# Import modules and set up working directory
import sys
import os
import time
import logging
import json
import csv
import threading
import queue
import asyncio 
import nest_asyncio
nest_asyncio.apply()
import twint
import pandas as pd


# Set up working directory
# The working directory should reflect the structure of the Github repository https://github.com/S2DSLondon/Aug20_Ditchley
sys.path.insert(1, '/Users/adam/S2DS/GitHub/Aug20_Ditchley')
from src.data import pipeline_setup
pipeline_setup.build_data_dir('/Users/adam/S2DS/GitHub/Aug20_Ditchley')

Data directory & sub-directories already exist, skipping.


<h4>0.2: Initialize graph database<h4> 

Databse must be active, this can be done in the neo4j desktop.

In [3]:
# import standard libraries
import numpy as np
import pandas as pd
from py2neo import Graph
from py2neo.data import Node, Relationship
from src.graph_database import graphdb as gdb

# load / declare the database
graph = gdb.get_graph(new_graph = True)
graph
# start with an empty graph
graph.delete_all()

<h3>Section 1: Getting journalist twitter handles according to a keyword<h3>
    <h4>The journalist scraping is performed at the web address https://www.journalism.co.uk/prof/?chunk=0&cmd=default<h4>

In [4]:
# Choose keyword and run the scraping function
from src.data import journalists as journos
keyword = 'cybersecurity'
# Input: string / Output: list
journo_handles = journos.get_handles_by_keyword(keyword)
print(len(journo_handles))
type(journo_handles)

3


list

<h3>Section 2. Scrape user information and friend lists for each journalist in the list<h3>
    <h4>2.1: Scrape user information using the Twitter API<h4> 

In [5]:
#Load twitter API credentials and return a tweepy API instance
import json
import tweepy
from src.data import api_tweepy as api

# Input: path of json file with credentials / Output: tweepy.api.API
tw_api = api.connect_API('../src/data/twitter_credentials.json')

In [6]:
# Scrape user information using the API
from src.data import api_user_tools as api_tools
from src.data import data_cleanup as dc

# Input: tweepy.api.API,list / Output: list
api_users = api_tools.batch_request_user_info(tw_api,journo_handles)
# Input: list / Output: DataFrame
df_api = dc.populate_user_df(api_users)
# Check
df_api.head()

Unnamed: 0,user_id,screen_name,name,location,user_description,user_friends_n,user_followers_n,prof_created_at,favourites_count,verified,statuses_count
0,335773502,_lucyingham,Lucy Ingham,London,editor of and digital magazines verdict magazi...,516,647,2011-07-15 06:29:08,2210,False,456
1,964233746865119233,jesscahaworth,Jessica Haworth,,cybersecurity journalist at music buff and ski...,970,668,2018-02-15 20:23:34,459,False,583
2,1186245031507693574,ad_nauseum74,Adam Bannister,,journalist the daily swig cybersecurity,368,135,2019-10-21 11:38:12,114,False,277


In [7]:
# Save the dataframe as csv
df_api.to_csv('../data/processed/'+keyword+'_user_profiles.csv', index = False)

<h4>2.2: Load user info into graph DB<h4>

In [8]:
# Neo4j import files need to be in a specific folder, however, the csv files saved above are in a different folder, to go around this problem on Windows machines it is
# possible to create a shortcut between the two folders

# lowd in user information
print('Loading in user information and drawing (Person) nodes')
fn_users = 'processed/'+keyword+'_user_profiles.csv'
gdb.load_users(fn_users ,graph)

Loading in user information and drawing (Person) nodes


<h4>2.3: Scrape user friend list using Twint<h4> 

In [9]:
# 
from src.data import twint_tools as tt

# define keyword arguments / 'n_retries' = max number of scrape attempts, 'suppress' = hide critical Twint warnings
kwargs = {'n_retries':3,
         'suppress':False}
# Multi threading function Input: _get_friends function, number of threads to distribute the queque, args and kwargs
tt.twint_in_queue(tt._get_friends, 3, journo_handles, args=('../data/raw/'+keyword+'_',), kwargs=kwargs)

Attempt #1 to get friends of @_lucyinghamAttempt #1 to get friends of @JesscaHaworth

Attempt #1 to get friends of @Ad_Nauseum74


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @Ad_Nauseum74 saved to: ../data/raw/cybersecurity_friends_Ad_Nauseum74.csv


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @_lucyingham saved to: ../data/raw/cybersecurity_friends__lucyingham.csv


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @JesscaHaworth saved to: ../data/raw/cybersecurity_friends_JesscaHaworth.csv


In [10]:
from src.data import twint_tools as tt
# Concatenate all the individual lists into one dataframe with journalist and its friends
friends_csv = tt.join_friends_csv(journo_handles,keyword) # this function has a bug, the first friend name is 'username'

@_lucyingham follows 1548 users.
@JesscaHaworth follows 3880 users.
@Ad_Nauseum74 follows 1839 users.

Total number of handles pulled: 7267
Number of unique twitter handles: 1716

Zero following in list for users: []


In [11]:
# Save the dataframe as csv
friends_csv.to_csv('../data/processed/'+keyword+'_journalist_friends.csv', index=False)

## Remove outliers

Assume friends and followers are lognormally distributed, calculate the chi squared of each user and remove outliers.

In [12]:
# get user profiles of friends
api_users = api_tools.batch_request_user_info(tw_api,list(friends_csv['friend']))                                                         
df_api = dc.populate_user_df(api_users)

# save user profiles to file
df_api.to_csv('../data/processed/'+keyword+'_all_profiles.csv', index = False) 

In [14]:
# calculate chi2s
%pylab
no_loners = gdb.get_chi2(df_api)

inliers = no_loners[no_loners['chi2']<6.18]
outliers = no_loners[no_loners['chi2']>6.18]

import matplotlib.pylab as plt
plt.scatter(inliers['user_friends_n'],inliers['user_followers_n'],label='inliers')
plt.scatter(outliers['user_friends_n'],outliers['user_followers_n'],label='outliers')
plt.xscale('log')
plt.yscale('log')
plt.xlabel('user_friends_n')
plt.ylabel('user_followers_n')
plt.legend()

Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


<matplotlib.legend.Legend at 0x1230e8080>

<h4>2.4: Load friend information into DB<h4> 

In [15]:
# load in friend information
print('Loading in friends info and drawing [FOLLOWS] edges')
fn_friends = 'processed/'+keyword+'_journalist_friends.csv'
gdb.load_friends(fn_friends,graph,new=True)

Loading in friends info and drawing [FOLLOWS] edges


In [16]:
# upload profile information of friends
gdb.load_existing_users('processed/'+keyword+'_all_profiles.csv',graph) 

In [17]:
#excise outliers from database
gdb.excise_outliers(outliers['screen_name'],graph)

## Filter graph by keywords

Look for keywords in the bio and screen name of friends, filter users who have these keywords.

In [18]:
keywords = ['tech','security','artificial','machine', 'cyber', 'computer','code','hack']
not_techies = gdb.filter_users_by_keywords(keywords,graph,without=True)
print(len(not_techies))

778


In [19]:
# excise uninteresting profiles
gdb.excise_outliers(not_techies['screen name'],graph)

<h3>Section 3. Loop over selected journalists handles and scrape their tweets (3.1) and mentions (3.2) using Twint<h3>
    <h4>Section 3.1: Scrape tweets using Twint<h4> 

In [20]:
from src.data import twint_tools as tt
# define keyword arguments
kwargs = {'date_range':('2020-08-01 00:00:00', None),
         'n_retries':3,
         'suppress':False}
# multi threading
tt.twint_in_queue(tt._search_tweets_by_user, 3, journo_handles, args=('../data/raw/'+keyword+'_',), kwargs=kwargs)

Attempt #1 to get tweets of @_lucyinghamAttempt #1 to get tweets of @JesscaHaworthAttempt #1 to get tweets of @Ad_Nauseum74




CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable


Results for @Ad_Nauseum74 saved to: ../data/raw/cybersecurity_tweets_Ad_Nauseum74.csv
Results for @JesscaHaworth saved to: ../data/raw/cybersecurity_tweets_JesscaHaworth.csv
Results for @_lucyingham saved to: ../data/raw/cybersecurity_tweets__lucyingham.csv


In [21]:
# Joined all the individual csv into one dataframe
cyber_test = tt.join_tweet_csv(journo_handles, keyword)
# Check
cyber_test.head()

Unnamed: 0,id,conversation_id,created_at,date,time,timezone,user_id,username,name,place,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1299059177474732034,1299059177474732034,1598554817000,2020-08-27,21:00:17,CEST,335773502,_lucyingham,Lucy Ingham,,...,,,,,,"[{'user_id': '335773502', 'username': '_lucyin...",,,,
1,1299057580774432770,1299057580774432770,1598554437000,2020-08-27,20:53:57,CEST,335773502,_lucyingham,Lucy Ingham,,...,,,,,,"[{'user_id': '335773502', 'username': '_lucyin...",,,,
2,1298287471004983296,1298282082746216449,1598370828000,2020-08-25,17:53:48,CEST,335773502,_lucyingham,Lucy Ingham,,...,,,,,,"[{'user_id': '335773502', 'username': '_lucyin...",,,,
3,1298141120128524288,1298131556251262976,1598335935000,2020-08-25,08:12:15,CEST,335773502,_lucyingham,Lucy Ingham,,...,,,,,,"[{'user_id': '335773502', 'username': '_lucyin...",,,,
4,1298136447258697728,1298131556251262976,1598334821000,2020-08-25,07:53:41,CEST,335773502,_lucyingham,Lucy Ingham,,...,,,,,,"[{'user_id': '335773502', 'username': '_lucyin...",,,,


In [22]:
# Save dataframe as csv
cyber_test.to_csv('../data/processed/'+keyword+'_journalist_tweets_twint.csv', index=False)

<h4>Section 3.2: Extract mentions from Twint dataset<h4> 

In [23]:
from src.data import data_cleanup as dc
# from the twint dataset, extract mentions based on tweet id and save in a separate csv
mentions_twint  = dc.mentions_to_df(cyber_test)
# Check
mentions_twint.head()

Unnamed: 0,tweet_id,mentions
0,1298287471004983296,trypewriter01
1,1298287471004983296,berenicejbaker
2,1298141120128524288,delafina777
3,1298136447258697728,delafina777
4,1298015000519487490,berenicejbaker


In [24]:
# Save the dataframe
mentions_twint.to_csv('../data/processed/' + keyword + '_mentions_twint.csv',index=False)

<h3>Section 4. Loop over selected journalists handles and scrape their tweets (4.1) and mentions (4.2) using Twitter API<h3>
    <h4>Section 4.1: Scrape tweets using Twitter API<h4> 

In [None]:
import json
import tweepy
from src.data import api_tweepy as api
#Load twitter API credentials and return a tweepy API instance
tw_api = api.connect_API('../src/data/twitter_credentials.json')

In [None]:
from src.data.api_tweet_tools import request_user_timeline, batch_request_user_timeline
cyber_test_api = batch_request_user_timeline(tw_api, journo_handles, '../data/processed/')
# Check
cyber_test_api.head()

<h4>Section 4.2: Extract mentions from API tweets<h4> 

In [None]:
from src.data import data_cleanup as dc
# from the API dataset, extract mentions based on tweet id and save in a separate csv
mentions_api  = dc.mentions_to_df(cyber_test_api)
# Check
mentions_api.head()

In [None]:
mentions_api.to_csv('../data/processed/' + keyword + '_mentions_api.csv',index=False)

<h3>Section 5. Data cleaning and standardization/LDA<h3>
     <h4>Section 5.1: Clean and standardize Twint dataset<h4>

In [25]:
# Standardise the twint output 
from src.data import data_cleanup as dc

# Standardize Twint dataset for graph DB loading
standard_tweet_twint = dc.clean_twint_dataframe(cyber_test)
# Check
standard_tweet_twint.head()

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  standard_df = pd.concat([standard_df, twint_df[twint_df.columns.intersection(standard_df.columns)]], axis=0)


Unnamed: 0,conversation_id,hashtags,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,like_count,name,quoted_status,quoted_status_id,replies_count,retweet_count,rt_id,rt_screen_name,rt_text,rt_user_id,screen_name,text,tweet_created_at,tweet_id,user_id
0,1299059177474732034,"['starwars', 'openingnightlive']",,,[335773502],0,Lucy Ingham,,,0,0,,,,,_lucyingham,"Oh EA, what have you done to #StarWars? And Th...",2020-08-27 21:00:17,1299059177474732034,335773502
1,1299057580774432770,['openingnightlive'],,,[335773502],1,Lucy Ingham,,,0,0,,,,,_lucyingham,We need more diversity in video game show pres...,2020-08-27 20:53:57,1299057580774432770,335773502
2,1298282082746216449,[],,,"[335773502, 1244308848, 26069433]",2,Lucy Ingham,,,1,0,,,,,_lucyingham,You're going to have to give us a training ses...,2020-08-25 17:53:48,1298287471004983296,335773502
3,1298131556251262976,[],,,"[335773502, 4385491]",1,Lucy Ingham,,,1,0,,,,,_lucyingham,Ah sorry looks like they've changed the option...,2020-08-25 08:12:15,1298141120128524288,335773502
4,1298131556251262976,[],,,"[335773502, 4385491]",2,Lucy Ingham,,,1,0,,,,,_lucyingham,"On Google Drive, have you set link sharing to ...",2020-08-25 07:53:41,1298136447258697728,335773502


In [26]:
# Save the dataframe
standard_tweet_twint.to_csv('../data/processed/' + keyword + '_standard_tweets_twint.csv',index=False)

<h4>Section 5.2: Clean and standardize API dataset<h4>

In [None]:
# Standardise the twint output 
from src.data import data_cleanup as dc

# Standardize API dataset for graph DB loading
standard_tweet_api = dc.clean_API_dataframe(cyber_test_api)
# Check
standard_tweet_api.head()

In [None]:
# Save the dataframe
standard_tweet_api.to_csv('../data/processed/' + keyword + '_standard_tweets_api.csv',index=False)

<h3>Section 6. Create graph database and import twitter data into it<h3>
    <h4>Section 6.1: Import modules and load graph database<h4> 

In [14]:
# import standard libraries
import numpy as np
import pandas as pd
from py2neo import Graph
from py2neo.data import Node, Relationship
from src.graph_database import graphdb as gdb

# load / declare the database
graph = gdb.get_graph(new_graph = True)
graph

<Graph database=<Database uri='bolt://localhost:7687' secure=False user_agent='py2neo/4.3.0 neobolt/1.7.17 Python/3.6.4-final-0 (darwin)'> name='data'>

<h4>Section 6.2: Load user info into graph DB<h4>

In [15]:
# Neo4j import files need to be in a specific folder, however, the csv files saved above are in a different folder, to go around this problem on Windows machines it is
# possible to create a shortcut between the two folders

# lowd in user information
print('Loading in user information and drawing (Person) nodes')
fn_users = 'processed/'+keyword+'_user_profiles.csv'
gdb.load_users(fn_users ,graph)

Loading in user information and drawing (Person) nodes


<h4>Section 6.2: Load friend information into DB<h4> 

In [16]:
# load in friend information
print('Loading in friends info and drawing [FOLLOWS] edges')
fn_friends = 'processed/'+keyword+'_journalist_friends.csv'
gdb.load_friends(fn_friends,graph)

Loading in friends info and drawing [FOLLOWS] edges


<h4>Section 6.3: Load tweet data into DB<h4> 

In [27]:
# load in tweet information from twint
print('Loading in tweets and drawing (Tweet) nodes')
fn_tweets = 'processed/'+keyword+'_standard_tweets_twint.csv'
gdb.load_tweets(fn_tweets ,graph) 

Loading in tweets and drawing (Tweet) nodes


In [None]:
# load in tweet information from API
print('Loading in tweets and drawing (Tweet) nodes')
fn_tweets = 'processed/'+keyword+'_standard_tweets_api.csv'
gdb.load_tweets(fn_tweets ,graph) 

<h4>Section 6.4: Draw edges between users and their tweets<h4> 

In [28]:
# draw edges between users and their tweets
print('Drawing [POSTS] edges')
gdb.get_posts(graph)


Drawing [POSTS] edges


<h4>Section 6.5: Load tweets' mentions<h4> 

In [29]:
# load in mentions information
print('Loading in mentions and drawing [MENTIONS] edges')
fn_mentions = 'processed/'+keyword+'_mentions_twint.csv'
gdb.load_mentions(fn_mentions,graph)

Loading in mentions and drawing [MENTIONS] edges


### Draw TALKS_ABOUT edges between users

In [30]:
gdb.get_talk_about_edges(graph)

<h4>Section 6.6: Run page rank algorithm using [FOLLOWS] [MENTIONS] edges<h4> 

In [31]:
# run Page rank using follower and mention edges
print('running page rank')
nodelist = ['Person']
edgelist = ['FOLLOWS']
page_rank = gdb.run_pagerank(nodelist,edgelist,graph,new_native_graph=True)

#df = pd.DataFrame.from_records(page_rank)#, columns=['screen name', 'rank', 'n_followers'])

running page rank


In [32]:
unboosted_top_10_follows = page_rank[:10]
print(unboosted_top_10_follows)

       screen name      rank n_followers
0  securitycharlie  0.151544       10825
1        fisher85m  0.151544       87951
2          gcluley  0.151544       97889
3     bsideslondon  0.151302        8071
4     ronaldvdmeer  0.151302        8298
5     ellieturnell  0.150939         110
6   realsexycyborg  0.150939      137934
7      kim_crawley  0.150939       15553
8        antgrasso  0.150939      157070
9  drjessicabarker  0.150939       16017


In [36]:
print('running page rank')
nodelist = ['Person']
edgelist = ['TALKS_ABOUT']
page_rank = gdb.run_pagerank(nodelist,edgelist,graph,new_native_graph=True)
unboosted_top_10_talks_about = page_rank[:10]
print(unboosted_top_10_talks_about)

running page rank
      screen name      rank n_followers
0       intel_owl  0.213750        None
1      matte_lodi  0.213750        None
2  blackhatevents  0.164167      279667
3       albinowax  0.164167       30127
4       safetydet  0.164167         370
5       dailyswig  0.164167        4994
6       joelgmsec  0.164167         904
7     consequence  0.164167        None
8    mcrmetrolink  0.164167        None
9   nailheadparty  0.164167        None


<h4>Section 6.7: Get a weighted random sample from the journalists friends<h4> 

In [33]:
# get a weighted random sample of users
n_sample = 20
fields = ['rank']
exponents = [2]
sample = gdb.get_multiple_weighted_sample(page_rank,n_sample,fields,exponents)

In [34]:
sample[:10]

Unnamed: 0,screen name,rank,n_followers
512,micleadership,0.150242,179.0
568,kevinmitnick,0.150242,251710.0
355,windows,0.150605,6287423.0
827,blissfoster,0.15,
671,jessrobin96,0.150242,4442.0
26,mcafee,0.150847,116282.0
234,zahrasalmanasif,0.150697,734.0
384,mmurray,0.150605,9571.0
139,edwardsclm,0.150697,215.0
559,patrickwardle,0.150242,23739.0


## Boost graph to flesh out connections

In [35]:
niter = 3
nsample = 3
fields = ['rank']
exponents = [2]
kwargs = {'n_retries':2,
         'suppress':False}

pagerank_params = nodelist, edgelist, graph
#from src.graph_database import graphdb_dev as gdb
gdb.boost_graph(niter,nsample,fields,exponents,pagerank_params,keyword,kwargs)

boost iteration  1
Attempt #1 to get friends of @hart_jason
Attempt #1 to get friends of @cedyuen
Attempt #1 to get friends of @airosecurity


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable


Attempt #2 to get friends of @cedyuen


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable


Attempt #3 to get friends of @cedyuen


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @airosecurity saved to: ../data/raw/cybersecurity_friends_airosecurity.csv


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @hart_jason saved to: ../data/raw/cybersecurity_friends_hart_jason.csv
@hart_jason follows 1100 users.
@airosecurity follows 598 users.

Total number of handles pulled: 1698
Number of unique twitter handles: 1629

Zero following in list for users: ['cedyuen']
boost iteration  2


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable


Attempt #1 to get friends of @dmbisson
Attempt #1 to get friends of @denisemberard
Attempt #1 to get friends of @sambowne


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.feed:Follow:IndexError


Attempt #2 to get friends of @sambowne


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @sambowne saved to: ../data/raw/cybersecurity_friends_sambowne.csv


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @denisemberard saved to: ../data/raw/cybersecurity_friends_denisemberard.csv


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @dmbisson saved to: ../data/raw/cybersecurity_friends_dmbisson.csv
@dmbisson follows 8854 users.
@denisemberard follows 3051 users.
@sambowne follows 1916 users.

Total number of handles pulled: 13821
Number of unique twitter handles: 12927

Zero following in list for users: []
boost iteration  3
Attempt #1 to get friends of @securitybrew
Attempt #1 to get friends of @monkeybanking
Attempt #1 to get friends of @cbrreynolds


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.feed:Follow:IndexError


Attempt #2 to get friends of @monkeybanking


CRITICAL:root:twint.get:User:'NoneType' object is not subscriptable
CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @cbrreynolds saved to: ../data/raw/cybersecurity_friends_cbrreynolds.csv


CRITICAL:root:twint.feed:Follow:IndexError


Results for @securitybrew saved to: ../data/raw/cybersecurity_friends_securitybrew.csv


CRITICAL:root:twint.feed:Follow:IndexError
CRITICAL:root:twint.feed:Follow:IndexError


Results for @monkeybanking saved to: ../data/raw/cybersecurity_friends_monkeybanking.csv
@securitybrew follows 900 users.
@monkeybanking follows 1574 users.
@cbrreynolds follows 430 users.

Total number of handles pulled: 2904
Number of unique twitter handles: 2843

Zero following in list for users: []


In [36]:
page_rank = gdb.run_pagerank(nodelist,edgelist,graph,new_native_graph=True)
boosted_top_10_follows = page_rank[:10]
print(boosted_top_10_follows)

       screen name      rank n_followers
0    jesscahaworth  0.187568         668
1       joe_carson  0.185095        1847
2     dannyjpalmer  0.184818        7436
3     sinon_reborn  0.183668       14663
4   blackhatevents  0.160634      279675
5       briankrebs  0.160634      289034
6            k8em0  0.160025       98301
7          evacide  0.159076      132924
8  swiftonsecurity  0.159076      309475
9         symantec  0.158828      206759


Unnamed: 0,0,1,2
0,securitycharlie,0.151697,10820
1,fisher85m,0.151697,87927
2,gcluley,0.151697,97866
3,bsideslondon,0.15142,8066
4,ronaldvdmeer,0.15142,8297
5,ellieturnell,0.151011,111
6,realsexycyborg,0.151011,137800
7,kim_crawley,0.151011,15485
8,antgrasso,0.151011,156648
9,drjessicabarker,0.151011,15957
