## Goal of this exercise

We like to search twitter for a list of search items such as hashtags or keywords. These tweets we want to download and generate a graph from it that we can analyse with Gephi(or in R).

### Twitter Search API

We need to setup the twitter API first. For this excercuse we will use the twitter-app credentials of DIKW Academy. So please use with care! 

In [None]:
# if not installed please do run the line below once
#!pip install TwitterSearch

In [None]:
# test if we have this library installed
from TwitterSearch import *
import json
import pprint

# we need access to de development api of twitter
# see https://developer.twitter.com/en/docs/twitter-api
# put your credentials in a seperate file to load (and do not commit this file :-)

#        consumer_key 
#        consumer_secret
#        access_token
#        access_token_secret

import twittercredentials as tc

## Search twitter

We search twitter by setting the keywords we are interested in.

For example let's search for '#oversterfte', 'wappies', 'mRNA'

In [None]:
# we now are going to search the twitter API
try:
    tso = TwitterSearchOrder() # create a TwitterSearchOrder object
    tso.set_keywords(['#oversterfte']) # let's define all words we would like to have a look for
    tso.set_language('nl') # we want to see Dutch tweets only
    tso.set_include_entities(True) # and don't give us all those entity information

    # it's about time to create a TwitterSearch object with our secret tokens
    ts = TwitterSearch(
        tc.consumer_key,
        tc.consumer_secret,
        tc.access_token,
        tc.access_token_secret
     )

    
except TwitterSearchException as e: # take care of all those ugly errors if there are some
    print(e)

## Error 429

If we use the API to much we get he following message:

Error 429: ('Too Many Requests: Request cannot be served ', "due to the application's rate limit having ", 'been exhausted for the resource')

In [None]:
# Get all tweetdata and store them in an JSON array
# this can take some time depending on the amount of tweets your search has hit
tweets = []
for tweet in ts.search_tweets_iterable(tso):
    #print( tweet)
    tweets.append(tweet)

# so how many tweets did we get?
len(tweets)

In [None]:
# Well, this is wat a tweet looks like when all items are loaded
pprint.pprint(tweets[0])

In [None]:
# if we want to store our results in a flat list
tweets_list = []

# this is where the fun actually starts :)
for tweet in tweets:
    #print( '@%s | %s | %s | %s | %s | %s | %s' % ( tweet['user']['screen_name'], tweet['user']['created_at'], tweet['user']['followers_count'], tweet['user']['location'], tweet['id'], tweet['text'], tweet['created_at'] ) )
    # create a list of wanted items from the captured tweet
    l = ['@'+str(tweet['user']['screen_name']), tweet['user']['created_at'], tweet['user']['followers_count'], tweet['user']['location'], tweet['id'], tweet['text'], tweet['created_at'], tweet['user']['entities']]
    # create  delimited string from the items in the list
    tweets_list.append(l)
    
# so how many tweets did we get?
len(tweets_list)

In [None]:
len(tweets_list)

In [None]:
# can we transform the list to a pandas dataframe
import pandas as pd

# colum names in dataframe
col = ('screen_name','user_created_date','followers','location','tweet_id','tweet_text','tweet_created_at','entities')

# create dataframe
df = pd.DataFrame(tweets_list, columns=col)

df.head()

In [None]:
# save dataframe as a csv file with pipe seperator for later analysis
fname = 'tweets-#oversterfte-20221021.txt'
sep = '|'
df.to_csv(fname,sep=sep)

In [None]:
import pandas as pd;

df = pd.read_csv('tweets-#oversterfte-20221021.txt', sep='|',index_col = False)
len(df)

In [None]:
df= df.drop(df.columns[0],axis=1)
df.head()


In [None]:
df.columns

Great, now we have collected some tweets with some meta data on them.
Let's load the saved file back in and have a look at what we collected.

### NetworkX
Now we are ready to load this data into a graph, You can be creative what to map to an edge or a node in your graph

In [None]:
# scrap board



In [None]:
# load networkX
import networkx as nx
import datetime

# create an empty graph
Gnew=nx.Graph()

# Loop over all tweets 
for tweet in tweets:
    
    
        # Add node
        #Gnew.add_node(row['screen_name'], {'type':'user', 'followers':str(row['followers'])})
        
        # Add users
        Gnew.add_node(str(tweet['user']['screen_name']), type="user", followers=str(tweet['user']['followers_count']))
        #Gnew.add_node(tweet['id_str'], {'label':tweet['text'],'type':'tweet', 'followers':str(tweet['user']['followers_count'])})
        Gnew.add_edge(str(tweet['user']['screen_name']),tweet['id_str'],label="tweet")    
        
              # If The tweet is a retweet then we create a new node for the tweet and an edge between tweet and retweet
        try:
            hashtags = tweet['retweeted_status']['id_str']
            Gnew.add_node(tweet['id_str'], label=tweet['text'], type="retweet", followers=str(tweet['user']['followers_count']))
            Gnew.add_edge(tweet['id_str'],tweet['retweeted_status']['id_str'],label="retweet")
        
        # Tweet is not an retweet
        except:
            Gnew.add_node(tweet['id_str'], label=tweet['text'],  type="tweet", followers=str(tweet['user']['followers_count']))
        
        # Add Edge (user -> tweets)    
        
               # Loop over all hashtags
        try:
            hashtags = tweet['entities']['hashtags']
            for hashtag in hashtags:
                Gnew.add_node(hashtag['text'], label=hashtag['text'],  type="hashtag")
                Gnew.add_edge(tweet['id_str'],hashtag['text'],label="hashtag")
        except:
            pass
        
                # Loop over all media
        try:
            media = tweet['entities']['media']
            for medium in media:
                Gnew.add_node(medium['display_url'], label=medium['display_url'],  type="media")
                Gnew.add_edge(tweet['id_str'],medium['display_url'],label="media")
        except:
            pass

      
        # Loop over all links
        try:
            links = tweet['entities']['urls']
            for link in links:
                Gnew.add_node(link['display_url'], label=link['display_url'],  type="link")
                Gnew.add_edge(tweet['id_str'],link['display_url'],label="link")
        except:
            pass
        
             
         # Loop over all mentions
        try:
            mentions = tweet['entities']['user_mentions']
            for mention in mentions:
                Gnew.add_node(str(mention['screen_name']), type="user")
                Gnew.add_edge(str(tweet['user']['screen_name']),mention['screen_name'],label="mentions")
                #print(str(tweet['user']['screen_name']),mention['screen_name'],{'label':'mentions'})
        except:
            pass
        

## Inline magic

Jupyter notebooks can do magic !

Inline magic commands are used to do all kinds of nifty tricks like talk to the OS or loading extentions and a lot more.

See [official magic docs](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

Or [this top 5 magic tricks](https://towardsdatascience.com/the-top-5-magic-commands-for-jupyter-notebooks-2bf0c5ae4bb8)

In [None]:
# we need the following inline magic to get matplotlib to plot inside jupyter notebook
%matplotlib inline

# draw graph (can take some time)
pos=nx.spring_layout(Gnew) # positions for all nodes
nx.draw_networkx_nodes(Gnew,pos,with_labels=False,node_size=20,node_color='0.5',alpha=0.3)
nx.draw_networkx_edges(Gnew,pos,edge_color='0.5', alpha=0.7)

In [None]:
# hmmm not very informative yet, maybe we should switch to something else
# lets expoert this graph so Gephi can read it.

## Gephi

We can export the graph as gexf format and use Gephi to make a beatifull visualization

In [None]:
# save graph
nx.write_gexf(Gnew,'tweets-#oversterfte-20221021.gexf')
print('done')

Now we can finetune and vizualize with [gephi](https://gephi.org/)