NetStream

Given a hypothetical network with a specific structure (defined below), netstream will listen to the twitter public stream for actions on the social network which are determined to constitute interactions between members of our hypothesised model network.

The application I have in mind for this tool is to aggregate the retrieved data to gain a view on the ways in which individuals can, through their varied interests, connect up otherwise disparate sets of ideas. However, by gathering data at the disaggregated tweet level, I hope to make it useful to other projects as well.

Network definition

A network is defined by:

A set of people (nodes) who can each be identified by a (possibly unit) set of strings
Interactions between people (edges), for a given definition of 'interaction'

Note that the network defined above is distinct from the social network consitiuted by Twitter itself. Within the context of this package, Twitter data is used as a source of evidence which can be used to relate individuals in the model network together, but in principle additional sources of evidence (e.g. publication histories) could also be incorporated as an extention to this work.

Details of what constitutes an interaction are given below.

Project structure

netstream uses the tweepy library for accessing the Twitter API, and a PostgreSQL database to store relevant detected tweets. It is broken down into two modules: trawl.py and netprocess.py.

`trawl.py`

trawl.py pulls potentially interesting data out of the public stream into the local database. It errs on the side of caution when it comes to selecting which tweets to keep insofar as it will store all tweets, favourites and likes coming from network members. While many of these activities won't correspond to interactions between network members as initially defined, they may, for example, help indicate new individuals who may be interesting to add to the network for subsequent experiments. This behaviour is predicated on the assumption that the number of individuals in the model network is small enough that they dont, between them, generate much more than 1,000 or so actions/ day. If this proves to be unrealistic the 'generous' data storage behaviour may need to be reviewed.

`netprocess.py`

netprocess.py contains the logic for sifting through tweets captured by trawl.py to identify network interactions. Based on its configuration, it will identify and categorise interactions between nodes, posting its results to derived database tables. The idea is that trawl.py and netprocess.py run asynchronously in separate processes so that, at any time, another user (such as a web server) could log into the database and retrieve up-to-date information about network interactions.

Evidence of network interactions

Identifying an individual

A tweet is deemed as evidence relating to a specific individual (and will be captured in the database by trawl.py) if one of the following conditions holds:

It was sent from a handle known to belong to an individual in the model network
A known handle liked the tweet
A known handle retweeted the tweet
A known handle is mentioned in the tweet
The tweet is in reply to a known handle
The tweet contains one of a set of regular expressions identifying the individual (such as their surname). In some cases, such regular expressions will only match if an additional string defining some sort of context is also found in the tweet. For example, we might always match "Prof. Smarty Pants" but only match "Prof. Pants" if we also match "Hull" (where Prof. Smarty Pants is known to be a lecturer).

All such tweets will be recorded in the database. It will be assumed that any pruning of 'noisy' irrelevant tweets will be performed on the database as part of a manual process including exploratory data analysis.

Defining an interaction

The following scenarios are considered evidence of an interaction between individual $i$ and individual $j$:

A single tweet contains evidence relating to both $i$ and $j$ (as defined above).
$i$ mentions $j$ in a tweet
$i$ replies to a tweet by $j$
$i$ likes a tweet by $j$
$i$ retweets a tweet by $j$

These rules will be implemented in netprocess.py to update the database with information describing network interactions.

Implementation details

Tweet object

Tweet object should include methods for identifying whether it should be retained, and being able to report on the rule(s) according to which it was retained subsequently. It should also provide access to data such as status etc.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
config.json		config.json
netprocess.py		netprocess.py
netstream.py		netstream.py
trawl.py		trawl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NetStream

Network definition

Project structure

`trawl.py`

`netprocess.py`

Evidence of network interactions

Identifying an individual

Defining an interaction

Implementation details

Tweet object

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NetStream

Network definition

Project structure

trawl.py

netprocess.py

Evidence of network interactions

Identifying an individual

Defining an interaction

Implementation details

Tweet object

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`trawl.py`

`netprocess.py`

Packages