Twitter Clustering Project

These are a collection of scripts that allow you to apply the DBSCAN algorithm to a person's "Followed" list on Twitter.

get_followers.py and get_complete_follow_graph.py are for collecting the Twitter data, and process_data.jl is for processing it into clusters.

Usage

This project depends on the Python packages listed in requirements.txt. Install them system-wide or in a virtualenv.

The collection process takes 3 steps:

Run get_followers.py with three CLI arguments: the username of the account you wish to examine, your own Twitter username, and your Twitter password. Pipe the output of this into a file with a name of your choosing.
Run get_complete_follow_graph.py with three CLI arguments: the file where the output of step 1 is stored, your Twitter username, and your Twitter password. This collects the follow lists of everyone in your follow list and puts it in data/ folder.
Run process_data.jl with one CLI argument: the name of a data folder (data/ itself, or another folder if you've moved the output of step 2 somewhere else). This will output clusters.

Current Issues

Workflow needs improvement
Selenium script unable to operate in PhantomJS, so it annoyingly opens Firefox browser windows.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.emsal1863-2.swp		.emsal1863-2.swp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deps.jl		deps.jl
emsal1863		emsal1863
emsal1863-2		emsal1863-2
get_complete_follow_graph.py		get_complete_follow_graph.py
get_followers.py		get_followers.py
output.pdf		output.pdf
process_data.jl		process_data.jl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.emsal1863-2.swp

.emsal1863-2.swp

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

deps.jl

deps.jl

emsal1863

emsal1863

emsal1863-2

emsal1863-2

get_complete_follow_graph.py

get_complete_follow_graph.py

get_followers.py

get_followers.py

output.pdf

output.pdf

process_data.jl

process_data.jl

requirements.txt

requirements.txt

Repository files navigation

Twitter Clustering Project

Usage

Current Issues

About

Releases

Packages

Languages

License

emsal0/twitter_clustering_py

Folders and files

Latest commit

History

Repository files navigation

Twitter Clustering Project

Usage

Current Issues

About

Resources

License

Stars

Watchers

Forks

Languages