Skip to content

Clustering of tweets in order to provide users profiles using Spark MLlib.

Notifications You must be signed in to change notification settings

geektoni/twitter_user_profiling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tweets Clustering for User Profiling

Main Authors:

This repository contains a Spark pipeline which can be used to parse and analyze a Twitter dataset in order to perform clustering of the tweets such to extract user profiles and insight about the different possible communities which are present on the social Network

Installation

This project requires Python3 and pip3. The installation of the required packages can be done automatically by running this line on your terminal:

pip3 install -r requirements.txt

Usage

In each directory you can find the following:

  • data: contains a Twitter miner which can be used to mine tweets directly from Twitter API and some mock twitter data;
  • model: contains the model used for this work (scikit-learn and spark);
  • preprocessing: contains the utilities employed to clean the dataset and perform feauture extraction;
  • script: contains some utilities used to test everything.

Contributing

This project is not maintained anymore. It serve just as a showcase about several features of Apache Spark and MLlib.

License

MIT