Tweets Clustering for User Profiling

Main Authors:

This repository contains a Spark pipeline which can be used to parse and analyze a Twitter dataset in order to perform clustering of the tweets such to extract user profiles and insight about the different possible communities which are present on the social Network

Installation

This project requires Python3 and pip3. The installation of the required packages can be done automatically by running this line on your terminal:

pip3 install -r requirements.txt

Usage

In each directory you can find the following:

data: contains a Twitter miner which can be used to mine tweets directly from Twitter API and some mock twitter data;
model: contains the model used for this work (scikit-learn and spark);
preprocessing: contains the utilities employed to clean the dataset and perform feauture extraction;
script: contains some utilities used to test everything.

Contributing

This project is not maintained anymore. It serve just as a showcase about several features of Apache Spark and MLlib.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
data		data
model		model
preprocessing		preprocessing
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweets Clustering for User Profiling

Installation

Usage

Contributing

License

About

Releases 1

Packages

Contributors 2

Languages

geektoni/twitter_user_profiling

Folders and files

Latest commit

History

Repository files navigation

Tweets Clustering for User Profiling

Installation

Usage

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages