Microblogging-Clustering-ML

We present a text representation framework by harnessing the power of semantic knowledge bases, i.e., Wikipedia and Wordnet. The originally uncorrelated texts are connected with the semantic representation, thus it enhances the performance of short text clustering and labelling. The experimental results on Twitter dataset demonstrate the superior performance of our framework in handling noisy and short micro-blogging messages.

The feature space is processed using unsupervised machine learning techniques. In this we try to find hidden structures from unlabelled data and then use K-Means clustering technique, a popular method for cluster analysis in data mining. The resultant clusters are labelled according to the highest informative score of the word contained in the tweets of that cluster using RAKE algorithm.

Dataset

To extract the dataset, Twitter’s Search API, which is a part of Twitter’s REST API is used. It works just like the search feature of Twitter and searches against recent tweets published in the past 7 days. Search API is based on relevance and not completeness. This means that not all users and tweets may be present in the search result. For completeness and realtime retrieval, Streaming API is preferred.

Project Flow

Syntactic Decomposition
Semantic Mapping using knowledge bases like Wordnet and Wikipedia.
Clustering using K Means
Labelling using RAKE

Contribution

Feel free to contribute and suggest new techniques to make this project better.

Project Contributors

Rishabh Gupta

Sachin Agarwal

Shreynik Kumar

Vanshaj Behl

We hope, this project will help you give a start to machine learning world.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
dataset		dataset
references		references
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microblogging-Clustering-ML

Dataset

Project Flow

Contribution

Project Contributors

About

Releases

Packages

Languages

License

imRishabhGupta/Microblogging-Clustering-ML

Folders and files

Latest commit

History

Repository files navigation

Microblogging-Clustering-ML

Dataset

Project Flow

Contribution

Project Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages