This Exploratory Project includes building a Hashtag Recommendation System for Twitter.
This project is a part of the Exploratory Project included in semester IV of academic session 2015-16. It was carried out by Deepak Yadav and Faizan Khan, Department of Computer Science and Engineering, IIT(BHU), under the guidance of Associate Professor Dr. S.K.Singh. The primary aim of the project was to help the students apply their skills to a real world project and gain experience.
The major details for the coding project are as follows:
Hashtags provide users with a tagging mechanism to help organize, group, and create visibility for their posts. This is a simple idea but can be challenging for the user in practice which leads to infrequent usage. Hashtag recommendation comes with numerous challenges including processing huge volumes of streaming data and content which is small and noisy. We will use preprocessing methods to reduce noise in the data and determine an effective method of hashtag recommendation to get better recommendations.
To build a Twitter Hashtag Recommendation System. The System will recommend the user other hashtags based on his/her given input.
- Domain : Twitter Tweets retrieved with the Twitter API
- Purpose : Recommend similar hashtags to users
- Personalization Level : Generic
- Interfaces : Explicit Input, Recommended Output
We will be more than happy to improve upon this basic project. Any advice is appreciated. Here is how you can contribute to the project:
- The easiest way to contribute is to point out issues.
- Send us PRs with what you think can be enhanced in the project.
- Contact us(see below) with the features you would like to see in the project and are willing to contribute to.
The basic requirements of the project are as follows (Note: The project was developed with the following specifications and may also work with earlier versions):
- Python v2.7
- HTMLParser
- Tkinter module
- R v3.2.2
The recommender uses the Twitter API to obtain the raw tweets for preprocessing. To obtain a large volume of data, the Streaming API is used. This tutorial provides a very basic introduction to getting tweets using a Python library called Python Twitter Tools to connect to Twitter API and downloading the data from Twitter.
Cleaning the tweets before processing is done using various steps.
- This infographic provides some of the basic details about cleaning tweet text.
- Most of the cleaning is done using Regular Expressions in Python 2/3 and R. This tutorial from Tutorials Point may be helpful.
- The stop words to remove from the tweets may be found here.
The app uses a very basic interface developed using the Python Tkinter module. This basic tutorial may be helpful.
Please contact us regarding any issues and/or suggestions
Department of Computer Science and Engineering IIT(BHU), Varanasi, India
Department of Computer Science and Engineering IIT(BHU), Varanasi, India