Skip to content

matsujju/Multi_Tag_Prediction

Repository files navigation

This repository contains files for this end-to-end Project (From Scraping to WebApp). The app is made with Dash interactive python framework developed by Plotly. Dash is a simple and effective to bind user interface around python code.

Overview -

  • The Project is an end-to-end WebApp built for interactive visualization of Machine learning Model used for classifying multiple tags from a given text input.
  • This Project also include Preprocessing and Model Building steps in the form of Jupyter Notebook where all the steps from Basic cleaning of text to Building a accurate model are present.
  • It also includes text data from StackOverflow Website which used Scrapy Spiders to scrape and Collect the data of 200,000+ Questions and Answers and can be found here.

Getting Started -

Running the app locally

First create a virtual environment with conda or venv inside a temp folder, then activate it.

virtualenv venv

# Windows
venv\Scripts\activate
# Or Linux
source venv/bin/activate

Clone the git repo, then install the requirements with pip

git clone https://github.com/matsujju/Multi_Tag_Prediction.git
cd Desktop/temp_folder/Multi_Tag_Prediction/        (Here temp_folder is in Desktop...choose your own path if different)
pip install -r requirements.txt

Run the app (from your terminal)

python dash_app.py

Open a browser at http://127.0.0.1:8050

About the app -

This WebApp predicts the tags based on the user input, given the input is Computer Programming related. It lets user select some preprocessing functions, number of tags and threshold value to play with.

Built with -

  • Dash - Main server and interactive components
  • Plotly Python - Used to create the interactive plots
  • Pandas - Exploring and Manipulating the data
  • scikit-learn - Simple library for predictive data analysis
  • NLTK - Library to work with human language data

Screenshots -

Followings are the screenshots of the app in this repo:

Credits -