Text classificator

This repo is intended for final project of course M9DM2 Data mining 2.

Authors: Stanislav Zámečník, Vojtěch Šindlář.

Description of project: the web application is intended for classifying articles into three categories: sport, travel, science (articles are supposed to be in english).

Features:

Categorizing copy of any text article.
Possibility to add .txt file for categorization.
It's also possible to use combination of copy of text and text from .txt file.
Use of webscraper to obtain own data from page https://www.dailymail.co.uk/home/index.html
Possibility to train your own model.

Folders:

data - .npy files containing n articles from dailymail.co page for pretraining
data_preparation - contains scripts for web-scraping, data cleaning and modelling
model - contain saved weights ofr pretrained model in format .h5
files_for_deployment - necessary files for heroku deployment
test_files - files for showing the functionality of web application.

How to run app: streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
__pycache__		__pycache__
data		data
data_preparation		data_preparation
files_for_deployment		files_for_deployment
model		model
presentation		presentation
test_files		test_files
README.md		README.md
app.py		app.py
model.ipynb		model.ipynb
skuska.py		skuska.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text classificator

About

Releases

Packages

Contributors 2

Languages

stazam/Datamining2-project

Folders and files

Latest commit

History

Repository files navigation

Text classificator

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages