This repo is our attemp on the Telegram Data Clustering contest. More details here : https://contest.com/docs/data_clustering
- The submission folder contains the files that will be finally shared for submission.
- Install the dependencies in
deb_packages.txt
site_packages
are prepared to have the correct dependencies for the debian system that will be used for the evaluation as mentioned in the contest page.- You can create a
virtualenv
and install the pythin dependencies listed inrequirements.txt
by doingpip install -r requirements.txt
- Additionally, some pre-trained vectors need to the downloaded and added to the assets folders.
tgnews
executable has been set up specifically for the contest.- For testing, run the modules using
and so on.
(venv) submission : python src/main.py languages 'test/'
- Jupyter notebooks show the different EDA, trials and modelling section.