Data Challenge Covid-19

Participants

Kontogeorgos Ioannis (P3351807)
Panagiotis Spyrakis (p3351819)

Description

he goal of this challenge was to apply machine learning algorithms and techniques in a real world problem. The problem was to classify messages posted on Twitter which were related with the recent SARS-CoV2 outbreak. Each message was classified into one out of fifteen classes based on its context. Also we were given a graph that models the retweet relationships between users of the platform

The project has been split into 4 sections(notebooks).

Notebook communities_detection contains the implementation for the community detection algorithms. It also contains the ensabling of the community detection extracted features, with some Supervised ML Algorithms.
Notebook ml_classifiers contains the implementation for a set of Supervised ML Algorithms.
Notebook flair_text_classification_torch contains the RNN implemetation using the Flair Framework.
Notebook keras_rnn_stackoverflow_posts_with_lstm contains the Uni-Bi LSTM - RNN implementation using Keras, Talos

Project Structure

The project is structured with the below formation:

Folder /notebooks which contains the jupyter notebooks implementation described above.
Folder /app which contains the main functionality of the custom implementation for the LSTM RNN using Keras.
- preprocessing.py module, which contains all the functions that are used from the notebooks for the required data formation and processing.
- models.py module, which contains the code for the RNNs models generation. The implementation is relevant to the Talos documentation in order to complete a successful parameter tuning for each model.
- metrics.py module, which contains the code for the RNNs models compilation metrics as f1 , accuracy. etc
- visualization.py module, which contains reusable code for the models performance visualization.
- layers.py module, which contains the code for the custom Layers of LinearAttention and DeepAttention that will be used by the RNN models.
Folder data, which contains all the reusable data sources, execution logs. The autogenerated subfolders for the better file handling are:
- fasttext_dir: Here should be placed all fasttex Embeddigs related files.
- flair_data_dir: Here should be placed all the input data for the Corpus of the Flair Models.
- flair_emdg_dir: Here should be placed all the downloaded embeddings and vocabularies for the Flair Models.
- flair_output_dir: The folder with the execution logs of the flair package.
'definitions.py' module, which contains some global variables and definitions which are essential for the project organization.

Setup Instructions

Create or load an existing conda environment

 conda env create -f requirements.yml
 conda activate text_analytics

Useful reference link

Install the packages from the requiremets.txt file
Download the vocabulary that will be used in the project and place it under the /data folder Download here

Flair Embeddings

Twitter Embeddings

  https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/twitter.gensim.vectors.npy
  https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/twitter.gensim

News Forward English

  https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/lm-news-english-forward-1024-v0.2rc.pt

News Backward English

  https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/lm-news-english-backward-1024-v0.2rc.pt

Glove

  https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/glove.gensim.vectors.npy
  https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/glove.gensim

`

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
baseline_classifiers		baseline_classifiers
notebooks		notebooks
outputs		outputs
.gitignore		.gitignore
README.md		README.md
definitions.py		definitions.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

baseline_classifiers

baseline_classifiers

notebooks

notebooks

outputs

outputs

.gitignore

.gitignore

README.md

README.md

definitions.py

definitions.py

requirements.txt

requirements.txt

Repository files navigation

Data Challenge Covid-19

Participants

Description

Project Structure

Setup Instructions

About

Releases

Packages

Languages

giannhskod/covid19_tweets_classifier

Folders and files

Latest commit

History

Repository files navigation

Data Challenge Covid-19

Participants

Description

Project Structure

Setup Instructions

About

Resources

Stars

Watchers

Forks

Languages