Skip to content

giannhskod/covid19_tweets_classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Challenge Covid-19

Participants

  • Kontogeorgos Ioannis (P3351807)
  • Panagiotis Spyrakis (p3351819)

Description

he goal of this challenge was to apply machine learning algorithms and techniques in a real world problem. The problem was to classify messages posted on Twitter which were related with the recent SARS-CoV2 outbreak. Each message was classified into one out of fifteen classes based on its context. Also we were given a graph that models the retweet relationships between users of the platform

The project has been split into 4 sections(notebooks).

  1. Notebook communities_detection contains the implementation for the community detection algorithms. It also contains the ensabling of the community detection extracted features, with some Supervised ML Algorithms.

  2. Notebook ml_classifiers contains the implementation for a set of Supervised ML Algorithms.

  3. Notebook flair_text_classification_torch contains the RNN implemetation using the Flair Framework.

  4. Notebook keras_rnn_stackoverflow_posts_with_lstm contains the Uni-Bi LSTM - RNN implementation using Keras, Talos

Project Structure

The project is structured with the below formation:

  1. Folder /notebooks which contains the jupyter notebooks implementation described above.

  2. Folder /app which contains the main functionality of the custom implementation for the LSTM RNN using Keras.

    • preprocessing.py module, which contains all the functions that are used from the notebooks for the required data formation and processing.

    • models.py module, which contains the code for the RNNs models generation. The implementation is relevant to the Talos documentation in order to complete a successful parameter tuning for each model.

    • metrics.py module, which contains the code for the RNNs models compilation metrics as f1 , accuracy. etc

    • visualization.py module, which contains reusable code for the models performance visualization.

    • layers.py module, which contains the code for the custom Layers of LinearAttention and DeepAttention that will be used by the RNN models.

  3. Folder data, which contains all the reusable data sources, execution logs. The autogenerated subfolders for the better file handling are:

    • fasttext_dir: Here should be placed all fasttex Embeddigs related files.

    • flair_data_dir: Here should be placed all the input data for the Corpus of the Flair Models.

    • flair_emdg_dir: Here should be placed all the downloaded embeddings and vocabularies for the Flair Models.

    • flair_output_dir: The folder with the execution logs of the flair package.

  4. 'definitions.py' module, which contains some global variables and definitions which are essential for the project organization.

Setup Instructions

  1. Create or load an existing conda environment

     conda env create -f requirements.yml
     conda activate text_analytics
    

    Useful reference link

  2. Install the packages from the requiremets.txt file

  3. Download the vocabulary that will be used in the project and place it under the /data folder Download here

  4. Flair Embeddings

    Twitter Embeddings

      https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/twitter.gensim.vectors.npy
      https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/twitter.gensim
    

    News Forward English

      https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/lm-news-english-forward-1024-v0.2rc.pt
    

    News Backward English

      https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/lm-news-english-backward-1024-v0.2rc.pt
    

    Glove

      https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/glove.gensim.vectors.npy
      https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/glove.gensim
    

    `

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published