Skip to content

samuellau0802/CBSA-Data-Analytics-Challenge

Repository files navigation

CBSA-Data-Analytics-Challenge

Link: https://docs.google.com/document/d/1RiR0BsdOug2UZ822Up0-FkCEGcv8ItKsyChahozDrOo/edit


Workflow:

  1. Data Exploration

    • {wordcloud}
  2. Preprocess data

    • Tokenization
    • Remove useless data (manual)
    • Remove stop words
    • Remove symbols
  3. Feature Extraction / Word Embeddings

    • Tf-idf
    • Sentence Transformer
  4. Classification

    • Naive bayes
    • SVM
    • CNN (Optional)
    • LSTM (Optional)
  5. Regression

    • CNN
    • LSTM
  6. Unsupervised

    • {lda}
  7. Data Visualization

  8. Others

    • Imbalanced Data

Deck Preparation:

  1. Intro
  2. Pain points, Why this prediction task, context
  3. Importance
  4. Data Exploration
  5. Methodology:
    • tried what model
  6. Model results
    • accuracy
    • training time
    • space complexity (Optional)
  7. Limitations and Improvements
  8. Prototype (Streamlit) (Optional)