Link: https://docs.google.com/document/d/1RiR0BsdOug2UZ822Up0-FkCEGcv8ItKsyChahozDrOo/edit
-
Data Exploration
- {wordcloud}
-
Preprocess data
- Tokenization
- Remove useless data (manual)
- Remove stop words
- Remove symbols
-
Feature Extraction / Word Embeddings
- Tf-idf
- Sentence Transformer
-
Classification
- Naive bayes
- SVM
- CNN (Optional)
- LSTM (Optional)
-
Regression
- CNN
- LSTM
-
Unsupervised
- {lda}
-
Data Visualization
-
Others
- Imbalanced Data
- Intro
- Pain points, Why this prediction task, context
- Importance
- Data Exploration
- Methodology:
- tried what model
- Model results
- accuracy
- training time
- space complexity (Optional)
- Limitations and Improvements
- Prototype (Streamlit) (Optional)