Repository for the Final Project for Natural Language Processing with Representation Learning (DS-GA.1011), Fall 2022
This is the repository for the final project for the course, creating a weakly-supervised model to obtain ESG sin/green scores based on Reddit posts about a company and SEC filings.
We built two models (classification and regression) in an attempt to reconcile between classes of institutions and investors with differently abled access to information.
Next steps for the project include:
- Standardizing vocabulary
- Finding a suitable dimensionality reduction method for words and corresponding Shapley values
- Utilizing a more SOTA model, one suitable for finance
- Consider scraping other sources of intitutional trading data