Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.

How to run the NLP code?

  1. First we train the model using: \train_models\ or LDAMarius.slurm if using a multi-core system.

    • The model files are saves under .\pretrained_models
    • Estimation logs are saved under .\train_models\Logs
    • Perplexity scores are saved under .\train_models\Output
  2. Second, we run the \train_models\ file to plot perplexity against the number of topics and select the optimal topics.

    • Output is a graph, perplexity_topics.pdf.
    • The file also outputs the top ten words for each topic, given a (manually) prespecified number of topics X: topics_terms_n=X.pdf.
  3. The file generates a quarter-industry panel of topic loadings, saved in the file: .\IndustryAnalysis\topic_loadings_by_industryquarter.csv'

  4. Code .\IndustryAnalysis\ generates the top 2 topics (with list of words) for each GIC code and saves in TopTopics_Industries.csv.

  5. Code (together with ShapleyMarius.slurm) generate panels of Shapley values by analyst-ticker-quarter (including information diversity, contribution), saved in OutputShapley folder.

  6. Use in the OutputShapley folder to generate a DataShapley.csv file.

  7. Run to get a file with analyst-level topic loadings on technical analysis topics (`DataShapley_TechnicalTopicWeights.csv')

  8. The complete merged file (DataShapley.csv + DataShapley_TechnicalTopicWeights.csv') is saved as Data_InfoContributionAnalyst.csv'


Data and code for Martineau and Zoican (2021): building an information contribution measure for sell-side analyst reports



No releases published


No packages published