Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.

How to run the NLP code?

  1. First we train the model using: \train_models\ or LDAMarius.slurm if using a multi-core system.

    • The model files are saves under .\pretrained_models
    • Estimation logs are saved under .\train_models\Logs
    • Perplexity scores are saved under .\train_models\Output
  2. Second, we run the \train_models\ file to plot perplexity against the number of topics and select the optimal topics.

    • Output is a graph, perplexity_topics.pdf.
    • The file also outputs the top ten words for each topic, given a (manually) prespecified number of topics X: topics_terms_n=X.pdf.
  3. The file generates a quarter-industry panel of topic loadings, saved in the file: .\IndustryAnalysis\topic_loadings_by_industryquarter.csv'

  4. Code .\IndustryAnalysis\ generates the top 2 topics (with list of words) for each GIC code and saves in TopTopics_Industries.csv.

  5. Code (together with ShapleyMarius.slurm) generate panels of Shapley values by analyst-ticker-quarter (including information diversity, contribution), saved in OutputShapley folder.

  6. Use in the OutputShapley folder to generate a DataShapley.csv file.

  7. Run to get a file with analyst-level topic loadings on technical analysis topics (`DataShapley_TechnicalTopicWeights.csv')

  8. The complete merged file (DataShapley.csv + DataShapley_TechnicalTopicWeights.csv') is saved as Data_InfoContributionAnalyst.csv'


Data and code for Martineau and Zoican (2021): building an information contribution measure for sell-side analyst reports






No releases published


No packages published