Skip to content

A pipeline to analyze DEI initiatives in CSR reports with Python

Notifications You must be signed in to change notification settings

partigabor/dei-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dei-index

The following code (main.ipynb) is a pipeline of NLP solution(s) to quickly analyze and compare company documents. Currently the program can read company reports and output a set of observations about said reports' contents, quantifying relative mentions of certain key terms and phrases, and provide simple visualizations. It uses TF-IDF scores (term frequency–inverse document frequency), a common technique known from text mining and information retrieval. This metric, TF-IDF takes the frequency of a term in a documents, multiplied by the log of the term's inverse document frequency (the number of documents it appears divided by the total number of documents), resulting in higher scores if a term is unique, and lower scores if a term is common across the corpus.

This brief example is focusing on the terms 'diversity', 'equity', and 'inclusion' in CSR reports of two big beverage companies over the past years. For a big data approach, I recommend using the Jena Organization Corpus (JOCo) which is a 280 million word corpus of US, UK, and German company reports.

The ultimate goal is the creation of an index to capture and measure companies DEI practices and initiatives.

At present, this program can:

  • read in and pre-process txt and pdf files of company documents and reports,
  • collate their contents in a dataframe
  • tokenize, remove stopwords, and lemmatize text
  • calculate tf-idf scores for every document in the corpus
  • compare a set of selected documents and visualize the comparison

This code was tested on a local machine, on Windows, using VSCode and Python 3.9.13 via Anaconda.

About

A pipeline to analyze DEI initiatives in CSR reports with Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages