Digital Ethnic Futures Lab - SCOTUS College Statement Text Analyis

Description

This repository contains multiple programs intended to analyze the statements released by select colleges on SCOTUS's ruling on affirmative action.

'statement_to_csv.py' utilizes the Google Sheets API to read in data from a column and transform its' contents into individual csv files, stored in folder 'csv_files'
'region_finder.py' tags region information and campus size using data from folder 'data_directory' for specified colleges and transforms it into a csv file 'locations_results.csv'
the 'tfidf' directory contains programs intended to perform term frequency inverse document frequency analysis on our corpus, while the 'sentiment' directory contains programs intended to perform sentiment analysis
the 'ngram' directory performs n-gram analysis on the corpus of text files. it defines functions to preprocess text, tokenize them, then find top n-grams. it also contains functions to compare ngrams
the 'response_comparison' directory contains programs intended to compare the similarity between different responses as well as between the responses and a GPT generated response using Jaccard similarity comparison and cosine similarity
'word_analysis.py' calculates average word count, lexical diversity, and most frequent words for each response in the corpus and outputs it into 'word_analysis_results.csv'

'statement_to_csv' depends on a 'credentials.json' file which is not included in this repository for security reasons. This code does not need to be run as the results are stored in 'csv_files'
'region_finder' can be ran from the home directory
'tfidf_analysis' needs to be ran from the tfidf directory, and 'vader_sentiment' needs to be run from the sentiment directory

This repository deploys 'pandas', 'os', 'vaderSentiment', 'sklearn', 'numpy', 'altair', 'csv', 'nltk', 'sklearn', and the 'googleapiclient' packages.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
csv_files		csv_files
data_directory		data_directory
ngram		ngram
response_comparison		response_comparison
sentiment		sentiment
tfidf		tfidf
README.md		README.md
jbdelta_average.py		jbdelta_average.py
jbdelta_reference.py		jbdelta_reference.py
locations_results.csv		locations_results.csv
region_finder.py		region_finder.py
statement_to_csv.py		statement_to_csv.py
word_analysis.py		word_analysis.py
word_analysis_plot.py		word_analysis_plot.py
word_analysis_results.csv		word_analysis_results.csv
word_phrase_finder.py		word_phrase_finder.py