Digital Ethnic Futures Lab - SCOTUS College Statement Text Analyis

Description

This repository contains multiple programs intended to analyze the statements released by select colleges on SCOTUS's ruling on affirmative action.

'statement_to_csv.py' utilizes the Google Sheets API to read in data from a column and transform its' contents into individual csv files, stored in folder 'csv_files'
'region_finder.py' tags region information and campus size using data from folder 'data_directory' for specified colleges and transforms it into a csv file 'locations_results.csv'
the 'tfidf' directory contains programs intended to perform term frequency inverse document frequency analysis on our corpus, while the 'sentiment' directory contains programs intended to perform sentiment analysis
the 'ngram' directory performs n-gram analysis on the corpus of text files. it defines functions to preprocess text, tokenize them, then find top n-grams. it also contains functions to compare ngrams
the 'response_comparison' directory contains programs intended to compare the similarity between different responses as well as between the responses and a GPT generated response using Jaccard similarity comparison and cosine similarity
'word_analysis.py' calculates average word count, lexical diversity, and most frequent words for each response in the corpus and outputs it into 'word_analysis_results.csv' while 'word_analysis_plot.py' is used to plot its results
'word_phrase.py' finds the percentage of texts that contain certain words or phrases out of the entire corpus
'identify_category.py' categorizes college responses according to specific lexicons
'jbdelta_average.py' tokenizes responses, calculates word frequency statistics, and computes the deviations of each text from the corpus average using z-scores, as well as visualizes these deviations using a bar chart
'jbdelta_reference.py' is similar to 'jbdelta_average.py', but instead calculates the deviation between a single test text and the rest of the corpus

Getting Started

'statement_to_csv' depends on a 'credentials.json' file which is not included in this repository for security reasons. This code does not need to be run as the results are stored in 'csv_files'
'region_finder' can be ran from the home directory
'tfidf_analysis' needs to be ran from the tfidf directory, and 'vader_sentiment' needs to be run from the sentiment directory

Dependencies

This repository deploys 'pandas', 'os', 'vaderSentiment', 'sklearn', 'numpy', 'altair', 'csv', 'nltk', 'sklearn', and the 'googleapiclient' packages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Digital Ethnic Futures Lab - SCOTUS College Statement Text Analyis

Description

Getting Started

Dependencies

Files

README.md

Latest commit

History

README.md

File metadata and controls

Digital Ethnic Futures Lab - SCOTUS College Statement Text Analyis

Description

Getting Started

Dependencies