Skip to content

roopikarisam/digitalethnicfutureslab

Repository files navigation

Digital Ethnic Futures Lab - SCOTUS College Statement Text Analyis

Description

This repository contains multiple programs intended to analyze the statements released by select colleges on SCOTUS's ruling on affirmative action.

  • 'statement_to_csv.py' utilizes the Google Sheets API to read in data from a column and transform its' contents into individual csv files, stored in folder 'csv_files'
  • 'region_finder.py' tags region information and campus size using data from folder 'data_directory' for specified colleges and transforms it into a csv file 'locations_results.csv'
  • the 'tfidf' directory contains programs intended to perform term frequency inverse document frequency analysis on our corpus, while the 'sentiment' directory contains programs intended to perform sentiment analysis
  • the 'ngram' directory performs n-gram analysis on the corpus of text files. it defines functions to preprocess text, tokenize them, then find top n-grams. it also contains functions to compare ngrams
  • the 'response_comparison' directory contains programs intended to compare the similarity between different responses as well as between the responses and a GPT generated response using Jaccard similarity comparison and cosine similarity
  • 'word_analysis.py' calculates average word count, lexical diversity, and most frequent words for each response in the corpus and outputs it into 'word_analysis_results.csv'

Getting Started

  • 'statement_to_csv' depends on a 'credentials.json' file which is not included in this repository for security reasons. This code does not need to be run as the results are stored in 'csv_files'

  • 'region_finder' can be ran from the home directory

  • 'tfidf_analysis' needs to be ran from the tfidf directory, and 'vader_sentiment' needs to be run from the sentiment directory

Dependencies

  • This repository deploys 'pandas', 'os', 'vaderSentiment', 'sklearn', 'numpy', 'altair', 'csv', 'nltk', 'sklearn', and the 'googleapiclient' packages.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published