Skip to content

mipayne/story-corpus-project

Repository files navigation

Project: -tokenize, pos_tag, and lemmatize words from children's books -record the average difficulty of the words in each book

Run book_values.py

File_functions: -book_values: contains calculate_book_values function -uses inputs from modify_dictionary_from_excel, importing_txt_files,*_modify -contains code for importing excel files -print statements at end of script for each book print: -title -total points awarded -total words(not including stopwords) -total words(including stopwords) -difficulty (total points awarded / total words (not including stopwords)) -difficulty (total points awarded/ total words (including stopwords))

resources: contains *.xlsx files, and StoryCorpus

modify_dictionary_from_excel.py: converts excel sheets to dictionaries and contains functions to find non-alphanumeric words and create custom_stopwords list

importing_txt_files.py: tokenize, pos_tag, and lemmatize words from StoryCorpus files

fixing_strings (final_words_modifier): concatenates halves of contractions into wholes

punctuation_modify (final_words_modifier): removes punctuation from long strings, removes short strings with punctuation

numeral_word_modify (final_words_modifier): changes arabic numeral strings to their equivalent word

propnoun_modify (final_words_modifier): removes proper nouns from the final words list

stopwords_modify (final_words_modifier): removes stopwords from the final words list

find_propernouns: NOT USED IN BOOK_VALUES. finds propernouns

word_search: NOT USED IN BOOK_VALUES. contains framework for searching for words from books in All and Easy dictionaries by mapping pos_tag to specific keys

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages