Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhaul the synonym data structure to take up less space #16

Merged
merged 2 commits into from Nov 27, 2018

Conversation

coolbutuseless
Copy link
Collaborator

This is a monster overhaul of the main "words" data structure.

Instead of storing raw words, we split it up and store:

  1. A sorted list of all unique words
  2. Convert each character vector of synonyms into an integer vector (indexing into the list of all words)

By storing integer vectors rather than character strings there is about a 50% reduction in memory usage, and the compressed data is now <5MB.

The downside is that creating the integer vectors from the word lists isn't very fast, and you wouldn't want to do this dynamically.

The upsides:

  • Package is now <5MB
  • The package code consists of just 2 functions i.e. syn() and syns()
  • Removed all the .onLoad() stuff to dynamically load data
  • No longer have to download data at runtime
  • Removed the code to "get" and "parse" the words package (this is now all done off-line in data-raw/download-and-compress-moby.R
  • As a side-effect of all this, Issue syn() and syns() are ill-behaved when a word doesn't exist #15 is now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants