file compressor #144

gvwilson · 2023-05-08T13:10:29Z

No description provided.

rkern · 2023-07-13T18:53:58Z

I wonder if byte-pair encoding would be an interesting algorithm to implement in this chapter. I suspect it's probably right-sized for implementing in a book chapter. While it's not a state-of-the-art compressor today, it is SOTA for NLP tokenization used in LLMs like the GPTs. That offers an opportunity to talk about some relevant topics in software engineering ethics using the implemented compressor as a demonstration.

For example, pretraining the compression dictionary on the English version of SDXJS probably handles the English SDXPY pretty reasonably. It probably does less well, but okay on Shakespeare, and probably terribly on Atukagawa Ryūnosuke. As we engineer our tools to be more data-driven, availability biases in how we obtain the data to build those tools have consequences that we need to think about.

gvwilson added this to the v1 milestone May 8, 2023

gvwilson self-assigned this May 8, 2023

gvwilson transferred this issue from another repository Jul 6, 2023

gvwilson added to-add Add something new in-content Lesson content labels Jul 6, 2023

gvwilson removed their assignment Jul 6, 2023

gvwilson modified the milestones: v1, v2 Jul 6, 2023

gvwilson added new-topic Ideas for new chapters and removed to-add Add something new in-content Lesson content labels Jul 6, 2023

gvwilson changed the title ~~add a chapter on compression algorithms~~ file compressor Jul 6, 2023

gvwilson self-assigned this May 26, 2024

gvwilson linked a pull request Jun 9, 2024 that will close this issue

feat: file compressor #280

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file compressor #144

file compressor #144

gvwilson commented May 8, 2023

rkern commented Jul 13, 2023

file compressor #144

file compressor #144

Comments

gvwilson commented May 8, 2023

rkern commented Jul 13, 2023