Update paper

jncraton · Feb 24, 2024 · 9389450 · 9389450
1 parent 8cd62fc
commit 9389450
Show file tree

Hide file tree

Showing 3 changed files with 48 additions and 23 deletions.
diff --git a/makefile b/makefile
@@ -32,7 +32,7 @@ doc:
 	python3 -m pdoc -o doc languagemodels
 
 paper.pdf: paper.md paper.bib
-	pandoc $< --citeproc -o $@
+	pandoc $< --citeproc --pdf-engine=xelatex -o $@
 
 spellcheck:
 	aspell -c --dont-backup readme.md

diff --git a/paper.bib b/paper.bib
@@ -192,3 +192,34 @@ @article{zhao2023survey
   journal={arXiv preprint arXiv:2303.18223},
   year={2023}
 }
+
+@inproceedings{ctranslate2,
+  title={The OpenNMT neural machine translation toolkit: 2020 edition},
+  author={Klein, Guillaume and Hernandez, Fran{\c{c}}ois and Nguyen, Vincent and Senellart, Jean},
+  booktitle={Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)},
+  pages={102--109},
+  year={2020}
+}
+
+@article{lamini-lm,
+  author       = {Minghao Wu and
+                  Abdul Waheed and
+                  Chiyu Zhang and
+                  Muhammad Abdul-Mageed and
+                  Alham Fikri Aji
+                  },
+  title        = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
+  journal      = {CoRR},
+  volume       = {abs/2304.14402},
+  year         = {2023},
+  url          = {https://arxiv.org/abs/2304.14402},
+  eprinttype   = {arXiv},
+  eprint       = {2304.14402}
+}
+
+@article{openchat,
+  title={OpenChat: Advancing Open-source Language Models with Mixed-Quality Data},
+  author={Wang, Guan and Cheng, Sijie and Zhan, Xianyuan and Li, Xiangang and Song, Sen and Liu, Yang},
+  journal={arXiv preprint arXiv:2309.11235},
+  year={2023}
+}
diff --git a/paper.md b/paper.md
@@ -22,58 +22,52 @@ bibliography: paper.bib
 
 # Statement of Need
 
-Large language models are starting to change the way software is designed [@mialon2023augmented]. The development of the transformer [@vaswani2017attention] has led to rapid progress in many NLP and generative tasks [@zhao2023survey; @bert; @gpt2; @gpt3; @t5; @palm; @flan-t5; @bubeck2023sparks]. These models are becoming more powerful as they scale in both parameters [@kaplan2020scaling] and training data [@hoffmann2022training].
+Large language models are having an impact on the way software is designed [@mialon2023augmented]. The development of the transformer [@vaswani2017attention] has led to rapid progress in many NLP and generative tasks [@zhao2023survey; @bert; @gpt2; @gpt3; @t5; @palm; @flan-t5; @bubeck2023sparks]. These models are becoming more powerful as they scale in both parameters [@kaplan2020scaling] and training data [@hoffmann2022training].
 
 Early research suggests that there are many tasks performed by humans that can be transformed by LLMs [@eloundou2023gpts]. For example, large language models trained on code [@codex] are already being used as capable pair programmers via tools such as Microsoft's Copilot. To build with these technologies, students need to understand their capabilities and begin to learn new paradigms for programming.
 
-There are many software tools already available for working with large language models [@hftransformers; @pytorch; @tensorflow; @langchain; @llamacpp; @gpt4all]. While these options serve the needs of software engineers, researchers, and hobbyists, they may not be simple enough for new learners. This package aims to radically lower the barriers to entry for using these tools to solve problems.
+There are many software tools already available for working with large language models [@hftransformers; @pytorch; @tensorflow; @langchain; @llamacpp; @gpt4all]. While these options serve the needs of software engineers, researchers, and hobbyists, they may not be simple enough for new learners. This package aims to lower the barriers to entry for using these tools in an educational context.
 
 \newpage
 
 # Example Usage
 
-This package eliminates boilerplate and configuration options that are meaningless to new learners, and uses basic types and simple functions. Here's an example from a Python REPL session:
+This package eliminates boilerplate and configuration options that create noise for new learners while using only basic types and simple functions. Here's an example from a Python REPL session:
 
 ```python
 >>> import languagemodels as lm
 
->>> lm.complete("She hid in her room until")
-'she was sure she was safe'
+>>> lm.do("Answer the question: What is the capital of France?")
+'Paris.'
 
->>> lm.do("Translate to English: Hola, mundo!")
-'Hello, world!'
-
->>> lm.do("What is the capital of France?")
-'paris'
-
->>> lm.classify("Language models are useful", "positive", "negative")
+>>> lm.do("Classify as positive or negative: I like games",
+...       choices=["positive", "negative"])
 'positive'
 
->>> lm.extract_answer("What color is the ball?", "There is a green ball and a red box")
+>>> lm.extract_answer("What color is the ball?",
+...                   "There is a green ball and a red box")
 'green'
 
 >>> lm.get_wiki('Chemistry')
 'Chemistry is the scientific study...'
 
 >>> lm.store_doc(lm.get_wiki("Python"), "Python")
 >>> lm.store_doc(lm.get_wiki("Javascript"), "Javascript")
->>> lm.get_doc_context("What does it mean for batteries to be included in a language?")
-'Python: It is often described as a "batteries included" language due to its comprehensive standard library...
+>>> lm.get_doc_context("What language is used on the web?")
+'From Javascript document: Javascript engines were...'
 ```
 
 # Features
 
 Despite its simplicity, this package provides a number of building blocks that can be combined to build applications that mimic the architectures of modern software products. Some of the tools included are:
 
-- Text generation via the `complete` function
 - Instruction following with the `do` function
-- Chat-style inference using `chat` function
-- Zero-shot classification with the `classify` function
-- Semantic search via a document store using the `store_doc` and `get_doc_context` functions
+- Zero-shot classification with the `do` function and `choices` parameter
+- Semantic search using the `store_doc` and `get_doc_context` functions
 - Extractive question answering using the `extract_answer` function
 - Basic web retrieval using the `get_wiki` function
 
-The package includes the following features under the hood
+The package includes the following features under the hood:
 
 - Local LLM inference on CPU for broad device support
 - Transparent model caching to allow fast repeated inference without explicit model initialization
@@ -83,9 +77,9 @@ The package includes the following features under the hood
 
 # Implementation
 
-The design of this software package allows its internals to be loosely coupled to the models and inference engines it uses. At the time of writing, rapid progress is being made to speed up inference on consumer hardware, but much of this software is difficult to install and may not work well for all learners.
+The design of this software package allows its interface to be loosely coupled to the models and inference engines it uses. Progress is being made to speed up inference on consumer hardware, and this package seeks to find a balance between inference efficiency, software stability, and broad hardware support.
 
-This package currently uses the Hugging Face Transformers library [@hftransformers], which internally uses PyTorch [@pytorch] for inference. The main model used is a variant of the T5 base model [@t5] that has been fine-tuned to better follow instructions [@flan-t5]. Models that focus on inference efficiency are starting to become available [@llama]. It will be possible to replace the internals of this package with more powerful and efficient models in the future. In addition to simple local inference, it is also possible to provide API keys to the package to allow access to more powerful hosted inference services.
+This package currently uses CTranslate2 [@ctranslate2] for efficient inference on CPU and GPU. The main models used include Flan-T5 [@flan-t5], LaMini-LM [@lamini-lm], and OpenChat [@openchat]. The default models used by this package can be swapped out in future versions to provide improved generation quality.
 
 # Future work