Persimmon-8B User Guide

This repo contains inference code for Persimmon-8B, the new LLM from Adept.

Downloading the Checkpoint

The model checkpoints are stored on our public OCI bucket and can be downloaded using wget. The base model is not fine-tuned and is released under an Apache 2.0 license. The chat model is fine-tuned and is released under a CC-BY-NC 4.0 license.

Base:
https://axtkn4xl5cip.objectstorage.us-phoenix-1.oci.customer-oci.com/n/axtkn4xl5cip/b/adept-public-data/o/8b_base_model_release.tar
md5sum: cd0320cba9efad9ccd18e9ec4d16ae1b

Chat:
https://axtkn4xl5cip.objectstorage.us-phoenix-1.oci.customer-oci.com/n/axtkn4xl5cip/b/adept-public-data/o/8b_chat_model_release.tar
md5sum: 663aeace07269c44e90f4e8bcd07f32a

Untar the model into its own directory via tar -xvf 8b_base_model_release.tar or tar -xvf 8b_chat_model_release.tar

The scripts are set up to expect the model folder to be placed within the code directory, but you can place it elsewhere and modify the scripts accordingly.

Building Docker

Build the docker that will include all the necessary dependencies (and then some!) using the included Dockerfile:

docker build -f docker/Dockerfile -t 'adeptdocker' .

Running Docker

Ensure that the variable MODEL_DIR in run_text_generation_server.sh is set to the location of the model directory. By default it is set to MODEL_DIR=8b_chat_model_release, which is the default name for the chat model. (For the base model, change this line to MODEL_DIR=8b_base_model_release.)

Running sh docker_launch.sh will start a model server that you can query via:

curl '<address of server>/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8' -d '{"prompts": ["human: Hello, how are you?\n\nadept:"], "tokens_to_generate": 128, "top_p": 0.9, "random_seed": 1234, "logprobs": false}'

Notes

The chat model is fine-tuned to expect inputs of the form: human: {prompt}\n\nadept:¹. To ensure best performance from this model, please use this format! You can see an example of this in the curl command above. To automatically wrap single-turn input prompts with this structure, you can modify the definition of megatron/text_generation/api.py::generate_and_post_process so that the default value for the argument process_prompts_for_chat is set to True.
We are releasing the model with tensor parallelism of 1. In this configuration, the model requires an 80GB GPU to run naively. It should be possible to fit the model on a 40GB card by removing the unused embeddings and reducing the maximum sequence length (at the top of run_text_generation_server.py).
Quantization to 8-bit or lower would make also it fit with plenty of room to spare.
We included the .vocab file so you can browse the vocabulary in plain text - this file is otherwise unused.

Citation

If you use this model in your work, please use the following BibTeX citation:

@misc{persimmon-8b,
  author = {Elsen, Erich and Odena, Augustus and Nye, Maxwell and Ta\c{s}\i{}rlar, Sa\u{g}nak and Dao, Tri and Hawthorne, Curtis and Moparthi, Deepak and Somani, Arushi},
  title = {Releasing {Persimmon-8B}},
  url = {https://www.adept.ai/blog/persimmon-8b},
  year = {2023}
}

Subsequent inputs should have the form human: {prompt}\n\nadept: {output}\n\nhuman: {follow_up}\n\nadept: ↩

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docker		docker
megatron		megatron
tool_use		tool_use
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker_launch.sh		docker_launch.sh
prompts.md		prompts.md
requirements.txt		requirements.txt
run_text_generation_server.py		run_text_generation_server.py
run_text_generation_server.sh		run_text_generation_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker

docker

megatron

megatron

tool_use

tool_use

.dockerignore

.dockerignore

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

docker_launch.sh

docker_launch.sh

prompts.md

prompts.md

requirements.txt

requirements.txt

run_text_generation_server.py

run_text_generation_server.py

run_text_generation_server.sh

run_text_generation_server.sh

Repository files navigation

Persimmon-8B User Guide

Downloading the Checkpoint

Building Docker

Running Docker

Notes

Citation

About

Releases

Packages

Languages

License

qqq-tech/adept-inference

Folders and files

Latest commit

History

Repository files navigation

Persimmon-8B User Guide

Downloading the Checkpoint

Building Docker

Running Docker

Notes

Citation

Footnotes

About

Resources

License

Stars

Watchers

Forks

Languages