Chat-Haruhi-Suzumiya

Reviving Anime Character in Reality via Large Language Model

We've just released finetuned ChatHaruhi-Qwen-7B model and code, try here Open In Colab . A detailed test on Harry Potter!

English | Chinese简体中文 | Japanese日本語 | 🤗 Hugging Face | 📜 Paper | 🤗🗃️ 54k Dataset |

Chat-Haruhi-Suzumiyais a language model that imitates the tone, personality and storylines of characters like Haruhi Suzumiya,

The project was developed by Cheng Li, Ziang Leng, Chenxi Yan, Xiaoyang Feng, HaoSheng Wang, Junyi Shen, Hao Wang, Weishi Mi, Aria Fei, Song Yan, Linkang Zhan, Yaokai Jia, Pingyu Wu, and Haozhen Sun,etc.

This is an open source project and the members were recruited from open source communities like DataWhale.

Lulu Li( Cheng Li@SenseTime )initiated the whole project and designed and implemented most of the features.

Ziang Leng( Ziang Leng@SenseTime )designed and implemented the training, data generation and backend architecture for ChatHaruhi 1.0.

Chenxi Yan( Chenxi Yan@Chengdu University of Information Technology )implemented and maintained the backend for ChatHaruhi 1.0.

Junyi Shen( Junyi Shen@Zhejiang University )implemented the training code and participated in generating the training dataset.

Hao Wang( Hao Wang )collected script data for a TV series and participated in data augmentation.

Weishi Mi( Weishi MI@Tsinghua University )participated in data augmentation.

Aria Fei( Aria Fei@BJUT )implemented the ASR feature for the script tool and participated in the Openness-Aware Personality paper project.

Xiaoyang Feng( Xiaoyang Feng@Nanjing Agricultural University )integrated the script recognition tool and participated in the Openness-Aware Personality paper project.

Yue Leng ( Song Yan )Collected data from The Big Bang Theory. Implemented script format conversion.

scixing(HaoSheng Wang)( HaoSheng Wang ) implemented voiceprint recognition in the script tool and tts-vits speech synthesis.

Linkang Zhan( JunityZhan@Case Western Reserve University ) collected Genshin Impact's system prompts and story data.

Yaokai Jia( Yaokai Jia )implemented the Vue frontend and practiced GPU extraction of Bert in a psychology project.

Pingyu Wu( Pingyu Wu@Juncai Shuyun )helped deploy the first version of the training code.

Haozhen Sun( [Haozhen Sun@Tianjin University] )plot the character figures for ChatHaruhi.

Chat-Haruhi-Suzumiya is one of the subproject of Luotuo, which was initiated by Cheng Li, Ziang Leng, and Qiyuan Chen.

This project is a work in progress. With the release of the Arxiv version, we will publish a dataset supporting 32 characters and 52K dialogues, along with the corresponding local model and ChatHaruhi1.0 inference code, within a week. We will then begin refactoring the project for ChatHaruhi2.0.

This project is licensed under Apache 2.0, which permits commercial use. However, you still need to comply with other relevant agreements, including:

The copyright of the character roles themselves.
The terms of any APIs used in the project, such as OpenAI's agreement.
The licenses of any models used in the project (for example, if we later adopt models from LlaMA or GLM, etc).

Quick Start

For English User, suggest you try 95 Eng-Character at first.

To get started with the ChatHaruhi project, you can directly run the following Colab notebooks:

Name	Colab Link	Description
ChatHaruhi2.0(code)		The openAI version of ChatHaruhi2.0 is already running
Qwen-1.8B		Role-Playing with finetuned Qwen-1.8B
ChatHaruhi2.0 Demo		Hugging Face Demo (openai as LLM)
95 Eng-Character		95 English Chracters Adapted From RoleLLM
ChatHaruhi2.0 Demo		Hugging Face Demo (GLMPro as LLM)
ChatHaruhi2.0 Demo		Hugging Face Demo (讯飞星火 as LLM)
Prototype of StoryTeller		Prototype of StoryTeller
Fine-tuning(English)		English Small Model Phi-1.5 Tuning Code
Fine-tuning(Chinese)		Chinese Small Model Qwen-1.8B Tuning
ChatHaruhi1.0		Integrated client that supports character switching

ChatHaruhi 2.0 code can already be installed via pip.

News

[2023-10-20] Added support for 95 English character roles adapted from RoleLLM work, planning to train a LlaMA2 version later. ChatHaruhi 2.0's repository also supports Baidu and BaiChuan's APIs, will launch a HF demo for everyone to try later.

[2023-09-03] ChatHaruhi 2.0 supports role-playing specific characters downloaded from HuggingFace.

[2023-08-29] inference code for ChatGLM2-LoRA released

[2023-08-28] Support for ChatHaruhi2.0 with openAI, Xunfei, and GLMPro has been completed, and the corresponding hugging face demos have been launched.

[2023-06-07] Chat Haruhi Suzumiya won the second prize in the Create@AI Hackathon hosted by the Modelscope Community, co-sponsored by Alibaba Cloud and NVIDIA, and co-organized by Tianchi(top3), video

[2023-06-03] Honored with second prize(top3) and do oral presentation in July 17 for CAAI 8th-Big Data and Social Computing: 8th China National Conference, BDSC 2023, Urumqi, China, July 15–17, 2023 ，for more details link

Demo Video

The VITS model used in the video was generously provided by the Haruhi Suzumiya Support Group. We are still refining the perforamnce. Please note this video contains audio 📢 .

My.Movie540.mp4

Content

ChatHaruhi_2.0_Design
Quick Start of Each Demo
Demo Video
Tutorial Video in Chinese
TODO and Feature
Honors
SponsorShip
Members
Citation

ChatHaruhi2

For convenience of future research, the refactored ChatHaruhi2.0 can now be started via pip. Currently 2.0 removes the design of images and sounds, which will be refactored in our follow-up research. You can install it via the following:

pip -q install transformers openai tiktoken langchain chromadb zhipuai chatharuhi

And call it like this:

from chatharuhi import ChatHaruhi

chatbot = ChatHaruhi(
    role_name = 'haruhi',
    llm = 'openai'
)

response = chatbot.chat(role='阿虚', text='I see the new baseball season is about to start! Should we participate?')

print(response)

Here is the English translation:

ChatHaruhi now supports directly dragging and dropping chatbot databases in our specified format from Hugging Face.

from chatharuhi import ChatHaruhi

chatbot = ChatHaruhi(
  role_from_hf = 'chengli-thu/linghuchong', 
  llm = 'openai')

response = chatbot.chat(role='小师妹', text = '冲哥。')
print(response)

For 95 English characters in RoleLLM, you can call them like this:

chatbot = ChatHaruhi(
  role_from_hf = 'silk-road/ChatHaruhi-from-RoleLLM/Jack-Sparrow',
  llm = 'openai', 
  embedding = 'bge_en')

See more docs and code at https://github.com/LC1332/Haruhi-2-Dev

More documentation and code can be found at https://github.com/LC1332/Haruhi-2-Dev

Quick Start of Each Demo

Name	Colab Link	Description
ChatHaruhi 1.0		A functionally integrated client capable of supporting role switching
Genesis		the first gradio chat developed by Lulu Li
Baidu Studio Version	Baidu Studio Version	A simplified version of Baidu Studio developed by DataWhale teaching assistant - Qijun Ma
HuggingFace Version		HuggingFace Version
personality - College entrance exam essay		College entrance exam essay generator tailored to high or low openness personalities，link
personality-Chatbot		Chatbot corresponding to high/low open personality，link
Chat Megumi		Chat Megumi was created using a corpus collected by community friends.

Previous News

[2023-08-22] Dataset Released on Hugging Face

[2023-08-21] ChatHaruhi tech report on arXiv.

Tutorial Video in Chinese

Video	Description
Roadmap in 5 minutes	AI Hackathon of Modelscope in Bilibi
DataWhale Presentation	Instructional video created for a DataWhale assignment
Script Tool Tutorial	Step-by-step guide to using the yuki_builder scripting tool
Character Data Format Tutorial	Tutorial on the character data format and converting text files to configuration files.
ModelScope Tutorial in 40 minutes	40-tutorial in entry-level, with an additional 40 minutes for discussion and Q&A

TODO and Feature

TODO:

train the model of the original corpus of 22k stories
release technical report on arxiv
release local inference code
release trained model with 52k data
Support local model and OpenAI's ChatHaruhi2.0, update to github
quick install with pip

Honors

🏆 Chat Haruhi Suzumiya won the second prize in the Create@AI Hackathon hosted by the Modelscope Community, co-sponsored by Alibaba Cloud and NVIDIA, and co-organized by Tianchi(top3) video
🏆 Honored with the second prize (top3) and do oral presentation in July 17 for CAAI 8th-Big Data and Social Computing: 8th China National Conference, BDSC 2023, Urumqi, China, July 15–17, 2023 for more details

SponsorShip

Due to Chat Haruhi Suzumiya adopts a strategy similar to CoT, which is 10-20 times more expensive than usual. Currently, API tokens are supported by community donations.

In addition, we are actively looking for GPUs (A100, A800). If you are willing to donate, please contact us. We greatly appreciate any support to help keep Chat Haruhi Suzumiya running.

If you are interested in sponsoring the Luotuo Project, please click on the major project or view the sponsorship form.

Back to top

Contributors

Cheng Li@SenseTime purposed the entire project and designed and implemented most of the functionality.
Ziang Leng@SenseTime designed and implemented the overall training, data generation and backend architecture of ChatHaruhi1.0.
Chenxi Yan@Chengdu University of Information Technology implemented and maintained the backend of ChatHaruhi1.0 version.
Junyi Shen@Zhejiang University implemented the training code and participated in the generation of training dataset.
Hao Wang collected script data from My Own Swordsman and participated in the generation of augmented data.
Weishi MI@Tsinghua University participated in the generation of augmented data.
Aria Fei@BJUT implemented the ASR function of the script tool and participated in the Openness-Aware Personality paper sub-project.
Xiaoyang Feng@Nanjing Agricultural University integrated the functions of the script recognition tool and participated in the Openness-Aware Personality paper sub-project.
Song Yan collected data from The Big Bang Theory. Implemented script format conversion functionality.
HaoSheng Wang implemented voiceprint recognition function in script tool, and tts-vits speech synthesis function.
Linkang Zhan@Case Western Reserve University collected system prompt and story data from Genshin Impact.
Yaokai Jia implemented the Vue version of the frontend, and practiced GPU extraction of Bert in the psychology project.
Pingyu Wu@Juncai Shuyun helped deploy the first version of the training code.
Haozhen Sun@Tianjin University drew the mosaic of ChatHaruhi characters.

Citation

Please cite the repo if you use the data or code in this repo.

@misc{li2023chatharuhi,
      title={ChatHaruhi: Reviving Anime Character in Reality via Large Language Model}, 
      author={Cheng Li and Ziang Leng and Chenxi Yan and Junyi Shen and Hao Wang and Weishi MI and Yaying Fei and Xiaoyang Feng and Song Yan and HaoSheng Wang and Linkang Zhan and Yaokai Jia and Pingyu Wu and Haozhen Sun},
      year={2023},
      eprint={2308.09597},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

If you have any suggestions for the project, such as the interface design of ChatHaruhi2.0, or want to add references to the future version of this report, please submit the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Chat-Haruhi-Suzumiya

Reviving Anime Character in Reality via Large Language Model

English | Chinese简体中文 | Japanese日本語 | 🤗 Hugging Face | 📜 Paper | 🤗🗃️ 54k Dataset |

Quick Start

News

Demo Video

Content

ChatHaruhi2

Quick Start of Each Demo

Previous News

Tutorial Video in Chinese

TODO and Feature

Honors

SponsorShip

Contributors

Citation

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Chat-Haruhi-Suzumiya

Reviving Anime Character in Reality via Large Language Model

English | Chinese简体中文 | Japanese日本語 | 🤗 Hugging Face | 📜 Paper | 🤗🗃️ 54k Dataset |

Quick Start

News

Demo Video

Content

ChatHaruhi2

Quick Start of Each Demo

Previous News

Tutorial Video in Chinese

TODO and Feature

Honors

SponsorShip

Contributors

Citation