GitHub - shizheng-rlfresh/slm-rag: A very simple sveltekit template of chat with your own fine-tuned transformers models

This repo demos a simple template of using transformers.js, LangChain.js, and Deep Chat to create a demo chat.

download project

# clone git repository
git clone https://github.com/shizheng-rlfresh/slm-rag.git
# go to the directory and install dependency
npm install

development mode

# start a serve on localhost
npm run dev -- --open

# build app
npm run build

# preview
npm run preview

model options

To import transformers models through transformers.js, you will need a .onnx model, e.g., model.onnx (preferrably a quantized model, e.g., model_quantized.onnx).
- transformers.js recommends using the conversion script to convert a customized model, e.g., your own pretrained or fine-tuned models.
```
# create a python virtual environment
python -m venv .venv
# activate .venv and install required packages
source .venv/bin/activate
pip install -r requirements.txt
# run the conversion script - <modelid>
python -m scripts.convert --quantize --model_id <modelid>
```
- Push your custom model to hub and craft your huggingface repo files structures as follows, where your converted models are enclosed in onnx directory.
- In this demo, we used a custom gpt2-small (124MM parms) fine-tuned on a conversational dataset, i.e., oasst2. This model was fine-tuned on a NVIDIA Tesla T4 GPU for 20 epochs.
```
// import model from HuggingFace Hub
import { pipeline } from '@xenova/transformers';
// for CuteChat Demo, we used our own model
const pipe = await pipeline('text-generation', 'shi-zheng-qxhs/gpt2_oasst2_curated_onnx');
```
```
// for PDF RAG Demo, we used 'Qwen1.5-0.5B-Chat'
// Note this could be a bit slow running on web
const pipe = await pipeline('text-generation', 'Xenova/Qwen1.5-0.5B-Chat');
```
You can either use pipeline or model.generate as if using transformers in python. In chat.js, we used custom functions to process the user input and model generations, which can be modified based on your own need.
- Deep Chat allows using handler in request to use models imported directly from transformers.js. chat.svelte shows an example of how we handled custom functions, as well as using requestInterceptor and responseInterceptor to process the (user) input and (model generated) output.
langchain.js does not support LLM through transformers.js as of now (and there are open issues on custom LLMs. It is not hard to implement a custom LLM). In this demo code, we chose to use vectorStore from langchain.js (see ragloader.js) and pipeline from transformers'js (see ragchat.js).
- RAG component is implemented rag.svelte with Deep Chat.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
src		src
static		static
.eslintignore		.eslintignore
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE.md		LICENSE.md
README.md		README.md
jsconfig.json		jsconfig.json
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
svelte.config.js		svelte.config.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

download project

development mode

model options

About

Releases

Packages

Languages

License

shizheng-rlfresh/slm-rag

Folders and files

Latest commit

History

Repository files navigation

download project

development mode

model options

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages