This is a transformers library application that allows you to choose a local LLM and run streaming inference on GPU.
it uses:
- Python: 3.8.10
- transformers library: 4.36.2
- transformers_stream_generator library
the models are assumed to be in oogabooga textgeneration ui folder
the openchat model is available at https://huggingface.co/
TheBloke/openchat-3.5-0106-GPTQ
sujitvasanth/TheBloke-openchat-3.5-0106-GPTQ