I'm not familiar with windows development, here are just something I wish can help.
please refer to llama.cpp
- Visual Studio Community installed with Desktop C++ Environment selected during installation
- Chocolatey (a package manager for Windows) installed
- CMake installed
- Python 3 installed
- LLaMA models downloaded (dalai can help)
Install Make Open PowerShell as an administrator and run the following command:
choco install make
if python is not installed, you can install python via choco
choco install python
Clone repository using Git or download the repository as a ZIP file and extract it to a directory on your machine.
Use Visual Studio to open llama.cpp directory.
Select "View" and then "Terminal" to open a command prompt within Visual Studio. Type the following commands:
cmake .
make
On the right hand side panel:
right click file quantize.vcxproj -> select build
this output .\Debug\quantize.exe
right click ALL_BUILD.vcxproj -> select build
this output .\Debug\llama.exe
back to the powershell termimal, cd to lldma.cpp directory, suppose LLaMA model
s have been download to models directory
python -m venv venv
.\venv\Scripts\pip.exe install torch torchvision torchaudio sentencepiece numpy
.\venv\Scripts\python.exe convert-pth-to-ggml.py models/7B/ 1
.\Debug\quantize.exe ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
.\Debug\llama.exe -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128