Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to compile on windows using standard go installation #188

Closed
FairyTail2000 opened this issue Jul 24, 2023 · 21 comments
Closed

Unable to compile on windows using standard go installation #188

FairyTail2000 opened this issue Jul 24, 2023 · 21 comments

Comments

@FairyTail2000
Copy link

FairyTail2000 commented Jul 24, 2023

Steps I followed:

  • I installed the newest GoLang using winget
  • I cloned the repro
  • I executed go build .
  • After initial library download build fails with the error message:

# github.com/jmorganca/ollama/server
server\routes.go:54:20: undefined: llama.New

  • I then checked out tags/v0.0.11
  • Same error
  • Downloaded release zip source
  • Same error
  • Opening in vscode
  • It also shows the error and the import "github.com/jmorganca/ollama/llama" resolves to llama/utils.go

Did I go into the wrong direction at any given point?

@Gregory-Ledray
Copy link

I am not a maintainer but may be able to help.

Other Issues reference getting ollama to run on WSL. Have you tried that?

For compiling, look at: https://github.com/jmorganca/ollama/blob/main/Dockerfile

Notice the line:

RUN CGO_ENABLED=1 go build -ldflags '-linkmode external -extldflags "-static"' .

Try running the Windows equivalent of that command.

@mxyng
Copy link
Contributor

mxyng commented Jul 24, 2023

Hi @FairyTail2000 do you have a C/C++ compiler installed and CGO_ENABLED=1 set? Both are required to compile from source.

@BSChuang
Copy link

BSChuang commented Aug 3, 2023

I ran into the same issue. Installing a C compiler fixed the problem.

@tomzorz
Copy link

tomzorz commented Aug 5, 2023

@BSChuang gcc, or is there a go specific one? I've been trying the commands above but haven't managed to get it running - if anyone would share a batch script that successfully builds/runs that'd be appreciated :)

@FairyTail2000
Copy link
Author

Hi @FairyTail2000 do you have a C/C++ compiler installed and CGO_ENABLED=1 set? Both are required to compile from source.

Hi sorry for not replying, I haven't seen your answer. I have the windows c compiler installed, if I remember correctly. But I will check again on Monday

@FairyTail2000
Copy link
Author

Okay @Gregory-Ledray, @mxyng and @tomzorz following Solution (maybe you can put it into the readme)

  • Install gcc from here winlibs, scroll down to Download, use the topmost download, for me GCC 13.2.0 (with POSIX threads). Extract and add the bin folder to the $PATH
  • In the ollama cloned repro then:
$env:CGO_ENABLED = 1
go build -ldflags '-linkmode external -extldflags "-static"' .

That did the trick for me

@dcasota
Copy link
Contributor

dcasota commented Aug 16, 2023

Hi,

I had the same 'undefined: llama.New' issue and found this thread. The recipe - installing gcc 13.2.0 and specifying ldflags - did help to get one step further. However the build still stops with issues.

PS C:\Users\dcaso\ollama> $env:CGO_ENABLED = 1
PS C:\Users\dcaso\ollama> go build -ldflags '-linkmode external -extldflags "-static"' .
# github.com/jmorganca/ollama/llm
ggml-alloc.c: In function 'ggml_allocr_alloc':
ggml-alloc.c:155:70: warning: unknown conversion type character 'z' in format [-Wformat=]
  155 |         fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
      |                                                                      ^
ggml-alloc.c:155:99: warning: unknown conversion type character 'z' in format [-Wformat=]
  155 |         fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
      |                                                                                                   ^
ggml-alloc.c:155:25: warning: too many arguments for format [-Wformat-extra-args]
  155 |         fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# github.com/jmorganca/ollama/llm
In file included from llama.cpp:35:
llama-util.h: In constructor 'llama_mmap::llama_mmap(llama_file*, bool, bool)':
llama-util.h:303:71: error: 'PWIN32_MEMORY_RANGE_ENTRY' has not been declared
  303 |             BOOL (WINAPI *pPrefetchVirtualMemory) (HANDLE, ULONG_PTR, PWIN32_MEMORY_RANGE_ENTRY, ULONG);
      |                                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~
llama-util.h:310:38: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to 'BOOL (*)(HANDLE, ULONG_PTR, int, ULONG)' {aka 'int (*)(void*, long long unsigned int, int, long unsigned int)'} [-Wcast-function-type]
  310 |             pPrefetchVirtualMemory = reinterpret_cast<decltype(pPrefetchVirtualMemory)> (GetProcAddress(hKernel32, "PrefetchVirtualMemory"));
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llama-util.h:314:17: error: 'WIN32_MEMORY_RANGE_ENTRY' was not declared in this scope
  314 |                 WIN32_MEMORY_RANGE_ENTRY range;
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
llama-util.h:315:17: error: 'range' was not declared in this scope
  315 |                 range.VirtualAddress = addr;
      |                 ^~~~~
PS C:\Users\dcaso\ollama> gcc --version
gcc.exe (MinGW-W64 x86_64-ucrt-posix-seh, built by Brecht Sanders) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

PS C:\Users\dcaso\ollama> go version
go version go1.21.0 windows/amd64

C:\Users\dcaso>ver
Microsoft Windows [Version 10.0.22621.2134]

@FairyTail2000
Copy link
Author

This seems to be a problem with the 0.0.15 tag. Because that's broken for me too.

Use the following command to use 0.0.14:

git checkout tags/v0.0.14

@kbimplis
Copy link

kbimplis commented Aug 18, 2023

I had to edit llm/llama-util.h
and add
`
#ifdef _WIN32

#pragma comment(lib,"kernel32.lib")

typedef struct _WIN32_MEMORY_RANGE_ENTRY {

void* VirtualAddress;

size_t NumberOfBytes;

} WIN32_MEMORY_RANGE_ENTRY, *PWIN32_MEMORY_RANGE_ENTRY;

#endif

`
to make it work along with FairyTail2000's instructions

`
$env:CGO_ENABLED = 1

go build -ldflags '-linkmode external -extldflags "-static"' .

`

(got the idea from ggerganov/llama.cpp#890 )

@valin4tor
Copy link

valin4tor commented Aug 19, 2023

Here's the full updated llm/llama-util.h file, based on @kbimplis's comment and the v0.0.15 tag:

/**
 * llama.cpp - git 3ebb00935f3f0522b75df49c2769ab1774b91380
 *
 * MIT License
 *
 * Copyright (c) 2023 Georgi Gerganov
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */

// Internal header to be included only by llama.cpp.
// Contains wrappers around OS interfaces.

#ifndef LLAMA_UTIL_H
#define LLAMA_UTIL_H

#include <cstdio>
#include <cstdint>
#include <cerrno>
#include <cstring>
#include <cstdarg>
#include <cstdlib>
#include <climits>

#include <string>
#include <vector>
#include <stdexcept>

#ifdef __has_include
    #if __has_include(<unistd.h>)
        #include <unistd.h>
        #if defined(_POSIX_MAPPED_FILES)
            #include <sys/mman.h>
        #endif
        #if defined(_POSIX_MEMLOCK_RANGE)
            #include <sys/resource.h>
        #endif
    #endif
#endif

#if defined(_WIN32)
    #define WIN32_LEAN_AND_MEAN
    #ifndef NOMINMAX
        #define NOMINMAX
    #endif
    #include <windows.h>
    #include <io.h>
    #include <stdio.h> // for _fseeki64

    #pragma comment(lib,"kernel32.lib")
    typedef struct _WIN32_MEMORY_RANGE_ENTRY {
        void* VirtualAddress;
        size_t NumberOfBytes;
    } WIN32_MEMORY_RANGE_ENTRY, *PWIN32_MEMORY_RANGE_ENTRY;
#endif

#define LLAMA_ASSERT(x) \
    do { \
        if (!(x)) { \
            fprintf(stderr, "LLAMA_ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \
            abort(); \
        } \
    } while (0)

#ifdef __GNUC__
#ifdef __MINGW32__
__attribute__((format(gnu_printf, 1, 2)))
#else
__attribute__((format(printf, 1, 2)))
#endif
#endif
static std::string format(const char * fmt, ...) {
    va_list ap, ap2;
    va_start(ap, fmt);
    va_copy(ap2, ap);
    int size = vsnprintf(NULL, 0, fmt, ap);
    LLAMA_ASSERT(size >= 0 && size < INT_MAX);
    std::vector<char> buf(size + 1);
    int size2 = vsnprintf(buf.data(), size + 1, fmt, ap2);
    LLAMA_ASSERT(size2 == size);
    va_end(ap2);
    va_end(ap);
    return std::string(buf.data(), size);
}

struct llama_file {
    // use FILE * so we don't have to re-open the file to mmap
    FILE * fp;
    size_t size;

    llama_file(const char * fname, const char * mode) {
        fp = std::fopen(fname, mode);
        if (fp == NULL) {
            throw std::runtime_error(format("failed to open %s: %s", fname, strerror(errno)));
        }
        seek(0, SEEK_END);
        size = tell();
        seek(0, SEEK_SET);
    }

    size_t tell() const {
#ifdef _WIN32
        __int64 ret = _ftelli64(fp);
#else
        long ret = std::ftell(fp);
#endif
        LLAMA_ASSERT(ret != -1); // this really shouldn't fail
        return (size_t) ret;
    }

    void seek(size_t offset, int whence) {
#ifdef _WIN32
        int ret = _fseeki64(fp, (__int64) offset, whence);
#else
        int ret = std::fseek(fp, (long) offset, whence);
#endif
        LLAMA_ASSERT(ret == 0); // same
    }

    void read_raw(void * ptr, size_t len) const {
        if (len == 0) {
            return;
        }
        errno = 0;
        std::size_t ret = std::fread(ptr, len, 1, fp);
        if (ferror(fp)) {
            throw std::runtime_error(format("read error: %s", strerror(errno)));
        }
        if (ret != 1) {
            throw std::runtime_error(std::string("unexpectedly reached end of file"));
        }
    }

    std::uint32_t read_u32() {
        std::uint32_t ret;
        read_raw(&ret, sizeof(ret));
        return ret;
    }

    std::string read_string(std::uint32_t len) {
        std::vector<char> chars(len);
        read_raw(chars.data(), len);
        return std::string(chars.data(), len);
    }

    void write_raw(const void * ptr, size_t len) const {
        if (len == 0) {
            return;
        }
        errno = 0;
        size_t ret = std::fwrite(ptr, len, 1, fp);
        if (ret != 1) {
            throw std::runtime_error(format("write error: %s", strerror(errno)));
        }
    }

    void write_u32(std::uint32_t val) {
        write_raw(&val, sizeof(val));
    }

    ~llama_file() {
        if (fp) {
            std::fclose(fp);
        }
    }
};

// llama_context_data
struct llama_data_context {
    virtual void write(const void * src, size_t size) = 0;
    virtual size_t get_size_written() = 0;
    virtual ~llama_data_context() = default;
};

struct llama_data_buffer_context : llama_data_context {
    uint8_t* ptr;
    size_t size_written = 0;

    llama_data_buffer_context(uint8_t * p) : ptr(p) {}

    void write(const void * src, size_t size) override {
        memcpy(ptr, src, size);
        ptr += size;
        size_written += size;
    }

    size_t get_size_written() override {
        return size_written;
    }
};

struct llama_data_file_context : llama_data_context {
    llama_file* file;
    size_t size_written = 0;

    llama_data_file_context(llama_file * f) : file(f) {}

    void write(const void * src, size_t size) override {
        file->write_raw(src, size);
        size_written += size;
    }

    size_t get_size_written() override {
        return size_written;
    }
};

#if defined(_WIN32)
static std::string llama_format_win_err(DWORD err) {
    LPSTR buf;
    size_t size = FormatMessageA(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
                                 NULL, err, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPSTR)&buf, 0, NULL);
    if (!size) {
        return "FormatMessageA failed";
    }
    std::string ret(buf, size);
    LocalFree(buf);
    return ret;
}
#endif

struct llama_mmap {
    void * addr;
    size_t size;

    llama_mmap(const llama_mmap &) = delete;

#ifdef _POSIX_MAPPED_FILES
    static constexpr bool SUPPORTED = true;

    llama_mmap(struct llama_file * file, size_t prefetch = (size_t) -1 /* -1 = max value */, bool numa = false) {
        size = file->size;
        int fd = fileno(file->fp);
        int flags = MAP_SHARED;
        // prefetch/readahead impairs performance on NUMA systems
        if (numa) { prefetch = 0; }
#ifdef __linux__
        if (prefetch >= file->size) { flags |= MAP_POPULATE; }
#endif
        addr = mmap(NULL, file->size, PROT_READ, flags, fd, 0);
        if (addr == MAP_FAILED) {
            throw std::runtime_error(format("mmap failed: %s", strerror(errno)));
        }

        if (prefetch > 0) {
            // Advise the kernel to preload the mapped memory
            if (madvise(addr, std::min(file->size, prefetch), MADV_WILLNEED)) {
                fprintf(stderr, "warning: madvise(.., MADV_WILLNEED) failed: %s\n",
                        strerror(errno));
            }
        }
        if (numa) {
            // advise the kernel not to use readahead
            // (because the next page might not belong on the same node)
            if (madvise(addr, file->size, MADV_RANDOM)) {
                fprintf(stderr, "warning: madvise(.., MADV_RANDOM) failed: %s\n",
                        strerror(errno));
            }
        }
    }

    ~llama_mmap() {
        munmap(addr, size);
    }
#elif defined(_WIN32)
    static constexpr bool SUPPORTED = true;

    llama_mmap(struct llama_file * file, bool prefetch = true, bool numa = false) {
        (void) numa;

        size = file->size;

        HANDLE hFile = (HANDLE) _get_osfhandle(_fileno(file->fp));

        HANDLE hMapping = CreateFileMappingA(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
        DWORD error = GetLastError();

        if (hMapping == NULL) {
            throw std::runtime_error(format("CreateFileMappingA failed: %s", llama_format_win_err(error).c_str()));
        }

        addr = MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0);
        error = GetLastError();
        CloseHandle(hMapping);

        if (addr == NULL) {
            throw std::runtime_error(format("MapViewOfFile failed: %s", llama_format_win_err(error).c_str()));
        }

        if (prefetch) {
            // The PrefetchVirtualMemory API is only present on Windows 8 and above, so we
            // will dynamically load it using GetProcAddress.
            BOOL (WINAPI *pPrefetchVirtualMemory) (HANDLE, ULONG_PTR, PWIN32_MEMORY_RANGE_ENTRY, ULONG);
            HMODULE hKernel32;

            // This call is guaranteed to succeed.
            hKernel32 = GetModuleHandleW(L"kernel32.dll");

            // This call may fail if on a pre-Win8 system.
            pPrefetchVirtualMemory = reinterpret_cast<decltype(pPrefetchVirtualMemory)> (GetProcAddress(hKernel32, "PrefetchVirtualMemory"));

            if (pPrefetchVirtualMemory) {
                // Advise the kernel to preload the mapped memory.
                WIN32_MEMORY_RANGE_ENTRY range;
                range.VirtualAddress = addr;
                range.NumberOfBytes = (SIZE_T)size;
                if (!pPrefetchVirtualMemory(GetCurrentProcess(), 1, &range, 0)) {
                    fprintf(stderr, "warning: PrefetchVirtualMemory failed: %s\n",
                            llama_format_win_err(GetLastError()).c_str());
                }
            }
        }
    }

    ~llama_mmap() {
        if (!UnmapViewOfFile(addr)) {
            fprintf(stderr, "warning: UnmapViewOfFile failed: %s\n",
                    llama_format_win_err(GetLastError()).c_str());
        }
    }
#else
    static constexpr bool SUPPORTED = false;

    llama_mmap(struct llama_file *, bool prefetch = true, bool numa = false) {
        (void) prefetch;
        (void) numa;

        throw std::runtime_error(std::string("mmap not supported"));
    }
#endif
};

// Represents some region of memory being locked using mlock or VirtualLock;
// will automatically unlock on destruction.
struct llama_mlock {
    void * addr = NULL;
    size_t size = 0;
    bool failed_already = false;

    llama_mlock() {}
    llama_mlock(const llama_mlock &) = delete;

    ~llama_mlock() {
        if (size) {
            raw_unlock(addr, size);
        }
    }

    void init(void * ptr) {
        LLAMA_ASSERT(addr == NULL && size == 0);
        addr = ptr;
    }

    void grow_to(size_t target_size) {
        LLAMA_ASSERT(addr);
        if (failed_already) {
            return;
        }
        size_t granularity = lock_granularity();
        target_size = (target_size + granularity - 1) & ~(granularity - 1);
        if (target_size > size) {
            if (raw_lock((uint8_t *) addr + size, target_size - size)) {
                size = target_size;
            } else {
                failed_already = true;
            }
        }
    }

#ifdef _POSIX_MEMLOCK_RANGE
    static constexpr bool SUPPORTED = true;

    size_t lock_granularity() {
        return (size_t) sysconf(_SC_PAGESIZE);
    }

    #ifdef __APPLE__
        #define MLOCK_SUGGESTION \
            "Try increasing the sysctl values 'vm.user_wire_limit' and 'vm.global_user_wire_limit' and/or " \
            "decreasing 'vm.global_no_user_wire_amount'.  Also try increasing RLIMIT_MLOCK (ulimit -l).\n"
    #else
        #define MLOCK_SUGGESTION \
            "Try increasing RLIMIT_MLOCK ('ulimit -l' as root).\n"
    #endif

    bool raw_lock(const void * addr, size_t size) {
        if (!mlock(addr, size)) {
            return true;
        } else {
            char* errmsg = std::strerror(errno);
            bool suggest = (errno == ENOMEM);

            // Check if the resource limit is fine after all
            struct rlimit lock_limit;
            if (suggest && getrlimit(RLIMIT_MEMLOCK, &lock_limit))
                suggest = false;
            if (suggest && (lock_limit.rlim_max > lock_limit.rlim_cur + size))
                suggest = false;

            fprintf(stderr, "warning: failed to mlock %zu-byte buffer (after previously locking %zu bytes): %s\n%s",
                    size, this->size, errmsg, suggest ? MLOCK_SUGGESTION : "");
            return false;
        }
    }

    #undef MLOCK_SUGGESTION

    void raw_unlock(void * addr, size_t size) {
        if (munlock(addr, size)) {
            fprintf(stderr, "warning: failed to munlock buffer: %s\n", std::strerror(errno));
        }
    }
#elif defined(_WIN32)
    static constexpr bool SUPPORTED = true;

    size_t lock_granularity() {
        SYSTEM_INFO si;
        GetSystemInfo(&si);
        return (size_t) si.dwPageSize;
    }

    bool raw_lock(void * ptr, size_t len) {
        for (int tries = 1; ; tries++) {
            if (VirtualLock(ptr, len)) {
                return true;
            }
            if (tries == 2) {
                fprintf(stderr, "warning: failed to VirtualLock %zu-byte buffer (after previously locking %zu bytes): %s\n",
                    len, size, llama_format_win_err(GetLastError()).c_str());
                return false;
            }

            // It failed but this was only the first try; increase the working
            // set size and try again.
            SIZE_T min_ws_size, max_ws_size;
            if (!GetProcessWorkingSetSize(GetCurrentProcess(), &min_ws_size, &max_ws_size)) {
                fprintf(stderr, "warning: GetProcessWorkingSetSize failed: %s\n",
                        llama_format_win_err(GetLastError()).c_str());
                return false;
            }
            // Per MSDN: "The maximum number of pages that a process can lock
            // is equal to the number of pages in its minimum working set minus
            // a small overhead."
            // Hopefully a megabyte is enough overhead:
            size_t increment = len + 1048576;
            // The minimum must be <= the maximum, so we need to increase both:
            min_ws_size += increment;
            max_ws_size += increment;
            if (!SetProcessWorkingSetSize(GetCurrentProcess(), min_ws_size, max_ws_size)) {
                fprintf(stderr, "warning: SetProcessWorkingSetSize failed: %s\n",
                        llama_format_win_err(GetLastError()).c_str());
                return false;
            }
        }
    }

    void raw_unlock(void * ptr, size_t len) {
        if (!VirtualUnlock(ptr, len)) {
            fprintf(stderr, "warning: failed to VirtualUnlock buffer: %s\n",
                    llama_format_win_err(GetLastError()).c_str());
        }
    }
#else
    static constexpr bool SUPPORTED = false;

    size_t lock_granularity() {
        return (size_t) 65536;
    }

    bool raw_lock(const void * addr, size_t len) {
        fprintf(stderr, "warning: mlock not supported on this system\n");
        return false;
    }

    void raw_unlock(const void * addr, size_t len) {}
#endif
};

// Replacement for std::vector<uint8_t> that doesn't require zero-initialization.
struct llama_buffer {
    uint8_t * addr = NULL;
    size_t size = 0;

    llama_buffer() = default;

    void resize(size_t len) {
#ifdef GGML_USE_METAL
        free(addr);
        int result = posix_memalign((void **) &addr, getpagesize(), len);
        if (result == 0) {
            memset(addr, 0, len);
        }
        else {
            addr = NULL;
        }
#else
        delete[] addr;
        addr = new uint8_t[len];
#endif
        size = len;
    }

    ~llama_buffer() {
#ifdef GGML_USE_METAL
        free(addr);
#else
        delete[] addr;
#endif
        addr = NULL;
    }

    // disable copy and move
    llama_buffer(const llama_buffer&) = delete;
    llama_buffer(llama_buffer&&) = delete;
    llama_buffer& operator=(const llama_buffer&) = delete;
    llama_buffer& operator=(llama_buffer&&) = delete;
};

#ifdef GGML_USE_CUBLAS
#include "ggml-cuda.h"
struct llama_ctx_buffer {
    uint8_t * addr = NULL;
    bool is_cuda;
    size_t size = 0;

    llama_ctx_buffer() = default;

    void resize(size_t size) {
        free();

        addr = (uint8_t *) ggml_cuda_host_malloc(size);
        if (addr) {
            is_cuda = true;
        }
        else {
            // fall back to pageable memory
            addr = new uint8_t[size];
            is_cuda = false;
        }
        this->size = size;
    }

    void free() {
        if (addr) {
            if (is_cuda) {
                ggml_cuda_host_free(addr);
            }
            else {
                delete[] addr;
            }
        }
        addr = NULL;
    }

    ~llama_ctx_buffer() {
        free();
    }

    // disable copy and move
    llama_ctx_buffer(const llama_ctx_buffer&) = delete;
    llama_ctx_buffer(llama_ctx_buffer&&) = delete;
    llama_ctx_buffer& operator=(const llama_ctx_buffer&) = delete;
    llama_ctx_buffer& operator=(llama_ctx_buffer&&) = delete;
};
#else
typedef llama_buffer llama_ctx_buffer;
#endif

#endif

I also needed to install GCC as per @FairyTail2000's comment, but I didn't need to build in any special way. I just ran:

go build .

as described in the docs, without needing to specify any additional environment variables.

@FairyTail2000
Copy link
Author

@valerie-makes yes that's correct, the binary will be bigger however and/or slower since fewer optimizations applied. You can view your build as a "development" build and the "special configuration" is the "production" build

@dcasota
Copy link
Contributor

dcasota commented Aug 23, 2023

fyi on W11, neither the v0.0.14 nor v0.0.15 works. The update version of llama-util.h results in a slightly different issue.

In file included from llama.cpp:35:
llama-util.h:66: warning: ignoring '#pragma comment ' [-Wunknown-pragmas]
   66 | #pragma comment(lib,"kernel32.lib")
      |
llama-util.h: In constructor 'llama_mmap::llama_mmap(llama_file*, bool, bool)':
llama-util.h:316:38: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to 'BOOL (*)(HANDLE, ULONG_PTR, PWIN32_MEMORY_RANGE_ENTRY, ULONG)' {aka 'int (*)(void*, long long unsigned int, _WIN32_MEMORY_RANGE_ENTRY*, long unsigned int)'} [-Wcast-function-type]
  316 |             pPrefetchVirtualMemory = reinterpret_cast<decltype(pPrefetchVirtualMemory)> (GetProcAddress(hKernel32, "PrefetchVirtualMemory"));

[...]

@valin4tor
Copy link

@dcasota it does work for me in Windows 11 following the code changes. Also, the output you've included doesn't show the error, only a warning which doesn't prevent compilation - maybe check to see if an error is occurring and edit your comment?

@dcasota
Copy link
Contributor

dcasota commented Aug 25, 2023

Hi @valerie-makes , good to know that make build works on W11 as well.
Not sure why, but in my env W11 + Python3.9 (or 3.11) + go1.21.0, the go build does not work yet.
Output.txt

@FairyTail2000
Copy link
Author

@dcasota the second output looks fine. If you now type ls or Get-ChildItem you will see an ollama.exe created.
This does not prevent compilation:
warning

This does:
error

@dcasota
Copy link
Contributor

dcasota commented Aug 28, 2023

@FairyTail2000 yes, true - I apologize - the go build does not work yet was unclear.

The executable has been created, indeed, but the setup to use afterwards ollama successfully does not work yet.

Here the install script used, with a few prerequisites.
script.txt

edited, August 29th 2023 remarks: (attached script above has been modified, too)

Different setup behavior of pip install ollama. The actual site package ollama-0.0.9 creates the directory .ollama as needed.
This is not the case with a git clone setup.
Suggested workaround @FairyTail2000 , create the .ollama directory manually.

There seems to be an issue when doing a cleanup of ollama and .ollama directories.
I get
"Couldn't find 'C:\Users\dcaso.ollama\id_ed25519'. Generating new private key.
Error: open C:\Users\dcaso.ollama\id_ed25519: The system cannot find the path specified."

I haven't found out yet how to fix this.

In addition, the windows setup does not use an existing gpu. Of course, this is not the target of this issue content.
With the recipe from @valerie-makes (thanks!), the setup works - with the warnings known - and without any cleanup.

Accordingly to #259, modifying ollama/api/types.go with MainGPU: 0and NumGPU: 8
and ollama/llm/llama.go with

#cgo opencl CFLAGS: -DGGML_USE_CLBLAST 
#cgo opencl CPPFLAGS: -DGGML_USE_CLBLAST 
#cgo opencl LDFLAGS: -lOpenCL -lclblast

should be enough to rerun afterwards go build with opencl to make use of an existing gpu.

go build --tags opencl -ldflags '-linkmode external -extldflags \"-static\"' .

These are the actual findings.

@FairyTail2000
Copy link
Author

@dcasota you need to manually create a .ollama folder in your users directory. Easiest would be to open a terminal and type "mkdir .ollama" or use the explorer in your home dir to create the directory

@Sanyin18
Copy link

Sanyin18 commented Sep 1, 2023

Hi, 你好

I had the same 'undefined: llama.New' issue and found this thread. The recipe - installing gcc 13.2.0 and specifying ldflags - did help to get one step further. However the build still stops with issues.我有同样的“未定义:美洲驼。新“问题并找到此线程。配方 - 安装 gcc 13.2.0 并指定 ldflags - 确实有助于更进一步。但是,构建仍然因问题而停止。

PS C:\Users\dcaso\ollama> $env:CGO_ENABLED = 1
PS C:\Users\dcaso\ollama> go build -ldflags '-linkmode external -extldflags "-static"' .
# github.com/jmorganca/ollama/llm
ggml-alloc.c: In function 'ggml_allocr_alloc':
ggml-alloc.c:155:70: warning: unknown conversion type character 'z' in format [-Wformat=]
  155 |         fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
      |                                                                      ^
ggml-alloc.c:155:99: warning: unknown conversion type character 'z' in format [-Wformat=]
  155 |         fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
      |                                                                                                   ^
ggml-alloc.c:155:25: warning: too many arguments for format [-Wformat-extra-args]
  155 |         fprintf(stderr, "%s: not enough space in the buffer (needed %zu, largest block available %zu)\n",
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# github.com/jmorganca/ollama/llm
In file included from llama.cpp:35:
llama-util.h: In constructor 'llama_mmap::llama_mmap(llama_file*, bool, bool)':
llama-util.h:303:71: error: 'PWIN32_MEMORY_RANGE_ENTRY' has not been declared
  303 |             BOOL (WINAPI *pPrefetchVirtualMemory) (HANDLE, ULONG_PTR, PWIN32_MEMORY_RANGE_ENTRY, ULONG);
      |                                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~
llama-util.h:310:38: warning: cast between incompatible function types from 'FARPROC' {aka 'long long int (*)()'} to 'BOOL (*)(HANDLE, ULONG_PTR, int, ULONG)' {aka 'int (*)(void*, long long unsigned int, int, long unsigned int)'} [-Wcast-function-type]
  310 |             pPrefetchVirtualMemory = reinterpret_cast<decltype(pPrefetchVirtualMemory)> (GetProcAddress(hKernel32, "PrefetchVirtualMemory"));
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llama-util.h:314:17: error: 'WIN32_MEMORY_RANGE_ENTRY' was not declared in this scope
  314 |                 WIN32_MEMORY_RANGE_ENTRY range;
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
llama-util.h:315:17: error: 'range' was not declared in this scope
  315 |                 range.VirtualAddress = addr;
      |                 ^~~~~
PS C:\Users\dcaso\ollama> gcc --version
gcc.exe (MinGW-W64 x86_64-ucrt-posix-seh, built by Brecht Sanders) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

PS C:\Users\dcaso\ollama> go version
go version go1.21.0 windows/amd64

C:\Users\dcaso>ver
Microsoft Windows [Version 10.0.22621.2134]

Have you solved this problem now?

@dcasota
Copy link
Contributor

dcasota commented Sep 1, 2023

Hi @Sanyin18,

ollama.exe in my lab has been buildable, if

  • using python 3.9
  • using ollama version v0.0.14 (git clone -b v0.0.14)
  • patching llama-util.h with the version from @valerie-makes

Buildable means:

  • still warnings
  • no gpu support
  • not all examples are executable (python version dependency of packages, packages for windows)

v0.0.17 for instance wasn‘t buildable.

Have to be patient. The authors do a great job, and they communicate fast and accurate. That‘s a big plus for community beginners like me.

Hope this helps.
Daniel

@jmorganca
Copy link
Member

Hi folks! We've recently updated how Ollama is built and it seems to build okay on Windows in our "lab" :). Note: GPU support is still work a in progress, but we're on it. We've recently fixed quite a few build and other minor issues with building on Windows, so it's worth a try again if you're looking to hack on Ollama.

The easiest way to get started right now would be:

Then:

go generate ./...
go build .
./ollama.exe

Will close this for now but do please re-open (and @me!) if you're still having issues.

@Shiftrdw
Copy link

Hi folks! We've recently updated how Ollama is built and it seems to build okay on Windows in our "lab" :). Note: GPU support is still work a in progress, but we're on it. We've recently fixed quite a few build and other minor issues with building on Windows, so it's worth a try again if you're looking to hack on Ollama.

The easiest way to get started right now would be:

Then:

go generate ./...
go build .
./ollama.exe

Will close this for now but do please re-open (and @me!) if you're still having issues.

@jmorganca.

This worked well for me on W10. Just have to add a .ollama folder manually after running build . on your home dir,

mkdir .ollama

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests