Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite existing file issue #388

Closed
natural-chop opened this issue May 17, 2019 · 21 comments
Closed

Rewrite existing file issue #388

natural-chop opened this issue May 17, 2019 · 21 comments

Comments

@natural-chop
Copy link

Hello! I'm using annoy 1.15.2 on python 3.7.3. Here is the issue:

>>>from annoy import AnnoyIndex
>>>a=AnnoyIndex(1)
>>>a.add_item(1,[1])
>>>a.add_item(2,[2])
>>>a.build(1)
True
>>>a.save("temp.ann")
True
>>>b=AnnoyIndex(1)
>>>b.add_item(3,[3])
>>>b.add_item(4,[4])
>>>b.build(1)
True
>>>b.save("temp.ann")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
OSError: [Errno 22] Invalid argument

Could you tell why I get the error?

@erikbern
Copy link
Collaborator

which operating system are you using?

@natural-chop
Copy link
Author

which operating system are you using?

Windows 10

@erikbern
Copy link
Collaborator

ok, i think the problem is that windows doesn't let you write to open files. temp.ann is open by index a

@sskorol
Copy link

sskorol commented Dec 12, 2019

@erikbern hi, can you clarify your point? I have the same issue while loading and building a model. I always see this OSError. First, I thought it's something related to Win paths' specifics. But I tried all the potential options with absolute/relative paths, slashes/backslashes, etc. No luck.

Env: Win10, python 3.7, miniconda

Also, note that file's size is 7Gb. And the same code perfectly works on MacOS.

@erikbern
Copy link
Collaborator

interesting. i've heard people report various issues with file sizes above a few gigabytes. let me see if i can add a test case this weekend and fix

@sskorol
Copy link

sskorol commented Dec 12, 2019

@erikbern just noticed the following line in logs, when such error occurs:
lseek returned -1

@erikbern
Copy link
Collaborator

interesting, will take a look

@erikbern
Copy link
Collaborator

@sskorol do you see any error message? there should have been an exception raised back to Python if lseek fails. See https://github.com/spotify/annoy/blob/master/src/annoylib.h#L1049

@sskorol
Copy link

sskorol commented Dec 12, 2019

@erikbern just the same stack as above:

lseek returned -1
Traceback (most recent call last):
  File "D:/ML/model.py", line 200, in <module>
    associations = Associations()
  File "D:/ML/model.py", line 187, in __init__
    self.ranker = AnnoyRanker()
  File "D:/ML/model.py", line 129, in __init__
    super().__init__()
  File "D:/ML/model.py", line 77, in __init__
    self.load_annoy_vectors()
  File "D:/ML/model.py", line 98, in load_annoy_vectors
    self.annoy.load(self.annoy_vectors_path)
OSError: Invalid argument

And the path is 100% valid as I'm reaching annoy loader:

    def load_annoy_vectors(self):
        if os.path.exists(self.annoy_vectors_path):
            self.annoy.load(self.annoy_vectors_path)
        else:
            raise FileNotFoundError

@erikbern
Copy link
Collaborator

i think the "invalid argument" probably refers to the offset argument to lseek

just double checking but you said this is on windows right? there's probably some 32/64 bit issue with large files

@sskorol
Copy link

sskorol commented Dec 12, 2019

@erikbern yes, Win 10 x64.

@sskorol
Copy link

sskorol commented Dec 17, 2019

@erikbern hi, did you have a chance to take a look at it?

@erikbern
Copy link
Collaborator

haven't had a time to look at it. feel free to look at it yourself. should be easy to write a failing unit test for files > 2GB

@erikbern erikbern reopened this Dec 17, 2019
@sskorol
Copy link

sskorol commented Dec 17, 2019

@erikbern a simple test that fails:

    def should_load_big_file_after_save(self):
        f = 250
        t = 10

        m = AnnoyIndex(f, 'angular')
        m.verbose(True)

        for i in range(2000000):
            v = [random.gauss(0, 1) for z in range(f)]
            m.add_item(i, v)

        m.build(t)
        m.save('test_big.annoy')

        self.assertEquals(m.get_n_trees(), t)

Just a note: a file itself is successfully saved (~2Gb+).

The problem appears on autoloading after saving: lseek fails with the same error.
I found this thread on SO. I'm not a C++ Guru to confirm or reject it. However, I can try to take a look at it.

Is there any instruction on how to build a C++ code in /src?

@erikbern
Copy link
Collaborator

Thanks, that confirms what I suspected. I'll make it into a failing unit test for now. Will be interesting to see on what platforms it fails.

@erikbern
Copy link
Collaborator

Should be fixed now!

@erikbern
Copy link
Collaborator

erikbern commented Dec 18, 2019 via email

@sskorol
Copy link

sskorol commented Dec 18, 2019

@erikbern thanks, will check it tomorrow

@sskorol
Copy link

sskorol commented Dec 19, 2019

@erikbern checked it...
a new test has passed... then I built annoy locally and tried on a pet project... now it works as expected and I don't see those lseek issue anymore... thanks for the quick fix! :)

@erikbern
Copy link
Collaborator

np. was a fairly easy fix. i'll push a new version to pypi before the end of the year

@erikbern
Copy link
Collaborator

fyi this is on pypi now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants