annoy compilation fail #3

Closed
piskvorky opened this Issue Jan 3, 2014 · 11 comments

Comments

Projects
None yet
2 participants
@piskvorky
Contributor

piskvorky commented Jan 3, 2014

Installing annoy from repo on EC2 m3.2xlarge instance (Ubuntu 13.10):

$ uname -a
Linux m3-2xlarge 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1

fails with

$ python setup.py build
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/annoy
copying annoy/__init__.py -> build/lib.linux-x86_64-2.7/annoy
running build_ext
building 'annoy.annoylib' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/src
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/annoylib.cc -o build/temp.linux-x86_64-2.7/src/annoylib.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
src/annoylib.cc: In instantiation of ‘void AnnoyIndexPython<T, Distance>::add_item_py(int, const boost::python::list&) [with T = float; Distance = Angular<float>]’:
src/annoylib.cc:542:30:   required from ‘void expose_methods(boost::python::class_<C>) [with C = AnnoyIndexPython<float, Angular<float> >]’
src/annoylib.cc:555:117:   required from here
src/annoylib.cc:508:25: error: ‘add_item’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
     add_item(item, &w[0]);
                         ^
src/annoylib.cc:508:25: note: declarations in dependent base ‘AnnoyIndex<float, Angular<float> >’ are not found by unqualified lookup
src/annoylib.cc:508:25: note: use ‘this->add_item’ instead
src/annoylib.cc: In instantiation of ‘boost::python::list AnnoyIndexPython<T, Distance>::get_nns_by_vector_py(boost::python::list, size_t) [with T = float; Distance = Angular<float>; size_t = long unsigned int]’:
src/annoylib.cc:548:31:   required from ‘void expose_methods(boost::python::class_<C>) [with C = AnnoyIndexPython<float, Angular<float> >]’
src/annoylib.cc:555:117:   required from here
src/annoylib.cc:523:40: error: ‘get_nns_by_vector’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
     get_nns_by_vector(&w[0], n, &result);
                                        ^
src/annoylib.cc:523:40: note: declarations in dependent base ‘AnnoyIndex<float, Angular<float> >’ are not found by unqualified lookup
src/annoylib.cc:523:40: note: use ‘this->get_nns_by_vector’ instead
src/annoylib.cc: In instantiation of ‘void AnnoyIndexPython<T, Distance>::add_item_py(int, const boost::python::list&) [with T = float; Distance = Euclidean<float>]’:
src/annoylib.cc:542:30:   required from ‘void expose_methods(boost::python::class_<C>) [with C = AnnoyIndexPython<float, Euclidean<float> >]’
src/annoylib.cc:556:121:   required from here
src/annoylib.cc:508:25: error: ‘add_item’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
     add_item(item, &w[0]);
                         ^
src/annoylib.cc:508:25: note: declarations in dependent base ‘AnnoyIndex<float, Euclidean<float> >’ are not found by unqualified lookup
src/annoylib.cc:508:25: note: use ‘this->add_item’ instead
src/annoylib.cc: In instantiation of ‘boost::python::list AnnoyIndexPython<T, Distance>::get_nns_by_vector_py(boost::python::list, size_t) [with T = float; Distance = Euclidean<float>; size_t = long unsigned int]’:
src/annoylib.cc:548:31:   required from ‘void expose_methods(boost::python::class_<C>) [with C = AnnoyIndexPython<float, Euclidean<float> >]’
src/annoylib.cc:556:121:   required from here
src/annoylib.cc:523:40: error: ‘get_nns_by_vector’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
     get_nns_by_vector(&w[0], n, &result);
                                        ^
src/annoylib.cc:523:40: note: declarations in dependent base ‘AnnoyIndex<float, Euclidean<float> >’ are not found by unqualified lookup
src/annoylib.cc:523:40: note: use ‘this->get_nns_by_vector’ instead
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

I tried adding -fpermissive to CFLAGS, which changes the errors into warnings. Then annoy builds and installs, but doesn't work correctly anymore. It returns fewer than requested/nonsense NNs:

======================================================================
FAIL: test_get_nns_by_item (__main__.AngularIndexTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/_annoy_test.py", line 41, in test_get_nns_by_item
    self.assertEquals(i.get_nns_by_item(0, 3), [0,1,2])
AssertionError: Lists differ: [0, 1] != [0, 1, 2]

======================================================================
FAIL: test_large_index (__main__.EuclideanIndexTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/_annoy_test.py", line 129, in test_large_index
    self.assertEquals(i.get_nns_by_item(j, 2), [j, j+1])
AssertionError: Lists differ: [104, 105] != [356, 357]

etc.

Any idea what's going on?

@erikbern

This comment has been minimized.

Show comment
Hide comment
@erikbern

erikbern Jan 3, 2014

Collaborator

Hm, no idea what's going on. Seems like the compilation issue could be fixed by adding "this->" in a couple of places. I will fix that and push a new version.

But it's still weird that the unit tests fail. Maybe it's an alignment problem on a specific processor? I think @a1k0n pointed out at some point there could be issues if the offset of a node isn't on a multiple of 4 or 8 on some architectures. I wonder if removing the "packed" attribute would help.

Collaborator

erikbern commented Jan 3, 2014

Hm, no idea what's going on. Seems like the compilation issue could be fixed by adding "this->" in a couple of places. I will fix that and push a new version.

But it's still weird that the unit tests fail. Maybe it's an alignment problem on a specific processor? I think @a1k0n pointed out at some point there could be issues if the offset of a node isn't on a multiple of 4 or 8 on some architectures. I wonder if removing the "packed" attribute would help.

@erikbern

This comment has been minimized.

Show comment
Hide comment
@erikbern

erikbern Jan 3, 2014

Collaborator

Pushed a fix for the compilation issues, and put the packed structs behind a macro. Would be super helpful if you can try to compile using -DNO_PACKED_STRUCTS and see if the unit tests are still failing.

Collaborator

erikbern commented Jan 3, 2014

Pushed a fix for the compilation issues, and put the packed structs behind a macro. Would be super helpful if you can try to compile using -DNO_PACKED_STRUCTS and see if the unit tests are still failing.

@piskvorky

This comment has been minimized.

Show comment
Hide comment
@piskvorky

piskvorky Jan 3, 2014

Contributor

Compilation OK now, thanks!

But tests still fail, in the same way. I'm thinking these two were unrelated issues.

Not sure where to look for problems. Seems like a big in-your-face failure => easier to debug, but I don't have time for that and Amazon is billing me per hour :(

I can still try some quick debugging if you have any suggestions where to look.

Contributor

piskvorky commented Jan 3, 2014

Compilation OK now, thanks!

But tests still fail, in the same way. I'm thinking these two were unrelated issues.

Not sure where to look for problems. Seems like a big in-your-face failure => easier to debug, but I don't have time for that and Amazon is billing me per hour :(

I can still try some quick debugging if you have any suggestions where to look.

@piskvorky

This comment has been minimized.

Show comment
Hide comment
@piskvorky

piskvorky Jan 3, 2014

Contributor

Background: I decided to run all the shootout tests from scratch, on a "reproducible" machine = EC2.

Annoy was returning rubbish accuracies so I dug down, found out it's nothing to do with the corpus, as the same thing happens with arbitrary data and even with the annoy's unit tests.

Contributor

piskvorky commented Jan 3, 2014

Background: I decided to run all the shootout tests from scratch, on a "reproducible" machine = EC2.

Annoy was returning rubbish accuracies so I dug down, found out it's nothing to do with the corpus, as the same thing happens with arbitrary data and even with the annoy's unit tests.

@erikbern

This comment has been minimized.

Show comment
Hide comment
@erikbern

erikbern Jan 3, 2014

Collaborator

Yeah I'm pretty sure those two issues were unrelated.

I ran it on a micro instance on EC2 and the unit tests succeed so I'm not sure what's going wrong. I might try to set up an beefy EC2 instance and try later. If it's broken I'll try with -DNO_PACKED_STRUCTS to see if that resolves the issue

Collaborator

erikbern commented Jan 3, 2014

Yeah I'm pretty sure those two issues were unrelated.

I ran it on a micro instance on EC2 and the unit tests succeed so I'm not sure what's going wrong. I might try to set up an beefy EC2 instance and try later. If it's broken I'll try with -DNO_PACKED_STRUCTS to see if that resolves the issue

@piskvorky

This comment has been minimized.

Show comment
Hide comment
@piskvorky

piskvorky Jan 3, 2014

Contributor

I forgot to say I tried -DNO_PACKED_STRUCTS as well :)

Contributor

piskvorky commented Jan 3, 2014

I forgot to say I tried -DNO_PACKED_STRUCTS as well :)

@piskvorky

This comment has been minimized.

Show comment
Hide comment
@piskvorky

piskvorky Jan 7, 2014

Contributor

Downgraded gcc and g++ to version 4.6 => all good, tests pass now. No special compile defines needed.

I still don't know what exactly the problem was, but feel free to close this issue if you wish. Annoy works for me again.

Contributor

piskvorky commented Jan 7, 2014

Downgraded gcc and g++ to version 4.6 => all good, tests pass now. No special compile defines needed.

I still don't know what exactly the problem was, but feel free to close this issue if you wish. Annoy works for me again.

@erikbern

This comment has been minimized.

Show comment
Hide comment
@erikbern

erikbern Jan 7, 2014

Collaborator

I assume you didn't use windows right? Looks like something changed in 4.7: http://www.bttr-software.de/forum/mix_entry.php?id=11767

Collaborator

erikbern commented Jan 7, 2014

I assume you didn't use windows right? Looks like something changed in 4.7: http://www.bttr-software.de/forum/mix_entry.php?id=11767

@piskvorky

This comment has been minimized.

Show comment
Hide comment
@piskvorky

piskvorky Jan 7, 2014

Contributor

Yeah, I assumed it must be something compiler-specific, since Annoy has always worked for me with "older" machines (=older package versions). That's why I tried downgrading!

(OS info is in the first post; haven't tried on Windows)

Contributor

piskvorky commented Jan 7, 2014

Yeah, I assumed it must be something compiler-specific, since Annoy has always worked for me with "older" machines (=older package versions). That's why I tried downgrading!

(OS info is in the first post; haven't tried on Windows)

erikbern pushed a commit that referenced this issue Nov 9, 2014

Ubuntu
Replacing an explicit loop with std::copy to solve #3 and #13
This seems to do the trick when I'm running it on an EC2 instance. Seems like GCC 4.8 goes too far optimizing redundant loops and removes the loop (that's the only explanation I can think of).

erikbern pushed a commit that referenced this issue Nov 9, 2014

Ubuntu
Replacing an explicit loop with std::copy to solve #3 and #13
This seems to do the trick when I'm running it on an EC2 instance. Seems like GCC 4.8 goes too far optimizing redundant loops and removes the loop (that's the only explanation I can think of).

erikbern added a commit that referenced this issue Nov 9, 2014

Merge pull request #24 from spotify/fix-3-13
Replacing an explicit loop with std::copy to solve #3 and #13
@erikbern

This comment has been minimized.

Show comment
Hide comment
@erikbern

erikbern Nov 9, 2014

Collaborator

See #24

Collaborator

erikbern commented Nov 9, 2014

See #24

@erikbern erikbern closed this Nov 9, 2014

@erikbern

This comment has been minimized.

Show comment
Hide comment
@erikbern

erikbern Nov 9, 2014

Collaborator

Compilation issues were fixed a long time ago – the unit test issues were fixed in #24

Collaborator

erikbern commented Nov 9, 2014

Compilation issues were fixed a long time ago – the unit test issues were fixed in #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment