Parallelize build? #70

ravimody · 2015-05-29T18:34:44Z

Conceptually, if the trees are independent, shouldn't we be able to build the trees in parallel across multiple cores? Is this something that could be implemented easily?

I haven't had a chance to dive deeply into the code/algorithm yet so I may be misunderstanding something.

erikbern · 2015-05-29T18:54:27Z

It wouldn't be too hard – you would need a mutex in a couple of places and also watch out for reallocations, but other than that it should be easy

ravimody · 2015-06-01T21:01:08Z

Cool thanks, I'll look into it (although my C++ is a little rusty :))

thomas4g · 2017-11-17T20:02:30Z

Has there been any further work on this?

erikbern · 2017-11-17T21:51:02Z

not afaik :(

ravimody · 2017-11-17T21:54:44Z

No work on it from my end.

thomas4g · 2017-11-17T23:32:35Z

Bummer. I'd love to help out, but I don't think I'm familiar enough with annoy yet (or frankly competent enough with C++) to spearhead anything. If anyone starts working on this and wants a hand, I'm happy to help out.

thomas4g · 2017-11-17T23:42:49Z

Actually, I take it back. It looks like a pretty simple change. If I understand it properly, it's line 496 of annoylib.h:

_roots.push_back(_make_tree(indices));

That can get parallelized, but with a mutex around _roots to avoid simultaneous push_back calls, right?

erikbern · 2017-11-18T03:03:43Z

roughly, but it's a bit more complicated._make_tree also modifies the datastructure. You also need a mutex around line 643-644 and line 702-703. Maybe that mutex could just be folded into _allocate_size and we could rename the function to something like _add_item ... probably makes more sense

then you obviously need to create pthreads and collect them afterwards... honestly I haven't done that in like 10 years, but iirc it's not too hard

tjrileywisc · 2017-11-20T11:49:47Z

@erikbern Actually I was working on a branch to add this just this weekend. It doesn't quite work yet though (I can only get it to run the precision_test.cpp example, only a debug binary works and it crashes sometimes even then); should I submit a PR?

I didn't touch anything towards the bottom of annoylib.h (near those lines that you mentioned), that might be the problem...

erikbern · 2017-11-20T13:51:57Z

sure, feel free to submit a PR, just make sure to put "WIP" in the subject or something

you definitely need a mutex around those lines, that's probably the issue :)

thomas4g · 2017-11-20T16:21:36Z

@tjrileywisc what a coincidence! I also started working on this over the weekend. Mine also doesn't quite work... but I've pushed a copy to my fork: https://github.com/thomas4g/annoy/tree/parallelize_build

I hope to keep working on it tonight, but if you're further along let me know if you'd like any help! Feel free to ping me here or shoot me an email me@thomasshields.net

tjrileywisc · 2017-11-20T16:26:36Z

@thomas4g
Just took a peak at your fork - I'm actually using std::thread instead of pthread . I think Windows doesn't have support for pthread built in. std::thread should be available for gcc and MSVC.

thomas4g · 2017-11-20T16:28:20Z

@tjrileywisc ahh, I wanted to use that but got thrown off by the build not supporting my #include <mutex> by default. I didn't want to fiddle with build settings.

tjrileywisc · 2017-11-21T01:26:15Z

Just submitted a PR for my build_trees_threaded branch.

#246

denkuzin · 2019-07-01T21:09:58Z

guys, is there any news on the multicore index building?

os-gabe · 2019-11-25T17:42:09Z

I thought I might have a go at this. The actual threading machinery is all straightforward but I'm having some trouble seeing exactly which bits of _make_tree() modify common data and need to be guarded with a mutex. As per previous comments I wrapped some functionality into a thread safe _add_item() function but I'm still getting segfaults indicating data is getting modified out from under my threads somewhere else.

S _add_item() {
    std::lock_guard<std::mutex> guard(_mutex);
    _allocate_size(_n_nodes + 1);
    S item = _n_nodes++;
    return item;
}

Any help with this is appreciated.

erikbern · 2019-11-25T18:22:42Z

interesting – you're probably right that it should be fairly straightforward, but i can't think of other critical sections off the top of my head

os-gabe · 2019-11-25T19:55:15Z

Unless I'm misunderstanding the code this may be more complicated than I first thought. It appears that there are many places within _make_tree that touch the underlying data. For example _get would need a mutex since it could be attempting to access _nodes while another thread is calling _add_item. But worse, since it returns a pointer, anywhere that pointer is used would also need a mutex. I believe this is looking like you would wind up needing mutexes around quite a large portion of _make_tree which would mostly undo the benefits of parallelizing it. It's possible I'm still not understanding the code fully though.

erikbern · 2019-11-25T23:28:25Z

From what you say I think the main issue is that the underlying memory can be reallocated at any point in time, and that invalidates any pointers held by any other thread.

But those reallocations are actually pretty rare so there should be some way to fix.

I haven't dealt with concurrency code in C++ since maybe 2007 so my knowledge is a bit rusty but couldn't you use a shared lock for this? Almost all access will be nonexclusive (so near-zero overhead), but the few times when you need to reallocate the underlying storage, you would have to acquire an exclusive lock. Does that make sense?

erikbern · 2019-11-26T05:08:49Z

I meant a shared mutex. This looks like the right concurrency primitive: https://en.cppreference.com/w/cpp/thread/shared_mutex

So basically acquire a shared lock when writing individual vectors, acquire an exclusive lock when you have to resize the underlying data storage.

But I'm mostly speculating, could be wrong :)

os-gabe · 2019-11-26T16:22:02Z

I think what you are saying makes sense. Thanks for the tip on shared mutex - I hadn't seen that before. It was introduced in c++17 though which may be it's own problem.

I'll take another look and see what I can figure out.

chikubee · 2020-01-27T12:10:37Z

@os-gabe Hey, Is it solved? Any insights on how to parallelize build?

erikbern · 2020-01-27T13:39:18Z

no, this would have to be implemented by someone

chikubee · 2020-01-27T14:27:27Z

@erikbern That's sad. I am trying to build an index for over 1M vectors and it is crashing even with on_disk_build. The process took up more than 30 GB memory and crashed.

erikbern · 2020-01-27T15:37:14Z

i'm not sure if parallelization would have helped, though. do you know what's causing it to crash?

ravimody · 2020-01-27T15:45:54Z

Yeah agreed that parallelization probably would not help.

1M vectors isn't that much unless they are extremely high dimensional vectors. Did you try with a smaller number of vectors to see where it starts failing?

erikbern · 2020-01-27T15:46:43Z

good point – annoy isn't meant for super high dimensionality, so if that's what you're facing then you should probably run dimensionality reduction outside of annoy first!

os-gabe · 2020-01-27T18:13:57Z

@os-gabe Hey, Is it solved? Any insights on how to parallelize build?

Unfortunately I had to move on to other things and did not get parallel build working

erikbern closed this as completed Feb 24, 2019

johnsonjsyuen mentioned this issue Jun 4, 2019

Can we use mutlicore cpus for building index? #386

Open

erikbern reopened this Nov 25, 2019

novoselrok mentioned this issue Jul 28, 2020

Multithreaded build #495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize build? #70

Parallelize build? #70

ravimody commented May 29, 2015

erikbern commented May 29, 2015

ravimody commented Jun 1, 2015

thomas4g commented Nov 17, 2017

erikbern commented Nov 17, 2017

ravimody commented Nov 17, 2017

thomas4g commented Nov 17, 2017

thomas4g commented Nov 17, 2017

erikbern commented Nov 18, 2017

tjrileywisc commented Nov 20, 2017

erikbern commented Nov 20, 2017 •

edited

Loading

thomas4g commented Nov 20, 2017

tjrileywisc commented Nov 20, 2017 •

edited

Loading

thomas4g commented Nov 20, 2017

tjrileywisc commented Nov 21, 2017 •

edited

Loading

denkuzin commented Jul 1, 2019

os-gabe commented Nov 25, 2019 •

edited

Loading

erikbern commented Nov 25, 2019

os-gabe commented Nov 25, 2019

erikbern commented Nov 25, 2019

erikbern commented Nov 26, 2019

os-gabe commented Nov 26, 2019

chikubee commented Jan 27, 2020

erikbern commented Jan 27, 2020

chikubee commented Jan 27, 2020

erikbern commented Jan 27, 2020

ravimody commented Jan 27, 2020

erikbern commented Jan 27, 2020

os-gabe commented Jan 27, 2020

Parallelize build? #70

Parallelize build? #70

Comments

ravimody commented May 29, 2015

erikbern commented May 29, 2015

ravimody commented Jun 1, 2015

thomas4g commented Nov 17, 2017

erikbern commented Nov 17, 2017

ravimody commented Nov 17, 2017

thomas4g commented Nov 17, 2017

thomas4g commented Nov 17, 2017

erikbern commented Nov 18, 2017

tjrileywisc commented Nov 20, 2017

erikbern commented Nov 20, 2017 • edited Loading

thomas4g commented Nov 20, 2017

tjrileywisc commented Nov 20, 2017 • edited Loading

thomas4g commented Nov 20, 2017

tjrileywisc commented Nov 21, 2017 • edited Loading

denkuzin commented Jul 1, 2019

os-gabe commented Nov 25, 2019 • edited Loading

erikbern commented Nov 25, 2019

os-gabe commented Nov 25, 2019

erikbern commented Nov 25, 2019

erikbern commented Nov 26, 2019

os-gabe commented Nov 26, 2019

chikubee commented Jan 27, 2020

erikbern commented Jan 27, 2020

chikubee commented Jan 27, 2020

erikbern commented Jan 27, 2020

ravimody commented Jan 27, 2020

erikbern commented Jan 27, 2020

os-gabe commented Jan 27, 2020

erikbern commented Nov 20, 2017 •

edited

Loading

tjrileywisc commented Nov 20, 2017 •

edited

Loading

tjrileywisc commented Nov 21, 2017 •

edited

Loading

os-gabe commented Nov 25, 2019 •

edited

Loading