-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to write the index from independent thread easily. #550
Comments
A no brainer for next version of tantivy would be to remove the make Toshi could then exchange its |
That would wrap up a lot of contention especially if you defer the commit for too long. Another very small change I think makes sense is to expose I also have some thoughts on the |
That would be a super cool feature. My end users are admiring of tantivy's read speed and if write from independent threads would be possible it would be fantastic. You make a great work, thanks! By the way, if I use |
@kkonevets no I do it that way, it works fine, you just do |
@hntd187 Actix-web does not allow to explicitly clone the object with It's documentation says The server creates a separate application instance for each created worker. Application state is not shared between threads. To share state, Arc could be used. So I guess we do the right thing |
I guess it would be nice to implement the feature in tantivy without the need to depend on the architecture of a specific web framework. |
@kkonevets Can you share the code you are talking about? The error you pointed is not called by a call to |
@fulmicoton I a sorry, have just committed the changes, pull if already cloned Just run |
Your problem is that multiple State object are created. Not that the Arc gets cloned. |
Can you try the following... let modify_state = ModifyState::new().unwrap();
HttpServer::new(move || {
vec![
App::with_state(SearchState::new().unwrap())
.prefix("/search")
.resource("", |r| r.method(http::Method::POST).with(search_index))
.boxed(),
App::with_state(modify_state.clone())
.prefix("/modify")
.resource("", |r| r.method(http::Method::POST).with(modify_index))
.boxed(),
]
})
.bind(host)
.unwrap()
.start(); |
error[E0277]: |
@fulmicoton thank you for your comment, I did create multiple State objects, my fault
|
This ticket looks interesting and I wanted to add my 2c.
I think blocking is fine. I estimate that most multi-threaded uses of tantivy are read-heavy >>> write-heavy. Blocking every once in a while should be ok. Unless, I am missing some obvious write-heavy use case, when reading needs to be non-blocking.
AFAIK, Futures imply an event-loop, which increases binary size and adds another thread model. What killer functionality do Futures add for our typical current and future use cases? |
@kkonevets Yeah. I need to mark the |
@petr-tik Your 2c are always welcome :).
That's more or less true. There are some analytics usage where write are more frequent than reads... I am suggesting futures here, not for performance reason at all, but for ergonomy. |
I am going to make it 4c total, by adding another 2c.
Can we hide it behind a feature-flag or do you want to redesign the current commit pipeline to be async/futures-based? I would use rust handy support for conditional compilation to make it a feature flag. Thinking about it. If we ever commit to implementing a wasm backend #541, we can re-use the Future feature on webserver and wasm front-end that will convert function returns to JS Promises (I think, my JS understanding is very hand-wavy) |
If you're gonna add that future functionality it might be better to build it on where nightly is going or wait until the async/await lands in stable. Don't implement a contract that's going away soon. |
Internalizing the As a general rule, I try as much as possible to rule out the chances to poison locks. The straightforward implementation of Internalizing the |
@fulmicoton I am sorry for a dilettante question. What do you mean by user code? How is it possible for the user to call something from |
@kkonevets Sorry for the sparse information. If the thread holding a RwlockGuard panicks. The lock is "poisoned". That means that trying to acquire the lock will return an error in the future. I want to avoid the situation were a client of tantivy is unaware that a let prepared_commit = index_writer.prepare_commit()?;
some_operation_that_panics(); He will get an error next time he tries to add docs to the index. If we don't internalize the |
closing the issue. |
@fulmicoton I am new to both Rust and Tantivy. To get a shared index writer in multiple threaded env, I am fighting with Rust's mut ref for a whole day. I wonder what the right approach to
I see |
Yeah you are still stuck with Arc<RwLock>. Just take the read lock for adding documents, and the write lock for committing. |
(This is a follow up from #549)
This problem is very common and hurts some Very Important Project relying on tantivy (toshi, plume). (Invoking the name @fdb-hiroshima and @hntd187 for the discussion).
Web server are typically multithreaded and requests may spawn the need to add or delete documents. Dealing with a
Arc<RwLock<IndexWriter>>
might feel dirty, and rust beginners may not really understand the logic behind that.On the other hand, the
IndexWriter
already relies on a channel to dispatch indexing operation to its own small thread pool. Stamping is also done using Atomics. There is actually no real reason to prevent.add_document()
and and.delete_term()
to happen from different threads.The problem is
Especially would this ability to index from several threads confuse people on
.commit()
ensure that all operations that happened but the.commit()
are processed)Also,
.commit()
and.rollback()
block other operations? (It is technically possible to have.commit()
only block other.commit()
operations.).commit()
and.rollback()
return futures?The text was updated successfully, but these errors were encountered: