Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Storing parse result to avoid reparsing? (async related) #472

Closed
Geobert opened this issue Aug 19, 2020 · 11 comments
Closed

[Question] Storing parse result to avoid reparsing? (async related) #472

Geobert opened this issue Aug 19, 2020 · 11 comments
Labels

Comments

@Geobert
Copy link
Contributor

Geobert commented Aug 19, 2020

I'd like to store the Pairs<> in order to avoid reparsing of the same expression and cloning it before reevaluate it using PrecClimber.

I have two concerns with that:

  • is that really efficient?
  • Pairs being !Send and !Sync, how can I use it in an async env?

Thanks!

@CAD97
Copy link
Contributor

CAD97 commented Aug 19, 2020

Pairs is an internally reference-counted resource, so cloning it is cheap (on the order of an Rc clone or two). The same work is done to get each child Pair, actually, so it's really nothing to clone and hang on to Pairs for as long as they're useful.

Use in async is difficult but possible. They use non-threadsafe reference counting internally, so you need to keep them on one thread. (It is theoretically possible to perform a deep clone that would then be able to be transferred between threads, but we don't do that and I think the benefit would be small, as it'd have to be moved back into the not send/sync form to be used again.) Basically, you need to use whatever mechanism your runtime provides to pin a given task to a single thread, allowing it to use nonthreadsafe resources. Or, push work with the Pairs into its own dedicated, blocking OS thread, which communicates with the async world through channels.

@CAD97 CAD97 added the question label Aug 19, 2020
@Geobert
Copy link
Contributor Author

Geobert commented Aug 20, 2020

Thank you very much! A pity that async use is so difficult :-/ But I'll see what I can do :)

@Geobert Geobert closed this as completed Aug 20, 2020
@Darksonn
Copy link

Darksonn commented Aug 20, 2020

Generally pinning the task to a single thread is very cumbersome or not possible (LocalSet requires the use of block_on, and spawn_local requires the use of LocalSet). Instead, I recommend handling the pest types in the same manner as a MutexGuard, that is, do not keep the type across an .await. Check out this section for some info on that, but basically you need to copy the data you need after an .await into a Send type before issuing that .await.

Pinning a normally spawned task to a single thread is not possible because this feature is fundamentally incompatible with the block_in_place feature. You can have block_in_place, or you can have pinning to a thread, but not both, and Tokio currently provides block_in_place. Because of this, only tasks executed with block_on, or inside a LocalSet that is in a task executed with block_on, can use non-Send data.

@Geobert
Copy link
Contributor Author

Geobert commented Aug 20, 2020

I recommend handling the pest types in the same manner as a MutexGuard

For completeness, follow up on this !Send !Sync aspect: https://users.rust-lang.org/t/how-to-deal-with-external-type-which-is-send-and-sync/47530

Is there any plan to have Pairs at least Send (so we can wrap it in a Mutex if Sync needed)?

@Geobert Geobert reopened this Aug 20, 2020
@SkiFire13
Copy link
Contributor

How about adding an optional feature to turn all the Rcs into Arcs?

@Geobert Geobert changed the title [Question] Storing parse result to avoid reparsing? [Question] Storing parse result to avoid reparsing? (async related) Aug 20, 2020
@CAD97
Copy link
Contributor

CAD97 commented Aug 20, 2020

The main issue with such is that it's not really a "strictly additive" feature. Additionally, it's not really possible to be generic between Arc and Rc in current Rust, as that requires some kind of HKT functionality.

That said, I've primarily been working with Rowan recently, which uses Arc trees rather than Rc, so I do agree that being able to share the parse between threads is useful.

(Actually, Rowan uses an Arc for the tree structure itself, but Rc for traversal of the tree, so it's sort of a hybrid approach.)

@SkiFire13
Copy link
Contributor

That's why I proposed a feature, we could do something like:

#[cfg(not(feature = "sync-send"))]
type RefCounted<T> = std::rc::Rc<T>;
#[cfg(feature = "sync-send")]
type RefCounted<T> = std::sync::Arc<T>;

and replace every occurence of Rc with RefCounted.

I wrote a proof of concept here and it seems to be a drop-in replacement. The only issue I found was that in the docs Pair and Pairs are still flagged as !Sync and !Send even though they are actually Sync and Send when the feature flag is enabled.

@CAD97
Copy link
Contributor

CAD97 commented Aug 20, 2020

The problem is the "purely additive" requirement for features. Sure, a feature flag could add the "feature" of send/sync for the Pairs API. But now everyone else in the dependency tree using Pest is paying for atomic reference counting even though they wrote their code to stay on a single thread.

@Geobert
Copy link
Contributor Author

Geobert commented Aug 20, 2020

I don't follow: if the feature flag is not in default flags, we stay like today, don't we?

with @SkiFire13 prototype, we pay for Arc only if we opt-in send-sync feature in our Cargo.toml, or am I missing something?

@CAD97
Copy link
Contributor

CAD97 commented Aug 21, 2020

Features are unified accross the entire dependency tree.

Let's say you use both tera and nalgebra, both of which use pest. Now you use pest directly, and enable the pairs-are-sync feature.

Tera and nalgebra also use pest, and there's a single version of pest compiled into your binary. Both tera and nalgebra will be slower to manipulate pairs, as they are using atomic operations, even though both of those libraries didn't enable the feature and are developed to not need it. If anyone in the dependency tree activates a feature, it's enabled for everyone.

This is even more insidious if, say, a test dependency is the only one that enables the feature, as now you're paying for it without even benefiting from it anywhere.

It's the same reason you shouldn't use mutually exclusive cargo features; features are intended to be strictly additive, because once they're on they're on for everyone.

@Geobert
Copy link
Contributor Author

Geobert commented Aug 21, 2020

Oh I understand better now!

What about another module pest-async in pest then?

ldap3 has sync and async API for instance. And oxide-auth have async landing soon and it had a sub crate for async

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

4 participants