[Question] Storing `parse` result to avoid reparsing? (async related) #472

Geobert · 2020-08-19T21:13:46Z

I'd like to store the Pairs<> in order to avoid reparsing of the same expression and cloning it before reevaluate it using PrecClimber.

I have two concerns with that:

is that really efficient?
Pairs being !Send and !Sync, how can I use it in an async env?

Thanks!

The text was updated successfully, but these errors were encountered:

CAD97 · 2020-08-19T23:58:26Z

Pairs is an internally reference-counted resource, so cloning it is cheap (on the order of an Rc clone or two). The same work is done to get each child Pair, actually, so it's really nothing to clone and hang on to Pairs for as long as they're useful.

Use in async is difficult but possible. They use non-threadsafe reference counting internally, so you need to keep them on one thread. (It is theoretically possible to perform a deep clone that would then be able to be transferred between threads, but we don't do that and I think the benefit would be small, as it'd have to be moved back into the not send/sync form to be used again.) Basically, you need to use whatever mechanism your runtime provides to pin a given task to a single thread, allowing it to use nonthreadsafe resources. Or, push work with the Pairs into its own dedicated, blocking OS thread, which communicates with the async world through channels.

Geobert · 2020-08-20T07:39:20Z

Thank you very much! A pity that async use is so difficult :-/ But I'll see what I can do :)

Darksonn · 2020-08-20T09:25:49Z

Generally pinning the task to a single thread is very cumbersome or not possible (LocalSet requires the use of block_on, and spawn_local requires the use of LocalSet). Instead, I recommend handling the pest types in the same manner as a MutexGuard, that is, do not keep the type across an .await. Check out this section for some info on that, but basically you need to copy the data you need after an .await into a Send type before issuing that .await.

Pinning a normally spawned task to a single thread is not possible because this feature is fundamentally incompatible with the block_in_place feature. You can have block_in_place, or you can have pinning to a thread, but not both, and Tokio currently provides block_in_place. Because of this, only tasks executed with block_on, or inside a LocalSet that is in a task executed with block_on, can use non-Send data.

Geobert · 2020-08-20T12:22:59Z

I recommend handling the pest types in the same manner as a MutexGuard

For completeness, follow up on this !Send !Sync aspect: https://users.rust-lang.org/t/how-to-deal-with-external-type-which-is-send-and-sync/47530

Is there any plan to have Pairs at least Send (so we can wrap it in a Mutex if Sync needed)?

SkiFire13 · 2020-08-20T18:32:43Z

How about adding an optional feature to turn all the Rcs into Arcs?

CAD97 · 2020-08-20T19:23:47Z

The main issue with such is that it's not really a "strictly additive" feature. Additionally, it's not really possible to be generic between Arc and Rc in current Rust, as that requires some kind of HKT functionality.

That said, I've primarily been working with Rowan recently, which uses Arc trees rather than Rc, so I do agree that being able to share the parse between threads is useful.

(Actually, Rowan uses an Arc for the tree structure itself, but Rc for traversal of the tree, so it's sort of a hybrid approach.)

SkiFire13 · 2020-08-20T19:41:11Z

That's why I proposed a feature, we could do something like:

#[cfg(not(feature = "sync-send"))]
type RefCounted<T> = std::rc::Rc<T>;
#[cfg(feature = "sync-send")]
type RefCounted<T> = std::sync::Arc<T>;

and replace every occurence of Rc with RefCounted.

I wrote a proof of concept here and it seems to be a drop-in replacement. The only issue I found was that in the docs Pair and Pairs are still flagged as !Sync and !Send even though they are actually Sync and Send when the feature flag is enabled.

CAD97 · 2020-08-20T22:06:08Z

The problem is the "purely additive" requirement for features. Sure, a feature flag could add the "feature" of send/sync for the Pairs API. But now everyone else in the dependency tree using Pest is paying for atomic reference counting even though they wrote their code to stay on a single thread.

Geobert · 2020-08-20T22:21:57Z

I don't follow: if the feature flag is not in default flags, we stay like today, don't we?

with @SkiFire13 prototype, we pay for Arc only if we opt-in send-sync feature in our Cargo.toml, or am I missing something?

CAD97 · 2020-08-21T01:02:01Z

Features are unified accross the entire dependency tree.

Let's say you use both tera and nalgebra, both of which use pest. Now you use pest directly, and enable the pairs-are-sync feature.

Tera and nalgebra also use pest, and there's a single version of pest compiled into your binary. Both tera and nalgebra will be slower to manipulate pairs, as they are using atomic operations, even though both of those libraries didn't enable the feature and are developed to not need it. If anyone in the dependency tree activates a feature, it's enabled for everyone.

This is even more insidious if, say, a test dependency is the only one that enables the feature, as now you're paying for it without even benefiting from it anywhere.

It's the same reason you shouldn't use mutually exclusive cargo features; features are intended to be strictly additive, because once they're on they're on for everyone.

Geobert · 2020-08-21T07:36:26Z

Oh I understand better now!

What about another module pest-async in pest then?

ldap3 has sync and async API for instance. And oxide-auth have async landing soon and it had a sub crate for async

CAD97 added the question label Aug 19, 2020

Geobert closed this as completed Aug 20, 2020

Geobert reopened this Aug 20, 2020

Geobert changed the title ~~[Question] Storing parse result to avoid reparsing?~~ [Question] Storing parse result to avoid reparsing? (async related) Aug 20, 2020

tomtau converted this issue into discussion #631 Jul 10, 2022

ekechinwokah mentioned this issue Jan 25, 2024

Merge Pest changes to make it thread-safe infinohq/pest#1

Merged

ekechinwokah mentioned this issue Feb 8, 2024

Async Pest #974

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Question] Storing `parse` result to avoid reparsing? (async related) #472

[Question] Storing `parse` result to avoid reparsing? (async related) #472

Geobert commented Aug 19, 2020 •

edited

Loading

CAD97 commented Aug 19, 2020

Geobert commented Aug 20, 2020

Darksonn commented Aug 20, 2020 •

edited

Loading

Geobert commented Aug 20, 2020

SkiFire13 commented Aug 20, 2020

CAD97 commented Aug 20, 2020

SkiFire13 commented Aug 20, 2020

CAD97 commented Aug 20, 2020

Geobert commented Aug 20, 2020 •

edited

Loading

CAD97 commented Aug 21, 2020

Geobert commented Aug 21, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

[Question] Storing parse result to avoid reparsing? (async related) #472

[Question] Storing parse result to avoid reparsing? (async related) #472

Comments

Geobert commented Aug 19, 2020 • edited Loading

CAD97 commented Aug 19, 2020

Geobert commented Aug 20, 2020

Darksonn commented Aug 20, 2020 • edited Loading

Geobert commented Aug 20, 2020

SkiFire13 commented Aug 20, 2020

CAD97 commented Aug 20, 2020

SkiFire13 commented Aug 20, 2020

CAD97 commented Aug 20, 2020

Geobert commented Aug 20, 2020 • edited Loading

CAD97 commented Aug 21, 2020

Geobert commented Aug 21, 2020

This issue was moved to a discussion.

[Question] Storing `parse` result to avoid reparsing? (async related) #472

[Question] Storing `parse` result to avoid reparsing? (async related) #472

Geobert commented Aug 19, 2020 •

edited

Loading

Darksonn commented Aug 20, 2020 •

edited

Loading

Geobert commented Aug 20, 2020 •

edited

Loading