Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability: is_interesting_payload #238

Merged
merged 4 commits into from Jan 29, 2019

Conversation

Projects
None yet
4 participants
@jeanphilippeD
Copy link
Contributor

jeanphilippeD commented Jan 28, 2019

When we have duplicate payloads in 'unconsensused_events' voted by different peers, only call is_interesting_payload on unique playloads.

This situation occurs with ConsensusMode::Supermajority, or for adding/removing peers where we can have 2/3*peer_count duplicate events.

Add tests to support changes:
-A bug was introduced in initial implementation, not caught by cargo test and was difficult to debug.
-Split testable free function in initial commit to ensure clear diff on this update without unit tests.
-This free function has its dependecy on Parsec injected as closure and use the new AbstractEvent trait so a light weight and easily configurable Event stub/mock can be used in tests.

Note on tests:

-Use AAA(Arrange/Act/Assert) test pattern for test function as it ensure test remain readable over time.
-Apply a data driven approach to the test: A single function is executed with different data providing the setup and expected outcome. (Generally, code would specify a collection of test cases and the testing framework would run all of them. Here, I adapted the pattern by having one test function per data calling a common sub-function as a simpler approach to avoid bringing dependency to this test for now.)
-I use a nested module 'find_interesting_content_for_event' to define types and function only relevant for testing our new function. The types and functions more generally useful for parsec testing are defined in tests.

Performance results:
Over 50% for large test cases.
Small regression on very small test cases with very few peers (First 3 benchmarks).

minimal time: [851.75 us 866.10 us 880.02 us]
change: [+2.7617% +5.1026% +7.2277%] (p = 0.00 < 0.05)
Performance has regressed.

static time: [4.1517 ms 4.2511 ms 4.4013 ms]
change: [+1.8617% +5.0668% +8.7181%] (p = 0.01 < 0.05)
Performance has regressed.

dynamic time: [2.8286 ms 2.9064 ms 2.9841 ms]
change: [+2.2913% +5.0127% +7.7377%] (p = 0.00 < 0.05)
Performance has regressed.

a_node4_opaque_evt16 time: [3.0903 ms 3.1319 ms 3.2025 ms]
change: [-2.1895% -0.1052% +2.1168%] (p = 0.93 > 0.05)
No change in performance detected.

a_node8_opaque_evt16 time: [12.695 ms 12.849 ms 13.192 ms]
change: [-37.835% -36.282% -34.501%] (p = 0.00 < 0.05)
Performance has improved.

a_node12_opaque_evt16 time: [95.981 ms 97.514 ms 98.197 ms]
change: [-36.983% -34.989% -32.622%] (p = 0.00 < 0.05)
Performance has improved.

a_node16_opaque_evt16 time: [229.51 ms 231.06 ms 232.88 ms]
change: [-37.909% -35.791% -34.382%] (p = 0.00 < 0.05)
Performance has improved.

a_node24_opaque_evt16 time: [840.21 ms 846.48 ms 853.49 ms]
change: [-50.129% -49.764% -49.390%] (p = 0.00 < 0.05)
Performance has improved.

a_node32_opaque_evt16 time: [2.1901 s 2.2141 s 2.2400 s]
change: [-54.058% -53.511% -52.926%] (p = 0.00 < 0.05)
Performance has improved.

@jeanphilippeD jeanphilippeD requested review from Fraser999 and maqi Jan 28, 2019

@@ -33,6 +33,37 @@ use std::fmt::{self, Debug, Display, Formatter};
#[cfg(feature = "dump-graphs")]
use std::io::{self, Write};

/// Provide a small interface to Event not dependent on PublicId.
/// Serves as a test seam.

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

Better use the same line width as others? so this line break is not necessary?

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

I find 2 sentences that elaborate on 2 different points to be easier to parse/read and see that there are 3 points when on separate lines.

Is that problematic?

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

Not a big issue, just a different preference. :)

This comment has been minimized.

@Fraser999

Fraser999 Jan 28, 2019

Member

It's a style we generally have throughout the codebase though (minimising vertical space), so in the interests of consistency as well as personal preference, I'd agree with Qi.

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 29, 2019

Author Contributor

Fixed

/// Returns whether this event can see `other`, i.e. whether there's a directed path from `other`
/// to `self` in the graph, and no two events created by `other`'s creator are ancestors to
/// `self` (fork).
fn sees<E: AsRef<Self>>(&self, other: E) -> bool

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

shall the name then be sees_without_fork to be explicit, according to the comment?

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

Could well be, but I do not think this should be part of this change as this is a long standing name for the function.

where
Self: Sized;

/// For Observation event the vote payload_key

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

For Observation event, returns the vote's payload_key ?

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

Fixed with:

    /// The vote payload_key for an Observation event
    fn payload_key(&self) -> Option<&ObservationKey>
@@ -360,6 +391,28 @@ impl<P: PublicId> Event<P> {
}
}

impl<P: PublicId> AbstractEvent for Event<P> {
fn sees<E: AsRef<Self>>(&self, other: E) -> bool {
self.is_descendant_of(other).unwrap_or(false)

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

does is_descendant_of check fork as well?

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

There is a problem here, I did not mean to re-implement sees, but forward it like the other functions.

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 29, 2019

Author Contributor

Fixed

Show resolved Hide resolved src/parsec.rs Outdated
.meta_election
.is_already_interesting_content(builder.event().creator(), payload_key)
})
.filter(|(_, payload_key)| !is_already_interesting_content(payload_key))

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

wondering can these two filter be merged into one?
and the same to the above filter and filter_map

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

I would not do unnecessary change in this commit.
I probably would not want to update other commit either.

But that is a great idea and I'll add another commit for this clean up since the code is tested reasonably.

@@ -0,0 +1,370 @@
// Copyright 2018 MaidSafe.net limited.

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

2019 now?

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

Not sure how this work considering this is code copied from parsec.rs?
Do we have a policy somewhere?

This comment has been minimized.

@Fraser999

Fraser999 Jan 28, 2019

Member

It's a bit ambiguous in this case. If code is purely being moved around, there's justification for keeping the original copyright date. But if the changes become significant we should apply the current year to new files. At least that's my inexpert understanding.

/// Find interesting playloads for the builder_event.
/// For payload observed from builder_event, order them by creation index.
#[allow(unused_qualifications)]
// E: std::convert::AsRef<E> both requiered and considered unused by 1.32

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

mis-spelling of required

.filter(|(_, payload_key, _)| !is_already_interesting_content(payload_key))
.collect();

// Transform to Vec< (PayloadKey, Vec<Event>) > so we can process identical payload once.

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

better use a quote mark for the vec type?

})
.collect();

let mut payloads: Vec<_> = payloads

This comment has been minimized.

@maqi

maqi Jan 28, 2019

Member

I am not sure, normally I try to avoid use the same name variables.

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

That is one of the power of rust. Here we are transforming the collection again and again, but it still represent the same thing. Is that something that is frowned upon?

This comment has been minimized.

@Fraser999

Fraser999 Jan 28, 2019

Member

I too prefer using shadowing for cases like this, but I think we're in the minority.

This comment has been minimized.

@jeanphilippeD

jeanphilippeD Jan 28, 2019

Author Contributor

actually, will change this for clarity,.

@Fraser999
Copy link
Member

Fraser999 left a comment

This is a nice optimisation. My comments are all fairly trivial suggestions.

Show resolved Hide resolved Cargo.toml Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
Show resolved Hide resolved src/parsec_helpers.rs Outdated
@Fraser999
Copy link
Member

Fraser999 left a comment

Nice work. Particularly happy to see the unit tests going in.

jeanphilippeD added some commits Jan 27, 2019

refactor/parsec: extract testable find_interesting_content_for_event
We need to add test to this functionality to support performance
improvment changes.

This refactor only apply simple transformations on this code with
limited tests.
We will be able to unit test this free function with no dependency on
complicated objects.

AbstractEvent trait:
-Needed so Event can be mocked/stubbed in tests.
-Statically dispatched functions should not incur performance penalty.
-Can be extended with other needed functions in the future, but should
 not depend on generic parameter.

Test:
Cargo test
test/parsec: move code and add tests to new parsec_helpers.rs
Add tests to support follow up changes improving performance of
find_interesting_content_for_event.

Move code in new file to cut down the long parsec.rs and run 'cargo fmt'
only chaging spacing.

More free functions used by parsec.rs and their tests can be added here,
ideally also depending on the mockable trait rather than Event, and with
no dependencies on Parsec object.

Note on tests:

-Implement unit tests with only very light weight dependencies, stubs
and depedency injection.

-Use AAA(Arrange/Act/Assert) test pattern for test function as it ensure
test remain readable over time.

-Apply a data driven approach to the test: A single function is executed
with different data providing the setup and expected outcome.
Generally, code would specify a collection of test cases and the testing
framework would run all of them. Here, I adapted the pattern by having
one test function per data calling a common sub-function as a simpler
approach to avoid bringing dependency to this test for now.

-I use a nested  module 'find_interesting_content_for_event' to define
types and function only relevant for testing our new function. The types
and functions more generally useful for parsec testing are defined in
tests.

Test:
cargo test
perf/parsec: de-duplicate calls to is_interesting_payload
When we have duplicate payloads in 'unconsensused_events' voted by
different peers, only call is_interesting_payload on unique playloads.

This situation occurs with ConsensusMode::Supermajority, or for
adding/removing peers where we can have 2/3*peer_count duplicate events.

Note on identifying the problem:
- Add counter to see the problem and improvment for
bench_section_size_evt16/a_node32:

- Based on PR235 is_interesting_payload was called many times (46784
times for 2631 events and 2599 calls to set_interesting_content).
Perf flame graph: 55% of run time for a_node32_opaque_evt16:

- Calling is_interesting_payload only once for all events with the same
payload we get called fewer times(8338 times for 1902 events and 1870
calls to set_interesting_content). Perf flame graph: 17% of run time for
a_node32_opaque_evt16:

Test:
cargo test
cargo bench against master results.

@jeanphilippeD jeanphilippeD dismissed stale reviews from maqi and Fraser999 via bc3996e Jan 29, 2019

@jeanphilippeD jeanphilippeD force-pushed the jeanphilippeD:scalability_is_interesting_payload branch from fb66cd5 to bc3996e Jan 29, 2019

@pierrechevalier83 pierrechevalier83 merged commit 073f57d into maidsafe:master Jan 29, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jeanphilippeD jeanphilippeD deleted the jeanphilippeD:scalability_is_interesting_payload branch Jan 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.