Monte carlo block added #113

ggaspersic · 2023-08-23T18:54:57Z

Created a separate MonteCarlo block which executes the run of a specific continuation of blocks n times (where n is the number of iterations), where a specific number of inputs (also configurable) is masked with a random function where the seed is based on the sum of the input. For each run we pick which index should be masked instead of iterating over the whole array & checking if it should be masked or not.

In the case of FW this block can be configurable run as part of the FFM after the triangle block, where it re-uses the input from the FFM/triangle on each run, which reduces the CPU & time of execution.

The execution of the sub runs is stored into a separate PredictionStats struct, which contains the mean, standard deviation, variance and number of iterations.

Additionally we allow the exposing of the PredictionStats information via FFI as an array of outputs.

…nteCarlo

…-carlo-block-added

SkBlaz · 2023-08-24T06:05:40Z

src/regressor.rs

@@ -166,7 +168,19 @@ impl Regressor {
        if mi.ffm_k > 0 {
            let block_ffm = block_ffm::new_ffm_block(&mut bg, mi).unwrap();
            let triangle_ffm = block_misc::new_triangle_block(&mut bg, block_ffm).unwrap();
-            output = block_misc::new_join_block(&mut bg, vec![output, triangle_ffm]).unwrap();
+            if mi.ffm_mc_iteration_count == 0 || mi.ffm_mc_dropout_rate <= 0.0 {


Negative dropout rate makes little sense imo?

Does indeed not make sense, but people make mistakes ;) Thats why I am creating the tiangulation there

SkBlaz · 2023-08-24T06:07:14Z

src/block_monte_carlo.rs

+    dropout_rate: f32,
+) -> Result<graph::BlockPtrOutput, Box<dyn Error>> {
+    let num_inputs = bg.get_num_output_values(vec![&input]);
+    assert_ne!(num_inputs, 0);


Assert debug?

Why debug, this is when creating? Should break also when running.

Assertions during runtime cost something - I guess this is fine as it's just during init?

SkBlaz · 2023-08-24T06:07:49Z

src/block_monte_carlo.rs

+        num_inputs,
+    });
+    let mut block_outputs = bg.add_node(block, vec![input])?;
+    assert_eq!(block_outputs.len(), 1);


Assertions - perhaps explicit to dbg mode

Why debug, this is when creating? Should break also when running.

Same answer as above -- as this is just init it's fine, for hotter part of source assertions would imply some kind of perf. penalty

SkBlaz · 2023-08-24T06:08:16Z

src/block_monte_carlo.rs

+}
+
+fn create_seed_from_input_tape(input_tape: &[f32]) -> u64 {
+    (input_tape.iter().sum::<f32>() * 1000.0) as u64


Please add constant to the top of the file + a mini explanation what it is exactly

Extracted to constant & added explanation. I will keep it close to the actual usage otherwise you constantly need to jump up & down.

Interesting view on that -- I guess the initial repo's convention is constants on top (if you check older files), even though both approaches are fine

SkBlaz · 2023-08-24T06:09:13Z

src/block_monte_carlo.rs

+    ) {
+        unsafe {
+            if self.number_of_inputs_to_skip == 0 {
+                self.copy_input_tape_to_output_tape(pb);


Afair "direct" implementation of this idea does not require this copy - is there an overhead associated?

We added an intermediate step, so we still need to copy the input tape to the output tape so it is used in the next step. Its copying values so not that large.

SkBlaz · 2023-08-24T06:14:47Z

src/block_monte_carlo.rs

+    }
+
+    fn fill_stats(&self, pb: &mut PortBuffer) {
+        let mean: f32 = pb.observations.iter().sum::<f32>() / self.num_iterations as f32;


What's the overhead of doing this for each prediction? Seems like two pass variance would avoid these extra mults, just a hunch though

or even the Bessel one

With 5 iterations we get to 0.5 overhead per prediction. So not that much :)

Right, assuming we stay in that range it's fine then

src/block_monte_carlo.rs

src/lib.rs

SkBlaz · 2023-11-09T10:35:46Z

src/lib.rs

+    }
+    let stats_slice = std::slice::from_raw_parts_mut(stats, stats_size);
+    *stats_slice.get_unchecked_mut(0) = prediction_stats.mean;
+    *stats_slice.get_unchecked_mut(1) = prediction_stats.variance;


No need to report std and var, var is enough ..

We actually need std dev. But I am reporting both just incase we mess up with the calculation.

Merely saying that stddev is just sqrt(var), computable more or less everywhere (e.g., during analysis)

SkBlaz · 2023-11-09T10:36:08Z

src/model_instance.rs

+        if cmd_arguments.is_present("ffm_mc_iteration_count") {
+            if let Some(val) = cmd_arguments.value_of("ffm_mc_iteration_count") {
+                let hvalue = val.parse::<>()?;
+                mi.ffm_mc_iteration_count = hvalue;


What exactly is the meaning of the two new hpos?

ffm_mc_iteration_count - number of iteration that the monte carlo will be run
ffm_mc_dropout_rate - dropout rate for it
Or did I misunderstand?

Ok cool, just wanted to make sure I understood :) Intuitively mc_iteration_count=1 + mc_dropout_rate=0.0 imply current mode of operation then -- this is not entirely true anymore though right?

current = current main

ggaspersic added 12 commits August 16, 2023 15:39

Monte carlo block added (excluding configurability)

aae3e29

MonteCarlo updates input on forward_backward step

76ec567

Same order of methods applied in BlockMonteCarlo as in BlockTrait

9648feb

Seed-ed random Xoshiro256Plus random number generator used in BlockMo…

ac76305

…nteCarlo

Configurability of FFM MonteCarlo added

357434c

Test added for MonteCarlo

e702de9

Prediction statistics exposed via FFI

d177241

FFI naming made consistent

62029cb

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

cc5b02f

…-carlo-block-added

Monte Carlo test corrected after incorrect merge

35ab593

Limit monte carlo blocks to more then 0 iterations

a447cf8

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

636318b

…-carlo-block-added

ggaspersic requested review from SkBlaz, yonatankarni and bbenshalom August 23, 2023 18:55

ggaspersic added 13 commits August 29, 2023 19:55

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

6d276ad

…-carlo-block-added

First iteration of Monte Carlo block executes run without masking

1b3859f

Handle 0 iteration of Monte Carlo in a more consise way

69b7573

ffm_mc_iteration_count and ffm_mc_dropout_rate exposed via cli

8c131a2

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

2b9be62

…-carlo-block-added

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

6709501

…-carlo-block-added

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

ff9cf33

…-carlo-block-added

Moving monte carlo block before triangle

def777c

Small correction

bd8911f

Remove println

63033c6

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

540ead4

…-carlo-block-added

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

f022467

…-carlo-block-added

Merge branch 'main' of github.com:outbrain/fwumious_wabbit into monte…

a763bac

…-carlo-block-added

SkBlaz reviewed Nov 9, 2023

View reviewed changes

Processing review comments

fd5394f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monte carlo block added #113

Monte carlo block added #113

ggaspersic commented Aug 23, 2023 •

edited

Loading

SkBlaz Aug 24, 2023

ggaspersic Nov 9, 2023

SkBlaz Aug 24, 2023

ggaspersic Nov 9, 2023

SkBlaz Nov 9, 2023

SkBlaz Aug 24, 2023

ggaspersic Nov 9, 2023

SkBlaz Nov 9, 2023

SkBlaz Aug 24, 2023

ggaspersic Nov 9, 2023

SkBlaz Nov 9, 2023

SkBlaz Aug 24, 2023

ggaspersic Nov 9, 2023

SkBlaz Aug 24, 2023

ggaspersic Nov 9, 2023

SkBlaz Nov 9, 2023

SkBlaz Nov 9, 2023

ggaspersic Nov 9, 2023

SkBlaz Nov 9, 2023

SkBlaz Nov 9, 2023

ggaspersic Nov 9, 2023

SkBlaz Nov 9, 2023

SkBlaz Nov 9, 2023

Monte carlo block added #113

Are you sure you want to change the base?

Monte carlo block added #113

Conversation

ggaspersic commented Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggaspersic commented Aug 23, 2023 •

edited

Loading