Piped RDDs and Bayesian AB Testing
==================================

Continued with Application to Old Bailey Online Data
----------------------------------------------------

This is a recall/repeat of `006a_PipedRDD`. After the recall, we
continue with applying Bayesian A/B Testing to the data extracted from
Old Bailey Online counts of crimes and punishments.

Here we will first take excerpts with minor modifications from the end
of **Chapter 12. Resilient Distributed Datasets (RDDs)** of *Spark: The
Definitive Guide*:

-   https://learning.oreilly.com/library/view/spark-the-definitive/9781491912201/ch12.html

Next, we will do Bayesian AB Testing using PipedRDDs.

First, we create the toy RDDs as in *The Definitive Guide*:

> From a Local Collection
> =======================

To create an RDD from a collection, you will need to use the parallelize
method on a SparkContext (within a SparkSession). This turns a single
node collection into a parallel collection. When creating this parallel
collection, you can also explicitly state the number of partitions into
which you would like to distribute this array. In this case, we are
creating two partitions:

In [None]:
// in Scala
val myCollection = "Spark The Definitive Guide : Big Data Processing Made Simple"  .split(" ")
val words = spark.sparkContext.parallelize(myCollection, 2)

  

>     myCollection: Array[String] = Array(Spark, The, Definitive, Guide, :, Big, Data, Processing, Made, Simple)
>     words: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[114] at parallelize at command-165306931359313:3

In [None]:
# in Python
myCollection = "Spark The Definitive Guide : Big Data Processing Made Simple"\
  .split(" ")
words = spark.sparkContext.parallelize(myCollection, 2)
words

  

> glom
> ====

> `glom` is an interesting function that takes every partition in your
> dataset and converts them to arrays. This can be useful if you’re
> going to collect the data to the driver and want to have an array for
> each partition. However, this can cause serious stability issues
> because if you have large partitions or a large number of partitions,
> it’s simple to crash the driver.

Let's use `glom` to see how our `words` are distributed among the two
partitions we used explicitly.

In [None]:
words.glom.collect 

  

>     res18: Array[Array[String]] = Array(Array(Spark, The, Definitive, Guide, :), Array(Big, Data, Processing, Made, Simple))

In [None]:
words.glom().collect()

  

> Checkpointing
> =============
>
> One feature not available in the DataFrame API is the concept of
> checkpointing. Checkpointing is the act of saving an RDD to disk so
> that future references to this RDD point to those intermediate
> partitions on disk rather than recomputing the RDD from its original
> source. This is similar to caching except that it’s not stored in
> memory, only disk. This can be helpful when performing iterative
> computation, similar to the use cases for caching:

Let's create a directory in `dbfs:///` for checkpointing of RDDs in the
sequel. The following `%fs mkdirs /path_to_dir` is a shortcut to create
a directory in `dbfs:///`

In [None]:
mkdirs /datasets/ScaDaMaLe/checkpointing/

  

>     res29: Boolean = true

In [None]:
spark.sparkContext.setCheckpointDir("dbfs:///datasets/ScaDaMaLe/checkpointing")
words.checkpoint()

  

  

> Now, when we reference this RDD, it will derive from the checkpoint
> instead of the source data. This can be a helpful optimization.

YouTry
------

Just some more words in `haha_words` with `\n`, the End-Of-Line (EOL)
characters, in-place.

In [None]:
val haha_words = sc.parallelize(Seq("ha\nha", "he\nhe\nhe", "ho\nho\nho\nho"),3)

  

>     haha_words: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[134] at parallelize at command-165306931359296:1

  

Let's use `glom` to see how our `haha_words` are distributed among the
partitions

In [None]:
haha_words.glom.collect

  

>     res31: Array[Array[String]] =
>     Array(Array(ha
>     ha), Array(he
>     he
>     he), Array(ho
>     ho
>     ho
>     ho))

  

> Pipe RDDs to System Commands
> ============================

> The pipe method is probably one of Spark’s more interesting methods.
> With pipe, you can return an RDD created by piping elements to a
> forked external process. The resulting RDD is computed by executing
> the given process once per partition. All elements of each input
> partition are written to a process’s stdin as lines of input separated
> by a newline. The resulting partition consists of the process’s stdout
> output, with each line of stdout resulting in one element of the
> output partition. A process is invoked even for empty partitions.

> The print behavior can be customized by providing two functions.

We can use a simple example and pipe each partition to the command wc.
Each row will be passed in as a new line, so if we perform a line count,
we will get the number of lines, one per partition:

The following produces a `PipedRDD`:

In [None]:
val wc_l_PipedRDD = words.pipe("wc -l")

  

>     wc_l_PipedRDD: org.apache.spark.rdd.RDD[String] = PipedRDD[143] at pipe at command-165306931359299:1

In [None]:
wc_l_PipedRDD = words.pipe("wc -l")
wc_l_PipedRDD

  

Now, we take an action via `collect` to bring the results to the Driver.

NOTE: Be careful what you collect! You can always write the output to
parquet of binary files in `dbfs:///` if the returned output is large.

In [None]:
wc_l_PipedRDD.collect

  

>     res32: Array[String] = Array(5, 5)

In [None]:
wc_l_PipedRDD.collect()

  

In this case, we got the number of lines returned by `wc -l` per
partition.

YouTry
------

Try to make sense of the next few cells where we do NOT specifiy the
number of partitions explicitly and let Spark decide on the number of
partitions automatically.

In [None]:
val haha_words = sc.parallelize(Seq("ha\nha", "he\nhe\nhe", "ho\nho\nho\nho"),3)
haha_words.glom.collect
val wc_l_PipedRDD_haha_words = haha_words.pipe("wc -l")
wc_l_PipedRDD_haha_words.collect()

  

>     haha_words: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[149] at parallelize at command-165306931359334:1
>     wc_l_PipedRDD_haha_words: org.apache.spark.rdd.RDD[String] = PipedRDD[151] at pipe at command-165306931359334:3
>     res33: Array[String] = Array(2, 3, 4)

  

Do you understand why the above `collect` statement returns what it
does?

In [None]:
val haha_words_again = sc.parallelize(Seq("ha\nha", "he\nhe\nhe", "ho\nho\nho\nho"))
haha_words_again.glom.collect
val wc_l_PipedRDD_haha_words_again = haha_words_again.pipe("wc -l")
wc_l_PipedRDD_haha_words_again.collect()

  

>     haha_words_again: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[152] at parallelize at command-165306931359308:1
>     wc_l_PipedRDD_haha_words_again: org.apache.spark.rdd.RDD[String] = PipedRDD[154] at pipe at command-165306931359308:3
>     res34: Array[String] = Array(0, 0, 2, 0, 0, 3, 0, 4)

  

Did you understand why some of the results are `0` in the last `collect`
statement?

> mapPartitions
> =============

> The previous command revealed that Spark operates on a per-partition
> basis when it comes to actually executing code. You also might have
> noticed earlier that the return signature of a map function on an RDD
> is actually `MapPartitionsRDD`.

Or `ParallelCollectionRDD` in our case.

> This is because map is just a row-wise alias for `mapPartitions`,
> which makes it possible for you to map an individual partition
> (represented as an iterator). That’s because physically on the cluster
> we operate on each partition individually (and not a specific row). A
> simple example creates the value “1” for every partition in our data,
> and the sum of the following expression will count the number of
> partitions we have:

In [None]:
// in Scala
words.mapPartitions(part => Iterator[Int](1)).sum() // 2.0

  

>     res35: Double = 2.0

In [None]:
# in Python
words.mapPartitions(lambda part: [1]).sum() # 2

  

> Naturally, this means that we operate on a per-partition basis and
> therefore it allows us to perform an operation on that *entire*
> partition. This is valuable for performing something on an entire
> subdataset of your RDD. You can gather all values of a partition class
> or group into one partition and then operate on that entire group
> using arbitrary functions and controls. An example use case of this
> would be that you could pipe this through some custom machine learning
> algorithm and train an individual model for that company’s portion of
> the dataset. A Facebook engineer has an interesting demonstration of
> their particular implementation of the pipe operator with a similar
> use case demonstrated at [Spark Summit East
> 2017](https://spark-summit.org/east-2017/events/experiences-with-sparks-rdd-apis-for-complex-custom-applications/).

> Other functions similar to `mapPartitions` include
> `mapPartitionsWithIndex`. With this you specify a function that
> accepts an index (within the partition) and an iterator that goes
> through all items within the partition. The partition index is the
> partition number in your RDD, which identifies where each record in
> our dataset sits (and potentially allows you to debug). You might use
> this to test whether your map functions are behaving correctly:

In [None]:
// in Scala
def indexedFunc(partitionIndex:Int, withinPartIterator: Iterator[String]) = {  withinPartIterator.toList.map(    
  value => s"Partition: $partitionIndex => $value").iterator
                                                                            }
words.mapPartitionsWithIndex(indexedFunc).collect()

  

>     indexedFunc: (partitionIndex: Int, withinPartIterator: Iterator[String])Iterator[String]
>     res36: Array[String] = Array(Partition: 0 => Spark, Partition: 0 => The, Partition: 0 => Definitive, Partition: 0 => Guide, Partition: 0 => :, Partition: 1 => Big, Partition: 1 => Data, Partition: 1 => Processing, Partition: 1 => Made, Partition: 1 => Simple)

In [None]:
# in Python
def indexedFunc(partitionIndex, withinPartIterator):  
  return ["partition: {} => {}".format(partitionIndex,    x) for x in withinPartIterator]
words.mapPartitionsWithIndex(indexedFunc).collect()

  

> foreachPartition
> ================

> Although `mapPartitions` needs a return value to work properly, this
> next function does not. `foreachPartition` simply iterates over all
> the partitions of the data. The difference is that the function has no
> return value. This makes it great for doing something with each
> partition like writing it out to a database. In fact, this is how many
> data source connectors are written. You can create

your

> own text file source if you want by specifying outputs to the temp
> directory with a random ID:

In [None]:
words.foreachPartition { iter =>  
  import java.io._  
  import scala.util.Random  
  val randomFileName = new Random().nextInt()  
  val pw = new PrintWriter(new File(s"/tmp/random-file-${randomFileName}.txt"))  
  while (iter.hasNext) {
    pw.write(iter.next())  
  }  
  pw.close()
}

  

  

> You’ll find these two files if you scan your /tmp directory.

You need to scan for the file across all the nodes. As the file may not
be in the Driver node's `/tmp/` directory but in those of the executors
that hosted the partition.

In [None]:
pwd

In [None]:
ls /tmp/random-file-*.txt

  

Numerically Rigorous Bayesian AB Testing
========================================

This is an example of Bayesian AB Testing with computer-aided proofs for
the posterior samples.

The main learning goal for you is to use pipedRDDs to distribute, in an
embarassingly paralle way, across all the worker nodes in the Spark
cluster an executible `IsIt1or2Coins`.

### What does `IsIt1or2Coins` do?

At a very high-level, to understand what `IsIt1or2Coins` does, imagine
the following simple experiment.

We are given

-   the number of heads that result from a first sequence of independent
    and identical tosses of a coin and then
-   we are given the number of heads that result from a second sequence
    of independent and identical tosses of a coin

Our decision problem is to do help shed light on whether both sequence
of tosses came from the same coin or not (whatever the bias may be).

`IsIt1or2Coins` tries to help us decide if the two sequence of
coin-tosses are based on one coin with an unknown bias or two coins with
different biases.

If you are curious about details feel free to see:

-   Exact Bayesian A/B testing using distributed fault-tolerant Moore
    rejection sampler, Benny Avelin and Raazesh Sainudiin, Extended
    Abstract, 2 pages, 2018 [(PDF
    104KB)](http://lamastex.org/preprints/20180507_ABTestingViaDistributedMRS.pdf).
-   which builds on: An auto-validating, trans-dimensional, universal
    rejection sampler for locally Lipschitz arithmetical expressions,
    Raazesh Sainudiin and Thomas York, [Reliable Computing, vol.18,
    pp.15-54,
    2013](http://interval.louisiana.edu/reliable-computing-journal/volume-18/reliable-computing-18-pp-015-054.pdf)
    ([preprint: PDF
    2612KB](http://lamastex.org/preprints/avs_rc_2013.pdf))

**See first about `PipedRDDs` excerpt from *Spark The Definitive Guide*
earlier.**

### Getting the executible `IsIt1or2Coins` into our Spark Cluster

**This has already been done in the project-shard. You need not do it
again for this executible!**

You need to upload the C++ executible `IsIt1or2Coins` from: -
https://github.com/lamastex/mrs2

Here, suppose you have an executible for linux x86 64 bit processor with
all dependencies pre-compiled into one executibe.

Say this executible is `IsIt10r2Coins`.

This executible comes from the following dockerised build:

-   https://github.com/lamastex/mrs2/tree/master/docker
-   by statically compiling inside the docerised environment for mrs2:
    -   https://github.com/lamastex/mrs2/tree/master/mrs-2.0/examples/MooreRejSam/IsIt1or2Coins

You can replace the executible with any other executible with
appropriate I/O to it.

Then you upload the executible to databricks' `FileStore`.

Just note the path to the file and DO NOT click `Create Table` or other
buttons!

![creenShotOfUploadingStaticExecutibleIsIt1or2CoinsViaFileStore](https://raw.githubusercontent.com/lamastex/scalable-data-science/master/images/2020/ScaDaMaLe/screenShotOfUploadingStaticExecutibleIsIt1or2CoinsViaFileStore.png)

In [None]:
ls "/FileStore/tables/IsIt1or2Coins"

  

[TABLE]

  

Now copy the file from `dbfs://FileStore` that you just uploaded into
the local file system of the Driver.

In [None]:
dbutils.fs.cp("dbfs:/FileStore/tables/IsIt1or2Coins", "file:/tmp/IsIt1or2Coins")

  

>     res44: Boolean = true

In [None]:
ls -al /tmp/IsIt1or2Coins

  

Note it is a big static executible with all dependencies inbuilt (it
uses GNU Scientific Library and a specialized C++ Library called C-XSC
or C Extended for Scientific Computing to do hard-ware optimized
rigorous numerical proofs using Interval-Extended Hessian
Differentiation Arithmetics over Rounding-Controlled Hardware-Specified
Machine Intervals).

Just note it is over 6.5MB. Also we need to change the permissions so it
is indeed executible.

In [None]:
chmod +x /tmp/IsIt1or2Coins

  

Usage instructions for IsIt1or2Coins
====================================

`./IsIt1or2Coins numboxes numiter seed numtosses1 heads1 numtosses2 heads2 logScale`
- numboxes = Number of boxes for Moore Rejection Sampling (Rigorous von
Neumann Rejection Sampler) - numiter = Number of samples drawn from
posterior distribution to estimate the model probabilities - seed = a
random number seed - numtosses1 = number of tosses for the first coin -
heads1 = number of heads shown up on the first coin - numtosses2 =
number of tosses for the second coin - heads2 = number of heads shown up
on the second coin - logscale = True/False as Int

Don't worry about the details of what the executible `IsIt1or2Coins` is
doing for now. Just realise that this executible takes some input on
command-line and gives some output.

Let's make sure the executible takes input and returns output string on
the Driver node.

In [None]:
/tmp/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1

In [None]:
# You can also do it like this

/dbfs/FileStore/tables/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1

  

>     theSeed: 234565432
>     N1: 1000
>     n1: 500
>     N2: 1200
>     n2: 600
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 100  rng_seed = 234565432
>       N1=number of first coin tosses          : 1000
>       n1=number of heads in first coin tosses : 500
>       N2=number of second coin tosses         : 1200
>       n2=number of heads in second coin tosses: 600
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor. 
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo 
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo 
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax 
>     in FirstBox, after updateUmax 
>     bottom of FirstBox. 
>     after FirstBox, before Refine 
>     Umax: -1.512624733218932E+003
>     UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003
>     f_scale: -1.973141751817741E+003  -1.973141751817741E+003
>     bottom of updateUmax 
>     in AdaptPartition after updateUmax2 
>     in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 48
>     #No. of Boxes with proposal mass function <= 1e-10 112
>     #No. of Boxes with proposal mass function >= 1e-6 776
>     #No. of Boxes with proposal mass function >= 1e-3 230
>     after Refine 
>     before Rej..SampleMany 
>     n_samples: 100
>     after Rej..SampleMany 
>     rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100
>     RSSampleMany, integral est: 1.2113e+193
>     RSSampleMany mean: 
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.970000000000000
>     Labelled Mean:
>        0.500284830473953
>
>     label: 1  proportion:    0.030000000000000
>     Labelled Mean:
>        0.510581317236150
>        0.489823533395186
>
>     n interval function calls: 1998
>     n real function calls: 2396703
>     # CPU Time (seconds). Partitioning: 0.010151  Sampling: 1.40342  Total: 1.41357
>     # CPU time (secods) per estimate: 0.0141357

  

Moving the executables to the worker nodes
------------------------------------------

To copy the executible from `dbfs` to the local drive of each executor
you can use the following helper function.

In [None]:
import scala.sys.process._
import scala.concurrent.duration._
// from Ivan Sadikov

def copyFile(): Unit = {
  "mkdir -p /tmp/executor/bin".!!
  "cp /dbfs/FileStore/tables/IsIt1or2Coins /tmp/executor/bin/".!!
}

sc.runOnEachExecutor(copyFile, new FiniteDuration(1, HOURS))

  

>     import scala.sys.process._
>     import scala.concurrent.duration._
>     copyFile: ()Unit
>     res0: scala.collection.Map[String,scala.util.Try[Unit]] = Map(80 -> Success(()), 77 -> Success(()))

  

Now, let us use piped RDDs via `bash` to execute the given command in
each partition as follows:

In [None]:
val input = Seq("/tmp/executor/bin/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1", "/tmp/executor/bin/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1")

val output = sc
  .parallelize(input)
  .repartition(2)
  .pipe("bash")
  .collect()

  

>     input: Seq[String] = List(/tmp/executor/bin/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1, /tmp/executor/bin/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1)
>     output: Array[String] = Array(theSeed: 234565432, N1: 1000, n1: 500, N2: 1200, n2: 600, UseLogPi: 1, "  n_boxes: 1000  n_samples: 100  rng_seed = 234565432", "  N1=number of first coin tosses          : 1000", "  n1=number of heads in first coin tosses : 500", "  N2=number of second coin tosses         : 1200", "  n2=number of heads in second coin tosses: 600", Ldomain.L: 0, Ldomain.L: 1, "end of FIsIt1or2Coins constructor. ", in FirstBox, before getBoxREInfo. k: 0, 0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 0 [1.000000000000000E-300,   1.000000000000000], in FirstBox, before getBoxREInfo. k: 1, 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000], Umax:    0.000000000000000, UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159, f_scale: -460.517018598809159  -460.517018598809159, "bottom of updateUmax ", "in FirstBox, after updateUmax ", "bottom of FirstBox. ", "after FirstBox, before Refine ", Umax: -1.512624733218932E+003, UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003, f_scale: -1.973141751817741E+003  -1.973141751817741E+003, "bottom of updateUmax ", "in AdaptPartition after updateUmax2 ", in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200, # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197, #Using log(pi)? 1, #No. of Boxes with proposal mass function <= 1e-16 48, #No. of Boxes with proposal mass function <= 1e-10 112, #No. of Boxes with proposal mass function >= 1e-6 776, #No. of Boxes with proposal mass function >= 1e-3 230, "after Refine ", "before Rej..SampleMany ", n_samples: 100, "after Rej..SampleMany ", rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100, RSSampleMany, integral est: 1.2113e+193, "RSSampleMany mean: ", "   Number of labels or topologies = 2", label: 0  proportion:    0.970000000000000, Labelled Mean:, "   0.500284830473953", "", label: 1  proportion:    0.030000000000000, Labelled Mean:, "   0.510581317236150", "   0.489823533395186", "", n interval function calls: 1998, n real function calls: 2396703, # CPU Time (seconds). Partitioning: 0.006855  Sampling: 0.714379  Total: 0.721234, # CPU time (secods) per estimate: 0.00721234, theSeed: 234565432, N1: 1000, n1: 500, N2: 1200, n2: 600, UseLogPi: 1, "  n_boxes: 1000  n_samples: 100  rng_seed = 234565432", "  N1=number of first coin tosses          : 1000", "  n1=number of heads in first coin tosses : 500", "  N2=number of second coin tosses         : 1200", "  n2=number of heads in second coin tosses: 600", Ldomain.L: 0, Ldomain.L: 1, "end of FIsIt1or2Coins constructor. ", in FirstBox, before getBoxREInfo. k: 0, 0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 0 [1.000000000000000E-300,   1.000000000000000], in FirstBox, before getBoxREInfo. k: 1, 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000], Umax:    0.000000000000000, UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159, f_scale: -460.517018598809159  -460.517018598809159, "bottom of updateUmax ", "in FirstBox, after updateUmax ", "bottom of FirstBox. ", "after FirstBox, before Refine ", Umax: -1.512624733218932E+003, UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003, f_scale: -1.973141751817741E+003  -1.973141751817741E+003, "bottom of updateUmax ", "in AdaptPartition after updateUmax2 ", in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200, # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197, #Using log(pi)? 1, #No. of Boxes with proposal mass function <= 1e-16 48, #No. of Boxes with proposal mass function <= 1e-10 112, #No. of Boxes with proposal mass function >= 1e-6 776, #No. of Boxes with proposal mass function >= 1e-3 230, "after Refine ", "before Rej..SampleMany ", n_samples: 100, "after Rej..SampleMany ", rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100, RSSampleMany, integral est: 1.2113e+193, "RSSampleMany mean: ", "   Number of labels or topologies = 2", label: 0  proportion:    0.970000000000000, Labelled Mean:, "   0.500284830473953", "", label: 1  proportion:    0.030000000000000, Labelled Mean:, "   0.510581317236150", "   0.489823533395186", "", n interval function calls: 1998, n real function calls: 2396703, # CPU Time (seconds). Partitioning: 0.005417  Sampling: 0.701866  Total: 0.707283, # CPU time (secods) per estimate: 0.00707283)

  

In fact, you can just use `DBFS FUSE` to run the commands without any
file copy in databricks-provisioned Spark clusters we are on here:

In [None]:
val isIt1or2StaticExecutible = "/dbfs/FileStore/tables/IsIt1or2Coins"
val same_input = Seq(s"$isIt1or2StaticExecutible 1000 100 234565432 1000 500 1200 600 1", 
                     s"$isIt1or2StaticExecutible 1000 100 234565432 1000 500 1200 600 1")

val same_output = sc
  .parallelize(same_input)
  .repartition(2)
  .pipe("bash")
  .collect()

  

>     isIt1or2StaticExecutible: String = /dbfs/FileStore/tables/IsIt1or2Coins
>     same_input: Seq[String] = List(/dbfs/FileStore/tables/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1, /dbfs/FileStore/tables/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1)
>     same_output: Array[String] = Array(theSeed: 234565432, N1: 1000, n1: 500, N2: 1200, n2: 600, UseLogPi: 1, "  n_boxes: 1000  n_samples: 100  rng_seed = 234565432", "  N1=number of first coin tosses          : 1000", "  n1=number of heads in first coin tosses : 500", "  N2=number of second coin tosses         : 1200", "  n2=number of heads in second coin tosses: 600", Ldomain.L: 0, Ldomain.L: 1, "end of FIsIt1or2Coins constructor. ", in FirstBox, before getBoxREInfo. k: 0, 0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 0 [1.000000000000000E-300,   1.000000000000000], in FirstBox, before getBoxREInfo. k: 1, 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000], Umax:    0.000000000000000, UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159, f_scale: -460.517018598809159  -460.517018598809159, "bottom of updateUmax ", "in FirstBox, after updateUmax ", "bottom of FirstBox. ", "after FirstBox, before Refine ", Umax: -1.512624733218932E+003, UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003, f_scale: -1.973141751817741E+003  -1.973141751817741E+003, "bottom of updateUmax ", "in AdaptPartition after updateUmax2 ", in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200, # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197, #Using log(pi)? 1, #No. of Boxes with proposal mass function <= 1e-16 48, #No. of Boxes with proposal mass function <= 1e-10 112, #No. of Boxes with proposal mass function >= 1e-6 776, #No. of Boxes with proposal mass function >= 1e-3 230, "after Refine ", "before Rej..SampleMany ", n_samples: 100, "after Rej..SampleMany ", rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100, RSSampleMany, integral est: 1.2113e+193, "RSSampleMany mean: ", "   Number of labels or topologies = 2", label: 0  proportion:    0.970000000000000, Labelled Mean:, "   0.500284830473953", "", label: 1  proportion:    0.030000000000000, Labelled Mean:, "   0.510581317236150", "   0.489823533395186", "", n interval function calls: 1998, n real function calls: 2396703, # CPU Time (seconds). Partitioning: 0.005987  Sampling: 0.768286  Total: 0.774273, # CPU time (secods) per estimate: 0.00774273, theSeed: 234565432, N1: 1000, n1: 500, N2: 1200, n2: 600, UseLogPi: 1, "  n_boxes: 1000  n_samples: 100  rng_seed = 234565432", "  N1=number of first coin tosses          : 1000", "  n1=number of heads in first coin tosses : 500", "  N2=number of second coin tosses         : 1200", "  n2=number of heads in second coin tosses: 600", Ldomain.L: 0, Ldomain.L: 1, "end of FIsIt1or2Coins constructor. ", in FirstBox, before getBoxREInfo. k: 0, 0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 0 [1.000000000000000E-300,   1.000000000000000], in FirstBox, before getBoxREInfo. k: 1, 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000], "in FirstBox, after getBoxREInfo ", 1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000], Umax:    0.000000000000000, UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159, f_scale: -460.517018598809159  -460.517018598809159, "bottom of updateUmax ", "in FirstBox, after updateUmax ", "bottom of FirstBox. ", "after FirstBox, before Refine ", Umax: -1.512624733218932E+003, UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003, f_scale: -1.973141751817741E+003  -1.973141751817741E+003, "bottom of updateUmax ", "in AdaptPartition after updateUmax2 ", in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200, # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197, #Using log(pi)? 1, #No. of Boxes with proposal mass function <= 1e-16 48, #No. of Boxes with proposal mass function <= 1e-10 112, #No. of Boxes with proposal mass function >= 1e-6 776, #No. of Boxes with proposal mass function >= 1e-3 230, "after Refine ", "before Rej..SampleMany ", n_samples: 100, "after Rej..SampleMany ", rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100, RSSampleMany, integral est: 1.2113e+193, "RSSampleMany mean: ", "   Number of labels or topologies = 2", label: 0  proportion:    0.970000000000000, Labelled Mean:, "   0.500284830473953", "", label: 1  proportion:    0.030000000000000, Labelled Mean:, "   0.510581317236150", "   0.489823533395186", "", n interval function calls: 1998, n real function calls: 2396703, # CPU Time (seconds). Partitioning: 0.006144  Sampling: 0.710526  Total: 0.71667, # CPU time (secods) per estimate: 0.0071667)

  

Thus by mixing several different executibles that are statically
compiled for linux 64 bit machine, we can mix and match multiple
executibles with appropriate inputs.

The resulting outputs can themselves be re-processed in Spark to feed
into toher pipedRDDs or normal RDDs or DataFrames and DataSets.

Finally, we can have more than one command per partition and then use
`mapPartitions` to send all the executible commands within the input
partition that is to be run by the executor in which that partition
resides as follows:

In [None]:
val isIt1or2StaticExecutible = "/dbfs/FileStore/tables/IsIt1or2Coins"

// let us make 2 commands in each of the 2 input partitions
val same_input_mp = Seq(s"$isIt1or2StaticExecutible 1000 100 234565432 1000 500 1200 600 1", 
                        s"$isIt1or2StaticExecutible 1000 100 123456789 1000 500 1200 600 1",
                        s"$isIt1or2StaticExecutible 1000 100 123456789 1000 500 1200 600 1",
                        s"$isIt1or2StaticExecutible 1000 100 234565432 1000 500 1200 600 1")

val same_output_mp = sc
  .parallelize(same_input)
  .repartition(2)
  .pipe("bash")
  .mapPartitions(x => Seq(x.mkString("\n")).iterator)
  .collect()

  

>     isIt1or2StaticExecutible: String = /dbfs/FileStore/tables/IsIt1or2Coins
>     same_input_mp: Seq[String] = List(/dbfs/FileStore/tables/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1, /dbfs/FileStore/tables/IsIt1or2Coins 1000 100 123456789 1000 500 1200 600 1, /dbfs/FileStore/tables/IsIt1or2Coins 1000 100 123456789 1000 500 1200 600 1, /dbfs/FileStore/tables/IsIt1or2Coins 1000 100 234565432 1000 500 1200 600 1)
>     same_output_mp: Array[String] =
>     Array(theSeed: 234565432
>     N1: 1000
>     n1: 500
>     N2: 1200
>     n2: 600
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 100  rng_seed = 234565432
>       N1=number of first coin tosses          : 1000
>       n1=number of heads in first coin tosses : 500
>       N2=number of second coin tosses         : 1200
>       n2=number of heads in second coin tosses: 600
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -1.512624733218932E+003
>     UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003
>     f_scale: -1.973141751817741E+003  -1.973141751817741E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 48
>     #No. of Boxes with proposal mass function <= 1e-10 112
>     #No. of Boxes with proposal mass function >= 1e-6 776
>     #No. of Boxes with proposal mass function >= 1e-3 230
>     after Refine
>     before Rej..SampleMany
>     n_samples: 100
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100
>     RSSampleMany, integral est: 1.2113e+193
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.970000000000000
>     Labelled Mean:
>        0.500284830473953
>
>     label: 1  proportion:    0.030000000000000
>     Labelled Mean:
>        0.510581317236150
>        0.489823533395186
>
>     n interval function calls: 1998
>     n real function calls: 2396703
>     # CPU Time (seconds). Partitioning: 0.005937  Sampling: 0.713929  Total: 0.719866
>     # CPU time (secods) per estimate: 0.00719866, theSeed: 234565432
>     N1: 1000
>     n1: 500
>     N2: 1200
>     n2: 600
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 100  rng_seed = 234565432
>       N1=number of first coin tosses          : 1000
>       n1=number of heads in first coin tosses : 500
>       N2=number of second coin tosses         : 1200
>       n2=number of heads in second coin tosses: 600
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.519706161376072E+006,   0.000000000000000] BoxIntegral: [-1.519706161376072E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -1.512624733218932E+003
>     UmaxMAX, Umax, f_scale_local: 1e+200 -1.512624733218932E+003 -1.973141751817741E+003
>     f_scale: -1.973141751817741E+003  -1.973141751817741E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 4.88326e-07 IL, IU: 1.41768e+191   2.90314e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 48
>     #No. of Boxes with proposal mass function <= 1e-10 112
>     #No. of Boxes with proposal mass function >= 1e-6 776
>     #No. of Boxes with proposal mass function >= 1e-3 230
>     after Refine
>     before Rej..SampleMany
>     n_samples: 100
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 2.903136091244425E+197 100 100
>     RSSampleMany, integral est: 1.2113e+193
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.970000000000000
>     Labelled Mean:
>        0.500284830473953
>
>     label: 1  proportion:    0.030000000000000
>     Labelled Mean:
>        0.510581317236150
>        0.489823533395186
>
>     n interval function calls: 1998
>     n real function calls: 2396703
>     # CPU Time (seconds). Partitioning: 0.006418  Sampling: 0.707381  Total: 0.713799
>     # CPU time (secods) per estimate: 0.00713799)

  

allCatch is a useful tool to use as a filtering function when testing if
a command will work without error.

In [None]:
import scala.util.control.Exception.allCatch
(allCatch opt " 12 ".trim.toLong).isDefined

  

>     import scala.util.control.Exception.allCatch
>     res1: Boolean = true

  

The following should only be done after you have been introduced to
Notebook: `033_OBO_PipedRDD_RigorousBayesianABTesting` and the Old baile
Online Data.

**TODO**: The below needs redo with DBFS FUSE.

Parsing the output from `IsIt1or2Coins`
=======================================

In [None]:
/**
 * Returns the label proportions from the output of IsIt1or2Coins
 *
 * This function takes an array of Strings, where each element
 * contains the whole output of one execution of IsIt1or2Coins.
 * It returns an Array[Array[Double]], where the first index denotes which
 * execution it belonged to, the second index is whether it is label0 or label1
 * i.e. it is a numExec x 2 array.
 */

def getLabelProps(input : Array[String]):Array[Array[Double]] = {
  input
  .map(out => out
       .split("\n")
       .filter(line => line.contains("label:"))
       .map(filtLine => filtLine
            .split(" ")
            .filter(line => (allCatch opt line.trim.toDouble).isDefined)
            .map(filtFiltLine => filtFiltLine.toDouble)))
  .map(trial => trial.map(labels => labels(1)))
}

/**
 * Returns the label means from the output of IsIt1or2Coins
 *
 * This function takes an array of Strings, where each element
 * contains the whole output of one execution of IsIt1or2Coins.
 * It returns an Array[Array[Double]], where the first index denotes which
 * execution it belonged to, the second index is whether it is label0 Mean 
 * or label1 Mean1 or label1 Mean2, i.e. it is a numExec x 3 array.
 */

def getLabelMeans(input : Array[String]):Array[Array[Double]] = {
  val output_pre = input
  .map(out => out.split("\n")
       .filter(line => (allCatch opt line.trim.toDouble).isDefined))
  .map(arr => arr.map(num => num.toDouble))
  // Some runs have such a low probability for a label that some end up being only 2 in length instead of three
  // That means we should pad with a 0 to fix this
  output_pre.map(trial => if (trial.length == 2) Array(0,trial(0),trial(1)) else trial)
}

  

>     getLabelProps: (input: Array[String])Array[Array[Double]]
>     getLabelMeans: (input: Array[String])Array[Array[Double]]

  

Providing case classes for input and output for easy spark communication
========================================================================

In [None]:
import org.apache.spark.sql.DataFrame

case class OutputRow(ID:Long, 
                     NumTosses1:Int, NumHeads1:Int,
                     NumTosses2:Int, NumHeads2:Int,
                     Label0Prob:Double,
                     Label1Prob:Double,
                     Label0Mean0:Double, Label0Mean1:Double, 
                     Label1Mean0:Double, Label1Mean1:Double)

case class InputOpts(ID:Long,NumBoxes:Int, NumIter:Int, Seed:Long, 
                     NumTosses1:Int, NumHeads1:Int,
                     NumTosses2:Int, NumHeads2:Int, 
                     LogScaling:Int) {
  def toExecutableString:String = {
    "/tmp/IsIt1or2Coins "+Array(NumBoxes, NumIter, Seed, NumTosses1, NumHeads1, NumTosses2, NumHeads2, LogScaling).mkString(" ")
  }
}

/**
 * Returns the result of running all trials in the array of InputOpts
 *
 * This function takes an Array[InputOpts], creates executable strings
 * via suckAs and runs all in a pipedRDD, after it creates this it will parse the output
 * and assemble it into the case class OutputRow, i.e. it returns an Array of OutputRow
 */

def execute(trials : Array[InputOpts]) = {
  sc.parallelize(trials.map(trial => trial.toExecutableString))
    .repartition(trials.length)
    .pipe("/tmp/suckAs.sh")
    .mapPartitions(x => Seq(x.mkString("\n").split("theSeed")).iterator) // We know that theSeed is included once in every output
    .collect
    .flatMap(x => x.filter(y => y.length > 0)) // Since each collection of outputs are split at theSeed we need to remove all empty strings and flatmap
}

def parseOutput(trials: Array[InputOpts], res:Array[String]) = {
  val labProp = getLabelProps(res)
  val labMean = getLabelMeans(res)
  (trials zip labProp zip labMean)
  .map(trial => (trial._1._1,trial._1._2,trial._2))
  .map(trial => OutputRow(trial._1.ID,
                          trial._1.NumTosses1,trial._1.NumHeads1,
                          trial._1.NumTosses2,trial._1.NumHeads2,
                          trial._2(0),trial._2(1),
                          trial._3(0),trial._3(0),
                          trial._3(1),trial._3(2)
                         ))
}

/* Returns a DataFrame of the Array[OutputRow] for ease of displaying in Databricks */

def resultAsDF(result:Array[OutputRow]):DataFrame = {
  sc.parallelize(result).toDF
}

  

>     import org.apache.spark.sql.DataFrame
>     defined class OutputRow
>     defined class InputOpts
>     execute: (trials: Array[InputOpts])Array[String]
>     parseOutput: (trials: Array[InputOpts], res: Array[String])Array[OutputRow]
>     resultAsDF: (result: Array[OutputRow])org.apache.spark.sql.DataFrame

In [None]:
val inputOpts = Array(InputOpts(1680,1000,1000,100,792,245,151,63,1), InputOpts(1690,1000,1000,100,1805,526,215,68,1), InputOpts(1700,1000,1000,100,1060,242,57,26,1),InputOpts(1710,1000,1000,100,1060,243,57,26,1),InputOpts(1720,1000,1000,100,1060,245,57,26,1),InputOpts(1730,1000,1000,100,1060,245,57,26,1))
val res = execute(inputOpts)

  

>     inputOpts: Array[InputOpts] = Array(InputOpts(1680,1000,1000,100,792,245,151,63,1), InputOpts(1690,1000,1000,100,1805,526,215,68,1), InputOpts(1700,1000,1000,100,1060,242,57,26,1), InputOpts(1710,1000,1000,100,1060,243,57,26,1), InputOpts(1720,1000,1000,100,1060,245,57,26,1), InputOpts(1730,1000,1000,100,1060,245,57,26,1))
>     res: Array[String] =
>     Array(: 100
>     N1: 1060
>     n1: 242
>     N2: 57
>     n2: 26
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 1000  rng_seed = 100
>       N1=number of first coin tosses          : 1060
>       n1=number of heads in first coin tosses : 242
>       N2=number of second coin tosses         : 57
>       n2=number of heads in second coin tosses: 26
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623054E+005,   0.000000000000000] BoxIntegral: [-7.715962646623054E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623055E+005,   0.000000000000000] BoxIntegral: [-7.715962646623055E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -601.931931469341521
>     Umax: -601.245170692618672
>     UmaxMAX, Umax, f_scale_local: 1e+200 -601.245170692618672 -1.061762189291428E+003
>     f_scale: -1.061762189291428E+003  -1.061762189291428E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU: 1.593739709448194E-069 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 2.24697e-06 IL, IU: 2.03174e+192   9.04217e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 39
>     #No. of Boxes with proposal mass function <= 1e-10 81
>     #No. of Boxes with proposal mass function >= 1e-6 821
>     #No. of Boxes with proposal mass function >= 1e-3 259
>     after Refine
>     before Rej..SampleMany
>     n_samples: 1000
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 9.042166864781814E+197 1000 1000
>     RSSampleMany, integral est: 2.92167e+194
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.007000000000000
>     Labelled Mean:
>        0.243096412677395
>
>     label: 1  proportion:    0.993000000000000
>     Labelled Mean:
>        0.228657602881767
>        0.459139673624810
>
>     n interval function calls: 1998
>     n real function calls: 3094861
>     # CPU Time (seconds). Partitioning: 0.009163  Sampling: 1.83352  Total: 1.84268
>     # CPU time (secods) per estimate: 0.00184268, : 100
>     N1: 792
>     n1: 245
>     N2: 151
>     n2: 63
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 1000  rng_seed = 100
>       N1=number of first coin tosses          : 792
>       n1=number of heads in first coin tosses : 245
>       N2=number of second coin tosses         : 151
>       n2=number of heads in second coin tosses: 63
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-6.514013228080160E+005,   0.000000000000000] BoxIntegral: [-6.514013228080160E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-6.514013228080161E+005,   0.000000000000000] BoxIntegral: [-6.514013228080161E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -588.153488415822835
>     Umax: -587.466809015800436
>     UmaxMAX, Umax, f_scale_local: 1e+200 -587.466809015800436 -1.047983827614610E+003
>     f_scale: -1.047983827614610E+003  -1.047983827614610E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU: 4.406167488191346E-062 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 6.09526e-05 IL, IU: 5.73838e+193   9.41449e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 23
>     #No. of Boxes with proposal mass function <= 1e-10 49
>     #No. of Boxes with proposal mass function >= 1e-6 882
>     #No. of Boxes with proposal mass function >= 1e-3 311
>     after Refine
>     before Rej..SampleMany
>     n_samples: 1000
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 9.414489592082716E+197 1000 1000
>     RSSampleMany, integral est: 3.52469e+195
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.256000000000000
>     Labelled Mean:
>        0.325896565213146
>
>     label: 1  proportion:    0.744000000000000
>     Labelled Mean:
>        0.309291795826172
>        0.418045580655123
>
>     n interval function calls: 1998
>     n real function calls: 267091
>     # CPU Time (seconds). Partitioning: 0.008357  Sampling: 0.157338  Total: 0.165695
>     # CPU time (secods) per estimate: 0.000165695, : 100
>     N1: 1060
>     n1: 245
>     N2: 57
>     n2: 26
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 1000  rng_seed = 100
>       N1=number of first coin tosses          : 1060
>       n1=number of heads in first coin tosses : 245
>       N2=number of second coin tosses         : 57
>       n2=number of heads in second coin tosses: 26
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623054E+005,   0.000000000000000] BoxIntegral: [-7.715962646623054E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623055E+005,   0.000000000000000] BoxIntegral: [-7.715962646623055E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -604.763616530585409
>     UmaxMAX, Umax, f_scale_local: 1e+200 -604.763616530585409 -1.065280635129395E+003
>     f_scale: -1.065280635129395E+003  -1.065280635129395E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU: 4.160085178892135E-071 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 2.22862e-06 IL, IU: 1.78864e+192   8.02577e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 33
>     #No. of Boxes with proposal mass function <= 1e-10 73
>     #No. of Boxes with proposal mass function >= 1e-6 812
>     #No. of Boxes with proposal mass function >= 1e-3 246
>     after Refine
>     before Rej..SampleMany
>     n_samples: 1000
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 8.025765897720203E+197 1000 1000
>     RSSampleMany, integral est: 2.5871e+194
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.006000000000000
>     Labelled Mean:
>        0.241308465707637
>
>     label: 1  proportion:    0.994000000000000
>     Labelled Mean:
>        0.231566745943815
>        0.460418597148691
>
>     n interval function calls: 1998
>     n real function calls: 3102223
>     # CPU Time (seconds). Partitioning: 0.009115  Sampling: 1.82417  Total: 1.83329
>     # CPU time (secods) per estimate: 0.00183329, : 100
>     N1: 1060
>     n1: 245
>     N2: 57
>     n2: 26
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 1000  rng_seed = 100
>       N1=number of first coin tosses          : 1060
>       n1=number of heads in first coin tosses : 245
>       N2=number of second coin tosses         : 57
>       n2=number of heads in second coin tosses: 26
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623054E+005,   0.000000000000000] BoxIntegral: [-7.715962646623054E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623055E+005,   0.000000000000000] BoxIntegral: [-7.715962646623055E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -604.763616530585409
>     UmaxMAX, Umax, f_scale_local: 1e+200 -604.763616530585409 -1.065280635129395E+003
>     f_scale: -1.065280635129395E+003  -1.065280635129395E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU: 4.160085178892135E-071 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 2.22862e-06 IL, IU: 1.78864e+192   8.02577e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 33
>     #No. of Boxes with proposal mass function <= 1e-10 73
>     #No. of Boxes with proposal mass function >= 1e-6 812
>     #No. of Boxes with proposal mass function >= 1e-3 246
>     after Refine
>     before Rej..SampleMany
>     n_samples: 1000
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 8.025765897720203E+197 1000 1000
>     RSSampleMany, integral est: 2.5871e+194
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.006000000000000
>     Labelled Mean:
>        0.241308465707637
>
>     label: 1  proportion:    0.994000000000000
>     Labelled Mean:
>        0.231566745943815
>        0.460418597148691
>
>     n interval function calls: 1998
>     n real function calls: 3102223
>     # CPU Time (seconds). Partitioning: 0.009092  Sampling: 1.83923  Total: 1.84832
>     # CPU time (secods) per estimate: 0.00184832, : 100
>     N1: 1060
>     n1: 243
>     N2: 57
>     n2: 26
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 1000  rng_seed = 100
>       N1=number of first coin tosses          : 1060
>       n1=number of heads in first coin tosses : 243
>       N2=number of second coin tosses         : 57
>       n2=number of heads in second coin tosses: 26
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623054E+005,   0.000000000000000] BoxIntegral: [-7.715962646623054E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-7.715962646623055E+005,   0.000000000000000] BoxIntegral: [-7.715962646623055E+005,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -602.404311091615568
>     UmaxMAX, Umax, f_scale_local: 1e+200 -602.404311091615568 -1.062921329690425E+003
>     f_scale: -1.062921329690425E+003  -1.062921329690425E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU: 4.633751408216663E-070 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 2.20682e-06 IL, IU: 1.88224e+192   8.52922e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 38
>     #No. of Boxes with proposal mass function <= 1e-10 81
>     #No. of Boxes with proposal mass function >= 1e-6 810
>     #No. of Boxes with proposal mass function >= 1e-3 257
>     after Refine
>     before Rej..SampleMany
>     n_samples: 1000
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 8.529216274485751E+197 1000 1000
>     RSSampleMany, integral est: 2.7859e+194
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.008000000000000
>     Labelled Mean:
>        0.237163228187683
>
>     label: 1  proportion:    0.992000000000000
>     Labelled Mean:
>        0.229340517844150
>        0.459908808479261
>
>     n interval function calls: 1998
>     n real function calls: 3061553
>     # CPU Time (seconds). Partitioning: 0.009239  Sampling: 1.80917  Total: 1.81841
>     # CPU time (secods) per estimate: 0.00181841, : 100
>     N1: 1805
>     n1: 526
>     N2: 215
>     n2: 68
>     UseLogPi: 1
>       n_boxes: 1000  n_samples: 1000  rng_seed = 100
>       N1=number of first coin tosses          : 1805
>       n1=number of heads in first coin tosses : 526
>       N2=number of second coin tosses         : 215
>       n2=number of heads in second coin tosses: 68
>     Ldomain.L: 0
>     Ldomain.L: 1
>     end of FIsIt1or2Coins constructor.
>     in FirstBox, before getBoxREInfo. k: 0
>     0 [1.000000000000000E-300,   1.000000000000000] RE: [-1.395366566354393E+006,   0.000000000000000] BoxIntegral: [-1.395366566354393E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     0 [1.000000000000000E-300,   1.000000000000000]
>     in FirstBox, before getBoxREInfo. k: 1
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000] RE: [-1.395366566354393E+006,   0.000000000000000] BoxIntegral: [-1.395366566354393E+006,   0.000000000000000]
>     in FirstBox, after getBoxREInfo
>     1 [1.000000000000000E-300,   1.000000000000000] [1.000000000000000E-300,   1.000000000000000]
>     Umax:    0.000000000000000
>     UmaxMAX, Umax, f_scale_local: 1e+200    0.000000000000000 -460.517018598809159
>     f_scale: -460.517018598809159  -460.517018598809159
>     bottom of updateUmax
>     in FirstBox, after updateUmax
>     bottom of FirstBox.
>     after FirstBox, before Refine
>     Umax: -1.213363139695285E+003
>     Umax: -1.212738223821990E+003
>     UmaxMAX, Umax, f_scale_local: 1e+200 -1.212738223821990E+003 -1.673255242420799E+003
>     f_scale: -1.673255242420799E+003  -1.673255242420799E+003
>     bottom of updateUmax
>     in AdaptPartition after updateUmax2
>     in updateIntegral. IL, IU:    0.000000000000000 1.000000000000362E+200
>     # Adaptive partitioning complete. Boxes: 1000  Lower bound on Acceptance Prob.: 1.75678e-06 IL, IU: 8.14842e+191   4.63827e+197
>     #Using log(pi)? 1
>     #No. of Boxes with proposal mass function <= 1e-16 53
>     #No. of Boxes with proposal mass function <= 1e-10 109
>     #No. of Boxes with proposal mass function >= 1e-6 743
>     #No. of Boxes with proposal mass function >= 1e-3 170
>     after Refine
>     before Rej..SampleMany
>     n_samples: 1000
>     after Rej..SampleMany
>     rs_sample IU, N, Nrs: 4.638268528941289E+197 1000 1000
>     RSSampleMany, integral est: 5.14926e+193
>     RSSampleMany mean:
>        Number of labels or topologies = 2
>     label: 0  proportion:    0.905000000000000
>     Labelled Mean:
>        0.294005930848202
>
>     label: 1  proportion:    0.095000000000000
>     Labelled Mean:
>        0.292069655006918
>        0.316300264841744
>
>     n interval function calls: 1998
>     n real function calls: 9007616
>     # CPU Time (seconds). Partitioning: 0.009024  Sampling: 2.80121  Total: 2.81024
>     # CPU time (secods) per estimate: 0.00281024)

In [None]:
display(resultAsDF(parseOutput(inputOpts,res)))

  

[TABLE]