# Recursively defined probability distributions

We consider the probability distributions generated from some set using some functions and operations with weights, with identity being one of the operations. This is used in dynamics (including learning) and also for 

A simple example is the geometic distribution, with a random variable X being:

* X = 0 with probability p.
* X = (new X) + 1 with probability (1 - p)

Such distributions are infinite. There are two traits that are used to handle these.

* Probability Distributions: allow one to pick a random value.
* Truncated Distributions: roughly give finite distributions of values with probability above a cutoff. However, in the case of sums each term is truncated at the cutoff, so sums crossing the cutoff with small components are ignored.


In [2]:
load.jar("/home/gadgil/code/ProvingGround/core/.jvm/target/scala-2.11/ProvingGround-Core-assembly-0.8.jar")



In [3]:
import provingground.{ProbabilityDistribution => PD, FiniteDistribution => FD, TruncatedDistribution => TD, _}

[32mimport [36mprovingground.{ProbabilityDistribution => PD, FiniteDistribution => FD, TruncatedDistribution => TD, _}[0m

## Probability distribution for Geometric:

We recursively define the probability distribution by mixing in cases, defaulting to the first one.

In [4]:
val geom : PD[Int] = FD.unif(0) <+> ((geom map ((x) => x + 1)), 0.7)

[36mgeom[0m: [32mProbabilityDistribution[0m[[32mInt[0m] = provingground.ProbabilityDistribution$Mixin@5cbd7f5c

We can sample from probability distributions recursively.

In [5]:
val samp = geom.sample(100)

[36msamp[0m: [32mFiniteDistribution[0m[[32mInt[0m] = [33mFiniteDistribution[0m(
  [33mVector[0m(
    [33mWeighted[0m([32m0[0m, [32m0.23000000000000007[0m),
    [33mWeighted[0m([32m5[0m, [32m0.03[0m),
    [33mWeighted[0m([32m1[0m, [32m0.2800000000000001[0m),
    [33mWeighted[0m([32m6[0m, [32m0.03[0m),
    [33mWeighted[0m([32m9[0m, [32m0.02[0m),
    [33mWeighted[0m([32m2[0m, [32m0.16[0m),
    [33mWeighted[0m([32m7[0m, [32m0.04[0m),
    [33mWeighted[0m([32m3[0m, [32m0.07[0m),
    [33mWeighted[0m([32m11[0m, [32m0.01[0m),
    [33mWeighted[0m([32m8[0m, [32m0.04[0m),
    [33mWeighted[0m([32m4[0m, [32m0.09[0m)
  )
)

In [6]:
samp.entropyView

[36mres5[0m: [32mList[0m[[32mWeighted[0m[[32mString[0m]] = [33mList[0m(
  [33mWeighted[0m([32m"1"[0m, [32m1.8365012677171202[0m),
  [33mWeighted[0m([32m"0"[0m, [32m2.1202942337177118[0m),
  [33mWeighted[0m([32m"2"[0m, [32m2.643856189774725[0m),
  [33mWeighted[0m([32m"4"[0m, [32m3.4739311883324127[0m),
  [33mWeighted[0m([32m"3"[0m, [32m3.8365012677171206[0m),
  [33mWeighted[0m([32m"7"[0m, [32m4.643856189774724[0m),
  [33mWeighted[0m([32m"8"[0m, [32m4.643856189774724[0m),
  [33mWeighted[0m([32m"5"[0m, [32m5.058893689053569[0m),
  [33mWeighted[0m([32m"6"[0m, [32m5.058893689053569[0m),
  [33mWeighted[0m([32m"9"[0m, [32m5.643856189774724[0m),
  [33mWeighted[0m([32m"11"[0m, [32m6.643856189774724[0m)
)

In [7]:
val s = geom.sample(1000)

[36ms[0m: [32mFiniteDistribution[0m[[32mInt[0m] = [33mFiniteDistribution[0m(
  [33mVector[0m(
    [33mWeighted[0m([32m0[0m, [32m0.2770000000000002[0m),
    [33mWeighted[0m([32m5[0m, [32m0.059000000000000045[0m),
    [33mWeighted[0m([32m10[0m, [32m0.005[0m),
    [33mWeighted[0m([32m14[0m, [32m0.002[0m),
    [33mWeighted[0m([32m1[0m, [32m0.23100000000000018[0m),
    [33mWeighted[0m([32m6[0m, [32m0.03200000000000002[0m),
    [33mWeighted[0m([32m9[0m, [32m0.007[0m),
    [33mWeighted[0m([32m13[0m, [32m0.005[0m),
    [33mWeighted[0m([32m2[0m, [32m0.16900000000000012[0m),
    [33mWeighted[0m([32m17[0m, [32m0.001[0m),
    [33mWeighted[0m([32m12[0m, [32m0.002[0m),
    [33mWeighted[0m([32m7[0m, [32m0.03200000000000002[0m),
    [33mWeighted[0m([32m3[0m, [32m0.09000000000000007[0m),
[33m...[0m

In [8]:
s.entropyView

[36mres7[0m: [32mList[0m[[32mWeighted[0m[[32mString[0m]] = [33mList[0m(
  [33mWeighted[0m([32m"0"[0m, [32m1.8520421186128977[0m),
  [33mWeighted[0m([32m"1"[0m, [32m2.1140352432460285[0m),
  [33mWeighted[0m([32m"2"[0m, [32m2.5649048483799017[0m),
  [33mWeighted[0m([32m"3"[0m, [32m3.4739311883324113[0m),
  [33mWeighted[0m([32m"4"[0m, [32m3.9213901653036327[0m),
  [33mWeighted[0m([32m"5"[0m, [32m4.0831412353002445[0m),
  [33mWeighted[0m([32m"6"[0m, [32m4.965784284662086[0m),
  [33mWeighted[0m([32m"7"[0m, [32m4.965784284662086[0m),
  [33mWeighted[0m([32m"8"[0m, [32m5.878321443411747[0m),
  [33mWeighted[0m([32m"9"[0m, [32m7.158429362604482[0m),
  [33mWeighted[0m([32m"10"[0m, [32m7.643856189774724[0m),
  [33mWeighted[0m([32m"13"[0m, [32m7.643856189774724[0m),
  [33mWeighted[0m([32m"11"[0m, [32m8.380821783940931[0m),
  [33mWeighted[0m([32m"14"[0m, [32m8.965784284662087[0m),
[33m...[0m

## Optional recursion

We can also mix in an optional probability distribution, i.e., one that takes optional values. Any `None` generated is ignored and one defaults to the previous distribution. We do this for the geometric distribution.

In [9]:
val rand = scala.util.Random
def nextOpt(x: Int) = if (rand.nextDouble < 7.0/8) Some(x + 1) else None

[36mrand[0m: [32mutil[0m.[32mRandom[0m.type = scala.util.Random$@3573063d
defined [32mfunction [36mnextOpt[0m

In [10]:
val geomOpt: PD[Int] = FD.unif(0) <+?> (geomOpt map (nextOpt), 0.8)

[36mgeomOpt[0m: [32mProbabilityDistribution[0m[[32mInt[0m] = provingground.ProbabilityDistribution$MixinOpt@7b506629

In [11]:
geomOpt.sample(10)

[36mres10[0m: [32mFiniteDistribution[0m[[32mInt[0m] = [33mFiniteDistribution[0m(
  [33mVector[0m(
    [33mWeighted[0m([32m7[0m, [32m0.1[0m),
    [33mWeighted[0m([32m1[0m, [32m0.2[0m),
    [33mWeighted[0m([32m3[0m, [32m0.1[0m),
    [33mWeighted[0m([32m0[0m, [32m0.6[0m)
  )
)

In [12]:
val samp = geomOpt.sample(1000)

[36msamp[0m: [32mFiniteDistribution[0m[[32mInt[0m] = [33mFiniteDistribution[0m(
  [33mVector[0m(
    [33mWeighted[0m([32m0[0m, [32m0.3080000000000002[0m),
    [33mWeighted[0m([32m5[0m, [32m0.05300000000000004[0m),
    [33mWeighted[0m([32m10[0m, [32m0.008[0m),
    [33mWeighted[0m([32m14[0m, [32m0.002[0m),
    [33mWeighted[0m([32m1[0m, [32m0.20600000000000016[0m),
    [33mWeighted[0m([32m6[0m, [32m0.03900000000000003[0m),
    [33mWeighted[0m([32m9[0m, [32m0.015000000000000006[0m),
    [33mWeighted[0m([32m13[0m, [32m0.006[0m),
    [33mWeighted[0m([32m2[0m, [32m0.1360000000000001[0m),
    [33mWeighted[0m([32m17[0m, [32m0.001[0m),
    [33mWeighted[0m([32m12[0m, [32m0.005[0m),
    [33mWeighted[0m([32m7[0m, [32m0.022000000000000013[0m),
    [33mWeighted[0m([32m3[0m, [32m0.09200000000000007[0m),
[33m...[0m

In [13]:
samp.entropyView

[36mres12[0m: [32mList[0m[[32mWeighted[0m[[32mString[0m]] = [33mList[0m(
  [33mWeighted[0m([32m"0"[0m, [32m1.6989977439671848[0m),
  [33mWeighted[0m([32m"1"[0m, [32m2.2792837574788676[0m),
  [33mWeighted[0m([32m"2"[0m, [32m2.878321443411747[0m),
  [33mWeighted[0m([32m"3"[0m, [32m3.442222328605073[0m),
  [33mWeighted[0m([32m"4"[0m, [32m3.7563309190331364[0m),
  [33mWeighted[0m([32m"5"[0m, [32m4.2378638300988865[0m),
  [33mWeighted[0m([32m"6"[0m, [32m4.680382065799838[0m),
  [33mWeighted[0m([32m"7"[0m, [32m5.50635266602479[0m),
  [33mWeighted[0m([32m"8"[0m, [32m5.878321443411747[0m),
  [33mWeighted[0m([32m"9"[0m, [32m6.058893689053567[0m),
  [33mWeighted[0m([32m"11"[0m, [32m6.058893689053567[0m),
  [33mWeighted[0m([32m"10"[0m, [32m6.965784284662088[0m),
  [33mWeighted[0m([32m"13"[0m, [32m7.380821783940932[0m),
  [33mWeighted[0m([32m"12"[0m, [32m7.643856189774724[0m),
[33m...[0m

## Truncated distrbitions : Geometric

These are cut-off at some point, but combinations effectively are cut off at lower values. We define the geometric distribution this way.

In [14]:
val gt: TD[Int] = (TD.OptAtom(Some(0)) <*> 0.3)<+> ((gt map ((x: Int) => x + 1)) <*> 0.7)

[36mgt[0m: [32mTruncatedDistribution[0m[[32mInt[0m] = provingground.TruncatedDistribution$Sum@525dbe59

In [15]:
gt.getFD(1.0/10)

[36mres14[0m: [32mOption[0m[[32mFiniteDistribution[0m[[32mInt[0m]] = Some([0 : 0.3, 1 : 0.21, 2 : 0.147, 3 : 0.10289999999999999])

In [16]:
val fd = gt.getFD(1.0/1000).get

[36mfd[0m: [32mFiniteDistribution[0m[[32mInt[0m] = [33mFiniteDistribution[0m(
  [33mVector[0m(
    [33mWeighted[0m([32m0[0m, [32m0.3[0m),
    [33mWeighted[0m([32m5[0m, [32m0.05042099999999999[0m),
    [33mWeighted[0m([32m10[0m, [32m0.008474257469999996[0m),
    [33mWeighted[0m([32m14[0m, [32m0.0020346692185469984[0m),
    [33mWeighted[0m([32m1[0m, [32m0.21[0m),
    [33mWeighted[0m([32m6[0m, [32m0.03529469999999999[0m),
    [33mWeighted[0m([32m9[0m, [32m0.012106082099999995[0m),
    [33mWeighted[0m([32m13[0m, [32m0.002906670312209998[0m),
    [33mWeighted[0m([32m2[0m, [32m0.147[0m),
    [33mWeighted[0m([32m12[0m, [32m0.004152386160299997[0m),
    [33mWeighted[0m([32m7[0m, [32m0.024706289999999992[0m),
    [33mWeighted[0m([32m3[0m, [32m0.10289999999999999[0m),
    [33mWeighted[0m([32m11[0m, [32m0.005931980228999997[0m),
[33m...[0m

In [17]:
fd.entropyView

[36mres16[0m: [32mList[0m[[32mWeighted[0m[[32mString[0m]] = [33mList[0m(
  [33mWeighted[0m([32m"0"[0m, [32m1.7369655941662063[0m),
  [33mWeighted[0m([32m"1"[0m, [32m2.2515387669959646[0m),
  [33mWeighted[0m([32m"2"[0m, [32m2.766111939825723[0m),
  [33mWeighted[0m([32m"3"[0m, [32m3.280685112655481[0m),
  [33mWeighted[0m([32m"4"[0m, [32m3.7952582854852395[0m),
  [33mWeighted[0m([32m"5"[0m, [32m4.3098314583149975[0m),
  [33mWeighted[0m([32m"6"[0m, [32m4.824404631144756[0m),
  [33mWeighted[0m([32m"7"[0m, [32m5.338977803974514[0m),
  [33mWeighted[0m([32m"8"[0m, [32m5.853550976804273[0m),
  [33mWeighted[0m([32m"9"[0m, [32m6.368124149634031[0m),
  [33mWeighted[0m([32m"10"[0m, [32m6.882697322463789[0m),
  [33mWeighted[0m([32m"11"[0m, [32m7.397270495293548[0m),
  [33mWeighted[0m([32m"12"[0m, [32m7.911843668123306[0m),
  [33mWeighted[0m([32m"13"[0m, [32m8.426416840953065[0m),
[33m...[0m

In [18]:
fd.supp.size

[36mres17[0m: [32mInt[0m = [32m16[0m