# Bayes, Bootstrap, MLE & All That
-----

In this post I want to go back to the basics of statistics, but from an advanced point of view both from a theoretical point of view and technical point of view. The point is go back to the basics of estimating a single parameter value and then quantifying the uncertantity in the estimate using variou methods. In general I will take three approaches to this two of which are [frequentist](https://en.wikipedia.org/wiki/Frequentist_inference) and one that is [Bayesian](https://en.wikipedia.org/wiki/Bayesian_statistics). I admit, Im not as familar with Bayesian methods and therefore sticking to a simple example of estimating a single parameter. I don't believe one approach to statistics is inherently better than the other, but have found so-called frequentist for me to be easier to understand, easier to implement and additionally found Fisherian approaches satisfying from a theoretical point of view.

The title for this post is inspired by [Div, Grad, Curl & All That](https://www.google.com/books/edition/_/sembQgAACAAJ?hl=en) which I used as a undergraduate to help learn vector calculus.


## 1. Using Poisson Distribution Written In Scala with Py4J
---------------

The first thing we need is data and for that purposes I wrote a [Poisson distribution in Scala](https://github.com/mdh266/PoissonDistributionInScala). The Poisson distribution is a probability distribution for a random variable $y$ that represents some count phenomena, i.e. a number of non-zero integer occurences in some fixed time frame.  For example the number of trains passing through a station per day or the number of customers that visit a website per hour are governed by Poisson distribution. The disibution is,

$$ p(y \, = \, k)  \; = \; \frac{\lambda^{k} e^{-\lambda} }{k!} $$

The parameter $\lambda$ is the rate variable, i.e. the true number of customers 

But why did I write my distribution in Scala? Well, I like Scala and enjoyed the challenge of writing a Poisson distribution using a functional approach. I also wanted to learn more about how to use [Py4J](https://www.py4j.org/) which can be used to work with functions and objects in the JVM from Python. [Apache Spark](https://spark.apache.org/) actually uses Py4J in PySpark to write Python wrappers for Scala code. I've used both PySpark and Spark in Scala and this gave me an opportunity to understand how PySpark works better.

The first was to create the Poisson class as I did this [project](https://github.com/mdh266/PoissonDistributionInScala), however, one key difference is the return value of any public value needs to be Java object. Specifically the [sample](https://github.com/mdh266/BayesBootstrapMLE/blob/main/src/main/scala/Poisson.scala) method needs to return a Java List ([java.util.List[Int]](https://www.javatpoint.com/java-list) of integers. I originally tried returning a [Scala List](https://www.scala-lang.org/api/current/scala/collection/immutable/List.html) which worked fine in pure Scala, but when Py4J was not able to serialize this object so well and Python type of returned list was "Java Object".

In order to use this class from Python we need to do three things from the Scala point of view:

1. Create a [Gateway Server](https://www.py4j.org/_static/javadoc/index.html?py4j/GatewayServer.html)
2. Create a class entrypoint to allow for setting object attributes outside of the constructor
3. Package the code as a jar using a build tool such as [Maven](https://maven.apache.org/) or SBT(https://www.scala-sbt.org/).


The first step is pretty straight forward to from the [Py4J Documentation](https://www.py4j.org/getting_started.html) and is in the [Main.Scala](https://github.com/mdh266/BayesBootstrapMLE/blob/main/src/main/scala/Main.scala) object:

    import py4j.GatewayServer

    object Main {
        def main(args: Array[String]) = {
            val server = new GatewayServer(new PoissonEntryPoint())
            server.start()
            System.out.println("Gateway Server Started")
        }
    }
    
The GatewayServer doesn't really offer a way for us to pass the $\lambda$ value from [Python](https://www.py4j.org/getting_started.html#writing-the-python-program) to the Poisson constructor. So we create a [PoissonEntryPoint](https://github.com/mdh266/BayesBootstrapMLE/blob/main/src/main/scala/PoissonEntryPoint.scala) case class:

    case class PoissonEntryPoint() {

        var p = new PoissonDistribution()

        def Poisson(lambda : Double) : PoissonDistribution = {
            p.setLambda(lambda)
            p
        }


    }

The point of this class is simply to be able to create a Poisson class instance after starting Web

In [2]:
from py4j.java_gateway import JavaGateway

gateway = JavaGateway() 

app = gateway.entry_point

In [7]:
type(app)

py4j.java_gateway.JavaObject

In [8]:
dir(app)

['Poisson',
 'apply',
 'canEqual',
 'copy',
 'equals',
 'getClass',
 'hashCode',
 'notify',
 'notifyAll',
 'p',
 'p_$eq',
 'productArity',
 'productElement',
 'productIterator',
 'productPrefix',
 'toString',
 'unapply',
 'wait']

We can then create a Poisson class instance:

In [3]:
p1 = app.Poisson(3.0)

In [4]:
type(p1)

py4j.java_gateway.JavaObject

In [6]:
dir(p1)

['$anonfun$cdf$1',
 '$anonfun$getSum$1',
 '$anonfun$invCDF$1',
 '$anonfun$invCDF$2',
 '$anonfun$invCDF$3',
 '$anonfun$invCDF$4',
 '$anonfun$sample$1',
 '$anonfun$uniform$1',
 '$lessinit$greater$default$1',
 'andThen',
 'apply',
 'apply$default$1',
 'canEqual',
 'cdf',
 'com$github$mdh266$PoissonDistribution$$lambda',
 'compose',
 'copy',
 'copy$default$1',
 'equals',
 'getClass',
 'getLambda',
 'getSum',
 'hashCode',
 'invCDF',
 'lambda$access$0',
 'notify',
 'notifyAll',
 'prob',
 'productArity',
 'productElement',
 'productIterator',
 'productPrefix',
 'sample',
 'setLambda',
 'toString',
 'unapply',
 'uniform',
 'wait']

Sampling from the Poisson object returns a Python list.. Py4J can only serialize specific Java objects back to Python which is why I needed to convert to a java.util.List[Int] object 

In [1]:
x = p.sample(10000)

NameError: name 'p' is not defined

In [None]:
lam = sum(x) / len(x)
print(f"lambda = {lam}") 

## 2. The Maximum Likelyhood Estimator 
----------

lambda = 2.9927


why Poisson(2).prob(2) == Poisson(2).prob(3)?

In [63]:
p.prob(2)

0.22404180765538775

In [64]:
p.prob(1)

0.14936120510359183

In [65]:
p.prob(3)

0.22404180765538775

In [57]:
p.setLambda(5.0)

In [59]:
p.getLambda()

5.0

## 3. Confidence Intervals From The Fisher Information
-------------------

## 4. The Bootstrap
----------------