Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate Distributions #26

Open
nrlucaroni opened this issue Jun 19, 2013 · 9 comments
Open

Multivariate Distributions #26

nrlucaroni opened this issue Jun 19, 2013 · 9 comments

Comments

@nrlucaroni
Copy link

Is there a way to abstract the 'float' from the distribution modules to also include 'float array' (or other data-types) to fully extend the distribution modules? and is that sufficient to extend distributions to multivariate ones? Something like...

module type Mean = sig
  type t
  type elt
  val mean : t -> elt option
end
@nrlucaroni
Copy link
Author

I'm misunderstanding why adding the following will not work...

module type MultivariateDistribution = sig
  include BaseDistribution with type elt := float array
  val dimension : t -> int
end

with error message,

Error: Only type constructors with identical parameters can be substituted.

but the following does,

module type MultivariateDistribution = sig
  type vector = float array
  include BaseDistribution with type elt := vector
  val dimension : t -> int
end

@superbobry
Copy link
Owner

Abstracting elt type in Mean and similar signatures sounds good. However, this won't be enough to support multivariate distributions.

I'm unsure on what's the best way to approach this, but the first thing that comes to mind isn't very elegant:

module type UnivariateDistribution = sig
    type t
    type elt = float

    include BaseDistribution with type t := t and type elt := elt
end

module type MultivariateDistribution = sig
    type t
    type elt

    include BaseDistribution with type t := t and type elt := elt 
end

(* And, the boilerplate for discrete-continuous cases. *)

The reasons we currently have discrete continuous cases separated are:

  1. It's handy to indicate which type of distribution your function operates on;

  2. GSL doesn't provide quantile functions for discrete distributions;

  3. We use labels to indicate the type of the argument for probability and cumulative_probability, so simply abstracting the type of the random variable won't work. Example:

    Normal.(cumulative_probability ~x:0.42 standard)
    Poisson.(cumulative_probability ~n:10 (create ~rate:.42))

As for the compiler error, I've never seen this one before, I think we should ask for clarifications in the mailing list.

@superbobry
Copy link
Owner

Update: compiler error is documented here:

There are a number of restrictions: [...] the definition must be either another type constructor (with identical type parameters).

superbobry added a commit that referenced this issue Jun 20, 2013
@superbobry
Copy link
Owner

I've tried to generalize distribution signatures, so now each distribution also has an elt type. However, I'm unsure what to do with remaining signatures. For instance, Mean:

module type Mean = sig
  type elt
  type t

  val mean : t -> elt
end

Most discrete distributions have real means, so we can't just include Mean with type elt := elt and including Mean with different types seems hackish to me. What do you think?

@nrlucaroni
Copy link
Author

Yeah that's a tough one.

@superbobry
Copy link
Owner

Actually, what do you think about switching to objects for distributions? that way can get rid of all of the micro-signatures, like Mean, Variance etc, because we have row polymorphism for objects:

type 'a mean = < mean : 'a; .. >
type 'a mean_opt = < mean_opt : 'a option; .. >

superbobry added a commit that referenced this issue Jun 26, 2013
@superbobry
Copy link
Owner

Okay, I've chosen to stick with modules for now, multivariate normal distribution can be expressed as:

module MultiNormal : sig
  type elt = float array
  include BaseDistribution with type elt := elt
  include Features with type t := t and type elt := elt
  include MLE with type t := t and type elt := elt
end

However, I'm unsure if we should focus on this now: neither SciPy nor R provide multivariate distributions out of the box. So maybe we should delay this until later?

@nrlucaroni
Copy link
Author

I prefer modules too. I thought R/scipy provided a fairly full distribution suite, but I see that (looking at [1] and [2]) they have only a few basic ones as you've pointed out. I think at least allowing some generality to implement them is important along with a few basic ones.

[1] - http://docs.scipy.org/doc/numpy/reference/routines.random.html
[2] - http://cran.r-project.org/web/views/Distributions.html

@superbobry
Copy link
Owner

Well, for SciPy a list of supported distributions is a little longer, but still, all of them are univariate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants