This is meant to be a whirlwind tour of Sequence Modeling Language or SMoL, a domain specific probabalistic programming language for sequences over symbols. SMoL is currently an embedded language in Haskell, this document was generated from a Haskell Jupyter Notebook, but Haskell knowledge shouldn't be necessary.

First, I'll show how to build up distributions of sequences with SMoL by composing smaller pieces.
Then I'll demonstrate two types of inference from sequence emission data: decoding and parameter inference. Decoding allows the user to make queries on the generative process for the data in terms of the model's branching variables. Parameter inference can infer the posterior parameters in a sequence model, although this is limited at this point.

# Hello World

In [27]:
-- Some imports we need (this is a comment)
import SMoL
import SMoL.Inference.SHMM
import SMoL.Tags.Utils
import Control.Monad

Let's start with the simplest distribution over non-empty sequences: the distribution that always returns a singleton sequence.

In [3]:
-- The simplest model, besides the empty model
simplest = symbol 'H'

`simplest` is a SMoL expression for this distribution. We can compile it and sample from it:

In [4]:
printSamples 10 (compileSMoL simplest)

"H"
"H"
"H"
"H"
"H"
"H"
"H"
"H"
"H"
"H"

Since `simplest` is deterministic, so we'll always get 'H'.

Almost as simple:

In [5]:
-- Just multiple symbols in a row
elloWorld = symbols "ello world!"

In [6]:
printSamples 10 (compileSMoL elloWorld)

"ello world!"
"ello world!"
"ello world!"
"ello world!"
"ello world!"
"ello world!"
"ello world!"
"ello world!"
"ello world!"
"ello world!"

`elloWorld` is still deterministic like `simplest`, but now the sequence has multiple symbols.

We build up more complex distributions over sequences by composing simpler ones. For example, `andThen` is a function that composes two sequence distributions by concatentating their consistituants:

In [7]:
-- We can use other models as parts
helloWorld = andThen simplest elloWorld

In [8]:
printSamples 10 (compileSMoL helloWorld)

"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Hello world!"

Since `simplest`, `elloWorld` and `andThen` were deterministic, so is `helloWorld`.

`eitherOr` is another way of composing distributions that is not deterministic. The first argument to `eitherOr` first argument (0.6 in this example) is the probability of sampling from the first distribution rather than the second.

In [0]:
-- Models can be probabilistic!
helloGoodbye =
    andThen 
        (eitherOr 0.6
            (symbols "Hello")
            (symbols "Goodbye, cruel"))
        (symbols " world!")

In [10]:
printSamples 10 (compileSMoL helloGoodbye)

"Hello world!"
"Goodbye, cruel world!"
"Goodbye, cruel world!"
"Goodbye, cruel world!"
"Hello world!"
"Hello world!"
"Hello world!"
"Goodbye, cruel world!"
"Hello world!"
"Hello world!"

`helloGoodbye` now represents the distribution that returns "Hello world!" with 60% probability and "Goodbye, cruel world!" with 40% probability.

# Brief introspection

This section is safe to skip if you're not interested in the Haskell types of the SMoL expressions we're working with.

An uncompiled model is of type `ProbSeq a`, where `a` is the type of the symbol in the sequences.

In [12]:
simplest :: ProbSeq Char

`compileSMoL` is a function from uncompiled `ProbSeq` to the matrix representation of the distribution, `MatSeq`.

In [13]:
compileSMoL :: forall s. Eq s => ProbSeq s -> MatSeq s

A compiled model is of type `MatSeq a`, where `a` is the type of the symbol in the sequences.

In [14]:
simplestC = compileSMoL simplest

In [15]:
simplestC :: MatSeq Char

If we print the value of an uncompiled value like `simplest` from earlier, we get a SMoL AST expression.

In [17]:
simplest

symbol ('H')

If we print a compiled value, we get the actual matrix form of the distribution as well as some bookkeeping.

In [18]:
simplestC

MatSeq {trans = SM SparseMatrix 2x2
1.0	0.0
0.0	1.0
, stateLabels = [StateLabel {stateLabel = 'H', stateTag = StateTag 0 [], tagSet = fromList []}]}

# More tools

This section is a non-exhaustive list of functions that I've built into SMoL for manipulating and composing sequence distributions.

## `finiteDistRepeat`

Repeat an sequence distribution a random number of times according to a given distribution.

In [19]:
repeatModel = finiteDistRepeat [0, 0.1, 0.4, 0, 0.3, 0.2] (andThen (symbols "la") (eitherOr 0.5 (symbols ", ") (symbols "! ")))

printSamples 10 (compileSMoL repeatModel)

"la, la, "
"la, la! "
"la, la! "
"la! la! la, la, "
"la, "
"la, la! "
"la, la! la, la! la! "
"la, la, "
"la, la, "
"la, la, "

## `finiteDistOver`

Choose a sequence distribution at random according to a given distribution. Generalizes `eitherOr`

In [20]:
branchModel = andThen (symbols "Today, I like ") $
    finiteDistOver [
        (symbols "bananas.", 0.4)
      , (symbols "apples.", 0.4)
      , (symbols "grapes.", 0.2)
    ]
printSamples 10 (compileSMoL branchModel)

"Today, I like apples."
"Today, I like bananas."
"Today, I like apples."
"Today, I like apples."
"Today, I like bananas."
"Today, I like apples."
"Today, I like bananas."
"Today, I like apples."
"Today, I like apples."
"Today, I like apples."

## `skip`

`skip n` is a special symbol that, when emitted, skips the next `n` symbols. `skip 0` doesn't do anything.

`possibly`, also shown below, emits nothing with the given probability.

In [21]:
skipModel = andThen (possibly 0.5 (skip 3)) (symbols "do i eat fruit")
printSamples 10 (compileSMoL skipModel)

"do i eat fruit"
"i eat fruit"
"do i eat fruit"
"do i eat fruit"
"i eat fruit"
"i eat fruit"
"do i eat fruit"
"i eat fruit"
"do i eat fruit"
"do i eat fruit"

## `skipDist`

Insert a `skip n`, where `n` is drawn from a given distribution, before each symbol of the given sequence distribution.

In [22]:
skipDistModel = skipDist [0.2, 0.5, 0.2, 0.1] (symbols "do i eat fruit")
printSamples 10 (compileSMoL skipDistModel)

"di eatfui"
"doieatt fruit"
"doo  t ruutt"
"dd ii e uitt"
"oo i aatfrri"
"doi  t uiit"
"o eaa ffrrrit"
"o   ea fruitt"
"d ieeaat  fritt"
"o eat ffrruii"

## `collapse`

Transform each constituant sequence in a given distribution to the sequence of sliding windows (or De Bruijn graph nodes) of a given length.

In [23]:
-- Read: collapse 3 (symbols "do i eat fruit")
collapseModel = collapse undefined (foldl1 (++)) 3 (symbols (map (:[]) "do i eat fruit"))
printSamples 10 (compileSMoL collapseModel)

["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]
["do ","o i"," i ","i e"," ea","eat","at ","t f"," fr","fru","rui","uit"]

# Example application: MinION Sequencer

SMoL was originally motivated by the problem of genotyping DNA given the singal from the Oxford Nanopore Technologies MinION Sequencer. Using SMoL, I can write a very concise and expressive solution to this decoding problem.

DNA is fed into the MinION machine, and a signal is read out. The goal is to be able to use prior information about the input DNA, represented by a SMoL expression, and transform it to a signal-level model, another SMoL expression. To accomplish this, we encode the domain-specific knowledge about how the MinION works - we can encapsulate this knowledge here, and future users can use it.

The MinION signal transformation takes a sliding window over the DNA sequence, with random skips. In SMoL, this is equivalent to a `skipDist` and a `collapse`.

In [24]:
type NT = Char

k = 4

ntSymbols :: [NT] -> ProbSeq [NT]
ntSymbols = symbols . map (:[])

-- Here is the function we want: map from DNA level models to signal level models.
minion :: ProbSeq [NT] -> ProbSeq [NT]
minion = skipDist [0.4, 0.3, 0.2, 0.1]
       . collapse undefined (foldl1 (++)) k


Below is an example of how the MinION transfors a DNA sequence. In this case, we know the DNA sequence for sure, and we take 10 samples.

In [25]:
printSamples 10 . compileSMoL $ minion (ntSymbols "ACGTACACGTATGAC")

["CGTA","ACAC","CACG","CACG","ACGT","CGTA","ATGA","ATGA","ATGA","ATGA","ATGA","TGAC"]
["ACGT","ACGT","ACGT","GTAC","GTAC","GTAC","GTAC","TACA","ACAC","ACAC","ACAC","ACGT","ACGT","ACGT","GTAT","TATG","TGAC","TGAC","TGAC","TGAC"]
["CGTA","CGTA","ACAC","ACAC","ACAC","ACAC","ACGT","CGTA","CGTA","CGTA","GTAT","TATG","TGAC","TGAC","TGAC","TGAC"]
["ACGT","ACGT","CGTA","CGTA","GTAC","GTAC","TACA","ACAC","ACGT","ACGT","GTAT","TATG","ATGA","TGAC"]
["GTAC","TACA","CACG","CGTA","CGTA","CGTA","ATGA","ATGA","ATGA"]
["ACGT","ACGT","GTAC","GTAC","ACAC","ACGT","CGTA","CGTA","ATGA"]
["GTAC","ACAC","CACG","ACGT","ACGT","CGTA","GTAT","GTAT","GTAT","TATG","ATGA"]
["CGTA","TACA","ACAC","ACAC","ACAC","ACGT","ACGT","ACGT","CGTA","GTAT","GTAT","GTAT","ATGA","ATGA","TGAC","TGAC"]
["ACGT","GTAC","ACAC","ACAC","ACAC","CACG","CACG","ACGT","GTAT","TATG","TATG","ATGA","TGAC"]
["GTAC","GTAC","TACA","TACA","TACA","TACA","TACA","TACA","ACAC","CACG","CACG","ACGT","ACGT","GTAT","TATG","ATGA","ATGA","TGAC"]

You can get the idea from the samples - MinION events are 4-wide sliding windows over the input sequence, with random skips and random stalls. The MinION machine's signal is translated into a distribution over these 4-tuples at each time point, which is the matrix passed to SMoL.

# Inference: Decoding

In [0]:
So far, we've defined sequence distributions and sampled from them. We can also do two kinds of inference on sequence data. Let's start with decoding.

We can take a SMoL expression defining our model and some data describing a sequence of distributions over symbols, and we can calculate high-level queries on the process that generated the data.

Below, I define a SMoL model and keep references to 'decision' points in the generative process.

In [28]:
model :: ProbSeq Char
vars :: (Tag Int, Tag Bool)
(model, vars) = runTagGen $ 
  -- The random variable "a" represents the number of times the symbol is repeated
  (ps1, a) <- finiteDistRepeatM [0.8,0.1,0.1] (symbol 'a')
  
  -- The random variable "b" represents the branch taken
  (ps2, b) <- eitherOrM 0.5 ps1 (symbol 'c')
  
  let ps = andThen ps2 (symbol 'd')
  return (ps, (a, b))

-- Sample from the model, just as before
printSamples 10 (compileSMoL model)

"ad"
"ad"
"cd"
"ad"
"ad"
"ad"
"ad"
"cd"
"aaad"
"ad"

As an example, let's generate some sample data from the model.

The matrix that's printed has a row for each point in the sequence, and a column for each symbol.

In [31]:
s <- sample (compileSMoL model)
observations = simulateEmissionsUniform "abcd" 0.5 s

print s
mapM_ print . emissions $ observations

"cd"

[0.125,0.125,0.5,0.125]
[0.125,0.125,0.125,0.5]

We can directly take the posterior over the symbols given our model and the data. There are more columns now, because instead of distributions over symbols, we now have distributions over generating states from our model.

In [32]:
posterior = fullPosteriorSHMM observations (compileSMoL model)

mapM_ print posterior

[0.16666666666666666,0.0,0.0,0.0,0.0,0.8333333333333333,0.0]
[0.0,0.0,0.0,0.0,0.0,0.0,1.0]

The posterior matrix above isn't very user friendly, we would ask queries in the language of our generating model.

To this end, there is a second DSL for defining queries. Below, I define a SMoL query that uses the random variables `a` and `b`. I can query the probability of events of one or more variable, as well as conditional probabilities.

In [33]:
query (a, b) = do
  aDist <- tagDist a
  
  -- p(a = 1 | b is first branch)
  anb <- condition1 b id $ event1 a (== 1)
  
  -- p(b is first branch)
  yb <- event1 b (== True)
  -- p(a > 1)
  ya <- event1 a (> 1)
  
  -- Return p(b is first branch)
  return yb

In [34]:
-- The SMoL function runQuery takes a model with variables, a query, some data, and returns the results of the query.
runQuery :: (ProbSeq d, a) -> (a -> Query b) -> Emissions d -> b

In [0]:
runQuery (model, vars) query observations

0.01

# Inference: Branch index posteriors

SMoL can also do posterior inference over the branch indices in a sequence distribution. In this section, I use a MinION inference problem as an example. I will define a true model and a prior model, and show that the prior model learns the true model from data. This example performs Short Tandem Repeat (STR) allele inference.

In [38]:
strSegment = ntSymbols "ACT"

-- This function constructs a SMoL MinION model given a sub-model to define the center region.
strProblem strModel = minion $ series [
      ntSymbols "ACGTACACGTATGAC"
    , strModel
    , ntSymbols "TACCAGTTGACAGAT"
    ]

In [39]:
-- The true model repeats the "ACT" sequence exactly 3 times.
strTruth = strProblem (repeatSequence 3 strSegment)
strTruthC = compileSMoL strTruth

In [40]:
-- This function constructs a model given a prior distribution over the number of "ACT" repeats.
-- The parameter is what we're going to infer.
strModel :: [Prob] -> (ProbSeq [NT], Tag Int)
strModel prior = runTagGen $ do
    (ps, repeatVar) <- finiteDistRepeatM prior strSegment
    return (strProblem ps, repeatVar)

-- Define a query that just returns the infered parameter to the model.
strQuery :: Tag Int -> Query [Prob]
strQuery repeatVar = do
    posteriorDist <- tagDist repeatVar
    let (Just distList) = sequence (map (flip Map.lookup posteriorDist) [1..4])
    return distList

In [41]:
-- This expression randomly generates a sample from the true distribution.
simulateSTR :: IO (Emissions [NT])
simulateSTR = do
    s <- sample strTruthC
    return (simulateEmissionsUniform (minionSymbols k) 0.5 s)

In [42]:
sim1 <- simulateSTR

In [44]:
-- This SMoL function takes as input:
--    1. A function from the parameter to a model with random variables
--    2. A function from random variables to a SMoL query
--    3. A first, prior setting for the parameter
--    4. A set of sequence data matricies to learn from
-- And returns as output the final parameter.

updates :: (a -> (ProbSeq b, c)) -> (c -> Query a) -> a -> [Emissions b] -> a

In [45]:
-- Get 5 simulated data matricies
simulations <- replicateM 5 simulateSTR

-- Run inference
updates strModel strQuery [0.25, 0.25, 0.25, 0.25] simulations

[1.800368398168107e-11,2.2899140151477957e-2,0.8986217921194574,7.847906771106108e-2]

Our model, given five noisy examples, assigned 90% posterior probability to the true model, 3 repeats of "ACT".

# Inference: MAP Estimates of Dirchlet-Categorical Branch Distributions

SMoL can also do posterior inference over the parameters to a sequence distribution. In this section, I use a MinION inference problem as an example. I will define a true model and a prior model, and show that the prior model learns the true model from data. This example performs Short Tandem Repeat (STR) allele inference.

In [46]:
--additional imports for this section
import SMoL.DirichletInference
import SMoL.EmissionsInference hiding (simulateEmissionsUniform)
import qualified Data.Map as Map
import Data.Map (Map)
import Data.List (intercalate)

In [47]:
-- Boring helper functions to help with gradient descent convergence and displaying
epsilon = 0.000001
listConverged xs1 xs2 = (< epsilon) . maximum $ zipWith (\x1 x2 -> abs (x1 - x2)) xs1 xs2
mapConverged xs1 xs2 = (< epsilon) . maximum $ Map.intersectionWith (\x1 x2 -> abs (x1 - x2)) xs1 xs2

showMap :: (Show a, Show b) => Map a b -> String
showMap = intercalate "\t" . map (\(k, v) -> show k ++ ":" ++ show v) . Map.toList

showIters :: (a -> String) -> (a -> a -> Bool) -> IO [a] -> IO ()
showIters showF converged = showIters' . (map showF <$>) . (takeUntilPair converged <$>)
  where showIters' :: IO [String] -> IO ()
        showIters' its = its >>= \iters -> forM_ (zip iters [0..]) $ \(iter, n) ->
          putStrLn $ "Iteration " ++ show n ++ ":\t" ++ iter

## Example 1

In [48]:
-- Sequence model to simulate symbol sequences: p(a)=0.3, p(b)=0.2, p(c)=0.5
generatingSeq = finiteDistOver $ zip [symbol 'a', symbol 'b', symbol 'c'] [0.3, 0.2, 0.5]

-- Prior sequence model, where [2, 5, 3] are dirichlet parameters for the prior of the branch categorical
modelSeq = runTagGen $ finiteDistOverDirichletM [symbol 'a', symbol 'b', symbol 'c'] [2, 5, 3]

-- Sample 1000 example symbol sequences, with noisy emissions distributions
observations = replicateM 1000 (simulateEmissionsUniform "abc" 0.95 <$> sample (compileSMoL generatingSeq))

-- Run the "branchMAPs" query, which runs an MAP optimization over the categorical branch distribution given data
dirichletExample1Iters = runQueries modelSeq branchMAPs <$> observations

-- Print the iteration until convergence
dirichletExample1 = showIters show listConverged dirichletExample1Iters

Iteration 0:	[0.2,0.5,0.3]
Iteration 1:	[0.3636569679663687,4.42067450604088e-2,0.5921362869732224]
Iteration 2:	[0.236458779788833,0.35395430096854225,0.4095869192426248]
Iteration 3:	[0.272784624628425,0.27114252385632637,0.4560728515152486]
Iteration 4:	[0.2869594766319614,0.22329185194457735,0.48974867142346123]
Iteration 5:	[0.2910090871463246,0.20124235018865855,0.5077485626650169]
Iteration 6:	[0.2905460806949576,0.19412540054251026,0.5153285187625322]
Iteration 7:	[0.28933004872775225,0.19234654439356433,0.5183234068786834]
Iteration 8:	[0.28848863167352495,0.19190263194084606,0.519608736385629]
Iteration 9:	[0.28802665173032577,0.19176931007046263,0.5202040381992116]
Iteration 10:	[0.28779077048512597,0.19171929380363098,0.5204899357112431]
Iteration 11:	[0.28767334859065663,0.19169746981165733,0.520629181597686]
Iteration 12:	[0.2876154287184997,0.19168722971924587,0.5206973415622543]
Iteration 13:	[0.2875584784018449,0.19167733758443964,0.5207641840137154]
Iteration 14:	[0.2

## Example 2

In [49]:
-- Using functions from earlier, but with new truth and prior models

-- Repeat the STR segment with the given probabilities (starting with p(0)=0)
strTruth = strProblem (finiteDistRepeat [0.0, 0.15, 0.3, 0.3, 0.25] strSegment)
strTruthC = compileSMoL strTruth

-- Generate an STR model with a given Dirichlet prior, and return the random variable
strModel :: [Double] -> (ProbSeq [NT], DirichletTag)
strModel prior = runTagGen $ do
    (ps, repeatVar) <- finiteDistRepeatDirichletM prior strSegment
    return (strProblem ps, repeatVar)

-- Randomly sample from the true model, with noisy emissions
simulateSTR :: IO (Emissions Double [NT])
simulateSTR = do
    s <- sample strTruthC
    return (simulateEmissionsUniform (minionSymbols k) 0.9 s)

-- Learn the distribution of repeats with a weak prior
dirichletExample2a = showIters show  listConverged $
  runQueries (strModel [2, 2, 2, 2]) branchMAPs <$> replicateM 100 simulateSTR

dirichletExample2a

Iteration 1:	[0.13837596670598415,0.2847074607600655,0.28438337223704707,0.29253320029690333]
Iteration 2:	[0.16563291897743118,0.2760272359235872,0.2744737253320037,0.28386611976697795]
Iteration 3:	[0.16400156101681632,0.2768096214710397,0.27435454450462166,0.28483427300752223]
Iteration 4:	[0.16385787331028565,0.2770147426724063,0.27390950520741253,0.28521787880989563]
Iteration 5:	[0.16384594063460103,0.27712502966218655,0.2735442836219793,0.2854847460812331]
Iteration 6:	[0.16384449588359212,0.27719918931818693,0.27326729778847253,0.2856890170097485]
Iteration 7:	[0.16384395573601382,0.2772515195682989,0.27305848029170465,0.2858460444039826]
Iteration 8:	[0.16384356716429024,0.2772891935805701,0.27290093584962377,0.28596630340551593]
Iteration 9:	[0.1638432614281866,0.27731668443579355,0.27278194967723834,0.2860581044587815]
Iteration 10:	[0.16384302149424995,0.2773369463629695,0.2726920127559536,0.28612801938682686]
Iteration 11:	[0.16384283482391282,0.2773519916971164,0.27262399

In [50]:
-- Learn the distribution of repeats with a stronger prior
dirichletExample2b = showIters show listConverged $
  runQueries (strModel [16, 31, 31, 26]) branchMAPs <$> replicateM 100 simulateSTR
  
dirichletExample2b

Iteration 0:	[0.15384615384615385,0.2980769230769231,0.2980769230769231,0.25]
Iteration 1:	[0.18421313508124104,0.28704945979622376,0.2871678044899782,0.2415696006325571]
Iteration 2:	[0.16172198391306225,0.2943017517575359,0.2943897114096955,0.2495865529197065]
Iteration 3:	[0.17689561328879155,0.28957440861676453,0.28936120615776817,0.24416877193667583]
Iteration 4:	[0.1657957245333616,0.2930883734110637,0.29303940117896304,0.24807650087661162]
Iteration 5:	[0.1735317742305609,0.29065758704435257,0.29048800938102,0.24532262934406662]
Iteration 6:	[0.1679312780500983,0.2924263867006909,0.29234433504886953,0.24729800020034126]
Iteration 7:	[0.17188557986163738,0.29118143866152196,0.29103873599495866,0.2458942454818821]
Iteration 8:	[0.16904063255984972,0.29207921786341484,0.29198079151870354,0.24689935805803193]
Iteration 9:	[0.1710612181348811,0.29144258770530956,0.29131310410379296,0.24618309005601635]
Iteration 10:	[0.16961248533893972,0.2918995723236985,0.29179254767152324,0.246695

# Inference: MAP Estimates of Emission Distributions

In [51]:
sampleEmissions generatingNorms generatingModel = do
  seq <- fst <$> sampleSeq vecDist (compileSMoL generatingModel)
  putStrLn $ "Generating sequence: " ++ show seq

  ems <- evalState (sampleNormalSequence generatingNorms seq) <$> getStdGen
  putStrLn $ "Generated emissions: " ++ show ems

## Example 1

In [52]:
sampleEmissions (Map.fromList [('a',(-1,1)),('b',(1,1))]) (symbols "abab")

Generating sequence: "abab"
Generated emissions: [-0.1984506676639053,1.7812933394857646,-1.53908923018402,1.8405630904808585]


In [53]:
emissionsExample1 = showIters showMap mapConverged $
  meanDescent 100 10 -- sample 100 symbolic sequences, and 10 emissions from each
  (symbols "abab") -- the generating and prior sequence model
  (Map.fromList [('a',0), ('b',0)]) -- the initial distribution means
  (Map.fromList [('a', (0,4)), ('b', (0,4))]) -- the distribution prior distributions (mean, stddev)
  (Map.fromList [('a',-1),('b',1)]) -- the generating emission distribution means

emissionsExample1

Iteration 0:	'a':-1.5632205493494604	'b':1.6197864502629864
Iteration 1:	'a':-0.6075798619541546	'b':0.6295654367232947
Iteration 2:	'a':-1.1917898915532659	'b':1.234915392266104
Iteration 3:	'a':-0.8346458695522601	'b':0.864847938975313
Iteration 4:	'a':-1.0529780548770777	'b':1.0910805813190978
Iteration 5:	'a':-0.9195054493953099	'b':0.9527782042612687
Iteration 6:	'a':-1.001101007043343	'b':1.037326337111074
Iteration 7:	'a':-0.9512193477780387	'b':0.9856396855837521
Iteration 8:	'a':-0.9817134090085861	'b':1.0172371893494805
Iteration 9:	'a':-0.9630715317328833	'b':0.9979207466176986
Iteration 10:	'a':-0.9744678356143173	'b':1.009729431334587
Iteration 11:	'a':-0.9675009545305503	'b':1.0025104502478968
Iteration 12:	'a':-0.9717600048805861	'b':1.0069236164200333
Iteration 13:	'a':-0.9691563276158195	'b':1.0042257238187071
Iteration 14:	'a':-0.9707480287561949	'b':1.0058750214441268
Iteration 15:	'a':-0.9697749770824875	'b':1.0048667594192127
Iteration 16:	'a':-0.9703698309377035	'

## Example 2

In [54]:
exampleEmissions (Map.fromList [('a',(-1,1)),('b',(1,1))]) (eitherOr 0.5 (symbols "ab") (symbols "ba"))

Generating sequence: "ba"
Generated emissions: [0.3651256936676718,-0.9261299350874203]


In [55]:
emissionsExample2a = showIters showMap mapConverged $
  meanDescent 100 10 -- sample 100 symbolic sequences, and 10 emissions from each
  (eitherOr 0.5 (symbols "ab") (symbols "ba")) -- the generating and prior sequence model
  (Map.fromList [('a',0), ('b',0)]) -- the initial distribution means
  (Map.fromList [('a', (0.0,4)), ('b', (0.0,4))]) -- the distribution prior distributions (mean, stddev)
  (Map.fromList [('a',-1),('b',1)]) -- the generating emission distribution means

emissionsExample2a

Iteration 0:	'a':5.297834456115842e-2	'b':5.297834456115842e-2
Iteration 1:	'a':1.8004359284456553e-2	'b':1.8004359284456553e-2
Iteration 2:	'a':4.1092654252281306e-2	'b':4.1092654252281306e-2
Iteration 3:	'a':2.5850772027426694e-2	'b':2.5850772027426694e-2
Iteration 4:	'a':3.5912795839929436e-2	'b':3.5912795839929436e-2
Iteration 5:	'a':2.9270287932454397e-2	'b':2.9270287932454397e-2
Iteration 6:	'a':3.3655381043247896e-2	'b':3.3655381043247896e-2
Iteration 7:	'a':3.0760534419327343e-2	'b':3.0760534419327343e-2
Iteration 8:	'a':3.2671585510902046e-2	'b':3.2671585510902046e-2
Iteration 9:	'a':3.1409993188726414e-2	'b':3.1409993188726414e-2
Iteration 10:	'a':3.2242841245163134e-2	'b':3.2242841245163134e-2
Iteration 11:	'a':3.169303139540642e-2	'b':3.169303139540642e-2
Iteration 12:	'a':3.205599180403266e-2	'b':3.205599180403266e-2
Iteration 13:	'a':3.1816381221779294e-2	'b':3.1816381221779294e-2
Iteration 14:	'a':3.19745616452167e-2	'b':3.19745616452167e-2
Iteration 15:	'a':3.1870137850

In [56]:
emissionsExample2b = showIters showMap mapConverged $
  meanDescent 100 10 -- sample 100 symbolic sequences, and 10 emissions from each
  (eitherOr 0.5 (symbols "ab") (symbols "ba")) -- the generating and prior sequence model
  (Map.fromList [('a',0), ('b',0)]) -- the initial distribution means
  (Map.fromList [('a', (-0.5,4)), ('b', (0.5,4))]) -- the distribution prior distributions (mean, stddev)
  (Map.fromList [('a',-1),('b',1)]) -- the generating emission distribution means
  
emissionsExample2b

Iteration 0:	'a':6.854932378683323e-2	'b':0.4591743237868322
Iteration 1:	'a':-1.7980537409380013	'b':0.8374317887140226
Iteration 2:	'a':-0.4354854058077964	'b':1.2015769033491164
Iteration 3:	'a':-1.2627828491673299	'b':0.888973670886968
Iteration 4:	'a':-0.7659513335549879	'b':1.144654710797567
Iteration 5:	'a':-1.0755356996521863	'b':0.9574632101627943
Iteration 6:	'a':-0.881299972499825	'b':1.091177176282741
Iteration 7:	'a':-1.0047411851225896	'b':0.9981203492061896
Iteration 8:	'a':-0.9257056128908115	'b':1.062007310994669
Iteration 9:	'a':-0.9766711774556309	'b':1.0186216714601741
Iteration 10:	'a':-0.9436348736766563	'b':1.0478719047623846
Iteration 11:	'a':-0.9651406877001455	'b':1.0282588720408747
Iteration 12:	'a':-0.9510953341138412	'b':1.0413583821260737
Iteration 13:	'a':-0.9602916365861104	'b':1.0326348331902007
Iteration 14:	'a':-0.9542585524113164	'b':1.0384316509215674
Iteration 15:	'a':-0.9582223865740058	'b':1.034585901404014
Iteration 16:	'a':-0.9556151056327333	'

In [57]:
emissionsExample2c = showIters showMap mapConverged $
  meanDescent 100 10 -- sample 100 symbolic sequences, and 10 emissions from each
  (eitherOr 0.75 (symbols "ab") (symbols "ba")) -- the generating and prior sequence model
  (Map.fromList [('a',0), ('b',0)]) -- the initial distribution means
  (Map.fromList [('a', (0.0,4)), ('b', (0.0,4))]) -- the distribution prior distributions (mean, stddev)
  (Map.fromList [('a',-1),('b',1)]) -- the generating emission distribution means

emissionsExample2c

Iteration 0:	'a':-1.0382290118510533	'b':1.7532791648077055
Iteration 1:	'a':-0.9001168359318837	'b':0.6068345466332241
Iteration 2:	'a':-0.8919454952350013	'b':1.2643201697689705
Iteration 3:	'a':-0.967862831396625	'b':0.9007999009004626
Iteration 4:	'a':-0.8946070676147433	'b':1.1176416185642601
Iteration 5:	'a':-0.9556771872352574	'b':0.9872020726991502
Iteration 6:	'a':-0.9095463235251442	'b':1.0674975115947847
Iteration 7:	'a':-0.9429350295115143	'b':1.0174251037515782
Iteration 8:	'a':-0.9194777437354328	'b':1.0490651939020514
Iteration 9:	'a':-0.935661665688284	'b':1.0288762387783925
Iteration 10:	'a':-0.9246369095824277	'b':1.0418632648050747
Iteration 11:	'a':-0.9320822074017978	'b':1.0334570345760101
Iteration 12:	'a':-0.9270853093608795	'b':1.0389246218483155
Iteration 13:	'a':-0.9304241431837297	'b':1.0353552602886107
Iteration 14:	'a':-0.9282003547109309	'b':1.037691960173826
Iteration 15:	'a':-0.9296780212172162	'b':1.0361589917875158
Iteration 16:	'a':-0.9286978195981046