You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I tried using FSharp.Stats.Distributions.Empirical.create 1. dataPoints to bin my dataPoints with a bandWidth of 1. Afterward i wanted to sample from the distribution via Distributions.Empirical.random but it returned the following error every other time
> System.Collections.Generic.KeyNotFoundException: An index satisfying the predicate was not found in the collection.
at Microsoft.FSharp.Collections.SeqModule.Find[T](FSharpFunc`2 predicate, IEnumerable`1 source)
at <StartupCode$FSI_0249>.$FSI_0249.main@()in D:\Freym\Source\Repos\Freymaurer\Jupyter_PraktikumBiotech\Scripts\JP06_Retention_time_and_scan_time.fsx:line 86
Stopped due to error
To Reproduce
Steps to reproduce the behavior:
let file = "Chlamy_JGI5_5(Cp_Mp).fasta" // can be found in BioFSharp.Mz
let sequences =
filePath
|> FastA.fromFile BioArray.ofAminoAcidString
|> Seq.toArray
let digestedProteins =
sequences
|> Array.mapi (fun i fastAItem ->
Digestion.BioArray.digest Digestion.Table.Trypsin i fastAItem.Sequence
|> Digestion.BioArray.concernMissCleavages 0 0
)
|> Array.concat
let digestedPeptideMasses =
digestedProteins
|> Array.choose (fun peptide ->
let mass = BioSeq.toMonoisotopicMassWith (BioItem.monoisoMass ModificationInfo.Table.H2O) peptide.PepSequence
if mass < 3000. then Some mass else None
)
let empHis = FSharp.Stats.Distributions.Empirical.create 1. digestedPeptideMasses
Expected behavior
Should produce a distribution, from which one can randomly sample according to the propabilities.
Additional context
I already looked in the FSharp.Stats.Distribution.Empirical module and think that the Distributions.Empirical.random function is just fine, but that the FSharp.Stats.Distributions.Empirical.create function is not correct when calculating the area subfunction. It seems unintuitive.
For my example above the total of all summed up propabilities was float = 0.6921621451, so everytime the random generated target subfunction of Distributions.Empirical.random produced an value above the ~0.69 it returned an error.
The text was updated successfully, but these errors were encountered:
You are right, the creation of the probability mass function (PMF) is incorrect.
Since the PMF must sum up to one, all bin sizes are divided by the total count of observations. This area calculation seems to be faulty.
If empty bins are present in the data, the area is overestimated (acc + (abs (x1 - x0)) * ((y0 + y1) / 2.)).
It is sufficient to divide the bin counts by the total number of items. A fix is prepared and will be pushed soon.
There are functions to normalize the Map generated by Empirical.create. I think the normalization should be incorporated into the create function to avoid confusion about probabilities that do not reach 100%.
Describe the bug
I tried using
FSharp.Stats.Distributions.Empirical.create 1. dataPoints
to bin my dataPoints with a bandWidth of 1. Afterward i wanted to sample from the distribution viaDistributions.Empirical.random
but it returned the following error every other timeTo Reproduce
Steps to reproduce the behavior:
Expected behavior
Should produce a distribution, from which one can randomly sample according to the propabilities.
Additional context
I already looked in the
FSharp.Stats.Distribution.Empirical
module and think that theDistributions.Empirical.random
function is just fine, but that theFSharp.Stats.Distributions.Empirical.create
function is not correct when calculating thearea
subfunction. It seems unintuitive.For my example above the total of all summed up propabilities was
float = 0.6921621451
, so everytime the random generatedtarget
subfunction ofDistributions.Empirical.random
produced an value above the ~0.69 it returned an error.The text was updated successfully, but these errors were encountered: