-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should Empirical PMF functions be usable with generic keys? #245
Comments
I will have a look at this. Maybe there is a performance advantage if you explicitly restrict it to float. If so, there should be additional "generic" functions. I'll test it and make the functions usable for "non-float" lists as well. That you don't have access to non-float letters in your case is hard to work around in the module. There are a lot of possible alphabets that could be considered (upper case, lower case, äüö, special characters, numbers). I assume you have to add your desired set of characters separately by: let myAlphabet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray() With this at hand you can use this as template and just replace counts of characters that are existing in your text. #r "nuget: FSharp.Stats"
#r "nuget: Plotly.NET"
open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET
let myAlphabet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()
let myTextMap =
"mississippi".ToCharArray()
|> List.ofArray
|> Frequency.createGeneric
let myFinalMap =
// use your own defined alphabet to include the desired set of characters
myAlphabet
|> Array.map (fun key ->
// if the text contains the current character, its value is used
if myTextMap.ContainsKey key then
key,myTextMap.[key]
// if the text does NOT contain the current character, set its count to 0
else
key,0
)
|> Map.ofArray
// accession of character frequencies
myFinalMap.['z'] // 0
myFinalMap.['s'] // 4
// visualization
myFinalMap
|> Map.toArray
|> Chart.Column
|> Chart.withSize (1000.,500.) // quick way to depict all characters
|> Chart.show
I'll comment if I have any news. |
I fixed the issue, tested the
still missing
UsageYou can build the binaries yourself or wait for the next FSharp.Stats release. Define the set of characters to search for: #r @"<PathToFSharp.Stats>\FSharp.Stats\src\FSharp.Stats\bin\Release\netstandard2.0\FSharp.Stats.dll"
#r "nuget: Plotly.NET"
open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET
let letters = "Mississippi"
// Define your set of characters that should be checked for
// Any character that is not present in these sets is ignored
let myAlphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" |> Set.ofSeq
let mySmallAlphabet = "abcdefghijklmnopqrstuvwxyz" |> Set.ofSeq These alphabets can be used to create the probability maps. //takes the characters and determines their probabilities without considering non-existing characters
let myFrequencies0 = EmpiricalDistribution.createNominal() letters
//takes upper and lower case characters and determines their probability
let myFrequencies1 = EmpiricalDistribution.createNominal(Template=myAlphabet) letters
//takes only lower case characters and determines their probability
let myFrequencies2 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet) letters An additional field for transforming the input sequence may be beneficial if it does not matter if an character is lower case or upper case: //converts all characters to lower case characters and determines their probability
let myFrequencies3 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet,Transform=System.Char.ToLower) letters
// check probability of non existing characters, that are within the search scope (Template alphabet)
myFrequencies3.['z'] //returns 0.0 Visualization[
Chart.Column(myFrequencies0 |> Map.toArray,"noTemplate") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies1 |> Map.toArray,"bigAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies2 |> Map.toArray,"smallAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies3 |> Map.toArray,"toLower + smallAlphabet") |> Chart.withYAxisStyle "probability"
]
|> Chart.Grid(4,1)
|> Chart.withTemplate ChartTemplates.lightMirrored
|> Chart.withTitle letters
|> Chart.withSize(1000.,900.)
|> Chart.show |
A prerelease is published and can be used:
The documentation that contains the same information as this thread can be found here. |
Thanks Benedikt, nice solution! |
I can create this
But then cant get probability for specific value as all functions except ofHistogram take a float as the map key.
I can work around this by querying the map directly with letters["i"]. But then letters["z"] returns an error instead of a zero.
Would prefer to use probabilityAt but this expects Map<float,float>. Should this function be generic or have I missed something?
thanks
The text was updated successfully, but these errors were encountered: