Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats.cov throws if Stats.stdDev contains missing values #551

Open
Choc13 opened this issue Nov 3, 2022 · 2 comments
Open

Stats.cov throws if Stats.stdDev contains missing values #551

Choc13 opened this issue Nov 3, 2022 · 2 comments

Comments

@Choc13
Copy link
Contributor

Choc13 commented Nov 3, 2022

Calling Stats.cov on a frame for which Stats.stdDev returns <missing> values throws an exception due to mismatched matrix dimensions.

Here's an example that shows this:

#r "nuget: Deedle.Math"

open System
open Deedle
open Deedle.Math
open MathNet.Numerics.LinearAlgebra

let googReturns =
    series [ DateTime(2022, 10, 28) => 0.1
             DateTime(2022, 10, 29) => 0.11
             DateTime(2022, 10, 30) => 0.09 ]

let msftReturns =
    series [ DateTime(2022, 10, 29) => -0.01
             DateTime(2022, 10, 30) => 0.2 ]

let returns =
    Frame.ofColumns [ "GOOG" => googReturns
                      "MSFT" => msftReturns ]

// This one works fine because the stdDev is defined for all columns as they have 2+ values. It returns,
// DenseMatrix 2x2-Double
//      0.0001        0
//      0        0.02205
returns |> Stats.cov

let fbReturns =
    series [ DateTime(2022, 10, 30) => 0.01 ]

let returnsWithFb = returns |> Frame.addCol "FB" fbReturns

// This returns `series [ GOOG => 0.009999999999999796; MSFT => 0.148492424049175; FB => <missing>]`
returnsWithFb |> Stats.stdDev

// This subsequently throws
returnsWithFb |> Stats.cov

Which results in the following error:

System.ArgumentException: Matrix dimensions must agree: op1 is 2x2, op2 is 3x3.

I was expecting it to instead return something like

series [ GOOG => series [ GOOG => 9.999999999999591E-05; MSFT => 0; FB => <missing>]; MSFT => series [ GOOG => 0; MSFT => 0.022050000000000007; FB => <missing>]; FB => series [ GOOG => <missing>; MSFT => <missing>; FB => <missing>]]

I've only just started using this library, so perhaps this is the intended behaviour, but the docs lead me to believe that the philosophy of this library was to propagate missing values where possible.

I did inline and modify the Stats.cov function locally and got it to output the frame I was expecting with the <missing> values by changing it to:

let covMatrix (df: Frame<'R, 'C>) =
    // Treat nan as zero, the same as MATLAB
    let corr =
        df
        |> Stats.corrMatrix
        |> Matrix.map (fun x -> if Double.IsNaN x then 0. else x)

    let stdev =
        df
        |> Stats.stdDev
        |> Series.valuesAll // <-- This and the following line were the changes I made, I feel like there might be a neater / more efficient way to do this
        |> Seq.map (Option.defaultValue nan)
        |> Array.ofSeq

    let stdevDiag = DenseMatrix.ofDiagArray stdev
    stdevDiag * corr * stdevDiag

I'd be happy to open a PR for this change or similar if this <missing> value propagation behaviour is correct.

@zyzhu
Copy link
Contributor

zyzhu commented Nov 4, 2022

This is indeed a bug. Your fix is perfectly fine. Would you mind opening a PR? Thanks!

@Choc13
Copy link
Contributor Author

Choc13 commented Nov 4, 2022

Not a problem. Will submit a PR this afternoon. Thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants