# Missing Values

are indicated by ```NaN``` (floats only) or ```missing``` (for most types).

# Load Packages

In [1]:
using Compat, Missings        #in Julia 0.6 
#using Dates                  #in Julia 0.7

include("printmat.jl")   #just a function for prettier matrix printing

printlnPs (generic function with 2 methods)

# The Effect of NaNs

Most computations involving NaNs give NaN as the result.

NaNs are often used to represent missing data.

In [2]:
a = 2.0
b = NaN
println(a+b)

NaN


## NaNs in a Matrix

The code below shows that if a matrix contains NaNs, then many calculations (eg. summing all elements) give NaN as the result. 

The NaN itself is a Float64 "number". For other types of data, you may want to use a ```Missing``` (see below).

In [3]:
z = [1.0 NaN;
     2.0 12.0;
     3.0 13.0]                 #a matrix with NaNs
println("z: ")
printmat(z)

if any(isnan.(z))                      #check if any NaNs
  println("z has some NaNs")
end

println("\nThe average of each column: ")
printmat(Compat.mean(z,dims=1))                #0.7 syntax

z: 
     1.000       NaN
     2.000    12.000
     3.000    13.000

z has some NaNs

The average of each column: 
     2.000       NaN



# Getting Rid NaNs

It is a common procedure in statistics to throw out all cases with NaNs. For instance, if `z[t,:]` is the data for period $t$ and it contains one more more `NaN` values, then it is common to throw out that entire row. 

(This is a reasonable approach if it can be argued that the fact that the data is missing is random - and not related to the subject of the investigation. It is much less reasonable if, for instance, all the returns for all poorly performing mutual funds are listed as "missing" - and you want to study what fund characteristics that drive performance.)

The code below shows a simple way of how.

In [4]:
va = isnan.(z)                   #v[i,j]=true if z[i,j]=NaN
vb = vec(Compat.any(va,dims=2))  #0.7 syntax, indicates rows with NaNs, vec to make it a vector
vc = .!vb                        #indicates rows without NaN

z2 = z[vc,:]                #keep only rows without NaNs
println("z2: a new matrix where all rows with any NaNs have been pruned:")
printmat(z2)

z2: a new matrix where all rows with any NaNs have been pruned:
     2.000    12.000
     3.000    13.000



# Missings 

can be used to indicate missing values for most types (not just floats). Missings are built in to Julia 0.7 and available in the Missings package for earlier versions.

You typically have to remove them before making calculations on the matrix.

In [6]:
z = [1 missing;
     2 12;
     3 13]                 #a matrix of integers with missing values
println("z: ")
printmat(z)

z: 
         1   missing
         2        12
         3        13



In [7]:
if any(ismissing.(z))                      #check if any NaNs
  println("z has some missings")
end

z has some missings


In [8]:
vc = .!vec(Compat.any(ismissing.(z),dims=2))  #0.7 syntax

z2 = z[vc,:]                #keep only rows without NaNs
println("z2: a new matrix where all rows with any missings have been pruned:")
printmat(z2)

z2: a new matrix where all rows with any missings have been pruned:
         2        12
         3        13

