# Missing types
Likely the biggest way that Julia differs from other programming languages when it comes to missing values is the different types that are used to store empty values. In Julia, there are three different ways that missing values could be stored.

In [1]:
using DataFrames

The first of these types is the missing type. The unique thing about the missing type is that it has its own type inside of the Julia language. This means that we can use typeof() in order to see this type, and we can also pass it through functions with multiple dispatch.

In [3]:
z = missing
println(typeof(z))
println(z)

Missing
missing


In [4]:
example(z::Int64) = println("It's an integer!")
example(z::Missing) = println("It's missing!")

example (generic function with 2 methods)

In [5]:
example(missing)
example(5)

It's missing!
It's an integer!


The next type of missing value is one you might be used to, the NaN. While in Python this might be provided by the NumPy library, in Julia this is not the case. The NaN value in Julia is actually not a type, it is of the type Float64. That being said, it inherits all of the dispatch for Number > Real > Float Float64. Julia is typically pretty good at detecting what data-types you are working with, and will usually substitute NaN for missing when working with continuous features. Of course, this is not always going to be the case, but either way it is certainly something to note about NaNs.

In [6]:
nan = NaN

NaN

In [7]:
println(typeof(nan))

Float64


In [9]:
h = 5
h += NaN

NaN

In [10]:
h

NaN

The final type of missing is the nothing type. Of course, this type is pretty conventional in programming. However, it is typically not associated with missing values in data, and while this is certainly less common than NaNs or missings, I have seen this be the case before. That in mind, it is an important thing to watch out for just in case you do happen to come across a dataset that is like that.

In [11]:
z = nothing

In [12]:
typeof(z)

Nothing

In [16]:
returnnothing() = nothing

returnnothing (generic function with 1 method)

In [19]:
@code_llvm returnnothing()

[90m;  @ In[16]:1 within `returnnothing'[39m
[95mdefine[39m [36mvoid[39m [93m@julia_returnnothing_1691[39m[33m([39m[33m)[39m [33m{[39m
[91mtop:[39m
  [96m[1mret[22m[39m [36mvoid[39m
[33m}[39m


# Real Processing
Now that we have an understanding of the different sorts of missings, allow me to reveal some ways to actually work with them. The first thing you should probably know about missings, NaNs, and nothing is that only nothing can be used in a boolean context. This means that any bitwise operators, e.g. ==, <, ≥, with a boolean return are not going to work with missings or NaNs. Let's take a look at that in action:

In [26]:
z = nothing

In [27]:
z == nothing

true

In [28]:
z = NaN

NaN

In [29]:
z == NaN

false

In [35]:
z = missing

missing

In [37]:
z == missing

missing

In [38]:
z = NaN
b = missing
if isnan(z) & ismissing(b)
    println("This is true")
end

This is true


In [41]:
df= DataFrame(:A => [5, 10, NaN, NaN, 25], :B => ["A", "B", "A", missing, missing])

Unnamed: 0_level_0,A,B
Unnamed: 0_level_1,Float64,String?
1,5.0,A
2,10.0,B
3,,A
4,,missing
5,25.0,missing


In [44]:
dropmissing!(df)

Unnamed: 0_level_0,A,B
Unnamed: 0_level_1,Float64,String
1,5.0,A
2,10.0,B
3,,A


In [45]:
df

Unnamed: 0_level_0,A,B
Unnamed: 0_level_1,Float64,String
1,5.0,A
2,10.0,B
3,,A


In [46]:
sum(df[!, :A])

NaN

In [60]:
dropbad!(df::DataFrame, col::Symbol) = filter(:A => x -> !any(f -> f(x), (ismissing, isnothing, isnan)), df)

dropbad! (generic function with 2 methods)

In [61]:
dropbad!(df, :A)

Unnamed: 0_level_0,A,B
Unnamed: 0_level_1,Float64,String
1,5.0,A
2,10.0,B
