# Statistical Learning Theory 

Exercises taken from the book *Learning from Data* by Yaser Abu-Mostafa, Malik Magdon-Ismail and Hsuan-Tien Lin.

In [1]:
using Random
using DataFrames

### Exercise 1.2


Suppose that we use a perceptron to detect spam messages. Let's say that each email message is represented by the frequency of occurrence of keywords, and the output is $+1$ if the message is considered spam.

(a) Can you think of some keywords that will end up with a large positive weight in the perceptron?<br>**Solution.** Some words that will end up with large positive weight may be: discount, price, save money, buy, best price, gift, free, click here. 

(b) How about keywords that will get a negative weight?
<br>**Solution.** Words with negative weight might be: important, job, academic, meeting, opportunity, document.

(c) What parameter in the perceptron directly affects how many borderline messages end up being classified as spam?
<br>**Solution.** The threshold or bias term $b$ is the parameter that affects the most the amount of borderline messages classified as spam, cause it represents the decision boundary.

### Excercise 1.3

### Excersice 1.10
Here is an experiment that illustrates the difference between a single bin and multiple bins. Run a computer simulation for flipping 1,000 fair coins. Flip each coin independently 10 times. Let's focus on 3 coins as follows: $c_{1}$ is the first coin flipped; $c_{\text {rand }}$ is a coin you choose at random; $c_{\min }$ is the coin that had the minimum frequency of heads (pick the earlier one in case of a tie). Let $\nu_{1}, \nu_{\text {rand }}$ and $\nu_{\min }$ be the fraction of heads you obtain for the respective three coins.

In [13]:
function flip_coins(total_coins)
    states = zeros(total_coins)
    probs = rand(total_coins)
    head_indices = findall(x -> x>0.5,probs)
    states[head_indices'].= 1
    return states
end

function experiment(total_coins,total_flips)
    # Build a matrix to store every result 
    result_matrix = zeros(total_coins,total_flips)
    
    # Flip all coins repeteadly
    for i=1:total_flips
        # Flip all coins
        states = flip_coins(total_coins)
        # Find indices of heads
        heads_idxs = findall(==(1),states)
        # Update the current column in result matrix
        result_matrix[heads_idxs,i].=1
    end
    
    # Count number of heads in each row (coin)
    num_heads = [ count(==(1),result_matrix[i,:]) for i=1:total_coins ]
    # Find the coin with minimum rate of heads
    _ , c_min = findmin(num_heads)
    # Set a random coin
    c_rand = rand(1:total_coins)
    
    # Calculate the required rates
    v₁ = num_heads[1]/total_flips
    vᵣ = num_heads[c_rand]/total_flips
    vₘ = num_heads[c_min]/total_flips 
    
    return [v₁; vᵣ; vₘ; c_rand; c_min]
end  

experiment (generic function with 1 method)

(a) What is $\mu$ for the three coin selected?
<br>**Solution**.

In [14]:
Random.seed!(1);

point_a = experiment(1000,10)
μ = sum(point_a[1:3])/3
print("μ = $μ")

μ = 0.3333333333333333

(b) Repeat this entire experiment a large number of times (e.g., 100, 000 runs of the entire experiment) to get several instances of $\nu_{1}$, $\nu_{\text {rand }}$ and $\nu_{\min }$ and plot the histograms of the distributions of $\nu_{1}, \nu_{\text {rand }}$ and $\nu_{\min }$. Notice that which coins end up being $c_{\text {rand }}$ and $c_{\min }$ may differ from one run to another.
<br>**Solution**.

In [15]:
big_experiment = []
for i=1:100
    push!(big_experiment,experiment(1000,10))
end
Array(big_experiment)

100-element Vector{Any}:
 [0.6, 0.4, 0.1, 538.0, 240.0]
 [0.6, 0.6, 0.1, 191.0, 31.0]
 [0.8, 0.8, 0.0, 849.0, 610.0]
 [0.4, 0.4, 0.1, 428.0, 54.0]
 [0.6, 0.4, 0.0, 707.0, 142.0]
 [0.6, 0.5, 0.0, 829.0, 403.0]
 [0.4, 0.6, 0.1, 764.0, 12.0]
 [0.6, 0.5, 0.1, 48.0, 9.0]
 [0.7, 0.7, 0.1, 85.0, 130.0]
 [0.6, 0.6, 0.0, 299.0, 90.0]
 [0.2, 0.5, 0.0, 711.0, 347.0]
 [0.6, 0.7, 0.1, 534.0, 64.0]
 [0.7, 0.5, 0.1, 355.0, 74.0]
 ⋮
 [0.4, 0.6, 0.0, 265.0, 318.0]
 [0.5, 0.6, 0.0, 732.0, 88.0]
 [0.4, 0.6, 0.0, 459.0, 525.0]
 [0.3, 0.4, 0.1, 485.0, 38.0]
 [0.4, 0.3, 0.0, 918.0, 734.0]
 [0.4, 0.4, 0.0, 770.0, 560.0]
 [0.7, 0.5, 0.0, 571.0, 323.0]
 [0.6, 0.8, 0.1, 398.0, 147.0]
 [0.6, 0.5, 0.0, 761.0, 120.0]
 [0.5, 0.5, 0.1, 267.0, 155.0]
 [0.2, 0.6, 0.1, 666.0, 30.0]
 [0.4, 0.3, 0.0, 482.0, 13.0]

In [5]:
data = DataFrame(big_experiment, [:v₁,:vᵣ,:vₘ,:c_rand,:c_min])

LoadError: ArgumentError: columns argument must be a vector of AbstractVector objects

In [6]:
?DataFrame()

```
DataFrame <: AbstractDataFrame
```

An AbstractDataFrame that stores a set of named columns

The columns are normally AbstractVectors stored in memory, particularly a Vector or CategoricalVector.

# Constructors

```julia
DataFrame(pairs::Pair...; makeunique::Bool=false, copycols::Bool=true)
DataFrame(pairs::AbstractVector{<:Pair}; makeunique::Bool=false, copycols::Bool=true)
DataFrame(ds::AbstractDict; copycols::Bool=true)
DataFrame(kwargs..., copycols::Bool=true)

DataFrame(columns::AbstractVecOrMat,
          names::AbstractVector;
          makeunique::Bool=false, copycols::Bool=true)

DataFrame(table; copycols::Union{Bool, Nothing}=nothing)
DataFrame(::DataFrameRow)
DataFrame(::GroupedDataFrame; keepkeys::Bool=true)
```

# Keyword arguments

  * `copycols` : whether vectors passed as columns should be copied; by default set to `true` and the vectors are copied; if set to `false` then the constructor will still copy the passed columns if it is not possible to construct a `DataFrame` without materializing new columns. Note the `copycols=nothing` default in the Tables.jl compatible constructor; it is provided as certain input table types may have already made a copy of columns or the columns may otherwise be immutable, in which case columns are not copied by default. To force a copy in such cases, or to get mutable columns from an immutable input table (like `Arrow.Table`), pass `copycols=true` explicitly.
  * `makeunique` : if `false` (the default), an error will be raised

(note that not all constructors support these keyword arguments)

# Details on behavior of different constructors

It is allowed to pass a vector of `Pair`s, a list of `Pair`s as positional arguments, or a list of keyword arguments. In this case each pair is considered to represent a column name to column value mapping and column name must be a `Symbol` or string. Alternatively a dictionary can be passed to the constructor in which case its entries are considered to define the column name and column value pairs. If the dictionary is a `Dict` then column names will be sorted in the returned `DataFrame`.

In all the constructors described above column value can be a vector which is consumed as is or an object of any other type (except `AbstractArray`). In the latter case the passed value is automatically repeated to fill a new vector of the appropriate length. As a particular rule values stored in a `Ref` or a `0`-dimensional `AbstractArray` are unwrapped and treated in the same way.

It is also allowed to pass a vector of vectors or a matrix as as the first argument. In this case the second argument must be a vector of `Symbol`s or strings specifying column names, or the symbol `:auto` to generate column names `x1`, `x2`, ... automatically. Note that in this case if the first argument is a matrix and `copycols=false` the columns of the created `DataFrame` will be views of columns the source matrix.

If a single positional argument is passed to a `DataFrame` constructor then it is assumed to be of type that implements the [Tables.jl](https://github.com/JuliaData/Tables.jl) interface using which the returned `DataFrame` is materialized.

Finally it is allowed to construct a `DataFrame` from a `DataFrameRow` or a `GroupedDataFrame`. In the latter case the `keepkeys` keyword argument specifies whether the resulting `DataFrame` should contain the grouping columns of the passed `GroupedDataFrame` and the order of rows in the result follows the order of groups in the `GroupedDataFrame` passed.

# Notes

The `DataFrame` constructor by default copies all columns vectors passed to it. Pass the `copycols=false` keyword argument (where supported) to reuse vectors without copying them.

By default an error will be raised if duplicates in column names are found. Pass `makeunique=true` keyword argument (where supported) to accept duplicate names, in which case they will be suffixed with `_i` (`i` starting at 1 for the first duplicate).

If an `AbstractRange` is passed to a `DataFrame` constructor as a column it is always collected to a `Vector` (even if `copycols=false`). As a general rule `AbstractRange` values are always materialized to a `Vector` by all functions in DataFrames.jl before being stored in a `DataFrame`.

`DataFrame` can store only columns that use 1-based indexing. Attempting to store a vector using non-standard indexing raises an error.

The `DataFrame` type is designed to allow column types to vary and to be dynamically changed also after it is constructed. Therefore `DataFrame`s are not type stable. For performance-critical code that requires type-stability either use the functionality provided by `select`/`transform`/`combine` functions, use `Tables.columntable` and `Tables.namedtupleiterator` functions, use barrier functions, or provide type assertions to the variables that hold columns extracted from a `DataFrame`.

# Examples

```jldoctest
julia> DataFrame((a=[1, 2], b=[3, 4])) # Tables.jl table constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

julia> DataFrame([(a=1, b=0), (a=2, b=0)]) # Tables.jl table constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> DataFrame("a" => 1:2, "b" => 0) # Pair constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> DataFrame([:a => 1:2, :b => 0]) # vector of Pairs constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> DataFrame(Dict(:a => 1:2, :b => 0)) # dictionary constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> DataFrame(a=1:2, b=0) # keyword argument constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> DataFrame([[1, 2], [0, 0]], [:a, :b]) # vector of vectors constructor
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0

julia> DataFrame([1 0; 2 0], :auto) # matrix constructor
2×2 DataFrame
 Row │ x1     x2
     │ Int64  Int64
─────┼──────────────
   1 │     1      0
   2 │     2      0
```
