# Introduction to Machine Learning through Julia

In this subject, the practice is focused on the study of different Machine Learning models. The aim is to understand how they work and how to use them to solve real-world problems. The main points to cover will be: 

* Artificial Neuron Networks
* Support Vector Machines 
* Decision Trees 
* k-Nearest Neighbours (KNN)
* Ensemble Approaches

To cover all these concepts, the programming language Julia will be used. The latter, being a language mainly for development and research, will allow us to indeep the whole process of using the aforementioned systems.  In a first approach, in order to gain familiarity with the language, an ANN training will be implemented. This will also serve as an introduction to ANNs, their correct definition and training.

From a general view, the practicum will cover two tasks in parallel: 

* In the first weeks, through tutorials like this one, coding tasks of different parts of a Machine Learning system, such as a cross validation or the metrics to be used, will be performed. Given that during these first weeks the degree of development of the practical application will be very restricted, the practices will take a well-known problem, the iris flower classification problem, as a benchmark.  However, as the code is developed in these tutorials, it is expected to be integrated into the final system. In other words, in each practice, a part of the final system will be developed.


* In parallel, each working team will have to carry out the tasks inherent to the problem they have set out to solve.  These tasks include data acquisition, data analysis, loading the database in Julia, etc.  As the tutorials progress, these advances must be integrated and applied to solve the problem, which will be reflected in the different memory deliverables. 

Therefore, during the first weeks there will be a double objective: on the one hand to learn how to implement the different parts of a machine learning system, and on the other hand to start working on the proposed practical application. For this purpose, the code that is being developed will be integrated incrementally. Once the tutorials have been completed, the work will focus solely on solving the proposed problem, using different approaches.

As previously mentioned, the iris flower problem will be used as an illustrative problem for the development of the code in the tutorials.  This is possibly the best known database in the field of pattern recognition.  This database was published by Fisher in 1936 as an example of linear discriminant analysis, and since then it has been used on many occasions as a benchmark for new systems or simply for learning. This database contains 150 instances belonging to 3 classes: Iris Setosa, Iris Versicolor and Iris Virginica. Each of these classes has 50 instances belonging only to that class giving a simple classification problem.  It should be noted that one of these classes is linearly separable from the other two, while they are not linearly separable from each other.  Each instance consists of 4 attributes, which are lengths and widths of sepals and petals, measured in centimetres.

The database to be used in these practices can be downloaded from the UCI website, at the following address: [http://archive.ics.uci.edu/dataset/53/iris](http://archive.ics.uci.edu/dataset/53/iris) ,specifically, the file to be downloaded is called "iris.data". 

This first tutorial aims to install and become familiar with Julia. To do so, a code will be developed to load this database in Julia and perform a basic preprocessing. This basic preprocessing has to do with the use of numerical inputs and/or outputs instead of categorical ones, and the normalisation of the data. 

With respect to the treatment of categorical values, this is a very common step, since many models such as NRs do not accept categorical inputs and outputs, but only work with numerical values. In contrast, many of the databases have categorical inputs and/or outputs rather than numerical ones. Therefore, in order for models that only accept numerical inputs and outputs to process them, it is necessary to convert these categorical values into numerical values: 


* If there are only two categories, e.g. true/false, green/blue, wood/metal or expensive/cheap, that attribute is transformed into a single attribute, which takes the value false or 0 for one category and true or 1 for the other. 

* If there are more than two categories, for example red/green/blue, wood/metal/plastic or car/boat/plane/train, it is transformed into as many attributes as there are possibilities, one for each category, with value 1 for those instances that belong to it and 0 for those that do not.  For example, in the red/green/blue case, the patterns with value "red" will become (1, 0, 0), the "green" ones (0, 1, 0) and the "blue" ones (0, 0, 1). 

* A third possibility, when there are more than two categories, is to convert them into a single real number.  For example, A/B/C/D could become 0/0.33/0.66/1. However, this case is only interesting when in the real world there is an order A < B < C < D, and is therefore not applicable in the case of iris flowers.

In the case of the iris flower database, this situation occurs only in the desired output, which needs to be encoded. 

With respect to data normalisation, training a model will be much faster if the inputs provided are on the same scale, i.e. if the model is spared from having to learn the relationship between the scales on which each attribute moves.  This process of converting the inputs so that they are all in the same range is called **normalisation** or **standarization**, and is one of the most common and important types of pre-processing.  This type of preprocessing allows a simpler model to solve more complex problems than without it, since it does not need to use part of it to learn the relationship between the scales of the input attributes. 

### Question 1.1
> ❓ Would this pre-processing be necessary when the inputs are the intensity values of each pixel in a black and white image? Why?

`Answer here`

There are many other types of pre-processing, such as noise clean-up, PCA analysis, etc.  Some of them will be covered in depth in other subjects.  Regarding normalisation, more is explained in theory classes, but in practice one of the following two types should be enough.  of these two types is used in practice:

1. Normalisation between maximum and minimum. For each attribute, it takes the lowest ($min$) and highest ($max$) values, and changes all $v$ values to pass them to the new interval $[newmin, newmax]$ as follows:<br/><br/> $$v' = \frac{v-min}{max-min}\times(newmax-newmin) + newmin$$ <br/>Generally, one usually moves to an interval between $[0, 1]$, which simplifies the equation to:<br/><br/> $$v' = \frac{v-min}{max-min}$$ <br/> This type of normalisation is appropriate when you are certain that the data is bounded (both top and bottom), i.e. it is within an interval.  What you are doing is changing the interval of the data to the interval $[0, 1]$ and matching each data to its new value within the new interval.  However, if you suspect that one of these data might fall outside the interval and take an excessively high or low value, this transformation can be very harmful.  In this case, this outlier would be assigned a new value of 1 if it is excessively high, and the rest of the values would oscillate close to 0 with little difference between them. On the other hand,  the value would be 0 if it is excessively low, while the rest of the values would be close to 1 with little difference between them. If it is suspected that there may be cases like this, another kind of normalization should be chosen.


### Question 1.2
> ❓ In the particular case that $min=max$, a different preprocessing can be performed on this attribute, what would it consist of?

`Answer here`

2. Normalisation to the mean 0. This kind of normalisation is more robust to outliers.  For each attribute, the mean and standard deviation of all the values it takes are taken, and a simple transformation is made:<br/><br/>$$v'=\frac{x-\mu}{\sigma}$$<br/> Thus, each attribute will have a mean ($\mu$) of 0 and a standard deviation ($\sigma$) of 1. Some values will fall outside this range, but this is not a problem for ANNs, which simply accept real values as inputs.

It is important to bear in mind that, no matter which normalisation method is applied, it must be carried out independently for each attribute. That is to say, if you have a database with $N$ patterns and $L$ attributes, you would have to perform $L$ different normalisations.

### Question 1.3
> ❓ Taking into account that Machine Learning models in general assume that patterns are distributed in rows, but that for ANNs they are arranged in columns, in which cases should each row be normalised separately and in which cases should each column be normalised separately?

`Answer here`

Generally speaking, knowledge about the nature of each attribute is often helpful in achieving models that deliver better results.  The more information about the data that is "fed" into the model, the better the model will perform. A good example of introducing information about the data is data normalisation.  This could be taken to the extreme and a different form of normalisation could be chosen, which is thought to be most appropriate, for each of the attributes.  This may lead to the decision, for example, to normalise an attribute "temperature" between maximum and minimum, while the attribute "distance" is normalised to mean 0. In most cases, however, one of these two normalisations is usually chosen and applied to all input attributes of the ANN. 

It is also important to note that this process also occurs at the outputs of the ANN. That is, if the outputs are in different intervals, the ANN has to learn this as well, so the ANN can be "helped" by normalising the output data.  This is a process that is done in regression problems, but not in classification problems.

### Question 1.4
> ❓ Why is it not performed in classification problems?

`Answer here`

### Important:  
In this way, the inputs (and desired outputs) that are applied to a model are no longer the original data, but the transformed ones.  Thus, once trained, a model is not ready to be passed the original data, but if data is to be applied, it will have to be transformed in the same way. For this very reason, no matter which way of normalisation is applied, it is necessary to save the  parameters used in the normalisation for each attribute (maximum and minimum or mean and standard deviation).  Similarly, if the desired output has been normalised (regression problems), the model will have learned to produce a normalised output, so it will have to be denormalised, which again means that the normalisation parameters of the desired outputs will have to be saved.  In summary, to apply new data to a model, the process will follow the next steps: 

1. Normalise the data according to the parameters that were used in the training set. 
2. Apply the normalized data to the model
3. (Only for regression problems) Denormalise the outputs of the model

In this first assignment, the resolution of the chosen classification problem will be studied by means of different Machine Learning techniques.  To do so, during several weeks, exercises will be presented, which will allow the development of the code that will later have to be integrated in a script that will incrementally grow during the following weeks.

In this the first week, you are asked to: 

1. Have a first contact with Julia, install the necessary packages and learn the most basic concepts of Julia.

2. Download the iris flower database from the indicated address and load the database in Julia. 
    * Create a matrix with the inputs and another one with the desired outputs, each one with the most appropriate type.

In [None]:
# Type the code to load the data 
# and to convert to the appropiate type in each case
#=
using Pkg
# Pkg.add("RDatasets")
using XLSX:readdata
using RDatasets
# Load the Iris dataset
iris = dataset(raw"E:\usc\master\courses\ML1\MIA_ML1-main\data", "iris.data")
=#
#TODO
# Type the code to load the data 
# and to convert to the appropiate type in each case


# Imports first
using Pkg

# Only run once to install packages
#=
Pkg.add("CSV") 
Pkg.add("DataFrames") 
Pkg.add("CategoricalArrays")
# Import the packages
using CSV
using DataFrames
using CategoricalArrays
=#

# ==================================================
#1. Load dataset into a DataFrame
# ==================================================
iris = CSV.read("E:/usc/master/courses/ML1/MIA_ML1-main/data/iris.data",  DataFrame, header=true)


# Show first 5 rows
first(iris, 5)

println("Dataset Columns: ", names(iris))   # prints (rows, cols)

# print dataframe shape
println("Dataset Shape: ", size(iris))   # prints (rows, cols)

# rename Columns
rename!(iris, [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth, :Species])

# print renamed columns
println("Dataset Columns: ", names(iris))   # prints (rows, cols)

# ==================================================
#for later usage
# ==================================================
train = select(iris, Not(:"Species"))   # features
target = select(iris,"Species")

# print Species unique values
println("Dataset Columns: ", unique(iris.Species) )   # prints (rows, cols)
unique(iris.Species)          # shows distinct labels

# ==================================================
# Encode Species
# ==================================================
using CategoricalArrays

# Convert to categorical
iris.Species = categorical(iris.Species)

# Encode as numeric (1, 2, 3)
target = levelcode.(iris.Species)


# some print
println("target firsts: ", first(target, 5) )   # prints (rows, cols)
println("train firsts: ", first(train, 5) )   # prints (rows, cols)

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\gianp\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\gianp\.julia\environments\v1.11\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\gianp\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\gianp\.julia\environments\v1.11\Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\gianp\.julia\environments\v1.11\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\gianp\.julia\environments\v1.11\Manifest.toml`


Dataset Columns: ["5.1", "3.5", "1.4", "0.2", "Iris-setosa"]
Dataset Shape: (149, 5)
Dataset Columns: ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Species"]
Dataset Columns: String15["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
target firsts: [1, 1, 1, 1, 1]
train firsts: [1m5×4 DataFrame[0m
[1m Row [0m│[1m SepalLength [0m[1m SepalWidth [0m[1m PetalLength [0m[1m PetalWidth [0m
[1m     [0m│[90m Float64     [0m[90m Float64    [0m[90m Float64     [0m[90m Float64    [0m
─────┼──────────────────────────────────────────────────
   1 │         4.9         3.0          1.4         0.2
   2 │         4.7         3.2          1.3         0.2
   3 │         4.6         3.1          1.5         0.2
   4 │         5.0         3.6          1.4         0.2
   5 │         5.4         3.9          1.7         0.4


> **Question**: What are the dimensions of the arrays that we expect? 

In [8]:
# Print both inputs and targets
# Tip: you can do this with a defensive programming

#TODO
println("Train Shape: ", size(train))   # prints (rows, cols)
println("target Shape: ", size(target))   # prints (rows, cols)

Train Shape: (149, 4)
target Shape: (149,)


3. Develop a program that allows to encode the categorical values it has in Boolean values, distinguishing the two most common cases (having only two categories and having more than two categories). It should considered the following points: 
    * The case to be considered between the two possibilities must be automatically detected in the code.  
    * The program should start from a vector (one-dimensional array) with the values of a desired attribute or output, and return a vector (one-dimensional array) or matrix (two-dimensional array), depending on the encoding of the desired attribute or output. 
    * The code must be vectorised, i.e. you cannot use loops to go through the patterns. The only loop that is allowed is for going through classes or attributes, but only this one is allowed. 
    * This program will be applied to each of the categorical inputs/outputs of the chosen problem.  To do so, keep in mind that in the next tutorial this one will be turn into a function
    * _Hint_: it may be interesting to use the function `unique`.

In [9]:
# Code to encode the categorical outputs of the problem
n_unique_values =  unique(iris.Species)
println("Unique Columns: ", n_unique_values )   # prints (rows, cols)

# Get unique species labels
labels = unique(iris.Species)
println("Unique labels: ", labels )   # prints (rows, cols)
# Map each species to an integer
label_to_int = Dict(label => i for (i, label) in enumerate(labels))
# Encode into integers
encoded = [label_to_int[x] for x in iris.Species]
#TODO

# General One-hot encoding
function onehot_encode(y, n_classes)
    m = zeros(Int, length(y), n_classes)
    # vectorized assignment using CartesianIndex
    m[CartesianIndex.(1:length(y), y)] .= 1
    return m
end
onehot = onehot_encode(encoded, length(labels))


Unique Columns: String15["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
Unique labels: String15["Iris-setosa", "Iris-versicolor", "Iris-virginica"]


149×3 Matrix{Int64}:
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 1  0  0
 ⋮     
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1
 0  0  1

In [12]:
# Step 1: Convert string labels to integers
labels = unique(iris.Species)  # ['setosa', 'versicolor', 'virginica']
label_to_int = Dict(label => i for (i, label) in enumerate(labels))
y_int = [label_to_int[x] for x in iris.Species]  # <-- use iris.Species

# Step 2: Get number of labels and samples
n_labels = maximum(y_int)
n_samples = length(y_int)

# Step 3: One-hot encoding
if n_labels == 2
    # Binary: single column (0/1)
    onehot = reshape(y_int .- 1, n_samples, 1)
else
    # Multi-class: one-hot matrix
    onehot = zeros(Int, n_samples, n_labels)
    onehot[CartesianIndex.(1:n_samples, y_int)] .= 1
end

# Step 4: Check result
println("Labels mapping: ", label_to_int)
println("One-hot matrix (first 5 rows):")
println(onehot[1:5, :])

# Step 5: Set as new target
target = onehot

# Step 6: Check
println("Target shape: ", size(target))
println("Target First 5 rows:")
println(target[1:5, :])


Labels mapping: Dict{String15, Int64}("Iris-virginica" => 3, "Iris-setosa" => 1, "Iris-versicolor" => 2)
One-hot matrix (first 5 rows):
[1 0 0; 1 0 0; 1 0 0; 1 0 0; 1 0 0]
Target shape: (149, 3)
Target First 5 rows:
[1 0 0; 1 0 0; 1 0 0; 1 0 0; 1 0 0]


4. Develop the code that, from the input data set, extracts the maximum, minimum, mean and standard deviation values for each column.  To do this, consult the functions `minimum`, `maximum`, `mean` and `std`. The latter two require the Statistics package to be loaded (`using Statistics`). In addition, these functions accept the additional keyword `dims`.  When it is used properly, `dims` returns a one-row matrix (not a vector) with as many columns as attributes, containing these values of minimum, maximum, mean and standard deviation. Once these one-row matrices have been extracted, use one of the two ways explained here to normalise the database entries.  This task can be done very easily by doing simple broadcast subtraction and division operations, wit not need of loops, as shown in Julia's tutorial. In addition, it will be necessary to consider the following cases: 

   * If it is normalised between maximum and minimum and in some attribute the minimum is equal to the maximum.
   * If it is normalised by mean and standard deviation and in some attribute the standard deviation is 0. <br/>
   
   Either case, results in the same situation: all patterns take the same value for an attribute. In this case, a common solution is to eliminate the attribute, since it does not provide any information. Another, simpler, possibility is to assign it a constant value, e.g. a value of 0. 
   One function that can be useful for converting a two-dimensional matrix with a single row or column into a vector is the function vec. Which allows the elements of the matrix to be easily referenced.




In [None]:
using Statistics
using DataFrames

#TODO
# 1. Select numeric columns only
numeric_cols = names(iris, Number)
train_numeric = select(iris, numeric_cols)

# 2. Convert to matrix
X = Matrix(train_numeric)

# 3. Column-wise statistics
col_means = mean(X; dims=1)
col_stds  = std(X; dims=1)
col_mins  = minimum(X; dims=1)
col_maxs  = maximum(X; dims=1)

println("Column-wise means: ", col_means)
println("Column-wise stds: ", col_stds)
println("Column-wise col_mins: ", col_mins)
println("Column-wise col_maxs: ", col_maxs)

# 4. Normalization


Column-wise means: [5.8483221476510066 3.051006711409396 3.7744966442953025 1.2053691275167786]
Column-wise stds: [0.8285940572656173 0.4334988777167476 1.7596511617753423 0.7612920413899604]
Column-wise col_mins: [4.3 2.0 1.0 0.1]
Column-wise col_maxs: [7.9 4.4 6.9 2.5]


In [20]:
println("Numeric columns only: ", numeric_cols)

Numeric columns only: ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]


As a result of this practice, after applying this function to categorical inputs or outputs, two matrices (desired inputs and outputs) should be available for use in the following practices.

## Going through this exercises with Julia

The first thing to be performed is to load the data, in this case is a excel file and Julian has a package which helps us with this point. Remember that the first time thay are used, they need to be pre-compiled. For example, image a situation where the data is in Excel sheet format, the `readdata` function of the XLSX package can be used to load it. For this, this package must be previously installed, which can be done by typing in the command line the following:
```julia
    # Only required the first time and when Julia is already setup
    import Pkg; Pkg.add("XLSX");
```

And to use the function in a script, you can write `using XLSX: readdata` at the beginning of the script. 
To do this exercise, you need to make two calls to load the two arrays. If this data is in an Excel sheet, these calls will look something like this:

```julia
    using XLSX:readdata

    inputs = readdata("iris.xlsx","inputs","A1:D150"); 
    targets = readdata("iris.xlsx","targets","A1:C150");
```

In this way, two variables will be loaded into memory, one with the input matrix and the other with the desired output matrix.

Alternatively, the data could be in a text file with some kind of delimiter. This is the case, for example, with the `iris.data` file which can be downloaded from [UCI repository - Iris](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/). The DelimitedFiles package is useful for this. This package can be installed in the usual way with:
 

In [None]:
import Pkg; Pkg.add("DelimitedFiles");

To use it, it can be loaded with keyword `using` at the beginning of the script. One the library is loaded, the dataset can be read into memory like:

In [None]:
using DelimitedFiles 

dataset = readdlm("iris.data",',');

As you can see, the first parameter is the name of the file, while the second is the delimiter or delimiters to be used. In this case, the entire database is loaded into a single variable called a dataset, which will be a two-dimensional array.  In most cases it will be necessary to separate the inputs from the desired outputs. A similar code to the following one would do the job: 

In [None]:
inputs = dataset[:,1:4];
targets = dataset[:,5];

The bracket operator allows you to reference a particular piece of data or part of an array, the first element referring to rows and the second to columns. When a colon (:) is used, all rows or all columns are referenced, thus returning an array instead of an element.  This will be further explained later in this tutorial. In the particular case of the iris flowers database, the first four columns are the inputs and the fifth one corresponds to the desired output, that is why in these two lines you can see the values 1:4 and 5. In another database, the columns are going to be different.

It is important to bear in mind that every variable has a type.  Julia provides a hierarchy of types where the root type is called `Any`. That is, any variable will be of type `Any`.  In this case, reading from a spreadsheet a large amount of data that could be of a different nature (numbers, dates, strings, etc.), it returns a two-dimensional array where each element is of type `Any`. In Julia, this is represented as `Array{Any,2}`, where `Any` indicates the type of each element of the array, and the 2 indicates the number of dimensions.  In the following image the hierarchy of the most common types can be seen (Source: Learning  Julia:  Abstract,  Concrete,  and  Parametric  Types  by  Spencer Russell, Leah Hanson),

![Chart with the common types of julia arranged in a tree shape](./img/JuliaTypes.png "Julia Types")

For example, as you can see, `Int64` is a subtype of `Signed`, `Integer`, `Real`, `Number` and `Any`. To see what type a variable is, you can use the function `typeof`. Alternatively, when a particular type is requiredm this can be check with the function `isa`. The types, as well, also have a certain type, which is `DataType`.  The type of `DataType` is `DataType`, which is a subtype of `Any`. Therefore, any element, including types, are of type `Any`, since all types are subtypes of `Any`. For example, try to guest the following types before executing the lines to check it.

In [None]:
typeof(Array{Float64,2}) 
miguel.leal@rai.usc.es

In [None]:
typeof(DataType)

In [None]:
typeof(Any)

In [None]:
isa(DataType, Any)

In [None]:
isa(Any, Any)

In [None]:
isa(Array{Float32,2}, Any)

In [None]:
isa(typeof(Array{Float64,2}), Any)

The most common types used in Machine Learning to store numerical data will generally be `Float32` or `Float64`. Although, it is undeniable that `Float32` is the most used in the world of Machine Learning because it is the data type used by most Graphic Processing Units (GPUs), and it is the type we will use in this subject, as it provides sufficient precision for the work to be done. 

Therefore, it is necessary to convert the data we will use, from `Array{Any,2}` to `Array{Float32,2}`. One possibility is to use the function `convert`, which attempts to do a conversion to a specified type, which could look something like:

In [None]:
inputs = convert(Array{Float32,2},inputs); 
targets = convert(Array{Float32,1},targets); 

In this last example, the desired outputs have been converted to a two-dimensional array of real values, which is useful in regression problems with several outputs.  However, in a classification problem such as iris, this line would give an error. The reason is that targets is of type `Array{Any,1}`, where each element is a `String` which cannot be directly converted to `Float32`.  This preprocess has to be always adapted for each particular problem.  For example, if it is a two-class classification problem and the file has numeric values of 0 and 1 for each class, this could be converted to boolean values with something like: 

```julia
        targets = convert(Array{Bool,1}, targets); 
```

If the problem is a classification problem and there are more than two classes, a slightly more complex conversion would have to be performed. This is a matrix, where the number of columns is equal to the number of classes and for each pattern we have a value of true in the column corresponding to the class to which it belongs and false in the rest, as in the exercises of this tutorial. 

Another option is to force a typecast by using the type itself as a function.  For example, you can specify a number to be of a particular type using something like `Float64(8)`, `Float32(8)`, `Int64(8)`, `UInt32(8)`, etc.  However, you cannot do ~~`Float64(inputs)`~~, because inputs is not of type `Number`, and therefore the conversion is not allowed. Instead of forcing the type of the array, you want a new array to be created where each element is the result of typecasting the corresponding element of the initial array, i.e. a broadcast of the typecasting. 

One of the key features of Julia is in the handling of multi-dimensional matrices. Julia allows, among other things, to apply functions to all the elements of a matrix and to construct the resulting matrix in an automated way.  In this case, the function is said to be broadcasted over the entire matrix. Let see an example step by step of this procedure. First define the function to be use to broadcast: 

In [None]:
# Definition of a function to calculate the square of an element
squared(x::Real) = x*x

In this case, the type of the argument has been specified as `Real`, which forces the argument to be `Int32`, `Int64`, `Float32`, `Float64`, etc. (if the type of the argument is not specified, by default Julia understands that it is of type Any).  You are therefore defining a function between numbers, not between arrays. For example, the following call would give an error: 

```julia
    squared([1 2 3]) # => Raises MethodError: no method matching squared(::Matrix{Int64})
```

since this function is defined between numbers and it is passing as argument a matrix, specifically of one row and 3 columns. However, if you wish to construct a new matrix of the same size as the original matrix where each element is the result of applying this function to the corresponding element of the original matrix, this can be done by writing a dot `.` after the name of the function, as follows:

In [None]:
squared.([1 2 3]) 

Julia is thus indicated to apply this function on an element-by-element level.  These broadcast operations allow you to develop a cleaner code, since you can avoid writing loops, and also more efficient, since Julia can parallel these operations in different cores.  An alternative way to perform this process would be as follows:

```julia
    [squared(x) for x in [1 2 3]]
```

Another example but this time with two arguments would be the following code:

In [None]:
# Define an function to add two numbers
add(a::Real, b::Real) = a+b

# add([1 2 3],[2 3 4]) => This would give an error because it is not defined for matrices
add.([1 2 3],[2 3 4])

In general, the broacasting of operations can be done in this matrices becasuse they have the same dimension, but could it be use if any of the matrix is smaller or even a single element?

### Questión 1.5
> ❓ What would be the result of  add.(1,[2 3 4]) and add.([1 2 3],3)? 

In [None]:
# TODO

In general, the most common mathematical operations have the broadcasting implemented.  For example, lets be `A` and `B` two matrices of the same size, to operate element by element, you can do `A.+B`, `A.-B`, `A.*B`or `A./B`. Another example is to do `A.^2`, where a new matrix is constructed where each element is the corresponding element, squared.

Returning to the type specification problem with these concepts clear, the following lines would therefore be equivalent:

```julia
    inputs = Float32.(inputs); 
    inputs = [Float32(x) for x in inputs]; 
    inputs = convert(Array{Float32,2},inputs);
```

In [None]:
#Choose the more efficient one and executed here


One key issue that will be important in the definition of functions is a good understanding of Julia's type system. As it was already mentioned, any element, including types, are of type Any, since all types are subtypes of `Any`.  However, one must be careful with types based on others, e.g. arrays, where the elements are of a particular type. In this case, for example, a variable that is of type `Array{Float32,2}` will also be of type `Any`, but not of type `Array{AbstractFloat,2}`, Array{Real,2} nor Array{Number,2} nor Array{Any,2},  because, although Float32 is a subtype of `AbstractFloat`, `Real`, `Number` and `Any`, the type `Array{Float32,2}` is not a subtype of `Array{AbstractFloat,2}`, `Array{Real,2}`, `Array{Number,2}` or `Array{Any,2}`. To indicate whether in a type of this style its elements are subtypes of others, the `<:` operator shall be used as can be seen in the following example. This is quite useful in function definitions.

### Question 1.6
> ❓ Which would be the answer of the following lines? Think about before executing them.

In [None]:
typeof(inputs) 

In [None]:
isa(inputs, Any) 

In [None]:
isa(inputs, Array) 

In [None]:
isa(inputs, Array{Float32,2}) 

In [None]:
isa(inputs, Array{Real,2}) 

In [None]:
isa(inputs, Array{Number,2}) 

In [None]:
isa(inputs, Array{Any,2}) 

In [None]:
isa(inputs, Array{<:Real,2}) 

In [None]:
isa(inputs, Array{<:Number,2})

In [None]:
isa(inputs, Array{<:Any,2}) 

Additionaly, the `<:` operator can also be used to check whether one type is a subtype of another, as can be seen in the following example:

In [None]:
Array{Float32,2} <: Any

In [None]:
Array{Float32,2} <: Array 

In [None]:
Array{Float32,2} <: Array{Any,2}

In [None]:
Array{Float32,2} <: Array{<:Real,2}

In [None]:
Array{Float32,2} <: Array{<:Any,2}

As noted above, this conversion of the targets is useful for regression problems. However, if the problem is a binary classification problem, and we are given these targets  as numeric values, 0 or 1, the following lines would be equivalent: 
```julia
    targets = Bool.(targets); 
    targets = [Bool(x) for x in targets]; 
    targets = convert(Array{Bool,1},targets);
```

In [None]:
#Chose one


With respect to vectors and matrices of Boolean values, there are two types that in most cases can be used interchangeably, which are `Array{Bool,N}` and `BitArray{N}`, where N indicates the dimensionality of the array.  The `Array{Bool,N}` type stores each Boolean value as a value of type `Bool`, which is represented internally as a value of type `UInt8`. Therefore, if the array has n elements, it will need n bytes to store it. On the other hand, the `BitArray{N}` type stores each Boolean value as a bit, so n elements need n/8 bytes to be stored, a much smaller amount than the `Array{Bool,N}` type. Depending on the situation, it may be more efficient to store in one way or the other. In any case, the most common operations are defined for both types, so in the vast majority of cases they are interchangeable and therefore one or the other can be used. When defining functions, it is important to bear in mind that both are subtypes of `AbstractArray{Bool,N}`, as can be seen in the following examples:

In [None]:
BitArray{2} <: AbstractArray{Bool,2} 

In [None]:
Array{Bool,2} <: AbstractArray{Bool,2}

In this way, we will have a matrix with the inputs of type `Array{Float32,2}`.  The matrix with the  targets will need to be constructed depending on the nature of the problem to be solved.   Although `Array{Float32,2}` is the type that will be used the most in this course, bear in mind that data types in Julia are very flexible. For example, you could have a variable containing a three-dimensional array where each element is a vector, the type would be  
`Array{Array{Float32,1},3}`.

### Question 1.7
> ❓ What type will the objects [[[]], [[8]] and [[8.]] have?

In [None]:
# Answer here

Generally speaking, in Machine Learning, in the matrices each instance is in a row. Whereas, in the columns of the input matrix the attributes are represented, in the target matrix each of the outputs is represented.  Therefore, both matrices must have the same number of rows.

In order to calculate the number of rows and/or columns of a matrix, the function `size` can be used. This function returns a tuple, with the number of elements equal to the number of dimensions, where each element indicates the size of that dimension.  For example, the call

In [None]:
size(inputs)

For the Iris problem it should return (150, 4), i.e. 150 rows and 4 columns. It is also possible to call this function indicating from which dimension you want to read the size. For example:

In [None]:
size(inputs,1)

This instruction should return a 150 value. In this case, if the loading of the database has been done correctly, both matrices should have the same number of rows. However, such issues should often be checked in order to find possible errors.  For that, in many parts of the code it is often interesting to introduce checks to verify that everything is correct.  When this check is executed, if it is not true, the system should give an error. This is called defensive programming. In the case of Julia, this can be done with the macro `@assert`, to which the check to be performed is indicated, and, optionally, the error message that should appear, for example:

In [None]:
@assert (size(inputs,1)==size(targets,1)) "The number of rows in inputs and targets do not match"

As the variable targets is a vector, i.e. a one-dimensional array, the previous call to `size(targets,1)` could be replaced by `length(targets)`.

At this point, and once the exercises of this practice have been carried out, two matrices should be loaded in memory, both with the same number of rows.  **Important:** Unlike in the rest of the models, in the world of Artificial Neural Networks, it is usually understood that each instance is represented in a column of the input matrix, with the rows being the attributes, and the rows of the target matrix being the outputs of the ANN.

Therefore, the first tutorial together with the introduction to Julia would conclude at this point. You should remembre t complete the exercises at the begining of this tutorial. Additionaly, it would be recommended to copy that same piece of code to a `.jl` file.

In the remaining of this tutorial, the spotlight is going to be on the use of matrices together with the broadcast of functions which is one of Julia's strong points, as it was aforementioned. It is particularly efficient and it is done in a very similar way to Matlab. So, we are going along with this objetive, we are going to show some common operations.

To create a vector, its elements can be enclosed in square brackets, separated by commas, for example: 

In [None]:
M = [1, 2, 3]

To create a matrix, simply enclose its elements in square brackets, separating the rows by semicolons `;`, for example:

In [None]:
M = [1 2 3; 4 5 6]

To access an element of the matrix, enter the name of the matrix followed by the row and column to be accessed in square brackets: 

In [None]:
M[2,3]

In this particular case, `M` is a bidimensional matrix, so two values have been indicated in square brackets.  If it has a different dimensionality, it would be necessary to indicate a value for each dimension.  For example, if it was 3-dimensional, it would be necessary to write `M[2,3,1]`. 

### Question 1.8
> ❓ What will be the result of the following calls? When representing a vector as a matrix, will it be a row matrix or a column matrix? Does the third call return a vector or a two-dimensional matrix? 

In [None]:
typeof([1,2,3])

In [None]:
typeof([1;2;3])

In [None]:
typeof([1 2 3]) 

In [None]:
typeof([1 2 3; 4 5 6])

The sequence operator `:` is used in many languages as a built-in range operator, which is used to create vectors. For example, in Matlab, `J:K` is equivalent to creating the vector `[J, J+1, ..., K]` whenever `J < K`.  Also, `J:D:K` is equivalent to [J,J+D, J+2* D,..., K]. For example, the following operations are equivalent in Matlab: 

```matlab
    1:3
    [1 2 3]
```
However, in Julia there is a slight difference, although the operability remains the same. The difference is that J:D:K does not create a vector but an element of type `UnitRange` or `StepRange`, which stores the initial and final indices and the increment, and which can be used in the same way as a vector.  Generally speaking, this eliminates the need to create vectors when they are absolutely necessary, for example in loops.  Moreover, this is done transparently to the developer, since, as mentioned above, the operability is the same as in the case of creating explicit vectors. 

The operator : can also be used to select rows, columns or parts of an array. These operators can be used in conjunction with the word end, which indicates that the range will end at the last value of the row or column. For example: 

In [None]:
M[:, 1]     # retrieve the first column of the matrix

In [None]:
M[1, 2:end] # retrive the first row only columns from secong to the end

In [None]:
M[2, :]     # retrive the second row

**Important:** When referencing elements of an array for each dimension where no range is given but a single value, e.g. a row or a column, that dimension will be lost. Beware of the result in the previous examples. 

As previously mentioned, the operability with this object is the same as with vectors.  For example, these two lines give the same result:

In [None]:
M[1, end:-1:1] 

In [None]:
M[1, [3, 2, 1]]

In these two examples, as you can see, one dimension is "lost" since only one row, the first one, is being referenced. Therefore, the result will be a vector, that is, an array of dimension 1. Something similar happens when a vector (array of dimension 1) is referenced by making an element.  For example, M[3] in a vector "loses" the dimension it had and returns a 0-dimensional object (a scalar value).  If instead of referencing by a scalar, it is referenced by a range or a vector, then it will not "lose" that dimension, but possibly decrease it.  For example:

In [None]:
M = [1, 2, 3];
M[:] 

In [None]:
M[1:end] 

In [None]:
M[end:-1:1] 

In [None]:
M[1:3] 

In [None]:
M[[1,2,3]] 

In [None]:
M[[1, 2, 3, 1, 2, 3]] 

In [None]:
M[[2]] 

In all these cases a vector is returned. However, the last example is interesting to finish understanding how the operator works to reference parts of arrays.  In this example, a single element vector is returned. A vector is returned because a vector has been used for referencing, and it has only one element because this vector used for referencing has only one element.  Therefore, what is returned is a different object than if it had been referenced by M[2], the result of which is a scalar value, namely the second element of the M vector. 

### Question 1.9
> ❓ If M were a 4-dimensional matrix, how many dimensions would the results of the following operations have?

Q = #TODO Define a 4 dimension matrix
length(size(Q[:, :, :, :])) == #COMPLETE

In [None]:
length(size(Q[1, 2, 3, 4])) == 1 #COMPLETE

In [None]:
length(size(length(size(Q[1, 2, 3, :] )) == #COMPLETE

In [None]:
length(size(Q[:, [1, 2, 3], 4, :] )) == #COMPLETE

In [None]:
length(size(Q[:, [1, 2, 3], :, :] )) == #COMPLETE

In [None]:
length(size(Q[[3, 2], 2, 4, :] )) == #COMPLETE

In [None]:
length(size(Q[[3, 2], [2], [4], [1]]  )) == #COMPLETE

In [None]:
length(size(Q[1, [1, 2, 3], 4, :])) == #COMPLETE

To transpose matrices, you can use the function `transpose`, or use the `'` operator:

In [None]:
M'

As a result, the object it returns is not of type `Array`, but a more complicated type that encapsulates an `Array`.  Julia does this so as not to have to reserve memory for the new transposed array, but simply reference its elements in another order, taking advantage of the memory that is already allocated.  This object, despite not being of type `Array`, is a subtype of `AbstractArray`, so it can be used indistinctly as if it were an Array.

As indicated above, Julia allows you to braadcasting a function to all the elements of a matrix and also to perform operations between the elements of matrices located in the same position. For the latter, operators such as multiplication, division or power are prefixed with a dot `.`:

In [None]:
N = [10 20 30; 40 50 60]; 

N*M 

In [None]:
N.*M 

In [None]:
N./M 

In [None]:
N/10 

In [None]:
N>30 

In [None]:
N.>30

In this last example, a matrix of Boolean values is created in which each element is the result of comparing each element of the matrix with the value 30.

### Question 1.10
> ❓ Why does it give an error when trying to execute N*M but not N.*M? Why does it give an error when trying to execute N>30 but not N.>30? Correct the code

Unlike Matlab or Python, Julia does not allow matrices to grow dynamically.  For example, in Matlab in the matrix above, it is possible to give a value to an element located in a position that does not exist.  In this case, it will not give an error, opositely,  Matlab would have increased the size of the matrix up to that position, filling the new values with zeros. In Julia, however, this will cause an error, since the matrices have defined sizes similar to R.

An additional functionality of broadcasting operations between arrays is that they need not be only between arrays of the same size or arrays and scalar values, but can be done between arrays of the same dimensionality but with different sizes.  This is widely used in the operation with two-dimensional arrays, and in this subject it will be used to normalise the data. If an operation is broadcasted between a (two-dimensional) matrix and another two-dimensional matrix with a single column, both with the same number of rows, this operation will be performed as if the second matrix had the same size as the first one, with the column repeated.  The same is true if the second matrix is a one-row matrix with the same number of columns.  Of course, it makes no difference which one comes first in the operation.  Several simple examples are shown below: 

In [None]:
M = [1 2; 3 4; 5 6; 7 8];

M.+[1 3]

In [None]:
M.+[4; 3; 2; 1]

In [None]:
[4; 3; 2; 1].+M

In [None]:
M.+[4, 3, 2, 1]

Notice the last example, if the row matrix or column matrix is a vector, it will be treated as a column matrix.

To declare an array and thus reserve the relevant memory, you can use the type of that array as if it were a function. To do this, the first argument is the way to initialise the data, and then a number for each dimension.  The most common way to initialise the data is to use `undef`, which indicates that you do not want to initialise the values of the array. The array will, therefore, have whatever values are in memory at the time.  For example, the following call creates a two-dimensional array of 10 rows and 6 columns:

In [None]:
M = Array{Float32,2}(undef, 10, 6)

### Question 1.11
> ❓ Why will the following expression gives an error? Correct it

In [None]:
M = Array{Float32,2}(undef, 10)

### Question 1.12
> ❓ What will be the result of the following calls?

In [None]:
typeof(Array{Float32,2}(undef, 4, 15)) == #COMPLETE (name the type)

In [None]:
typeof(Array{Float32,2}) == #COMPLETE (name the type) 

In [None]:
isa(Array{Float32,2}, Array{Float32,2}) == # COMPLETE (true or false)

In [None]:
isa(Array{Float32,2}(undef, 4, 10), Array{Float32,2}) == # COMPLETE (true or false)

Another possibility is to use the functions `zeros` or `ones`, which create arrays of the given size where all elements are 0 or 1 respectively, for example:

In [None]:
zeros(10, 6) 

In [None]:
ones(13)

To concatenate two vectors, you can put them in square brackets, separating the elements by `;`.  For example, the following line allows you to create a vector resulting from concatenating the vectors indicated: 

In [None]:
[[1, 2, 3]; [4, 5, 6]] 

To merge matrices, this is done in a similar way, using square brackets. To put them "next to" each other, they are separated by a space. This can be done as long as the number of rows of the matrices match.

In [None]:
[[1 2 3; 4 5 6] [7 8 9; 10 11 12]]

In case you want to put one "under" the other, the separation must be done with the operator `;` .  As opposed to the previous case, in this one the number of columns must match. 

In [None]:
[[1 2 3; 4 5 6] ; [7 8 9; 10 11 12]]

If `[[1, 2, 3]; [4, 5, 6]]`
allows concatenating vectors, what will be the result of the following operations? What will be the type of the result?

In [None]:
[[1, 2, 3]  [4, 5, 6]] 

In [None]:
[[1, 2, 3], [4, 5, 6]] 

In [None]:
[[1  2  3]  [4  5  6]] 

In [None]:
[[1  2  3]; [4  5  6]] 

In [None]:
[[1  2  3], [4  5  6]]

Julia has a number of functions for working with matrices such as ones, zeros, size, length, max, min, minmax, rand, inv, det, sum, etc. Some of the most commonly used are: 

- `zeros`: takes as parameters the size of each dimension, and creates an array of that dimensionality and size, where all elements are equal to 0. 

- `ones`: takes as parameters the size of each dimension, and creates an array of that dimensionality and size, where all elements are equal to 1. 

- `rand`: takes as parameters the size of each dimension, and creates an array of that dimensionality and size, where all elements are random values. 

- `size`: receives as parameter an array and returns a tuple with as many elements as the dimensionality of the array, where each element is the size of that array.  It can be called with the dimension as a second argument, and returns only the size of that dimension. 

- `maximum`: receives an array as a parameter and returns the maximum value of the array. Optionally, It also accepts a dim keyword that specifies the dimension to apply the function to. 

- `minimum`: performs the operation similar to maximum, but returns the minimum value instead of the maximum. It also accepts the dims keyword.

- `findall`: takes an array of boolean values as a parameter and returns the indices of the positive values. 

- `sum`: receives as parameter an array and the sum of the values of the array. It also accepts the dims keyword. 

- `mean`: takes an array as parameter and returns the average value of the array.  It also accepts the keyword dims.  Requieres a preload of  `Statistics` package. 

- `std`: takes an array as a parameter and returns the standard deviation of the values in the array.  It accepts also the keyword dims. Requieres a preload of  `Statistics` package. 

As it was pointed out, many of these functions accept the `dims` keyword to indicate along which dimension the specified operation will be performed.  If not used, this operation is performed on all elements of the array.  However, if `dims=1` is specified, this function is performed in parallel on each column, and as a result a matrix of one row and as many columns as the original matrix had will be returned. The returned value of each column will be the result of applying the function on the elements of vector formed by the column of the original matrix. 
Opositely, if `dims=2` is specified, this operation is performed in parallel on each row, and the sresults is a matrix of one column and as many rows as the original matrix. The value of each row will be the result of applying the function on the vector of elements of that row of the original matrix.For example:

In [None]:
sum([1 2; 3 4])

In [None]:
sum([1 2; 3 4], dims=1)

In [None]:
sum([1 2; 3 4], dims=2)

In general, to see the documentation of a function, simply need to type `?`. If do so, the prompt would change from the normal `julia>` to `help?>`. Now, by simply typing the name of the function the system will retrive the documentation. 