<h1 align ="center"> An Introduction to Machine Learning with R</h1>

------------------
-------------------
The Traing will offer a hands-on overview of typical machine learning applications in R, including unsupervised (clustering, such as hierarchical and k-means clustering, and dimensionality reduction, such as principal component analysis) and supervised methods (classification and regression, such as k-nearest neighbour and linear regression). We will also address questions such as model selection using cross-validation.

## Objectives and pre-requisites

The course aims at providing an accessible introduction to various machine learning methods and applications in R. The core of the courses focuses on unsupervised and supervised methods.

The course contains numerous exercises to provide numerous opportunities to apply the newly acquired material.

Participants are expected to be familiar with the R syntax and basic plotting functionality.

At the end of the course, the participants are anticipated to be able to apply what they have learnt, as well as feel confident enough to explore and apply new methods.

### Why R?
R is one of the major languages for data science. It provides excellent visualisation features, which is essential to explore the data before submitting it to any automated learning, as well as assessing the results of the learning algorithm. Many R packages for machine learning are available off the shelf and many modern methods in statistical learning are implemented in R as part of their development.

There are however other viable alternatives that benefit from similar advantages. If we consider Python for example, the scikit-learn library provides all the tools that we will discuss in this course.

#### Overview of machine learning (ML)

**Supervised Learning (SML)**  the learning algorithm is presented with labelled example inputs, where the labels indicate the desired output. SML itself is composed of classification, where the output is categorical, and regression, where the output is numerical.

![Image](https://lh3.googleusercontent.com/-51tcbYLGpug/XPDCyX6x1gI/AAAAAAAAc08/Kzhla0rWUeoCuyTkwgi58pv90EIg1kDAgCK8BGAs/s0/2019-05-30.png)

The same dataset used in the context of SML contains an additional column of labels, documenting the outcome or class of each example.

|Species|	Sepal.Length|	Sepal.Width|	Petal.Length|	Petal.Width|
|----|---|---|----|----|
|setosa|	5.1|	3.5	|1.4	|0.2|
|setosa|	4.9	|3.0	|1.4	|0.2|
|setosa|	4.7	|3.2	|1.3	|0.2|
|setosa|	4.6|	3.1|	1.5	|0.2|
|setosa|	5.0	|3.6|	1.4	|0.2|
|setosa|	5.4	|3.9|	1.7	|0.4|


In **unsupervised learning (UML)**, no labels are provided, and the learning algorithm focuses solely on detecting structure in unlabelled input data.

![Image](https://lh3.googleusercontent.com/-7o6_kF7DNYg/XNKgyK0diQI/AAAAAAAAbkY/qZAfJvAXMtcCKzUfcFuleSXXjRBKZUKaACK8BGAs/s0/2019-05-08.png)


**Example:**

Using the iris data as an example, for UML, we would have 4 features for each unlabelled example.

* Observations, examples or simply data points along the rows
* Features or variables along the columns

|Sepal.Length|	Sepal.Width|	Petal.Length|	Petal.Width|
|-------|-----|-----|---------------------------------------------------------------------|
|5.1|	3.5|	1.4|	0.2|
|4.9|	3.0	|1.4|	0.2|
|4.7|	3.2	|1.3|	0.2|
|4.6|	3.1	|1.5|	0.2|
|5.0|	3.6	|1.4|	0.2|
|5.4|	3.9	|1.7|	0.4|


Note that there are also semi-supervised learning approaches that use labelled data to inform unsupervised learning on the unlabelled data to identify and annotate new classes in the dataset (also called novelty detection).

**Reinforcement learning**, the learning algorithm performs a task using feedback from operating in a real or synthetic environment.

![Image](https://lh3.googleusercontent.com/-RmJ8tEtvK2s/XNKjnGRf8uI/AAAAAAAAbkk/qWu7CxHhzkQ-JjuKgCZOwAXTHwYwvlqxgCK8BGAs/s512/2019-05-08.png)

![Image](https://lh3.googleusercontent.com/-1Ix3QrRWYDg/XPCuy2LXF5I/AAAAAAAAc0c/Gc6aTKnCRf8BltV3dbvk7derYd5gXgv6wCK8BGAs/s0/2019-05-30.png)




**Note:**

We will be using, directly or indirectly, the following packages through the chapters:

1. caret
2. ggplot2
3. mlbench
4. class
5. caTools
6. randomForest
7. impute
8. ranger
9. kernlab
10. class
11. glmnet
12. naivebayes
13. rpart
14. rpart.plot

A more comprehensive list of machine learning libraries in R can be found at the 
[CRAN Task View for Machine Learning and Statistical Learning]( "https://cran.r-project.org/web/views/MachineLearning.html)
                                                                           
<h1 align='center'>  Example datasets</h1>
                                                                           Iris Data In R:
                                                                       This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
                                                                       
   
   <h1>Iris setosa</h1>
   
![Iris setosa](https://lh3.googleusercontent.com/-SXJYVbzzXHk/XPC1IKeuodI/AAAAAAAAc0o/2JhjxeJP1rw79KmvvItlSkOmcY_gzdzngCK8BGAs/s0/2019-05-30.png)


<h1>versicolor</h1>

![versicolor](https://lh3.googleusercontent.com/-bQIUbZjQQrM/XPC1KxZL8YI/AAAAAAAAc0s/Z-LnKtiDpC8x7Aq1Sbu4VijHRRrWMDL9ACK8BGAs/s0/2019-05-30.png)


<h1>virginica</h1>
  
  ![virginica](https://lh3.googleusercontent.com/-1KmAgmZ-3fY/XPC1NRXWUAI/AAAAAAAAc0w/mnhHv-Ydn_w5XKwJ6qiwvxOYUDyzgN0ewCK8BGAs/s0/2019-05-30.png)