<a href="https://www.bigdatauniversity.com"><img src = "https://ibm.box.com/shared/static/wbqvbi6o6ip0vz55ua5gp17g4f1k7ve9.png" width = 300, align = "center"></a>

# <center>Machine Learning Basics</center>


<p><b>Machine Learning is a subset of artificial intelligence (AI), where the system can "learn" without explicitly being coded</b></p>

In this lab exercise, you will learn some basic functions for viewing and analysing data such as target, feature names, etc. Also, you will get a basic understanding of how to use data to fit (train) a model and use it to make a prediction. This will serve as a building block for future labs!


### Some Notebook Commands
<p>In case you haven't dealt with a Jupyter Notebook before, here are some quick, useful commands that may be handy to get started.</p>
<ul>
    <li>Run a cell: CTRL + ENTER</li>
    <li>Create a cell above a cell: a</li>
    <li>Create a cell below a cell: b</li>
    <li>Change a cell to Markdown: m</li>
    
    <li>Change a cell to code: y</li>
</ul>

If you are interested in more keyboard shortcuts, go to <b> Help -> Keyboard Shortcuts </b>

<b> <i> Before starting the lab, please run the following code in order to access the solutions </i> </b>

### Hello! We will start by introducing you to the digits dataset.

The digits dataset is made of up of 1797 8x8 images such as the one below.
<img src = "https://ibm.box.com/shared/static/psb68kpyyt0o6kbhcq88cwj7fuv7nlhq.png">
These images are hand-written digits converted into image format. <br>
We can use this data to train our machine to further determine other 8x8 images as specific digits! <br>
Sounds like we are <i>Classifying</i> data!

---
First we will need to <b>import</b> the dataset from **sklearn** and declare the dataset.

In [None]:
from sklearn.datasets import load_digits
digits = load_digits()

Now let's check out the <b>type</b> and <b>data</b> for digits. The type should be <i>'Bunch'</i> which is a dictionary-like object specifically useful for loading sklearn internal sample datasets. 

In [None]:
print type(digits)
print digits.data

In reality, you won't be creating 'Bunch' types. But they come with a lot of useful information to learn for beginners.

---
Let's check out the <b>description</b> of this dataset for more information!

In [None]:
print digits.DESCR

We can see the categories that <i>classify</i> each of the images by invoking the <b>target</b> field. There is a number associated to the classification of each digit. The target field fetches these numbers, where each digit is mapped to a name in target_names

In [None]:
print digits.target

Now if we print out the <b>target_names</b>, we can find out what the data is categorized as.

In [None]:
print digits.target_names

An important piece of information to note is that the data is stored as a <i>numpy datatype</i>, which is a homogeneous multidimensional array (ndarray). 

In [None]:
print type(digits.data)
print type(digits.target)
print type(digits.target_names)

Now let's confirm that the <b>shape</b> of the data and target match (first column) <br>
<b>Note</b>: The shape of the data is a tuple, where the first field is the number of observations and the second field is the number of attributes.

In [None]:
print digits.data.shape
print digits.target.shape

---
Then we can declare variables for the data and target which will be used to fit (train) the machine!

In [None]:
X = digits.data
y = digits.target

First we need to <b>import svm</b> (an algorithm) and declare a variable called clf with gamma and C attributes.
Now we can <b>fit</b> our model and <b>predict</b> the last digit, which should be 8. <br>
<b>Note</b>: The predict function will show a warning when run. Please ignore the warning.

In [None]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100)
clf.fit(X,y)
print('Prediction:'), clf.predict(digits.data[-1])
print('Actual:'), y[-1]

## Want to learn more?

IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: [SPSS Modeler for Mac users](https://cocl.us/ML0101EN_SPSSMod_mac) and [SPSS Modeler for Windows users](https://cocl.us/ML0101EN_SPSSMod_win)

Also, you can use Data Science Experience to run these notebooks faster with bigger datasets. Data Science Experience is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, DSX enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of DSX users today with a free account at [Data Science Experience](https://cocl.us/ML0101EN_DSX)

---
# Additional Resources
<br>
Tools for loading datasets: http://scikit-learn.org/stable/auto_examples/datasets/plot_digits_last_image.html
<br><br>
Introduction to sklearn: http://scikit-learn.org/stable/tutorial/basic/tutorial.html
<br><br>
Difference between Machine Learning and Statistical Modelling: <br>
http://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/

<hr>
Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​