## Getting Started with Python Machine Learning
Machine learning teaches machines how to carry out tasks by themselves.
#### Machine learning and Python, the dream team
The process of coming up with a decent ML approach is never a waterfall-like process. Instead, you will see yourself going back and forth in your analysis, trying out different versions of your input data on diverse sets of ML algorithms. **It is this explorative nature that lends itself perfectly to Python.**
#### What the book will cover
This book will give you a broad overview of the types of learning algorithms that
are currently used in the diverse  elds of machine learning and what to watch out for when applying them. 
In reality, most of the time will be spent in rather mundane tasks:

1. Reading the data and cleaning it.
2. Exploring and understanding the input data.
3. Analyzing how best to present the data to the learning algorithm.
4. Choosing the right model and learning algorithm.
5. Measuring the performance correctly.

Often you will not feed your data directly into your ML algorithms.Instead, you will  nd that you can re ne parts of the data before training. Many times, the machine learning algorithm will reward you with increased performance. You will even  nd that a simple algorithm with re ned data generally outperforms a very sophisticated algorithm with raw data. This part of the machine learning work ow is called **feature engineering**, and it is generally a very exciting and rewarding challenge. Creative and intelligent that you are, you will immediately see the results.
#### What to do when stuck
Google is your best friend.

### Introduction to NumPy, SciPy, and Matplotlib

#### Installing Python
Recommends: [annoconda]() or [miniconda](), in general [conda]()

#### Chewing data ef ciently with NumPy and intelligently with SciPy
#### Learning Numpy

In [47]:
import numpy as np
np.version.full_version

'1.11.3'

In [48]:
a = np.array([0,1,2,3,4,5])
print(a)
print(a.ndim)
print(a.shape)

[0 1 2 3 4 5]
1
(6,)


Attention:
```python
a = np.array(0, 1, 2, 3) ## Wrong
a = np.array([0, 1,2, 3]) ## correct
```

*a* is one-dimention
try the *reshape* fucniton

In [49]:
b = a.reshape(3, 2) ## 3 rows and 2 cols
print(b)
print(b.ndim)
print(b.shape)
print(b[1,1]) ## shall print 3

[[0 1]
 [2 3]
 [4 5]]
2
(3, 2)
3


Watch out this: *a* and *b* are the same object, which means if you modify *a*, you are modifing *b* simutaneously.

In [50]:
print(b)
a[3] = 30
print(b)

[[0 1]
 [2 3]
 [4 5]]
[[ 0  1]
 [ 2 30]
 [ 4  5]]


To have a **TURE** copy:

In [51]:
c = a.reshape(3,2).copy()
print(c)
c[1,1] = 1000
print(c)
print(a)

[[ 0  1]
 [ 2 30]
 [ 4  5]]
[[   0    1]
 [   2 1000]
 [   4    5]]
[ 0  1  2 30  4  5]


In [52]:
print(a * 2)
## if ordinary python
print([1,2,3] * 2)

[ 0  2  4 60  8 10]
[1, 2, 3, 1, 2, 3]


In [53]:
print(a ** 2)

[  0   1   4 900  16  25]


##### Indexing

In [54]:
print(a)
print(a[np.array([2,3,4])])
## print(a[2,3,4])  ## will not work
## print(a[(2,3,4)])  ## will not work
print(a[[2,3,4]])

[ 0  1  2 30  4  5]
[ 2 30  4]
[ 2 30  4]


##### handling non-existing values

In [55]:
c = np.array([1, 2, np.NAN, 3, 4])
print(c)
print(np.isnan(c))

[  1.   2.  nan   3.   4.]
[False False  True False False]


In [56]:
print(c[~np.isnan(c)])
print(np.mean(c[~np.isnan(c)]))

[ 1.  2.  3.  4.]
2.5


##### compare runtime behaviors

In [57]:
import timeit
normal_py_sec = timeit.timeit('sum(x*x for x in range(1000))', number=10000)
naive_np_sec = timeit.timeit('sum(na*na)',
                             setup="import numpy as np; na=np.arange(1000)",
                             number=10000)
good_np_sec = timeit.timeit('na.dot(na)',
                            setup="import numpy as np; na=np.arange(1000)",
                            number=10000)
print("Normal Python: %f sec"%normal_py_sec)
print("Naive NumPy: %f sec"%naive_np_sec)
print("Good NumPy: %f sec"%good_np_sec)


Normal Python: 1.328249 sec
Naive NumPy: 0.910891 sec
Good NumPy: 0.014344 sec
