# Important python libraries

 - `numpy` (numerics) + `scipy` (scientific functions) 
 - `matplotlib` - plotting
 - `astropy` - convenient operations on data for Data Science  (`pandas` is another alternative) 
 - `scikit-learn` - machine learning
 
We'll meet them very soon during the ML session.

## Hello numpy!

`numpy` is the core of scientific python. It is the most convenient way to organize number-crunching in python.

In [3]:
import numpy

In [4]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]:
x.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [4]:
x.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [5]:
# slicing has the same logic for lists / strings / tuples / numpy, etc
x[:4]

array([0, 1, 2, 3])

In [6]:
print x[:3]
print x[3:7]
print x[7:]

[0 1 2]
[3 4 5 6]
[7 8 9]


### Vector operations

In [7]:
x = numpy.arange(10 ** 6)
# vector operations do similar task for each element. In this case each element is multiplied by 3 and 12 added.
3 * x + 12.

array([  1.20000000e+01,   1.50000000e+01,   1.80000000e+01, ...,
         3.00000300e+06,   3.00000600e+06,   3.00000900e+06])

In [8]:
# use timing magic to understand this is quite fast
%timeit 3 * x + 12.

100 loops, best of 3: 6.26 ms per loop


In [8]:
Z = numpy.arange(15).reshape(5, 3)
Z

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [9]:
numpy.log(numpy.exp(Z)) # type conversion happened

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.],
       [ 12.,  13.,  14.]])

In [10]:
numpy.exp(Z)

array([[  1.00000000e+00,   2.71828183e+00,   7.38905610e+00],
       [  2.00855369e+01,   5.45981500e+01,   1.48413159e+02],
       [  4.03428793e+02,   1.09663316e+03,   2.98095799e+03],
       [  8.10308393e+03,   2.20264658e+04,   5.98741417e+04],
       [  1.62754791e+05,   4.42413392e+05,   1.20260428e+06]])

In [12]:
Z += 4

In [13]:
Z

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [15]:
Z[::2, ::2]

array([[ 4,  6],
       [10, 12],
       [16, 18]])

In [14]:
Z[[0, 2, 4], :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [16]:
Z.sum(axis=1)

array([15, 24, 33, 42, 51])

In [16]:
# axes are also zero-numerated
Z.sum(axis=0)

array([50, 55, 60])

In [17]:
Z.max(axis=1)

array([ 6,  9, 12, 15, 18])

In [19]:
Z2 = - Z
print(Z2)
Z2 = numpy.sort(Z2, axis=0)
Z2

[[ -4  -5  -6]
 [ -7  -8  -9]
 [-10 -11 -12]
 [-13 -14 -15]
 [-16 -17 -18]]


array([[-16, -17, -18],
       [-13, -14, -15],
       [-10, -11, -12],
       [ -7,  -8,  -9],
       [ -4,  -5,  -6]])

## Indexing with boolean array

In [19]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
x > 3

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [21]:
x[x < 7.4]

array([0, 1, 2, 3, 4, 5, 6, 7])

## Copies

Many operations in numpy don't create copies, but operate with the same memory 

In [21]:
x = numpy.arange(10)
y = x[:5]

print (x, y)
y[0] = 10
print (x, y)

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[10  1  2  3  4  5  6  7  8  9] [10  1  2  3  4]


this happened because x and y point __to the same place in memory__

In [23]:
x = numpy.arange(10)
y = x[:5].copy()
print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9] [10  1  2  3  4]


## Random numbers

module `numpy.random` helps with generating random numbers

In [33]:
# generating 10000 random numbers at once
a = numpy.random.normal(loc=2, scale=12, size=10000)
print(a)
print(numpy.mean(a))
print(numpy.std(a))


[ 16.07740998   0.44105277  -4.60855636 ...,  -6.62319333  19.7287692
  -0.87636797]
1.93988891859
11.9512347334


## Sorting

In [45]:
x = numpy.random.random(size=1000)
x = numpy.sort(x)

In [46]:
print (x[:10])
print (x[-10:])

[ 0.00247927  0.00453332  0.00479916  0.00501733  0.00565004  0.00774708
  0.0089939   0.01019499  0.01316996  0.01385974]
[ 0.98777467  0.98990703  0.99070106  0.99177262  0.99183615  0.991899
  0.99629208  0.99700996  0.99745227  0.99890682]


## Arg...

arg-functions allow writing non-trivial operations with a couple of lines

In [60]:
# random.random generates uniform in [0, 1]
random_numbers = numpy.random.random(size=1000)
indices = numpy.argsort(random_numbers)

In [61]:
numpy.alltrue(random_numbers[indices] == numpy.sort(random_numbers))

True

In [62]:
indices[:10]

array([213, 742, 339, 716, 391, 764, 489, 255, 946, 859], dtype=int64)

In [63]:
random_numbers.min(), random_numbers.max()

(0.0010924025809679883, 0.99621907786213859)

In [65]:
random_numbers.argmax(), random_numbers[random_numbers.argmax()]

(908, 0.99621907786213859)

In [66]:
random_numbers.argmin(), random_numbers[random_numbers.argmin()]

(213, 0.0010924025809679883)

## Exercise

In [67]:
import numpy as np

In [71]:
random_numbers = np.random.normal(size = 1000)
# 1. sample 1000 elements from normal distribution 


In [76]:
random_numbers = random_numbers[random_numbers>0]
# 2. leave only positive numbers (from previous exercise)


In [80]:
count_of_left_numbers = len(random_numbers)
minimum = random_numbers.min()
maximum = random_numbers.max()
mean = np.mean(random_numbers)
variance = np.var(random_numbers)
print(count_of_left_numbers, minimum, maximum, mean, variance)
# 3. count number of left numbers, their minimum, maximum, mean and variance.


495 0.00284172971055 2.9069574496 0.786903851071 0.32391998091


## References:
* `numpy` documentation: https://docs.scipy.org/doc/numpy/reference/
    * almost any question about `numpy` is already answered on stackoverflow
* [From python to numpy: a beautiful book about numpy](https://github.com/rougier/from-python-to-numpy)
* Data manipulation with `numpy`: tips and tricks [part1](http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html), [part2](http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html)
