# Programming for Data Science Series (Lab Session 13)
## Working with Pandas Dataframe
### Objectives
* Understand what is Numpy
* Familiarize with Python Scientific Library for numerical computation 
* How to create Numpy array
* How to access array elements
* How to transform Numpy array
* Space saving opportunity in Numpy

!pip install numpy

In [2]:
!python -m pip install --upgrade pip

Collecting pip
  Using cached https://files.pythonhosted.org/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
Successfully installed pip-20.0.2


<b>The following code will install Numpy libray for python (if not installed previously)</b>

In [3]:
!pip install numpy



### Creating Numpy Array

In [2]:
import numpy as np
a = np.array([1,2,3,4,5,6])
print(a)

[1 2 3 4 5 6]


In [5]:
b = np.array([[ 1, 3, 5 ] , [ 2, 4, 6 ]])
print(b)

[[1 3 5]
 [2 4 6]]


### Numpy Attributes
* Check type of elements
 * e.g. a.dtype
* Check number of dimension
 * e.g. a.ndim
* Check shape of array. It returns length of array along each dimension
 * e.g. a.shape
* Get bytes per element
 * e.g. a.itemsize
* Get bytes used by data portion of the array
 * e.g. a.nbytes

In [6]:
a.dtype

dtype('int32')

In [7]:
a.ndim

1

In [8]:
a.shape

(6,)

In [9]:
b.shape

(2, 3)

In [10]:
a.itemsize

4

In [15]:
a.nbytes

24

### Indexing, Slicing
* Use square bracket to get any item by respective index
* Use multiple square brackets for multi-dimensional array
* Range selection is also possible with colon

In [16]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[1:7:2]

array([1, 3, 5])

In [17]:
b[0][1]

3

In [18]:
b[1][2]

6

In [19]:
b[1][3]

IndexError: index 3 is out of bounds for axis 0 with size 3

### Shape Transformation
* Use np.reshape method to transform shape of the numpy array
* Use np.swapaxes method to interchange axes of two array
* Use np.flatten to get one dimensional flatten array

In [20]:
b.reshape(3,2)

array([[1, 3],
       [5, 2],
       [4, 6]])

In [25]:
b

array([[1, 3, 5],
       [2, 4, 6]])

In [26]:
b.swapaxes(0,1)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [27]:
b

array([[1, 3, 5],
       [2, 4, 6]])

In [29]:
b.swapaxes(1,0)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [34]:
c = b.flatten()
c

array([1, 3, 5, 2, 4, 6])

In [38]:
np.sort(c)

array([1, 2, 3, 4, 5, 6])

### Space saving using dtype
* int32 and float64 consume more memory (compare to int8)
* If precision is not required, then use dtype to reduce space
* Smaller size also improve speed of operations
* Significant performance improved if working with Big Data

In [3]:
data = np.arange(0,1000)
print(data.dtype)
print(data.nbytes)

int32
4000


In [7]:
sdata = np.arange(0,1000, dtype='int8')
print(sdata.dtype, type(sdata[1]))
print(sdata.nbytes)

int8 <class 'numpy.int8'>
1000
