**Scientific Computation (MKP3303)**


> R.U.Gobithaasan (2021). Scientific Computing, Lectures for Undergraduate Degree Program B.Sc (Applied Mathematics), Faculty of Ocean Engineering Technology & Informatics, University Malaysia Terengganu.
https://sites.google.com/site/gobithaasan/LearnTeach

<p align="center">
     © 2021 R.U. Gobithaasan All Rights Reserved.

</p>



**Chapter 3: Lists, Arrays, Vectors and Matrix Operations**   

**PART 1**: 

1. Types of Sequences in Python: Built-in containers

**PART 2: Previous Notebook**

2. Arrays                   
3. Column and row vector
4. Matrix representation

**PART 3**

5. Introduction to array operations 
6. Vector and Matrix Operations
7. Towards Higher dimensions
8. Reading and writing files
9. Bonus: Python for Data Analysis (Pandas)

**References:** 

- [NumPy](https://numpy.org/)
- Robert Johansson, Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib (2019, Apress).
>Source code listings for [Numerical Python - A Practical Techniques Approach for Industry](http://www.apress.com/9781484205549) (ISBN 978-1-484205-54-9). The source code listings can be downloaded from http://www.apress.com/9781484205549

- VanderPlas, Jacob T,  Python data science handbook: essential tools for working with data, O'Reilly Media, 2017. This book is made available [online](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) 
>The source code listings can be downloaded from [Jake's GitHub] (https://github.com/jakevdp/PythonDataScienceHandbook)

- Travis E. Oliphant(creater of NumPy), [Guide to NumPy](https://web.mit.edu/dvp/Public/numpybook.pdf)

---
**PART 3**

# Elementwise operation


In [1]:
import numpy as np
np.__version__

'1.20.2'

## Introduction to Array Operations

- Element by element (elementwsie);

In [2]:
b1 = np.array([[ 0,  1,  2,  3],
                [ 4,  5,  6,  7],
                [ 0,  0,  0,  0]])
print(b1)
print(b1.shape)

b2 = np.ones((3,4))
print(b2)
print(b2.shape)

[[0 1 2 3]
 [4 5 6 7]
 [0 0 0 0]]
(3, 4)
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
(3, 4)


- Thus, **matrix addition** which tis carried out in elementwise form can be carried out if the shape of array is the same

In [3]:
b3 = b1 + b2
print(b3.shape)
b3

(3, 4)


array([[1., 2., 3., 4.],
       [5., 6., 7., 8.],
       [1., 1., 1., 1.]])

- elementwise operation still can be **broadcasted**  if two arrays can be matched in the form of shape and size

- one dimensional **row array** b4 is matched to two dimensional b1:

In [4]:
b4 = np.empty(4) #creating a row vector
b4.fill(2)
b4
b4.shape

(4,)

In [5]:
print(b1.shape)
print(b4.shape)

print(b1)
print(b4)

b5 = b1 + b4
b5

(3, 4)
(4,)
[[0 1 2 3]
 [4 5 6 7]
 [0 0 0 0]]
[2. 2. 2. 2.]


array([[2., 3., 4., 5.],
       [6., 7., 8., 9.],
       [2., 2., 2., 2.]])

-  cannot be broadcasted if size or shape not the same

In [6]:
print(np.ones(3))
# b1 + np.ones(3) # you will not be able to run this due to unmatached dimension

[1. 1. 1.]


- **column array** b6 is matched to b1

In [7]:
b6 = np.array([[3],[3],[3]])
print(b6)
print(b6.shape)

print(b1)
print(b6)

b7 = b1 + b6
b7

[[3]
 [3]
 [3]]
(3, 1)
[[0 1 2 3]
 [4 5 6 7]
 [0 0 0 0]]
[[3]
 [3]
 [3]]


array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10],
       [ 3,  3,  3,  3]])

- other element wise aritmetic ooperations include: scalar multiplication, substraction, etc. 

In [8]:
print(5* b1)

[[ 0  5 10 15]
 [20 25 30 35]
 [ 0  0  0  0]]


In [9]:
print(b1)

[[0 1 2 3]
 [4 5 6 7]
 [0 0 0 0]]


In [10]:
print(b7)

[[ 3  4  5  6]
 [ 7  8  9 10]
 [ 3  3  3  3]]


In [11]:
print(b1/b7)

[[0.         0.25       0.4        0.5       ]
 [0.57142857 0.625      0.66666667 0.7       ]
 [0.         0.         0.         0.        ]]


In [12]:
print(b1*b7)

[[ 0  4 10 18]
 [28 40 54 70]
 [ 0  0  0  0]]


In [13]:
a1 = np.random.normal(0, 3, size = 6) # a 1D list
print(a1.shape)
print(a1)

(6,)
[ 1.84169434  0.7394138   1.65964174  3.52766895 -1.01128927 -5.2779993 ]


In [14]:
b7 = np.ceil(a1) 
print(b7)
np.sign(b7)

[ 2.  1.  2.  4. -1. -5.]


array([ 1.,  1.,  1.,  1., -1., -1.])

In [15]:
np.power(b7,3)

array([   8.,    1.,    8.,   64.,   -1., -125.])

### User Defined Function for array processing

In [16]:
np.cos(np.pi)

-1.0

- we can also array processing for a given function.
- $f1(x) = cos(x)$

In [17]:
def f1(x):
    return np.cos(x)

In [18]:
Xvalues = np.linspace(- np.pi, np.pi, 10)
print(Xvalues.shape)
print(Xvalues)

b8 = f1(Xvalues)
print(b8)
print(b8.shape)

(10,)
[-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585  0.34906585
  1.04719755  1.74532925  2.44346095  3.14159265]
[-1.         -0.76604444 -0.17364818  0.5         0.93969262  0.93969262
  0.5        -0.17364818 -0.76604444 -1.        ]
(10,)


- various builtin NumPy functions for elementwise operations

- below is a special method called `vectorize` to apply for user defined function

$f(x) = \begin{cases} 
          \frac{x}{2} & x\leq 0 \\
          0 & x> 0
       \end{cases}
$


In [19]:
def f3(x):
    if x <= 0:
        return (x/2)
    else:
        return 0

In [20]:
f3(2)

0

In [21]:
print(Xvalues)
# b9 = f3(Xvalues) # you will not be able to run this since this fucntion does not take in raay as input

[-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585  0.34906585
  1.04719755  1.74532925  2.44346095  3.14159265]


In [22]:
print(b8)
f3 = np.vectorize(f3)
b9 = f3(Xvalues)

[-1.         -0.76604444 -0.17364818  0.5         0.93969262  0.93969262
  0.5        -0.17364818 -0.76604444 -1.        ]


In [26]:
print(b1new.sort(1))
b1new

None


array([[0, 0, 4],
       [0, 1, 5],
       [0, 2, 6],
       [0, 3, 7]])

### Conditional Expression on arrays

In [27]:
b10 =  np.random.normal(size = (5))
b11 =  np.random.normal(size = (5))
print(b10)
print(b11)

[ 0.09354843  0.77783429  0.12278899  0.02940752 -0.55495501]
[-0.46913311  0.25676652  0.01134636 -1.13412403  0.55634109]


In [28]:
b10 > b11

array([ True,  True,  True,  True, False])

In [29]:
np.all( b10 > -2) #all elements must satisfy condition to be True

True

In [30]:
np.any( b10 > 1)  #any element may satisfy condition to be True

False

In [31]:
np.all(b10 > b11)

False

In [32]:
#if the element satisfy given condition, then make it a positive number, else keep it as it is
print(b11)
np.where( b11 < 0, abs(b11), 0 )

[-0.46913311  0.25676652  0.01134636 -1.13412403  0.55634109]


array([0.46913311, 0.        , 0.        , 1.13412403, 0.        ])

### Functions takes in array and returns a scalar

In [23]:
print(b9)
print(b9.sum())
print(b9.prod())

[-1.57079633 -1.22173048 -0.87266463 -0.52359878 -0.17453293  0.
  0.          0.          0.          0.        ]
-4.363323129985824
-0.0


-  descriptive statistics

In [24]:
print(np.mean(b9))
print(b9.min())
print(b9.max())
print(b9.mean())
print(b9.std())
print(b9.var())

-0.43633231299858244
-1.5707963267948966
0.0
-0.43633231299858244
0.5587780017872718
0.31223285528137634


- operation on arrays

In [25]:
print(b1)
b1new=b1.transpose()
print(b1new)

[[0 1 2 3]
 [4 5 6 7]
 [0 0 0 0]]
[[0 4 0]
 [1 5 0]
 [2 6 0]
 [3 7 0]]


### Set Like operations

In [33]:
b11 = np.ceil(np.random.normal(10, 3, size = 6)) # an array of rand nu centered at 10, with spread of 5, size of 6
b12 = np.ceil(np.random.normal(10, 3, size = 6)) 
print(b11)
print(b12)

[ 6. 15.  6.  7. 15. 10.]
[ 9. 11. 12.  9.  7. 11.]


In [34]:
print(np.unique(b11))
print(np.unique(b12))
print(np.in1d(b11,b12)) # check in order if the elements in b11 is in b12
print(np.intersect1d(b11,b12))
print(np.union1d(b11,b12))

[ 6.  7. 10. 15.]
[ 7.  9. 11. 12.]
[False False False  True False False]
[7.]
[ 6.  7.  9. 10. 11. 12. 15.]


In [35]:
5 in b11

False

In [36]:
12 in b11

False

## Linear Algebra: selected functions
read the documentation [online](https://numpy.org/doc/stable/reference/routines.linalg.html)

In [37]:
np.linalg?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'numpy.linalg' from '/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/linalg/__init__.py'>
[0;31mFile:[0m        /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/linalg/__init__.py
[0;31mDocstring:[0m  
``numpy.linalg``

The NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient
low level implementations of standard linear algebra algorithms. Those
libraries may be provided by NumPy itself using C versions of a subset of their
reference implementations but, when possible, highly optimized libraries that
take advantage of specialized processor functionality are preferred. Examples
of such libraries are OpenBLAS, MKL (TM), and ATLAS. Because those libraries
are multithreaded and processor dependent, environmental variables and external
packages such as threadpoolctl may be needed to control the number of threads
or specify the proce

## Vector operations

- examples of two vectors in ${\mathbb{R}}^3$

In [38]:
u = np.array([[5, 4, 3]])
v = np.array([[1, 1, 1]])
print(u.shape)
print(v.shape)

(1, 3)
(1, 3)


- addition 
- scalar multiplication
- dot product

- We can transpose a vector using `.T`  or `transpose()` function.

In [39]:
u2 = np.transpose(u)
#u2 =  u.T
print(u2.shape)
print(u2)

(3, 1)
[[5]
 [4]
 [3]]


In [40]:
print('addition u + v = ', u+v)
print('scalar multiplication 2*v = ', 2*v)

addition u + v =  [[6 5 4]]
scalar multiplication 2*v =  [[2 2 2]]


-  we can use special built-in function `vdot` product to carry out dot product for vectors, the output is a scalar. Vectors can be either in a row or column vector, hence `transpose` is unnecesary.

In [41]:
print(u)
print(v)
print('u.v = ', np.vdot(u,v))

print('v . u.T = ', np.vdot(u.T,v))# can be either in a row or column vector 

[[5 4 3]]
[[1 1 1]]
u.v =  12
v . u.T =  12


-  we can also use built-in function `dot` product to carry out dot product for vectors, it is matrix multiplication. However, the dimension must match for dot product computation.

In [42]:
#print('u.v = ', np.dot(u,v)) # cannot as the dimension does not match. You need to transpose one vector before multiplying

In [43]:
print('u.v = ', np.dot(u,v.T)) # must in the same dimension

u.v =  [[12]]


- Let $v_1 = [1, 2, 3]$ and $v_2 = [4, 5, 6]$. Vector operation of `outer` maps two vectors $v_1$ and $v_2$ into a matrix:

> first row: 1*[4 5 6]

> second row: 2*[4 5 6]

> third row: 3*[4 5 6]

In [44]:
v1 = np.arange(1,4)
print(v1)
print(v1.shape)
v2 = np.arange(4,7)
print(v2)
print(v2.shape)
v3 = np.outer(v1,v2)
print('out product of v1 and v2', v3) # can be either in a row or column vector 
print(v3.shape)

[1 2 3]
(3,)
[4 5 6]
(3,)
out product of v1 and v2 [[ 4  5  6]
 [ 8 10 12]
 [12 15 18]]
(3, 3)


- The **$L_2$ norm** of a vector is a measure of its length is denoted as $\Vert v \Vert_{2}$ =$ \sqrt{\sum_i v_i^2}$. Find the norm of $v_2$. 

In [45]:
l2= np.sqrt(np.dot(u,u.T))
print(l2)

[[7.07106781]]


- We use`linalg.norm()` function to find **$L_2$ norm**:

In [46]:
print("L2 norm of u: ", np.linalg.norm(u))

L2 norm of u:  7.0710678118654755


The **cross product** between two vectors, $v$ and $w$, is written $v \times w$. The geometric interpretation of the cross product is a vector perpendicular to both $v$ and $w$ with length equal to the area enclosed by the parallelogram created by the two vectors.

In [47]:
print('cross product u x v = ',np.cross(u, v))
print('cross product u x v = ',np.cross(u, 2*u)) #parallel vectors

cross product u x v =  [[ 1 -2  1]]
cross product u x v =  [[0 0 0]]


## Matrix operation in NumPy

In [48]:
m1 = np.arange(1,7).reshape(2,3)
print(m1)
print(m1.shape)
m2 = np.arange(7,13).reshape(3,2)
print(m2)
print(m2.shape)

[[1 2 3]
 [4 5 6]]
(2, 3)
[[ 7  8]
 [ 9 10]
 [11 12]]
(3, 2)


- the `multiply` function operates elementwise in NumPy arrays.

In [49]:
np.multiply(3,m1)

array([[ 3,  6,  9],
       [12, 15, 18]])

- elementwise product operator is  `*`

In [50]:
3*m1

array([[ 3,  6,  9],
       [12, 15, 18]])

- scalar product between matrices is by `dot` function

In [51]:
print(m1.shape)
print(m2.shape)
print('Using dot: m1.m2 = ', np.dot(m1,m2))

(2, 3)
(3, 2)
Using dot: m1.m2 =  [[ 58  64]
 [139 154]]


- The matrix product can be performed using the `@` operator

In [52]:
m1 @ m2

array([[ 58,  64],
       [139, 154]])

- cross product

In [53]:
m3 = np.arange(14,20).reshape(2,3)
print(m3)
print('m1 x m3 = ',np.cross(m1,m3))

[[14 15 16]
 [17 18 19]]
m1 x m3 =  [[-13  26 -13]
 [-13  26 -13]]


### Linear Systems

$$
20 x_1 + 3 x_2 = 4
$$

$$
5 x_1 + 4 x_2 = 3
$$

- We can represent a system of (linear) equations above in a matrix form of $A.x =b$

In [54]:
A = np.array([[1, 3], [5, -1]])
print('Matrix A = ', A)
print(A.shape)

b = np.array([1, 3])
print('Matrix b = ', b)
print(b.shape)

Matrix A =  [[ 1  3]
 [ 5 -1]]
(2, 2)
Matrix b =  [1 3]
(2,)


We can solve this system by finding its inverse: 
$$ 
A^{-1}A x = A^{-1}b \\
I x = A^{-1}b\\
 x = A^{-1}b
$$
where $I$ is the identity matrix and, $A$ is a square matrix, its inverse is denoted by $A^{-1}$.

---
A **square matrix** has the same number of rows as columns. The determinant is denoted as $det(M)$ or $|M|$. For example  $2 \times 2$ matrix, the determinant is:
$$
|M| = \begin{bmatrix}
a & b \\
c & d\\
\end{bmatrix} = ad - bc$$

- If the determinant is $|M|=0$, then the matrix is **singular**, thus there is no inverse for the matrix. 
- If $|M|\neq 0$, the matrix is **nonsingular**, hence we can compute the inverse.

- we can compute $|A|$ manually:

In [55]:
A_Determinent = A[1,1]*A[0,0]-A[0,1]*A[1,0]
print(A_Determinent)

-16


We use `linalg.det(M)` function to compute. 

In [56]:
print(np.linalg.det(A))

-15.999999999999998


---
The **inverse** of a square matrix $M$ is a matrix of the same size, $N$, such that $M \cdot N = I$.  A matrix is said to be **invertible** if it has an inverse. The inverse of a matrix is unique; that is, for an invertible matrix, there is only one inverse for that matrix. For a $2 \times 2$ matrix, the analytic solution of the matrix inverse is:

$$
M^{-1} = \begin{bmatrix}
a & b \\
c & d\\
\end{bmatrix}^{-1} = \frac{1}{|M|}\begin{bmatrix}
d & -b \\
-c & a\\
\end{bmatrix}$$


- we can do this manually as well:

In [57]:
A_Inverse =(1/A_Determinent) * np.array([[A[1,1], -1*A[0,1]],[-1*A[1,0],A[0,0]]])
print(A_Inverse)

[[ 0.0625  0.1875]
 [ 0.3125 -0.0625]]


The inverse can be computed in Python using the function `inv` from Numpy's `linalg` package.

In [58]:
print(np.linalg.inv(A))

[[ 0.0625  0.1875]
 [ 0.3125 -0.0625]]


---
We can now solve the system above manually:

In [59]:
x1, x2 = np.round(np.dot(A_Inverse,b),5)
print('The solutions: (x1,x2) = ', (x1,x2))

The solutions: (x1,x2) =  (0.625, 0.125)


We can also use the function `linalg.solve`

In [60]:
np.linalg.solve(A,b)

array([0.625, 0.125])

 ---
Eigenvalues are a **special set of scalars** associated with a linear system of equations (i.e., a matrix equation) that are sometimes also known as **characteristic values**. Let $M$ be a square matrix. A non-zero vector $v$ is an **eigenvector** for M with **eigenvalue** $\lambda$ if

$$
M v = \lambda v \\
M v - \lambda v = 0 \\
(M -\lambda I) v = 0
$$
Non-trivial solutions exist only if the matrix $(M -\lambda I)$ is singular which means $|(M -\lambda I)| = 0 $
In order to compute the eigenvalues, We construct the characteristic polynomial 

$$
p(\lambda) = |(M -\lambda I)|
$$



- we can find the eigenvalues  directly using `linalg.eig`

In [61]:
L, v = np.linalg.eig(A)
print(L)
print(v)

[ 4. -4.]
[[ 0.70710678 -0.51449576]
 [ 0.70710678  0.85749293]]


In [62]:
answerRHS = np.round(np.dot(A,v) - L*v,16)
print(answerRHS)

[[ 0.e+00 -4.e-16]
 [ 0.e+00 -4.e-16]]


# Towards higher dimensions:

A **three dimensional array** is a collection of 2D arrays. It is specified by using **three-tuple (block, row, column)**.Three dimensional array: Imagine layered cake!
<img src="figures/layeredCake.jpg" alt="Drawing" style="width: 200px;"/>

Below is an example of **three dimensional array** in $\mathbb{R}^4$: 
- three nested brackets
- 2 tables overlaid, with 3 rows and 4 columns each table.

In [63]:
h1=np.arange(1,25).reshape(2,3,4) 
print(h1.ndim)
print(h1.shape)
h1

3
(2, 3, 4)


array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]],

       [[13, 14, 15, 16],
        [17, 18, 19, 20],
        [21, 22, 23, 24]]])

- layer 1: first table

In [64]:
h1[0]

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

-layer 2: second table

In [65]:
h1[1]

array([[13, 14, 15, 16],
       [17, 18, 19, 20],
       [21, 22, 23, 24]])

# Reading & Writing files
- **A comma-separated values (CSV) file** is a delimited text file that uses a comma to separate values.\
- Each line of the file is a data record. 
- Each record consists of one or more fields, separated by commas. 
- The use of the comma as a field separator is the source of the name for this file format. 
- A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields.

Feeling adventerous? try using [Pandas](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html#min-tut-01-tableoriented):
> pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

### Writing an array into csv file

Let's see a simple example of generating 50 rows and 5 columns of data. 

In [66]:
dataset = np.arange(1,101).reshape(20,5)
dataset

array([[  1,   2,   3,   4,   5],
       [  6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15],
       [ 16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25],
       [ 26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35],
       [ 36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45],
       [ 46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55],
       [ 56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65],
       [ 66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75],
       [ 76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85],
       [ 86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95],
       [ 96,  97,  98,  99, 100]])

-  We use the `np.loadtxt` function. 
- saving the array above in a folder called `data` and the file is named as `dataset.csv`.
- the format `fmt = %.2f` means to two significant digits
- each entry is separated by comma
- the `header` is a string of five columns

You may open the folder and click on the file to view in MS Excel.

In [67]:
np.savetxt('data/dataset.csv', dataset, fmt = '%.2f', delimiter=',', header = 'X1, X2, X3, X4, X5')

### Reading a csv file
- using the `np.loadtxt` function. 
- there is a file in the folder called `data` named `populations.csv`. The data describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years:

In [68]:
samples = np.loadtxt('data/populations.csv', delimiter=',')
! head -n 6 'data/populations.txt'

# year	hare	lynx	carrot
1900	30e3	4e3	48300
1901	47.2e3	6.1e3	48200
1902	70.2e3	9.8e3	41500
1903	77.4e3	35.2e3	38200
1904	36.3e3	59.4e3	40600


In [69]:
print(samples)

[[ 1900. 30000.  4000. 48300.]
 [ 1901. 47200.  6100. 48200.]
 [ 1902. 70200.  9800. 41500.]
 [ 1903. 77400. 35200. 38200.]
 [ 1904. 36300. 59400. 40600.]
 [ 1905. 20600. 41700. 39800.]
 [ 1906. 18100. 19000. 38600.]
 [ 1907. 21400. 13000. 42300.]
 [ 1908. 22000.  8300. 44500.]
 [ 1909. 25400.  9100. 42100.]
 [ 1910. 27100.  7400. 46000.]
 [ 1911. 40300.  8000. 46800.]
 [ 1912. 57000. 12300. 43800.]
 [ 1913. 76600. 19500. 40900.]
 [ 1914. 52300. 45700. 39400.]
 [ 1915. 19500. 51100. 39000.]
 [ 1916. 11200. 29700. 36700.]
 [ 1917.  7600. 15800. 41800.]
 [ 1918. 14600.  9700. 43300.]
 [ 1919. 16200. 10100. 41300.]
 [ 1920. 24700.  8600. 47300.]]


In [70]:
year, hare, lynxe, carrot = samples.T  # trick: columns to variables

In [71]:
hare

array([30000., 47200., 70200., 77400., 36300., 20600., 18100., 21400.,
       22000., 25400., 27100., 40300., 57000., 76600., 52300., 19500.,
       11200.,  7600., 14600., 16200., 24700.])

# Bonus: Python for Data Analysis (Pandas)
Feeling adventerous? try using [Pandas](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html#min-tut-01-tableoriented):
> pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

In [72]:
#!pip install pandas

In [1]:
import pandas as pd
pd.__version__

'1.1.2'

A [DataFrame](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html#) is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.

In [2]:
populations = pd.read_csv("data/populations.csv")
type(populations)

pandas.core.frame.DataFrame

- The DataFrame below  has 4 columns, each of them with a column label. The column labels are respectively year, hare, lynx and carrot.

- The column year and carrot consists of `int64` and, the column hare and carrot is `float64` data.

- we use `info` to inspect the DataFrame representation:

In [3]:
populations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   # year  21 non-null     int64  
 1   hare    21 non-null     float64
 2   lynx    21 non-null     float64
 3   carrot  21 non-null     int64  
dtypes: float64(2), int64(2)
memory usage: 800.0 bytes


- check out the first 5 rows:

In [4]:
populations.head(5)

Unnamed: 0,# year,hare,lynx,carrot
0,1900,30000.0,4000.0,48300
1,1901,47200.0,6100.0,48200
2,1902,70200.0,9800.0,41500
3,1903,77400.0,35200.0,38200
4,1904,36300.0,59400.0,40600


- Each column in a DataFrame is a **Series**. 
- If you are familiar to Python dictionaries, the selection of a single column is very similar to selection of dictionary values based on the key.
- Let's extract the column `hare`

In [10]:
hare = populations["hare"]
type(hare)

pandas.core.series.Series

- we can compute the `mean` for hare alone.

In [11]:
populations["hare"].mean()

34080.95238095238

- compute basic statistics of the numerical data of populations

In [78]:
populations.describe()

Unnamed: 0,# year,hare,lynx,carrot
count,21.0,21.0,21.0,21.0
mean,1910.0,34080.952381,20166.666667,42400.0
std,6.204837,21413.981859,16655.99992,3404.555771
min,1900.0,7600.0,4000.0,36700.0
25%,1905.0,19500.0,8600.0,39800.0
50%,1910.0,25400.0,12300.0,41800.0
75%,1915.0,47200.0,29700.0,44500.0
max,1920.0,77400.0,59400.0,48300.0


In [21]:
type(populations.describe())

pandas.core.frame.DataFrame