# pandas

Similar to NumPy, pandas is one of the most wildly used Python libraries in data science. It is a high-performance and easy-to-use Python library for data analytics and manipulation. 

https://pandas.pydata.org/

### pandas vs. NumPy

Both NumPy and pandas are often used together. The pandas library relies heavily on the NumPy ndarray for the implementation of pandas data objects and shares many of its features. 


## pandas DataFrame vs. NumPy ndarray

### ndarray

    A = np.array([[1,2,3],
                  [4,5,6]])

### dataframe

![alt text](dataframe.jpg "data frame")


In [2]:
# import numpy and pandas
import numpy as np
import pandas as pd

Create a matrix with columns x1, x2 and x3, where x1 has integers from 0 to 19, x2 = x1^2, and x3 = sqrt(x1)

In [3]:
x1 = np.array(range(20)) # 0, .., 19
x2 = x1 ** 2
x3 = np.sqrt(x1)

X = np.array([x1, x2, x3]).T
print(X)
print(X.shape)

[[  0.           0.           0.        ]
 [  1.           1.           1.        ]
 [  2.           4.           1.41421356]
 [  3.           9.           1.73205081]
 [  4.          16.           2.        ]
 [  5.          25.           2.23606798]
 [  6.          36.           2.44948974]
 [  7.          49.           2.64575131]
 [  8.          64.           2.82842712]
 [  9.          81.           3.        ]
 [ 10.         100.           3.16227766]
 [ 11.         121.           3.31662479]
 [ 12.         144.           3.46410162]
 [ 13.         169.           3.60555128]
 [ 14.         196.           3.74165739]
 [ 15.         225.           3.87298335]
 [ 16.         256.           4.        ]
 [ 17.         289.           4.12310563]
 [ 18.         324.           4.24264069]
 [ 19.         361.           4.35889894]]
(20, 3)


Create a dataframe from the existing data

In [4]:
df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3}) # we need to specify the column names
df

Unnamed: 0,x1,x2,x3
0,0,0,0.0
1,1,1,1.0
2,2,4,1.414214
3,3,9,1.732051
4,4,16,2.0
5,5,25,2.236068
6,6,36,2.44949
7,7,49,2.645751
8,8,64,2.828427
9,9,81,3.0


Transfer dataframes to numpy ndarrays by function `to_numpy`

In [5]:
X2 = df.to_numpy()
print(X2)

[[  0.           0.           0.        ]
 [  1.           1.           1.        ]
 [  2.           4.           1.41421356]
 [  3.           9.           1.73205081]
 [  4.          16.           2.        ]
 [  5.          25.           2.23606798]
 [  6.          36.           2.44948974]
 [  7.          49.           2.64575131]
 [  8.          64.           2.82842712]
 [  9.          81.           3.        ]
 [ 10.         100.           3.16227766]
 [ 11.         121.           3.31662479]
 [ 12.         144.           3.46410162]
 [ 13.         169.           3.60555128]
 [ 14.         196.           3.74165739]
 [ 15.         225.           3.87298335]
 [ 16.         256.           4.        ]
 [ 17.         289.           4.12310563]
 [ 18.         324.           4.24264069]
 [ 19.         361.           4.35889894]]
