## [Widely used Python libraries: Numpy and Pandas](https://www.codecademy.com/articles/introduction-to-numpy-and-pandas)

### NumPy

- Open source library for efficient numerical operations
- The core structure: ndarray (arrays of any dimensions)
- The library provides many features for performing operations on these special arrays.

### Pandas

- Library for data manipulation
- The core structure: Series and DataFrame objects (based on NumPy array structure)



Once these libraries are installed, their features can be used. First, the libraries must be imported as follows:

In [2]:
import numpy as np
import pandas as pd

**as** keyword is used to create an alias. From now on, we can refer to numpy as np and pandas as pd. 

### NumPy arrays

* NumPy arrays are more efficient
* NumPy arrays support element-wise operations
* All elements must be of the same data type. (Python lists might have elements of mixed types)

The code initializes a Python list named list1:

In [4]:
list1 = [1,2,3,4]

A list can be converted into a numpy array using np.array() function: 

In [6]:
array1 = np.array(list1)
array1

array([1, 2, 3, 4])

The above array contains one row and four columns. We will create a two dimensional array from list of lists below: 

In [7]:
list2 = [[1,2,3],[4,5,6]]
array2 = np.array(list2)
array2

array([[1, 2, 3],
       [4, 5, 6]])

The operations supported by NumPy which are beneficial for data manipulation:

* Selecting array elements

*  Slicing arrays

* Reshaping arrays

* Splitting arrays

* Combining arrays

* Numerical operations (min, max, mean, etc)

The operations can be executed on all the elements rather than individual elements:

In [21]:
toyPrices = np.array([5,8,3,6])
toyPrices - 2

array([3, 6, 1, 4])

In [22]:
toyPrices ** 2

array([25, 64,  9, 36], dtype=int32)

In [23]:
toyPrices = [5,8,3,6]
# print(toyPrices - 2) -- Not possible. Causes an error
for i in range(len(toyPrices)):
    toyPrices[i] -= 2
toyPrices

[3, 6, 1, 4]

In [14]:
array3 = np.array([[1,2,3], [4,5,6]])
array3.shape

(2, 3)

#### Indexing

In [15]:
array3[1, 2]

6

####  Slicing

In [16]:
array3[1:]

array([[4, 5, 6]])

In [20]:
array3 % 2 == 0

array([[False,  True, False],
       [ True, False,  True]])

In [19]:
array3[array3 % 2 == 0]

array([2, 4, 6])

## Pandas

* Series and DataFrame are the core objects of this library. 
* Series is the one-dimensional NumPy array which can be indexed using labels rather than integers. 
* A Series could store items of one data type and can created with scalars, lists, dictionaries, etc. 

In [25]:
ages = np.array([13,25,19])
series1 = pd.Series(ages)
series1

0    13
1    25
2    19
dtype: int32

By default, the indices are integers however they can be customized:

In [26]:
ages = np.array([13,25,19])
series1 = pd.Series(ages, index=['Emma', 'Swetha', 'Serajh'])
series1

Emma      13
Swetha    25
Serajh    19
dtype: int32

In [27]:
series1 = pd.Series({'Emma': 13, 'Swetha': 25, 'Serajh': 19})
series1

Emma      13
Swetha    25
Serajh    19
dtype: int64

* DataFrame is the other core object in the library. It contains rows and columns and both can be indexed with labels. 
* A DataFrame is the collection of Series (which are the columns). All columns must have the length. 
* A DataFrame can be composed of columns of different data types, however the values within the columns must have of the same type. 

DataFrames can be created from lists, dictionaries or read from csv and excel files. 

In [30]:
data = pd.DataFrame([
    ['John Smith','123 Main St',34],
    ['Jane Doe', '456 Maple Ave',28],
    ['Joe Schmo', '789 Broadway',51]
    ],
    columns=['name','address','age'])
data

Unnamed: 0,name,address,age
0,John Smith,123 Main St,34
1,Jane Doe,456 Maple Ave,28
2,Joe Schmo,789 Broadway,51


In [31]:
data.shape

(3, 3)

In [32]:
data.describe()

Unnamed: 0,age
count,3.0
mean,37.666667
std,11.930353
min,28.0
25%,31.0
50%,34.0
75%,42.5
max,51.0


By default, the row indices are integers. However, we can set a column to be the index of the frame.

In [34]:
data.set_index('name')

Unnamed: 0_level_0,address,age
name,Unnamed: 1_level_1,Unnamed: 2_level_1
John Smith,123 Main St,34
Jane Doe,456 Maple Ave,28
Joe Schmo,789 Broadway,51


NumPy and Pandas 
* facilitate mathematical and logical operations 
* make data manipulation and exploration easier

# References
* https://www.codecademy.com/articles/introduction-to-numpy-and-pandas