# The pandas

|  |  |  |  |
| --- | --- | --- | --- | 
|<img src="./assets/jnb021/Panda_Cub_from_Wolong,_Sichuan,_China.jpeg" alt="Drawing" style="width: 300px;"/> | <img src="./assets/jnb021/Panda_Cub_from_Wolong,_Sichuan,_China.jpeg" alt="Drawing" style="width: 300px;"/> | <img src="./assets/jnb021/Panda_Cub_from_Wolong,_Sichuan,_China.jpeg" alt="Drawing" style="width: 300px;"/> | <img src="./assets/jnb021/Panda_Cub_from_Wolong,_Sichuan,_China.jpeg" alt="Drawing" style="width: 300px;"/> |


`pandas` is the Python library that structures and simplifies data manipulation and analysis. The name, pandas is derived by **pan**el and **da**ta. The library indeed focusses on representing data as tables (`DataFrames`) and time series (`Series`). 

pandas uses NumPy arrays to represent data. It provides specific functions to manipulated data as `code objects`. You will see that to use pandas you will need to import also NumPy, as pandas requires and builds on top of NumPy.

There are three main data object types in pandas:

  - Series - A one-dimensional array of indexed data.
  - DataFrames - Two-dimensional, size-mutable, potentially heterogeneous tabulated data structure.
  - Index -  Immutable array or as ordered set (technically a multi-set, as Index objects may contain repeated values).

Below we will work a little bit with these three data objects.

###### A short history of Pandas.
Wes McKinney started developing what then became pandas while working at the capital management firm Applied Quantitative Research (AQR). Pandas was developed initially as a closed-source project and was made open source in 2009. Pandas is sponsored by [NumFOCUS, Inc.](https://numfocus.org/) that promotes support and sponsorships of python based open source code. 

In [1]:
import numpy as np
import pandas as pd

#### Pandas series 

`Series` are dictionary-like objects. 
Let's evaluate the code below.

In [None]:
data = pd.Series(['a', 'b', 'c', 'd'])
print (data)

A Pandas Series can be created directly by assigning values into an array (using `[]`), and that array to a series (`pd.Series()`). Yet, the final produc of that series definition create something different than an Array. It creates a set of pairs of values where a label  is associated to a value. In the case of the example above the 

#### Pandas `index` object

An `Index` is the pandas object that hosts information redarding the ordering of the arrays inside other objects such as `DataFrames` and `Series`. An index is similar to a a NumPy array, but it is immutable. This means that once an Index is defined the values inside the index cannot be changed.


Pandas `Index` objects are designed to facilitate operations on the array and focus on keeping track of positions of data entries in the objects. They support and facilitiate operations such as joins of datasets. 

Index objects also have many of the attributes familiar from NumPy arrays:

In [6]:
ind = pd.Index([1, 2, 3, 4, 5])

print(ind.size, ind.shape, ind.ndim, ind.dtype)

print(ind)

5 (5,) 1 int64
Int64Index([1, 2, 3, 4, 5], dtype='int64')


#### Pandas `Index` objects are immutable.

Where arrays can be modified after definition, pandas `index`es cannot. Let's try this. Above we defined `ind` as a pandas `index` and set in the third position the value `3`. 

Let's try to change that value, evaluate the following operation in which we attempt to set the value `10` in the third position of `ind`:

In [7]:
ind[2] = 10

TypeError: Index does not support mutable operations

The error should return the following line at the end:

`TypeError: Index does not support mutable operations`

A pandas `Index` does not allow changes. This is helpful. One way to think about the index is that it is a specialized NumPy array. Specialized means that it has a more narrow scope than the more general goal of a NumPy array. The scope is that to define the (ahem) index of a data frame. Because of this scope (store an index) changes to the values of the array are not allowed, in other words the `Index` is immutable, or cannot be changed after definition by assiging a different value to any of its elements.

If we think about it, this makes sense. Changing a value inside an index of a data frame would change the definition of the data frame and really invalidating the purpuse of the index and of the data frame.