# Pandas

* Jake VanderPlas. 2016. *Python Data Science Handbook: Essential Tools for Working with Data*. O'Reilly Media, Inc.
* Chapter 3 - Data Manipulation with Pandas
* https://github.com/jakevdp/PythonDataScienceHandbook

Pandas provides:

* Ritch I/O Capabilities (read/write data from/to CSV, Excel, SQL, JSON, etc.)
* 1-dimensional (**Series**) and 2-dimensional tabular (**DataFrame**) data structures.
* Data flexibility (handles missing data, time series, and heterogeneous data types).
* Labeled Rows and columns for data alignment
* Flexible indexing, slicing, fancy indexing, and subsetting of large datasets.

In [22]:
import numpy as np
import pandas as pd
pd.__version__


'2.2.3'

In [4]:
# type TAB to get the numpy namespace
#pd.

## Pandas Series

* One-dimensional array of indexed data
* Two attributes:
   * `values` : NumPy array
   * `index` : an array-like object of type `pd.Index

In [13]:
d = pd.Series([0.25, 0.5, 0.75, 1.0])
print(f'd =\n{d}')
print(f'{d.values = }')
print(f'{ d.index = }')

d =
0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64
d.values = array([0.25, 0.5 , 0.75, 1.  ])
 d.index = RangeIndex(start=0, stop=4, step=1)


Series can be created from NumPy arrays:

In [15]:
d = pd.Series(np.linspace(0,4,6))
d

0    0.0
1    0.8
2    1.6
3    2.4
4    3.2
5    4.0
dtype: float64

A series can be indexed just like a NumPy array:

In [27]:
d = pd.Series(np.arange(10,20))
print(f'{      d[3] = }')       # simple index --> scalar
print(f'{     d[3:] = }')      # slice --> series
print(f'{    d[3:4] = }')     # slice --> series
print(f'{d[[1,3,0]] = }') # fancy index --> series
print(f'{    d[[1]] = }')     # fancy index --> series

      d[3] = 13
     d[3:] = 3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64
    d[3:4] = 3    13
dtype: int64
d[[1,3,0]] = 1    11
3    13
0    10
dtype: int64
    d[[1]] = 1    11
dtype: int64


But... Pandas Series has an explicit **index** (vs. NumPy implicit integer index)
   * Defaults to integer index

In [38]:
d = pd.Series(np.arange(5), index=["a","b","c","d","e"])
print(f'd =\n{d}')
# WARNING "treating keys as positions is deprecated"  --> use serie.iloc[pos]
#print(f'{d[1] = }')
print(f'{d["b"] = }')

d =
a    0
b    1
c    2
d    3
e    4
dtype: int64
d["b"] = 1


In [40]:
d = pd.Series(np.arange(5), index=[3,23,1,2,456])
print(f'd =\n{d}')
print(f'{d[456] = }')

d =
3      0
23     1
1      2
2      3
456    4
dtype: int64
d[456] = 4


<br>

* Series are kind of python dictionary $\{index_{typed \,\&\, ordered} \to value_{typed}\}$ 

## Pandas DataFrame

## Pandas Index

## Indeing and Selection

## Operating on Data

## Handling Missing Data

## Hierarchical Indexing???

## Combining Datasets: Concat and Append