### 2024-03-04
#### Pandas

open source analysis library
- lower level ops implemented in Cython (C+Python)

In [3]:
## install 
# !pip install pandas

using `conda`:  
`conda install pandas`  
using `pip`:  
`pip install pandas`  

### Basic data structures  
-  series: represents a 1d labeled array  
-  dataframe: table of rows with labeled columns  
    - like a spreadsheet or R dataframe

### Pandas `series`  
- by default indices are integers starting from 0  
- pandas tries to infer datatype automatically  
- can create a `pandas` series from any array-like structure

In [4]:
import pandas as pd
import numpy as np
numbers = np.random.randn(5)
numbers

array([ 0.42971771, -1.68127207, -0.70791126,  0.30192299, -1.31893095])

In [5]:
s = pd.Series(numbers)
s

0    0.429718
1   -1.681272
2   -0.707911
3    0.301923
4   -1.318931
dtype: float64

In [6]:
# create a series using a custom index
idx = ['a','b','c','d','e']
s = pd.Series(numbers, index=idx)
s

a    0.429718
b   -1.681272
c   -0.707911
d    0.301923
e   -1.318931
dtype: float64

Setting up a series looks similar to a dictionary, and you can create a labeled series from a dictionary. Keys become indices.

In [8]:
d = {'dog':2, 'cat':1, 'bird':0, 'goat':9}
s = pd.Series(d)
s

dog     2
cat     1
bird    0
goat    9
dtype: int64

Note: Indices do not need to be unique in pandas series. 

In [10]:
s = pd.Series([1,2,3], index = ['a','a','b'])
s

a    1
a    2
b    3
dtype: int64

In [11]:
s['a']

a    1
a    2
dtype: int64

Series objects can are `dict`-like, we can access and update entries via keys

more information: https://pandas.pydata.org/docs/getting_started/index.html#getting-started