<a href="https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-With-Python-Course-Notes/blob/main/01_Programming_in_python/14_Pandas_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Series
Justin Post

---

- [Pandas](https://pandas.pydata.org/) library supports a data frame object similar to R's data.frame

![](https://drive.google.com/uc?export=view&id=1MFemy7erGLCxMRKejDgesFFTzTfQynOm)

- Convention is to import the module as `pd`

- First we'll learn about the **series** objects. These make up a data frame object, which we'll use to handle many rectangular datasets

- pandas *series*
    + 1D labeled array that can hold any data type
    + Contains values and indices that are used to extract those values

## Creating a pandas Series

- Create a series using the `pd.Series()` function

In [3]:
import numpy as np
import pandas as pd
rng = np.random.default_rng(2) #set a seed
s = pd.Series(rng.normal(size = 10, loc = 2, scale = 4)) #mean of 2 and std of 4
s

0    2.756214
1   -0.090994
2    0.347746
3   -7.765870
4    9.198830
5    6.576663
6    0.698309
7    5.095226
8    3.124843
9   -0.215291
dtype: float64

---

## Indexing a Series

- The ordering starts at 0
- Like `numpy` arrays, all elements in a series must be of the same type
- Unlike `numpy` arrays, series can be indexed by an `index` attribute (not just the numeric index)
- `.index` attribute returns just the indices

In [4]:
s.index

RangeIndex(start=0, stop=10, step=1)

In [5]:
s[0] #is both the numeric index and the value of an index here

2.756213527174132

In [7]:
s2 = pd.Series(rng.normal(size = 10, loc = 2, scale = 4),
               index = [x for x in "abcdefghij"])
s2

a    5.910270
b    0.757774
c    0.684704
d   -1.168587
e    3.819832
f    1.603208
g    4.181155
h   -0.428743
i    2.507311
j   -1.569096
dtype: float64

In [8]:
s2.index

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')

We can access elements with the numeric index of this index value.

In [10]:
s2[2]

0.6847043837681492

In [11]:
s2["c"]

0.6847043837681492

- We can obtain just the values with of a series using the `.values` attribute

In [12]:
s.values

array([ 2.75621353, -0.09099377,  0.34774583, -7.76586953,  9.19882953,
        6.57666349,  0.69830865,  5.09522635,  3.12484268, -0.21529135])

In [13]:
s2.values

array([ 5.9102698 ,  0.75777381,  0.68470438, -1.16858702,  3.81983228,
        1.60320779,  4.18115486, -0.4287428 ,  2.50731139, -1.56909617])

- Note that when you return the values you get back just a numpy array!

In [14]:
type(s2.values)

numpy.ndarray

---

# Series Relation to Other Common Objects


### Relation to Dictionaries

- Recall a dictionary consists of key/value pairs
- When creating a series from a dictionary, the keys of the dictionary become the indices


In [15]:
d = {'b': 1,
     'a': 0,
     'c': 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

In [18]:
AFCDivisions = {
  "AFCNorth": ["Steelers", "Browns", "Ravens", "Bengals"],
  "AFCEast" : ["Patriots", "Jets", "Dolphins", "Bills"],
  "AFCWest" : ["Raiders", "Chiefs", "Chargers", "Broncos"],
  "AFCSouth": ["Texans", "Colts", "Jaguars", "Titans"]
  }
div_series = pd.Series(AFCDivisions)
div_series

AFCNorth     [Steelers, Browns, Ravens, Bengals]
AFCEast        [Patriots, Jets, Dolphins, Bills]
AFCWest     [Raiders, Chiefs, Chargers, Broncos]
AFCSouth        [Texans, Colts, Jaguars, Titans]
dtype: object

- Series are like a fixed-size `dict` object

    + Can get and set values within a series using the index label
    + But Series have an ordering to them so, unlike a dictionary, we can use a numeric index


In [23]:
div_series["AFCNorth"]

['Steelers', 'Browns', 'Ravens', 'Bengals']

In [22]:
div_series[0]

['Steelers', 'Browns', 'Ravens', 'Bengals']

- We can check if an index occurs similar to how we could check if a key occurred in a dictionary

In [24]:
print("AFCNorth" in AFCDivisions)
print("AFCNorth" in div_series)

True
True


---

### Relation to Numpy Arrays

- Series behave very similarly to `NumPy`'s 1D `ndarray`
- In fact, `NumPy` functions can typically take series as input

In [25]:
s #was created from a numpy array!

0    2.756214
1   -0.090994
2    0.347746
3   -7.765870
4    9.198830
5    6.576663
6    0.698309
7    5.095226
8    3.124843
9   -0.215291
dtype: float64

In [26]:
np.exp(s)

0      15.740130
1       0.913023
2       1.415872
3       0.000424
4    9885.551552
5     718.139247
6       2.010350
7     163.240789
8      22.756315
9       0.806306
dtype: float64

- Numerical operations are done element-wise

In [30]:
s * 3

0     8.268641
1    -0.272981
2     1.043237
3   -23.297609
4    27.596489
5    19.729990
6     2.094926
7    15.285679
8     9.374528
9    -0.645874
dtype: float64

---

### Relation to lists

- Series are like a `list` object in that you can

    + get and set values by integer index location
    + can using slicing with `:`


In [31]:
s[4] = 0
s

0    2.756214
1   -0.090994
2    0.347746
3   -7.765870
4    0.000000
5    6.576663
6    0.698309
7    5.095226
8    3.124843
9   -0.215291
dtype: float64

In [32]:
s[:5]

0    2.756214
1   -0.090994
2    0.347746
3   -7.765870
4    0.000000
dtype: float64

In [33]:
s[3:5]

3   -7.76587
4    0.00000
dtype: float64

---

# Recap

- Series will make up Data Frames
    + Each column of a data frame is made up of a series
- Series are:
    + a 1D data structure with indices and values    
