<a href="https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/14-Pandas_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `Pandas` Series

> Justin Post

- [`Pandas`](https://pandas.pydata.org/) library supports a `DataFrame` object similar to `R`'s data frame. This `DataFrame`'s columns are made up of `pandas` `Series` objects.

![](https://www4.stat.ncsu.edu/online/datasets/pandas.png)
> Image from https://www.altexsoft.com/blog/pandas-library/

- Convention is to import the module as `pd`

- First we'll learn about the `Series` objects. These make up a `DataFrame` object, which we'll use to handle many rectangular datasets

- `pandas` `Series` are
    + 1D labeled array that can hold any data type
    + Contains values and indices that are used to extract those values

Note: These types of webpages are built from Jupyter notebooks (`.ipynb` files). You can access your own versions of them by [clicking here](https://colab.research.google.com/github/jbpost2/ST-554-Big-Data-with-Python/blob/main/01_Programming_in_python/14-Pandas_Series.ipynb). **It is highly recommended that you go through and run the notebooks yourself, modifying and rerunning things where you'd like!**

## Creating a `pandas` `Series`

- Create a series using the `pd.Series()` function

In [1]:
import numpy as np
import pandas as pd
rng = np.random.default_rng(2) #set a seed
s = pd.Series(rng.normal(size = 10, loc = 2, scale = 4)) #mean of 2 and std of 4
s

Unnamed: 0,0
0,2.756214
1,-0.090994
2,0.347746
3,-7.76587
4,9.19883
5,6.576663
6,0.698309
7,5.095226
8,3.124843
9,-0.215291


---

## Indexing a `Series`

- Like lists, the ordering starts at 0
- Like `numpy` arrays, all elements in a `Series` must be of the same type
- Unlike `numpy` arrays, `Series` can be indexed by an `index` attribute (not just the numeric index)
- `.index` attribute returns just these indices

In [2]:
s.index

RangeIndex(start=0, stop=10, step=1)

In [3]:
s[0] #is both the numeric index and the value of an index here

2.756213527174132

In [4]:
s2 = pd.Series(rng.normal(size = 10, loc = 2, scale = 4),
               index = [x for x in "abcdefghij"])
s2

Unnamed: 0,0
a,5.91027
b,0.757774
c,0.684704
d,-1.168587
e,3.819832
f,1.603208
g,4.181155
h,-0.428743
i,2.507311
j,-1.569096


In [5]:
s2.index

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')

We can access elements with the numeric index or the index value itself but this behavior will go away soon and the `.iloc[]` method should be used instead (we discuss the similar `DataFrames` `.iloc[]` method shortly).

In [8]:
s2[2]

  s2[2]


0.6847043837681492

In [9]:
s2["c"]

0.6847043837681492

- We can obtain just the values with of a `Series` using the `.values` attribute

In [10]:
s.values

array([ 2.75621353, -0.09099377,  0.34774583, -7.76586953,  9.19882953,
        6.57666349,  0.69830865,  5.09522635,  3.12484268, -0.21529135])

In [11]:
s2.values

array([ 5.9102698 ,  0.75777381,  0.68470438, -1.16858702,  3.81983228,
        1.60320779,  4.18115486, -0.4287428 ,  2.50731139, -1.56909617])

- Note that when you return the values you get back just a `numpy` array!

In [12]:
type(s2.values)

numpy.ndarray

---

# Series Relation to Other Common Objects


### Relation to Dictionaries

- Recall a dictionary consists of key/value pairs
- When creating a `Series` from a dictionary, the keys of the dictionary become the indices


In [13]:
d = {'b': 1,
     'a': 0,
     'c': 2}
pd.Series(d)

Unnamed: 0,0
b,1
a,0
c,2


Here's an example with more complex values that show the values of a `Series` can be a list!

In [15]:
AFCDivisions = {
  "AFCNorth": ["Steelers", "Browns", "Ravens", "Bengals"],
  "AFCEast" : ["Patriots", "Jets", "Dolphins", "Bills"],
  "AFCWest" : ["Raiders", "Chiefs", "Chargers", "Broncos"],
  "AFCSouth": ["Texans", "Colts", "Jaguars", "Titans"]
  }
div_series = pd.Series(AFCDivisions)
div_series

Unnamed: 0,0
AFCNorth,"[Steelers, Browns, Ravens, Bengals]"
AFCEast,"[Patriots, Jets, Dolphins, Bills]"
AFCWest,"[Raiders, Chiefs, Chargers, Broncos]"
AFCSouth,"[Texans, Colts, Jaguars, Titans]"


- `Series` are like a fixed-size `dict` object

    + Can get and set values within a `Series` using the index label
    + But `Series` have an ordering to them so, unlike a dictionary, we can use a numeric index (although again, `.iloc[]` is now the preferred way to do numeric index subsetting)


In [18]:
div_series["AFCNorth"]

['Steelers', 'Browns', 'Ravens', 'Bengals']

In [19]:
div_series[0]

  div_series[0]


['Steelers', 'Browns', 'Ravens', 'Bengals']

In [22]:
div_series.iloc[0]

['Steelers', 'Browns', 'Ravens', 'Bengals']

- We can check if an index occurs similar to how we could check if a key occurred in a dictionary

In [23]:
print("AFCNorth" in AFCDivisions)
print("AFCNorth" in div_series)

True
True


---

### Relation to Numpy Arrays

- Series behave very similarly to `NumPy`'s 1D `ndarray`
- In fact, `NumPy` functions can typically take series as input!

In [24]:
s #was created from a numpy array!

Unnamed: 0,0
0,2.756214
1,-0.090994
2,0.347746
3,-7.76587
4,9.19883
5,6.576663
6,0.698309
7,5.095226
8,3.124843
9,-0.215291


In [25]:
np.exp(s)

Unnamed: 0,0
0,15.74013
1,0.913023
2,1.415872
3,0.000424
4,9885.551552
5,718.139247
6,2.01035
7,163.240789
8,22.756315
9,0.806306


- Numerical operations are done element-wise

In [26]:
s * 3

Unnamed: 0,0
0,8.268641
1,-0.272981
2,1.043237
3,-23.297609
4,27.596489
5,19.72999
6,2.094926
7,15.285679
8,9.374528
9,-0.645874


---

### Relation to lists

- Series are like a `list` object in that you can

    + get and set values by integer index location
    + can using slicing with `:`


In [29]:
s[4] = 0
s

Unnamed: 0,0
0,2.756214
1,-0.090994
2,0.347746
3,-7.76587
4,0.0
5,6.576663
6,0.698309
7,5.095226
8,3.124843
9,-0.215291


In [30]:
s[:5]

Unnamed: 0,0
0,2.756214
1,-0.090994
2,0.347746
3,-7.76587
4,0.0


In [31]:
s[3:5]

Unnamed: 0,0
3,-7.76587
4,0.0


---

# Recap

- `Pandas` `Series` will make up `Pandas` `DataFrames`
    + Each column of a `DataFrame` is made up of a `Series`
- `Series` are:
    + a 1D data structure with indices and values    

If you are on the course website, use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!

If you are on Google Colab, head back to our course website for [our next lesson](https://jbpost2.github.io/ST-554-Big-Data-with-Python/01_Programming_in_python/15-Pandas_Data_Frames.html)!
