# Data Science and Visualization (RUC F2023)

## Lecture 2: Exploratory Data Analysis (EDA)

# Series in pandas

A *Series* is a one-dimensional array-like object containing a sequence of **values** (of
similar types to NumPy types) and an associated array of data labels, called **index**.
The simplest Series is formed from only an array of data:

## Construction and content

In [None]:
import pandas as pd

series_1 = pd.Series([4, 7, -5, 3])
series_1

Above, the Output shows the index on the left and the values on the right. Since we did not specify an index for the data, a default index is created, consisting of the integers 0 through N-1, where N is the length of the Series object. 

We can get the array representation and index object of the Series via its values and index attributes, respectively:

In [None]:
series_1.values

In [None]:
series_1.index

We can get one of the element in two ways.

In [None]:
series_1[0]

In [None]:
series_1.values[0]

Pay attention to the value type if we change one of the element in a Series to be different from other integer values.

In [None]:
series_2 = pd.Series([4, '7', -5, 3])

In [None]:
series_2

In [None]:
series_2.values

In [None]:
series_2.index

## Compare Series List

* **Series** is defined in pandas
* **List** is defined in Python

In [None]:
list_1 = [4, 7, -5, 3]
list_1

**NB**: A *list* object in Python has no attribute 'values':

In [None]:
list_1.values

In [None]:
list_1.index

If we have different types of values in a list, the value/data types will be specific.

In [None]:
list_2 = [4, '7', -5, 3]
list_2

In [None]:
for i in range(len(list)):
    print(type(list_2[i]))

## More on Series' index

Now let's create a Series with customized index.

In [None]:
series_3 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])

In [None]:
series_3

In [None]:
series_3.index

We can use labels in the index when selecting single values or a set of values:

In [None]:
series_3['d']

In [None]:
series_3.values[0]

When we get values from a Series object, if the index contains more than one element, the index itself should be formulated as a list/array.

In [None]:
series_3[['a', 'b', 'c']]

We can still use the default, integer index.

In [None]:
series_3[[2, 1, 3]]

A Series’s index can be altered *in-place* by assignment:

In [None]:
series_3.index = ['A', 'BB', 'CCC', 'DDDD']
series_3

## Operations on Series

Given a Series, operations like filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link:

In [None]:
series_3[series_3 > 3]

In [None]:
series_3*2

In [None]:
series_3/2

In [None]:
import numpy as np

# exp(x) = e^x, where e is Euler's number 2.718281
np.exp(series_3)

## Series as dict (for self-study)

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping of index values to data values. It can be used in many contexts where you might use a Python *dict*:

In [None]:
'c' in series_3

We can create a Series from a Python by passing the dict:

In [None]:
dict = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

In [None]:
dict

In [None]:
series_4 = pd.Series(dict)
series_4

When you are only passing a dict, the index in the resulting Series will have the dict’s keys. You can override this by passing the dict keys in the order you want them to appear in the resulting Series:

In [None]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
series_5 = pd.Series(dict, index=states)
series_5

A useful Series feature for many applications is that it automatically aligns by index label in arithmetic operations:

In [None]:
series_6 = series_4 + series_5

Both the Series object itself and its index have a name attribute, which integrates with other key areas of pandas functionality:

In [None]:
# Seems it has no effect
series_6.name = 'population'

In [None]:
series_6.index.name = 'state'

In [None]:
series_6