![alt text](pandas.png "Title")

In [0]:
# Import in the recommended namespace
import pandas as pd

# pandas Series

Series are one-dimensional objects containing a sequence of values (not necessarily homogenuous) and an index. It's important to understand them as they are the foundation of Dataframes, an object which we'll use thoroughly. Series index and multiple methods make them very useful.

## Basics

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html

In [0]:
# let's create a Series from scratch:
s = pd.Series ( (1, -3, 7, 2) )
print(type(s))

In [0]:
# let's display the Series. Remember, this syntax works in Notebooks (e.g. Jupyter or Databricks). Use print() otherwise
s

# This is an indexed series of values, an array if you like...

In [0]:
# Looking at the values:
s.values

In [0]:
# The index is a range from 0 to 3:
s.index

In [0]:
help(pd.Series)

In [0]:
# What if we mix value types?
s = pd.Series ( (1, -3, 7, "Hello") )
s

## Labeling index

In [0]:
# We can label the index:
s = pd.Series ( [1, -3, 7, 2], index = ['a', 'b', 'c', 'd'])
print(s.index, '\n')
print(s)

In [0]:
# Series are array-like (based on NumPy arrays, for Pandas V1.x), so we can use the index to access values:
print('Value for index a: ', s['a'], '\n')

# that syntax works too if the index name is a valid object name
print('Value for index b: ', s.b, '\n')

# Same syntax to modify values:
s['b'] = -6
print(s)

## Basic operations

In [0]:
s

In [0]:
# Series support vectorized computations. 
# This is an essential concept in Pandas, and quite different from the implicit row-by-row logic in SAS.

# Example with a scalar multiplication:
s*2

In [0]:
# Alignement in arithmetic operations
s1 = pd.Series ( [1, -3, 7, 2], index = ['a', 'b', 'c', 'd'])
s2 = pd.Series ( [2, -1, 2, 5], index = ['a', 'b', 'c', 'd'])

s1 + s2

In [0]:
# Creating a Series of booleans
test = s > 0
test

In [0]:
# Filtering a Series with that boolean array. Note that the index/value link is preserved.
# Again, a very important feature which we'll use later
print(s[ s>0 ])

In [0]:
# You can think of a Series as an ordered Dict, mapping index values to data values.
# The following therefore works, like it would with Dicts.

# checking if 'b' is in index of Series s
'b' in s

In [0]:
# Concatenate 2 series. The resulting index has duplicates, we'll see how to change that later
pd.concat([s1, s2])

## Create from Dict

In [0]:
# You can create a Series from many other objects
genders = {10010: 'M', 10011: 'F', 10012: 'M'}
print(genders)
pd.Series(genders)

## Missing values

Missing values are displayed as 'NaN' (Not a number)

In [0]:
patients = [10010, 10011, 10012, 10013]
genders = {10010: 'M', 10012: 'F', 10013: 'M'}
genders = pd.Series(genders, index= patients)
genders

In [0]:
# isnull() creates a Series of booleans indicating whether we have missing values:
pd.isnull(genders) # the opposite being pd.notnull(genders)

In [0]:
pd.notnull(genders)

In [0]:
# Using an array of booleans for filtering out the missings:
genders[ pd.notnull(genders) ]

In [0]:
# Alignement and missings:
s1 = pd.Series ( [1, -3, 7, 2],    index = ['a',      'c', 'd', 'e'])
s2 = pd.Series ( [2, -1, 2, 5, 0], index = ['a', 'b', 'c', 'd', 'e'])

s1 + s2

In [0]:
# We'll see more about missings in a later chapter

## Misc

In [0]:
# Series have aggregations methods
(s1+s2).sum()

In [0]:
# By the way, this works too. sum() is a core Python function that works on iterables
print(s1)
print('Sum:', sum(s1))

In [0]:
# Series also have many methods to deal with strings, they are placed under the str accessor:
s = pd.Series( ["Hello", "Good morning", "Hi", "Good day"])
s.str.upper()

In [0]:
# Find the number of items in the Series. We can use the classic len()
len(s)

In [0]:
# or the length of each item
s.str.len()

__________________________________________________
Nicolas Dupuis, Methodology and Innovation (IDAR C&SP), 2020+