# Pandas Series

## What is a Pandas Series?

- A Pandas Series is a one-dimensional array-like object containing data of any type.
- Each element in a Pandas Series is called an "item".
- Each item in a Pandas Series is associated with an index (integers by default)

## Axis Labels

- An Axis label is the name of a column or row
- A Pandas Series has only one axis label, which is the index axis (unlike a DataFrame which have both rows and column axis)
- The axis label can be useful for clarity when manipulating or describing the Series
- The axis label is defined using the "name" attribute

## Label Index (Index)

- A label index refers to the label of each item in the Series
- These labels can be used to access each item in the Series
- By default, the index is an integer sequence starting at 0

## Creating a Pandas Series

### Creating an Empty Pandas Series

In [1]:
import pandas as pd

# 1. Create an empty series
s_empty = pd.Series()
print(s_empty)

# 2. Create an empty series but specify the data type
s_empty_int = pd.Series([], dtype="int64")
print(s_empty_int)

s_empty_str = pd.Series([], dtype="str")
print(s_empty_str)

Series([], dtype: object)
Series([], dtype: int64)
Series([], dtype: object)


### Creating a Series from a Python List

In [15]:
import pandas as pd

# Create a series from a list, index is 0, 1, 2, 3, 4, ... by default
data = ["a", "b", "c", "d", "e"]
s = pd.Series(data)

print(s)

# In this example, specify the index to be [0, 1, 2, 3, 4]
# NOTE: The index length must be equal to the data length
data2 = [10, 20, 30, 40, 50]
idx = ["a", "b", "c", "d", "e"]
s2 = pd.Series(data2, index=idx)

print(s2)

0    a
1    b
2    c
3    d
4    e
dtype: object
a    10
b    20
c    30
d    40
e    50
dtype: int64


### Creating a Series from a Python Dictionary

When you pass a dictionary to the `Series` constructor, the keys become the index and the values become the items in the series.

In [17]:
import pandas as pd

# Python dictionary
data = {"a": 10, "b": 20, "c": 30, "d": 40}

# Creating a Series from the dictionary
series = pd.Series(data)

print(series)

a    10
b    20
c    30
d    40
dtype: int64


In [18]:
import pandas as pd

# Python dictionary
data = {"a": 10, "b": 20, "c": 30, "d": 40}

# Creating a Series with a name
# The name acts as an axis label
series = pd.Series(data, name="Example Series")

print(series)

a    10
b    20
c    30
d    40
Name: Example Series, dtype: int64


## Creating Other Data Structures from Pandas Series

### Convert a Pandas Series to a Python List

In [44]:
import pandas as pd

# Creating a pandas Series
data = [10, 20, 30, 40]
series = pd.Series(data)

# Converting the Series back to a Python list
data_list = series.tolist()

print(data_list)

[10, 20, 30, 40]


### Convert a Pandas Series to a Python Dictionary

In [46]:
import pandas as pd

# Creating a pandas Series
data = [10, 20, 30, 40]
index = ["a", "b", "c", "d"]
series = pd.Series(data, index=index)

# Converting the Series to a Python dictionary
#  The index becomes the dictionary keys
#  The values become the dictionary values
data_dict = series.to_dict()

print(data_dict)

{'a': 10, 'b': 20, 'c': 30, 'd': 40}


### Convert a Pandas Series to a Pandas DataFrame

In [52]:
import pandas as pd

# Creating a pandas Series
data = [10, 20, 30, 40]
index = ["a", "b", "c", "d"]
series = pd.Series(data, index=index)

# Converting the Series to a DataFrame
axis_label = "Values"  # The name of the column (this is optional)
df = series.to_frame(name=axis_label)

print(df)

   Values
a      10
b      20
c      30
d      40


### Convert Multiple Pandas Series to a Pandas DataFrame

In [56]:
import pandas as pd

# Creating a list of pandas Series
idx = ["a", "b", "c"]
series1 = pd.Series([10, 20, 30], index=idx)
series2 = pd.Series([40, 50, 60], index=idx)
series3 = pd.Series([70, 80, 90], index=idx)

# Creating a DataFrame from the list of Series (as rows)
df_rows = pd.DataFrame([series1, series2, series3])
print(f"Series as Rows:\n{df_rows}\n")

# Creating a DataFrame from the list of Series (as columns)
df_columns = pd.DataFrame({"Column1": series1, "Column2": series2, "Column3": series3})
print(f"Series as Columns:\n{df_columns}")

Series as Rows:
    a   b   c
0  10  20  30
1  40  50  60
2  70  80  90

Series as Columns:
   Column1  Column2  Column3
a       10       40       70
b       20       50       80
c       30       60       90


## Information about the Pandas Series

### Basic Information

In [40]:
import pandas as pd

# Create a series from a list
data = [10, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
s = pd.Series(data)

# Return data type of the Series
dtype = s.dtype
print(f"The Data Type of the Series is {dtype}")

# Return the size of the Series (number of elements)
size = s.size
print(f"The Size of the Series is {size}")

# Return the shape of the Series (number of elements in each dimension)
shape = s.shape
print(f"The Shape of the Series is {shape}")

# Return the number of dimensions of the Series
ndim = s.ndim
print(f"The Number of Dimensions of the Series is {ndim}")

# Return first 5 elements of the Series
print(f"First 5 Elements: {s.head()}")

# Return first n elements of the Series
n = 3
print(f"First {n} Elements: {s.head(n)}")

# Return last 5 elements of the Series
print(f"Last 5 Elements: {s.tail()}")

# Return last n elements of the Series
n = 3
print(f"Last {n} Elements: {s.tail(n)}")

# Return the unique values of the Series
print(f"Unique Values: {s.unique()}")

# Return the number of unique values of the Series
print(f"Number of Unique Values: {s.nunique()}")

# Return boolean series indicating whether each element is null
print(f"Has Null Values: {s.isnull()}")

# Return boolean mask indicating whether the series has null values
print(f"Has Null Values: {s.isnull().any()}")

The Data Type of the Series is int64
The Size of the Series is 11
The Shape of the Series is (11,)
The Number of Dimensions of the Series is 1
First 5 Elements: 0    10
1    10
2    20
3    30
4    40
dtype: int64
First 3 Elements: 0    10
1    10
2    20
dtype: int64
Last 5 Elements: 6      60
7      70
8      80
9      90
10    100
dtype: int64
Last 3 Elements: 8      80
9      90
10    100
dtype: int64
Unique Values: [ 10  20  30  40  50  60  70  80  90 100]
Number of Unique Values: 10
Has Null Values: 0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool
Has Null Values: False


### Descriptive Statistics

In [32]:
import pandas as pd

# Create a series from a list
data = [10, 20, 30, 40, 50, 50]
s = pd.Series(data)

# Generates description of the Series
print(f"Series Description:\n{s.describe()}")

# Return the number of non-null values in the Series
print(f"Count: {s.count()}")

# Return the number of null values in the Series
print(f"Null Count: {s.isnull().sum()}")

# Return the maximum value of the Series
print(f"Max: {s.max()}")

# Return the minimum value of the Series
print(f"Min: {s.min()}")

# Return the mean of the Series
print(f"Mean: {s.mean()}")

# Return the median of the Series
print(f"Median: {s.median()}")

# Return the mode of the Series (most common value)
print(f"Mode: {s.mode()}")

# Return the variance of the Series (measure of the spread of the distribution)
print(f"Variance: {s.var()}")

# Return the standard deviation of the Series (measure of the spread of the distribution)
print(f"Standard Deviation: {s.std()}")

# Return the skewness of the Series (measure of the asymmetry of the distribution)
print(f"Skewness: {s.skew()}")

# Return the sum of the Series
print(f"Sum: {s.sum()}")

# Return the product of the Series
print(f"Product: {s.prod()}")

# Return the cumulative sum of the Series
print(f"Cumulative Sum: {s.cumsum()}")

# Return the cumulative product of the Series
print(f"Cumulative Product: {s.cumprod()}")

Series Description:
count     6.000000
mean     33.333333
std      16.329932
min      10.000000
25%      22.500000
50%      35.000000
75%      47.500000
max      50.000000
dtype: float64
Count: 6
Null Count: 0
Max: 50
Min: 10
Mean: 33.333333333333336
Median: 35.0
Mode: 0    50
dtype: int64
Variance: 266.6666666666667
Standard Deviation: 16.32993161855452
Skewness: -0.38273277230987224
Sum: 200
Product: 600000000
Cumulative Sum: 0     10
1     30
2     60
3    100
4    150
5    200
dtype: int64
Cumulative Product: 0           10
1          200
2         6000
3       240000
4     12000000
5    600000000
dtype: int64


## Accessing Items in a Pandas Series

The items in a Pandas Series can be accessed using their index label.

In [64]:
import pandas as pd

# Create a series from a list
data = ["a", "b", "c"]
s1 = pd.Series(data)
v_0 = s1[0]
v_1 = s1[1]
v_2 = s1[2]

# Accessing by index
print(f"Value at index 0: {v_0}")
print(f"Value at index 1: {v_1}")
print(f"Value at index 2: {v_2}")


# Creating a pandas Series
data = [10, 20, 30, 40]
index = ["a", "b", "c", "d"]
s2 = pd.Series(data, index=index)
v_a = s2["a"]
v_b = s2["b"]
v_c = s2["c"]
v_d = s2["d"]
# v_e = s2["e"] # KeyError

# Accessing by index
print(f"Value at index 'a': {v_a}")
print(f"Value at index 'b': {v_b}")
print(f"Value at index 'c': {v_c}")
print(f"Value at index 'd': {v_d}")
# print(f"Value at index 'e': {v_e}") # KeyError

Value at position 0: a
Value at position 1: b
Value at position 2: c
Value at index 'a': 10
Value at index 'b': 20
Value at index 'c': 30
Value at index 'd': 40


You can also access the items in a Pandas Series using their index position.

In [66]:
import pandas as pd

# Creating a pandas Series
data = [10, 20, 30, 40]
s = pd.Series(data)
v_0 = s.iloc[0]
v_1 = s.iloc[1]
v_2 = s.iloc[2]
v_3 = s.iloc[3]

# Accessing by position
print(f"Value at position 0: {v_0}")
print(f"Value at position 1: {v_1}")
print(f"Value at position 2: {v_2}")
print(f"Value at position 3: {v_3}")

Value at position 0: 10
Value at position 1: 20
Value at position 2: 30
Value at position 3: 40


## Adding (Concatenating) Items to a Pandas Series

In [69]:
import pandas as pd

# Existing Series
data1 = [10, 20, 30]
data2 = [40, 50, 60]

series1 = pd.Series(data1)
series2 = pd.Series(data2)

# Concatenating two Series
concatenated_series = pd.concat([series1, series2])

print(concatenated_series)

0    10
1    20
2    30
0    40
1    50
2    60
dtype: int64


In [68]:
import pandas as pd

# Existing Series
data1 = [10, 20, 30]
data2 = [40, 50, 60]

# Also specify the index
index = ["a", "b", "c"]

series1 = pd.Series(data1, index=index)
series2 = pd.Series(data2, index=index)

# Concatenating two Series
concatenated_series = pd.concat([series1, series2])

print(concatenated_series)

a    10
b    20
c    30
a    40
b    50
c    60
dtype: int64


## Updating Items in a Pandas Series

### Update Items Using Index Labels

Use bracket notation to access and update items in a Pandas Series
- If the index label does not exist, a new item will be added
- If the index label exists, the value will be updated
- Bracket notation is context-dependent
    - Returns a single element when used against a Pandas Series
    - Returns a Pandas Series when used against a Pandas DataFrame

**NOTE:** You can also use `.iloc` to access and update items in a Pandas Series using their index position
- This assumes that the index is integer based

In [2]:
import pandas as pd

# Example 1: Using Default Index
data1 = [10, 20, 30]
s1 = pd.Series(data1)
print(f"Before:\n{s1}\n")

# Update value at index 2
s1[1] = 25  # Using bracket notation
s1.iloc[2] = 35  # Using iloc
print(f"After:\n{s1}\n")


# Example 2: Using Custom Index
data2 = [40, 50, 60]
idx = ["a", "b", "c"]
s2 = pd.Series(data2, index=idx)
print(f"Before:\n{s2}\n")

# Update value at index 'b'
s2["b"] = 55
print(f"After:\n{s2}")

Before:
0    10
1    20
2    30
dtype: int64

After:
0    10
1    25
2    35
dtype: int64

Before:
a    40
b    50
c    60
dtype: int64

After:
a    40
b    55
c    60
dtype: int64


### Update Items Using Label Positions

Can also use `.loc` to access and update items in a Pandas Series using their index label

In [76]:
import pandas as pd

# Existing Series
data = [10, 20, 30]
index = ["a", "b", "c"]
series = pd.Series(data, index=index)

# Update value at index 'c'
series.loc["c"] = 35

print(series)

a    10
b    20
c    35
dtype: int64


## Deleting Items from a Pandas Series

### Using `.drop()`

You can use the `.drop()` method to remove items from a Pandas Series based on their index label
- By default, the `.drop()` method returns a new Series
- If you want to keep the original Series, you can set the `inplace` parameter to `True`

In [9]:
import pandas as pd

# Existing Series
data = [10, 20, 30]
series = pd.Series(data)

print("Original Series:")
print(series)

# Drop value at index 1
series = series.drop(1)

print("\nSeries after dropping value at index 1:")
print(series)

Original Series:
0    10
1    20
2    30
dtype: int64

Series after dropping value at index 1:
0    10
2    30
dtype: int64



### Using `.pop()`

You can use the `.pop()` method to remove items from a Pandas Series based on their index position
- By default, the `.pop()` method returns the removed item
- If you want to keep the original Series, you can set the `inplace` parameter to `True`

In [8]:
import pandas as pd

# Existing Series
data = [10, 20, 30]
series = pd.Series(data)

print("Original Series:")
print(series)

# Pop off value at index 1
popped = series.pop(1)

print("\nPopped value:")
print(popped)

Original Series:
0    10
1    20
2    30
dtype: int64

Popped value:
20


### Using `.dropna()`

You can use the `.dropna()` method to remove items from a Pandas Series that contain `NaN` values
- By default, the `.dropna()` method returns a new Series
- If you want to keep the original Series, you can set the `inplace` parameter to `True`

In [7]:
import numpy as np
import pandas as pd

# Existing Series
data = [10, np.nan, 20, np.nan, 30]
series = pd.Series(data)

print("Original Series:")
print(series)

# Using dropna() to remove NaN values
clean_series = series.dropna()

print("\nSeries after using dropna():")
print(clean_series)

Original Series:
0    10.0
1     NaN
2    20.0
3     NaN
4    30.0
dtype: float64

Series after using dropna():
0    10.0
2    20.0
4    30.0
dtype: float64


### Using Boolean Indexing

You can use boolean indexing to remove items from a Pandas Series by filtering based on a condition

In [13]:
import pandas as pd

# Creating a pandas Series
data = [0, 10, 20, 30, 40, 50]
series = pd.Series(data)

print("Original Series:")
print(series)

# Find out indices of elements greater than 20
series = series[series <= 20]

print("\nUpdated Series after deleting elements greater than 20:")
print(series)

Original Series:
0     0
1    10
2    20
3    30
4    40
5    50
dtype: int64

Updated Series after deleting elements greater than 20:
0     0
1    10
2    20
dtype: int64


## Sorting Items in a Pandas Series

### Sort by Index

Use `.sort_index()` to sort items in a Pandas Series in ascending or descending order
- By default, the `.sort_index()` method sorts items in ascending order
- You can set the `ascending` parameter to `False` to sort in descending order

In [17]:
import pandas as pd

# Creating a pandas Series
data = [10, 30, 20, 40]
index = ["d", "b", "c", "a"]
series = pd.Series(data, index=index)

print("Original Series:")
print(series)

# Sorting by index
sorted_series_index_asc = series.sort_index()
sorted_series_index_desc = series.sort_index(ascending=False)

print("\nSeries sorted by index (ascending):")
print(sorted_series_index_asc)

print("\nSeries sorted by index (descending):")
print(sorted_series_index_desc)

Original Series:
d    10
b    30
c    20
a    40
dtype: int64

Series sorted by index (ascending):
a    40
b    30
c    20
d    10
dtype: int64

Series sorted by index (descending):
d    10
c    20
b    30
a    40
dtype: int64


### Sort by Values

Use `.sort_values()` to sort items in a Pandas Series in ascending or descending order
- By default, the `.sort_values()` method sorts items in ascending order
- You can set the `ascending` parameter to `False` to sort in descending order

In [16]:
import pandas as pd

# Creating a pandas Series
data = [30, 10, 40, 20]
series = pd.Series(data)

print("Original Series:")
print(series)

# Sorting by values
sorted_series_asc = series.sort_values()
sorted_series_desc = series.sort_values(ascending=False)

print("\nSeries sorted by values (ascending):")
print(sorted_series_asc)

print("\nSeries sorted by values (descending):")
print(sorted_series_desc)

Original Series:
0    30
1    10
2    40
3    20
dtype: int64

Series sorted by values (ascending):
1    10
3    20
0    30
2    40
dtype: int64

Series sorted by values (descending):
2    40
0    30
3    20
1    10
dtype: int64


## Filtering Items in a Pandas Series

### Using Boolen Indexing

Use boolean indexing to filter items in a Pandas Series
- The boolean indexing operator `[]` is used to filter items in a Pandas Series
- Apply a condition that returns a boolean series to filter items in a Pandas Series

In [18]:
import pandas as pd

# Creating a pandas Series
data = [10, 30, 20, 40]
index = ["a", "b", "c", "d"]
series = pd.Series(data, index=index)

print("Original Series:")
print(series)

# Filter out values greater than 20
filtered_series = series[series > 20]

print("\nFiltered Series (values greater than 20):")
print(filtered_series)

Original Series:
a    10
b    30
c    20
d    40
dtype: int64

Filtered Series (values greater than 20):
b    30
d    40
dtype: int64


### Filtering Series with Multiple Conditions

Use multiple conditions to filter items in a Pandas Series
- The `&` and `|` operators are used to combine multiple conditions
- The `~` operator is used to negate a condition

In [20]:
import pandas as pd

# Creating a pandas Series
data = [0, 10, 20, 30, 40, 50]
series = pd.Series(data)

print("Original Series:")
print(series)

# Filter out values greater than 20 and less than 50
filtered_series = series[(series > 20) & (series < 50)]

print("\nFiltered Series (values greater than 20 and less than 50):")
print(filtered_series)

Original Series:
0     0
1    10
2    20
3    30
4    40
5    50
dtype: int64

Filtered Series (values greater than 20 and less than 50):
3    30
4    40
dtype: int64


### Using `dropna()` to Filter Out NaN Values

Use the `dropna()` method to filter out NaN values from a Pandas Series
- The `dropna()` method removes all NaN values from a Pandas Series
- The `inplace` parameter is set to `True` to remove the NaN values from the original Series

In [21]:
import numpy as np

# Creating a pandas Series with NaN values
data = [10, np.nan, 20, np.nan, 30, 40]
series = pd.Series(data)

print("Original Series with NaN values:")
print(series)

# Filter out NaN values
filtered_series = series.dropna()

print("\nFiltered Series (without NaN values):")
print(filtered_series)

Original Series with NaN values:
0    10.0
1     NaN
2    20.0
3     NaN
4    30.0
5    40.0
dtype: float64

Filtered Series (without NaN values):
0    10.0
2    20.0
4    30.0
5    40.0
dtype: float64


### Using the `query()` Method

Although `query()` is more commonly used with DataFrames, it can be adapted for Pandas Series by converting the Series to a DataFrame first

In [22]:
import pandas as pd

# Creating a pandas Series
data = [10, 20, 30, 40]
series = pd.Series(data)

print("Original Series:")
print(series)

# Convert the series to a DataFrame to use the query method
df = series.to_frame(name="values")

# Filter out values greater than 20
filtered_df = df.query("values > 20")

print("\nFiltered DataFrame (values greater than 20):")
print(filtered_df)

Original Series:
0    10
1    20
2    30
3    40
dtype: int64

Filtered DataFrame (values greater than 20):
   values
2      30
3      40


## Iterating Through Items in a Pandas Series

### Using a `for` Loop

Use a `for` loop to iterate through items in a Pandas Series

In [24]:
import pandas as pd

# Creating a pandas Series
data = [10, 30, 20, 40]
series = pd.Series(data)

print("Iterating through values in the series:")
for value in series:
    print(value)

Iterating through values in the series:
10
30
20
40


### Using the `items()` Method

Use the `items()` method to iterate through items in a Pandas Series
- The `items()` method returns a tuple of (index, value) for each item in the Series

In [26]:
import pandas as pd

# Creating a pandas Series
data = [10, 30, 20, 40]
series = pd.Series(data)

print("Iterating through index and values in the series:")
for idx, value in series.items():
    print(f"Index: {idx}, Value: {value}")

Iterating through index and values in the series:
Index: 0, Value: 10
Index: 1, Value: 30
Index: 2, Value: 20
Index: 3, Value: 40


## Apply Functions to Items in a Pandas Series

In [43]:
import pandas as pd

# Create a series from a list
data = [10, 20, 30, 40, 50, 50]
s = pd.Series(data)
print(f"Original Series:\n{s}")


# Create a function that multiplies each value in the series by 2
def multiply_by_2(x):
    return x * 2


# Apply the function to each value in the series
s = s.apply(multiply_by_2)

print(f"Series after applying function:\n{s}")

Original Series:
0    10
1    20
2    30
3    40
4    50
5    50
dtype: int64
Series after applying function:
0     20
1     40
2     60
3     80
4    100
5    100
dtype: int64


## Resources / References
- [pandas.Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html)