<img src="./images/banner.png" width="800">

In [None]:
# todo
# add .name change to series

# Creating and Manipulating Pandas Series

In this lecture, we will dive into the world of Pandas Series, one of the fundamental data structures in the Pandas library. Series are one-dimensional labeled arrays that can hold any data type, providing a convenient way to store and manipulate data in Python.


We will start by exploring different methods for creating Series from various data types, such as lists, dictionaries, and NumPy arrays. You'll learn how to use the `pd.Series()` function to create Series objects efficiently.


Next, we will focus on accessing elements and slicing Series using both index labels and integer locations. You'll discover how to retrieve specific elements, select subsets of data, and extract portions of a Series based on their positions or labels.


We will then move on to performing basic operations on Series, including arithmetic operations like addition, subtraction, multiplication, and division. You'll learn how to apply these operations element-wise between Series and scalars or between two Series objects. Additionally, we will explore statistical operations, such as calculating descriptive statistics and summarizing data using built-in methods.


Modifying Series is another essential skill you'll acquire in this lecture. We will cover techniques for updating existing elements, adding new elements, and removing elements from a Series. You'll also learn how to sort Series based on their index or values, providing a convenient way to organize your data.


Throughout the lecture, we will introduce you to important Series attributes and methods that will enhance your data manipulation capabilities. You'll learn how to access attributes like data type (`dtype`), shape, size, and index, as well as utilize methods such as `head()`, `tail()`, `unique()`, `value_counts()`, `isnull()`, and `notnull()` to gain insights into your data.


By the end of this lecture, you'll have a solid foundation in creating and manipulating Pandas Series, enabling you to efficiently work with one-dimensional data in your data analysis and manipulation tasks.


Let's dive in and start exploring the power of Pandas Series!


**Table of contents**<a id='toc0_'></a>    
- [Creating Series](#toc1_)    
  - [From lists](#toc1_1_)    
  - [From dictionaries](#toc1_2_)    
  - [From NumPy arrays](#toc1_3_)    
  - [Using the `pd.Series()` function](#toc1_4_)    
- [Series Attributes and Methods](#toc2_)    
  - [Common attributes](#toc2_1_)    
  - [Useful methods](#toc2_2_)    
- [Accessing Elements and Slicing Series](#toc3_)    
  - [Accessing elements by index label](#toc3_1_)    
  - [Accessing elements by integer location](#toc3_2_)    
  - [Slicing Series using index labels](#toc3_3_)    
  - [Slicing Series using integer locations](#toc3_4_)    
- [Modifying Series](#toc4_)    
  - [Updating elements](#toc4_1_)    
  - [Adding elements](#toc4_2_)    
  - [Removing elements](#toc4_3_)    
- [Sorting Series](#toc5_)    
  - [Sorting by index](#toc5_1_)    
  - [Sorting by values](#toc5_2_)    
- [Basic Operations on Series](#toc6_)    
  - [Arithmetic operations](#toc6_1_)    
    - [Addition, subtraction, multiplication, division](#toc6_1_1_)    
    - [Scalar operations](#toc6_1_2_)    
    - [Series operations](#toc6_1_3_)    
  - [Statistical operations](#toc6_2_)    
    - [Descriptive statistics](#toc6_2_1_)    
    - [Summarizing data](#toc6_2_2_)    
  - [Comparing Series](#toc6_3_)    
    - [Element-wise comparison](#toc6_3_1_)    
    - [Filtering based on conditions](#toc6_3_2_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Creating Series](#toc0_)

Pandas Series can be created from various data types, including lists, dictionaries, and NumPy arrays. The `pd.Series()` function is the primary way to create Series objects in Pandas. Let's explore each method of creating Series in detail.


### <a id='toc1_1_'></a>[From lists](#toc0_)


One of the simplest ways to create a Series is from a Python list. You can pass a list of values to the `pd.Series()` function, and it will create a Series object with the values from the list.


In [1]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

By default, Pandas assigns an integer index starting from 0 to each element in the Series.


### <a id='toc1_2_'></a>[From dictionaries](#toc0_)


You can also create a Series from a Python dictionary. In this case, the keys of the dictionary become the index labels, and the corresponding values become the Series values.


In [3]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series = pd.Series(data)
series

a    1
b    2
c    3
d    4
e    5
dtype: int64

The resulting Series has the dictionary keys as the index labels and the corresponding values as the Series values.


### <a id='toc1_3_'></a>[From NumPy arrays](#toc0_)


Series can be created from NumPy arrays as well. When you pass a NumPy array to the `pd.Series()` function, it creates a Series object with the values from the array.


In [4]:
import numpy as np

data = np.array([1, 2, 3, 4, 5])
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

Similar to creating Series from lists, Pandas assigns an integer index starting from 0 to each element in the Series.


### <a id='toc1_4_'></a>[Using the `pd.Series()` function](#toc0_)


The `pd.Series()` function provides additional options for creating Series objects. You can specify the index labels explicitly using the `index` parameter.


In [5]:
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
series

a    1
b    2
c    3
d    4
e    5
dtype: int64

In this case, the Series is created with the specified index labels instead of the default integer index.


You can also specify the data type of the Series using the `dtype` parameter.


In [6]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data, dtype=float)
series

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

The resulting Series has a float data type instead of the default integer data type.


Creating Series in Pandas is straightforward and flexible, allowing you to work with different data types and customize the index labels according to your needs. Whether you have data in lists, dictionaries, or NumPy arrays, you can easily create Series objects using the `pd.Series()` function.

In [71]:
# todo: add name attribute

## <a id='toc2_'></a>[Series Attributes and Methods](#toc0_)

Pandas Series objects come with a variety of attributes and methods that provide useful information about the data and facilitate common operations. Let's explore some of the common attributes and useful methods available for Series.


### <a id='toc2_1_'></a>[Common attributes](#toc0_)


- `dtype`: Returns the data type of the elements in the Series.
  ```python
  series.dtype
  ```

- `shape`: Returns a tuple representing the dimensions of the Series. For a Series, the shape is a single element tuple containing the length of the Series.
  ```python
  series.shape
  ```

- `size`: Returns the number of elements in the Series.
  ```python
  series.size
  ```

- `index`: Returns the index labels of the Series.
  ```python
  series.index
  ```


These attributes provide quick access to important information about the Series, such as the data type, dimensions, number of elements, and index labels.


### <a id='toc2_2_'></a>[Useful methods](#toc0_)


- `head(n=5)`: Returns the first `n` elements of the Series. By default, it returns the first 5 elements.
  ```python
  series.head()
  series.head(3)
  ```

- `tail(n=5)`: Returns the last `n` elements of the Series. By default, it returns the last 5 elements.
  ```python
  series.tail()
  series.tail(3)
  ```

- `unique()`: Returns an array of unique values in the Series.
  ```python
  series.unique()
  ```

- `value_counts()`: Returns a Series containing counts of unique values in descending order.
  ```python
  series.value_counts()
  ```

- `isnull()`: Returns a boolean Series indicating missing (NaN) values.
  ```python
  series.isnull()
  ```

- `notnull()`: Returns a boolean Series indicating non-missing values.
  ```python
  series.notnull()
  ```


These methods provide convenient ways to inspect and summarize the data in a Series. Let's see some examples:


In [7]:
data = [1, 2, 3, 4, 5, 1, 2, 3, np.nan]
series = pd.Series(data)

In [8]:
# Get the data type of the Series
series.dtype

dtype('float64')

In [9]:
# Get the shape of the Series
series.shape

(9,)

In [10]:
# Get the number of elements in the Series
series.size

9

In [11]:
# Get the index labels of the Series
series.index

RangeIndex(start=0, stop=9, step=1)

In [12]:
# Get the first 3 elements of the Series
series.head(3)

0    1.0
1    2.0
2    3.0
dtype: float64

In [13]:
# Get the last 3 elements of the Series
series.tail(3)


6    2.0
7    3.0
8    NaN
dtype: float64

In [14]:
# Get the unique values in the Series
series.unique()

array([ 1.,  2.,  3.,  4.,  5., nan])

In [15]:
# Get the counts of unique values in the Series
series.value_counts()

1.0    2
2.0    2
3.0    2
4.0    1
5.0    1
Name: count, dtype: int64

In [16]:
# Get a boolean Series indicating missing values
series.isnull()

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8     True
dtype: bool

In [17]:
# Get a boolean Series indicating non-missing values
series.notnull()

0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8    False
dtype: bool

These attributes and methods provide a quick and convenient way to explore and understand the data in a Pandas Series. They allow you to access important information about the Series, inspect the first and last elements, find unique values, count occurrences, and handle missing data.


By leveraging these attributes and methods, you can efficiently analyze and manipulate Series objects in your data analysis and preprocessing tasks.

## <a id='toc3_'></a>[Accessing Elements and Slicing Series](#toc0_)

Pandas Series provide various ways to access individual elements and slice subsets of data based on index labels or integer locations. Let's explore how to access elements and slice Series using different methods.


### <a id='toc3_1_'></a>[Accessing elements by index label](#toc0_)


You can access individual elements of a Series using their index labels. This is useful when you have a Series with meaningful index labels and want to retrieve specific values based on those labels.


In [18]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series = pd.Series(data)

In [19]:
# Accessing elements by index label
series['a']

1

In [20]:
series['c']

3

In [None]:
# todo: duplicate index
series = pd.Series([1, 2, 3, 4], index=['a', 'b', 'a', 'a'], dtype=float)

In the example above, we access the elements with index labels 'a' and 'c' using square bracket notation.


### <a id='toc3_2_'></a>[Accessing elements by integer location](#toc0_)


You can also access elements of a Series using their integer locations. This is similar to accessing elements in a Python list or NumPy array.


In [21]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)

In [22]:
# todo: fix by using iloc
# Accessing elements by integer location
series[0]

1

In [23]:
series[3]

4

In this case, we access the elements at integer locations 0 and 3 using square bracket notation.


### <a id='toc3_3_'></a>[Slicing Series using index labels](#toc0_)


Slicing a Series using index labels allows you to extract a subset of the Series based on a range of index labels.


In [24]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series = pd.Series(data)

In [25]:
# Slicing Series using index labels
series['b':'d']

b    2
c    3
d    4
dtype: int64

In the example above, we slice the Series from index label 'b' to 'd' (inclusive) using the colon notation. The resulting slice includes the elements with index labels 'b', 'c', and 'd'.


### <a id='toc3_4_'></a>[Slicing Series using integer locations](#toc0_)


Slicing a Series using integer locations allows you to extract a subset of the Series based on a range of integer positions.


In [26]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)

# Slicing Series using integer locations
series[1:4]

1    2
2    3
3    4
dtype: int64

In this case, we slice the Series from integer location 1 to 4 (exclusive) using the colon notation. The resulting slice includes the elements at integer locations 1, 2, and 3.


It's important to note that when slicing using integer locations, the end index is exclusive, meaning the element at the end index is not included in the slice.


You can also use negative integers to slice from the end of the Series.


In [27]:
series[-3:]

2    3
3    4
4    5
dtype: int64

Here, we slice the last three elements of the Series using negative integer indices.


Accessing elements and slicing Series in Pandas provides flexibility in retrieving specific values or subsets of data based on index labels or integer locations. These techniques allow you to efficiently extract and work with the desired portions of your data.


By mastering element access and slicing techniques, you can effectively manipulate and analyze Series objects in your data processing tasks.

## <a id='toc4_'></a>[Modifying Series](#toc0_)

Pandas Series are mutable, which means you can modify their elements, add new elements, or remove existing elements. Let's explore different ways to modify Series.


### <a id='toc4_1_'></a>[Updating elements](#toc0_)


You can update the values of existing elements in a Series using index labels or integer locations.


In [28]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series = pd.Series(data)

In [29]:
# Updating elements using index labels
series['b'] = 20
series['d'] = 40

In [30]:
# Updating elements using integer locations
series.iloc[0] = 10
series.iloc[4] = 50

  series[0] = 10
  series[4] = 50


In [31]:
series

a    10
b    20
c     3
d    40
e    50
dtype: int64

In the example above, we update the values of elements with index labels 'b' and 'd' using square bracket notation and the assignment operator. Similarly, we update the values of elements at integer locations 0 and 4.


### <a id='toc4_2_'></a>[Adding elements](#toc0_)


You can add new elements to a Series by assigning values to new index labels.


In [32]:
data = {'a': 1, 'b': 2, 'c': 3}
series = pd.Series(data)

In [33]:
# Adding elements
series['d'] = 4
series['e'] = 5

In [34]:
series

a    1
b    2
c    3
d    4
e    5
dtype: int64

In this case, we add new elements with index labels 'd' and 'e' to the Series by assigning values to those labels.


### <a id='toc4_3_'></a>[Removing elements](#toc0_)


To remove elements from a Series, you can use the `drop()` method. It returns a new Series with the specified index labels or integer locations removed.


In [35]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series = pd.Series(data)

In [36]:
# Removing elements using index labels
new_series = series.drop(['b', 'd'])
new_series

a    1
c    3
e    5
dtype: int64

In [37]:
# Removing elements using integer locations
new_series = series.drop(series.index[[1, 3]])
new_series

a    1
c    3
e    5
dtype: int64

In the first example, we remove the elements with index labels 'b' and 'd' using the `drop()` method and passing the labels as a list. The resulting `new_series` contains the remaining elements.


In the second example, we remove the elements at integer locations 1 and 3 using the `drop()` method and passing the integer indices as a list. Again, the resulting `new_series` contains the remaining elements.


It's important to note that the `drop()` method does not modify the original Series in place. Instead, it returns a new Series with the specified elements removed. If you want to modify the original Series, you can assign the result back to the original variable.


In [38]:
series = series.drop(['b', 'd'])
series

a    1
c    3
e    5
dtype: int64

Modifying Series in Pandas is straightforward and flexible. You can easily update existing elements, add new elements, or remove elements based on your requirements.


By leveraging these modification techniques, you can dynamically manipulate Series objects to suit your data processing and analysis needs.

## <a id='toc5_'></a>[Sorting Series](#toc0_)

Sorting is a fundamental operation in data analysis, and Pandas provides convenient methods to sort Series based on either the index labels or the values. Let's explore how to sort Series in both ways.


### <a id='toc5_1_'></a>[Sorting by index](#toc0_)


You can sort a Series based on its index labels using the `sort_index()` method. By default, it sorts the index labels in ascending order.


In [39]:
data = {'c': 3, 'a': 1, 'e': 5, 'b': 2, 'd': 4}
series = pd.Series(data)

In [40]:
# Sorting by index in ascending order
sorted_series = series.sort_index()

In [41]:
sorted_series

a    1
b    2
c    3
d    4
e    5
dtype: int64

In the example above, we create a Series with unsorted index labels. By calling the `sort_index()` method, we obtain a new Series `sorted_series` with the index labels sorted in ascending order.


You can also sort the index labels in descending order by passing the parameter `ascending=False` to the `sort_index()` method.


In [42]:
# Sorting by index in descending order
sorted_series = series.sort_index(ascending=False)
sorted_series

e    5
d    4
c    3
b    2
a    1
dtype: int64

In this case, the resulting `sorted_series` has the index labels sorted in descending order.


### <a id='toc5_2_'></a>[Sorting by values](#toc0_)


To sort a Series based on its values, you can use the `sort_values()` method. By default, it sorts the values in ascending order.


In [43]:
data = {'a': 3, 'b': 1, 'c': 5, 'd': 2, 'e': 4}
series = pd.Series(data)

In [44]:
# Sorting by values in ascending order
series.sort_values()

b    1
d    2
a    3
e    4
c    5
dtype: int64

In this example, we create a Series with unsorted values. By calling the `sort_values()` method, we obtain a new Series `sorted_series` with the values sorted in ascending order.


Similarly, you can sort the values in descending order by passing the parameter `ascending=False` to the `sort_values()` method.


In [45]:
# Sorting by values in descending order
series.sort_values(ascending=False)

c    5
e    4
a    3
d    2
b    1
dtype: int64

In this case, the resulting `sorted_series` has the values sorted in descending order.


It's important to note that sorting a Series by values does not modify the original Series in place. Instead, it returns a new Series with the values sorted in the specified order. If you want to modify the original Series, you can assign the result back to the original variable.


In [46]:
series.sort_values()

b    1
d    2
a    3
e    4
c    5
dtype: int64

Sorting Series in Pandas is a straightforward process using the `sort_index()` and `sort_values()` methods. These methods allow you to sort Series based on either the index labels or the values, providing flexibility in organizing your data.


By utilizing these sorting techniques, you can easily arrange your Series in a desired order, facilitating data analysis and visualization tasks.

## <a id='toc6_'></a>[Basic Operations on Series](#toc0_)

Pandas Series support various basic operations, including arithmetic operations, statistical operations, and comparisons. These operations allow you to perform calculations, derive insights, and filter data based on specific conditions. Let's explore each category of operations in detail.


### <a id='toc6_1_'></a>[Arithmetic operations](#toc0_)


Pandas Series support arithmetic operations such as addition, subtraction, multiplication, and division. These operations can be performed between two Series or between a Series and a scalar value.


#### <a id='toc6_1_1_'></a>[Addition, subtraction, multiplication, division](#toc0_)


In [47]:
series1 = pd.Series([1, 2, 3, 4, 5])
series2 = pd.Series([10, 20, 30, 40, 50])

In [48]:
# Addition
series1 + series2

0    11
1    22
2    33
3    44
4    55
dtype: int64

In [49]:
# Subtraction
series1 - series2

0    -9
1   -18
2   -27
3   -36
4   -45
dtype: int64

In [50]:
# Multiplication
series1 * series2

0     10
1     40
2     90
3    160
4    250
dtype: int64

In [51]:
# Division
series1 / series2

0    0.1
1    0.1
2    0.1
3    0.1
4    0.1
dtype: float64

In the above examples, the arithmetic operations are performed element-wise between `series1` and `series2`. The resulting Series `result` contains the element-wise results of the corresponding operation.


#### <a id='toc6_1_2_'></a>[Scalar operations](#toc0_)


You can also perform arithmetic operations between a Series and a scalar value.


In [52]:
series = pd.Series([1, 2, 3, 4, 5])

In [53]:
# Addition with a scalar
series + 10

0    11
1    12
2    13
3    14
4    15
dtype: int64

In [54]:
# Multiplication with a scalar
series * 2

0     2
1     4
2     6
3     8
4    10
dtype: int64

In these examples, the scalar value is applied to each element of the Series, resulting in a new Series with the element-wise results.


#### <a id='toc6_1_3_'></a>[Series operations](#toc0_)


When performing arithmetic operations between two Series, Pandas aligns the data based on the index labels. If the indexes don't match, Pandas uses the union of the indexes and fills missing values with NaN (Not a Number).


In [None]:
# todo: add viz for align

In [55]:
series1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
series2 = pd.Series([10, 20, 30], index=['b', 'c', 'd'])

series1 + series2

a     NaN
b    12.0
c    23.0
d     NaN
dtype: float64

In this example, the addition operation is performed based on the aligned indexes. The resulting Series `result` contains the sum of the values where the indexes match, and NaN where there is no corresponding value.


### <a id='toc6_2_'></a>[Statistical operations](#toc0_)


Pandas Series provide various statistical operations that allow you to calculate descriptive statistics and summarize data.


#### <a id='toc6_2_1_'></a>[Descriptive statistics](#toc0_)


In [56]:
series = pd.Series([1, 2, 3, 4, 5])

In [57]:
# Count
series.count()

5

In [58]:
# Mean
series.mean()

3.0

In [59]:
# Standard deviation
series.std()

1.5811388300841898

In [60]:
# Minimum value
series.min()

1

In [61]:
# Maximum value
series.max()

5

These operations calculate the count, mean, standard deviation, minimum value, and maximum value of the Series, respectively.


#### <a id='toc6_2_2_'></a>[Summarizing data](#toc0_)


In [62]:
series = pd.Series([1, 2, 3, 4, 5])

In [63]:
# Sum
series.sum()

15

In [64]:
# Product
series.prod()

120

In [65]:
# Cumulative sum
series.cumsum()

0     1
1     3
2     6
3    10
4    15
dtype: int64

In [66]:
# Cumulative product
series.cumprod()

0      1
1      2
2      6
3     24
4    120
dtype: int64

These operations calculate the sum, product, cumulative sum, and cumulative product of the Series, respectively. The cumulative operations return a new Series with the cumulative results at each index.


### <a id='toc6_3_'></a>[Comparing Series](#toc0_)


Pandas Series support element-wise comparisons and filtering based on conditions.


#### <a id='toc6_3_1_'></a>[Element-wise comparison](#toc0_)


In [67]:
series1 = pd.Series([1, 2, 3, 4, 5])
series2 = pd.Series([1, 2, 0, 4, 0])

In [68]:
# Element-wise equality comparison
series1 == series2

0     True
1     True
2    False
3     True
4    False
dtype: bool

In [69]:
# Element-wise greater than comparison
series1 > series2

0    False
1    False
2     True
3    False
4     True
dtype: bool

These operations perform element-wise comparisons between `series1` and `series2` and return a boolean Series `result` indicating the comparison result for each element.


#### <a id='toc6_3_2_'></a>[Filtering based on conditions](#toc0_)


In [70]:
series = pd.Series([1, 2, 3, 4, 5])

# Filtering values greater than 3
series[series > 3]

3    4
4    5
dtype: int64

In this example, we filter the Series based on the condition `series > 3`. The resulting `filtered_series` contains only the elements that satisfy the condition.


These are just a few examples of the basic operations you can perform on Pandas Series. Pandas provides a wide range of functions and methods for data manipulation, analysis, and computation.


By leveraging these basic operations, you can efficiently perform calculations, derive statistical insights, and filter data based on specific criteria, empowering you to analyze and manipulate Series data effectively.