<a href="https://colab.research.google.com/github/krauseannelize/nb-py-ms-exercises/blob/sprint03/notebooks/s03_pandas_foundation/34_exercises_pandas_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 34 | Exercises - Introduction to `Pandas` Series

## What is `Pandas`

`Pandas` is a open-source **Python library** that provides high-performing, user-friendly data structures designed to make working with structured data intuitive and efficient.

It is built on top of another fundamental Python library called `Numpy`, which offers powerful array objects and mathematical functions. `Pandas` take `NumPy` arrays and add the concept of **labeled data**, making it easier to work with datasets that have meaningful row and column names.

## Why `Pandas`?

`Pandas` empowers you to spend less time doing:

- Data cleaning and preparation
- Data exploration and analysis
- Data manipulation
- Working with different data formats

## Installing `Pandas`

`Pandas` is preinstalled in Google Colab and many Python distributions. However, you can use Python's package installed `pip` to install it if needed:

```python
!pip install pandas
```

## Importing `Pandas` & `NumPy` library

In [1]:
import pandas as pd
import numpy as np

You can check your `Pandas` version should you need to verify that it is installed or check that you have the appropriate version as follows:

In [3]:
print(pd.__version__)

2.2.2


## `Pandas` Series

A `Pandas` **Series** is like a 1D array with labels. As shown below, a **Series** consists of:

- an **index** (left column)
- **values** (right column)

In [5]:
example_series = pd.Series(data=[100, 200, 300, 400, 500], index=['A', 'B', 'C', 'D', 'E'])
print(example_series)

A    100
B    200
C    300
D    400
E    500
dtype: int64


**Series** over some _key advantages_ over lists:

- **label-based access** are more readable and less error-prone
- **data alignment** based on index labels when performing operations on multiple **Series**
- **integration** with DataFrames
- **optimized** for Data Analysis

## Creating `Pandas` Series

There are several ways to create a **Series**:

- from Python Lists
- from `NumPy` Arrays
- from Python Dictionaries

In [6]:
# convert a List to a Series - default integer index assigned
name_list = ['Adam', 'Bob', 'Charlie', 'David', 'Eve']
name_series = pd.Series(name_list)
print(name_series)

0       Adam
1        Bob
2    Charlie
3      David
4        Eve
dtype: object


In [7]:
# convert a NumPy Array to a Series
age_array = np.array([25, 30, 35, 40, 45])
age_series = pd.Series(age_array)
print(age_series)

0    25
1    30
2    35
3    40
4    45
dtype: int64


In [8]:
# convert a Dictionary to a Series
gender_dict = {'Adam': 'Male', 'Bob': 'Male', 'Charlie': 'Female', 'David': 'Male', 'Eve': 'Female'}
gender_series = pd.Series(gender_dict)
print(gender_series)

Adam         Male
Bob          Male
Charlie    Female
David        Male
Eve        Female
dtype: object


## Working with the Series Index

When you don't explicitly provide an index when creating a **Series**, `Pandas` automatically assigns a default integer index starting from zero. However, **custom indices** can be very powerful, as they add meaningful labels to your data, improving both readability and ease of use. Different data types can be used for indices:

- **Strings**: most common for descriptive labels
- **Dates**: essential for time series data
- **Numbers**: can be useful in certain situations

In [15]:
# creating a Series using dates as index
date_list = ['2025-01-31', '2025-02-28', '2025-03-31']
q1_sales = [10700, 20500, 33600]
q1_series = pd.Series(data=q1_sales, index=date_list)
print(q1_series)

2025-01-31    10700
2025-02-28    20500
2025-03-31    33600
dtype: int64


In [16]:
# creating a Series with custom string labels as index
sku_list = ['NIK-RN4-10W', 'ADI-PQ7-12W', 'SKE-BB3-14W']
product_inventory = [100, 200, 300]
inventory_series = pd.Series(data=product_inventory, index=sku_list)
print(inventory_series)

NIK-RN4-10W    100
ADI-PQ7-12W    200
SKE-BB3-14W    300
dtype: int64


## Accessing and Manipulating Series Data

`Pandas` Series offers flexible ways to work with data by customizing and interacting with its index. You can **rename** index labels to improve clarity, set meaningful indices for easier lookup, and reset them when needed. Series also supports both label-based and integer-based access, making data manipulation intuitive.

In [18]:
# creating a Series with custom string labels as index
fruit_series = pd.Series(data=[10, 20, 30], index=['Apple', 'Banana', 'Cherry'])
print(f"The original Series:\n{fruit_series}")
new_index = {'Apple': 'APP-RED-SM', 'Banana': 'BAN-YEL-MD', 'Cherry': 'CHR-RED-LG'}
renamed_series = fruit_series.rename(index=new_index)
print(f"The renamed Series:\n{renamed_series}")

The original Series:
Apple     10
Banana    20
Cherry    30
dtype: int64
The renamed Series:
APP-RED-SM    10
BAN-YEL-MD    20
CHR-RED-LG    30
dtype: int64


In [23]:
# Creating a dictionary containing lists of fruit names and their sales
fruit1 = {'fruit': ['Apple', 'Banana', 'Orange'], 'sales': [150, 120, 100]}

# When converted directly to a Series, the dictionary keys ('fruit' and 'sales')
# become index labels, and values (lists) become the data
series_direct = pd.Series(fruit1)
print(f"Direct conversion without index assignment:\n{series_direct}")

# Creating a Series using 'sales' list as data and 'fruit' list as custom index labels
fruit2 = {'fruit': ['Apple', 'Banana', 'Orange'], 'sales': [150, 120, 100]}
series_indexed = pd.Series(data=fruit2['sales'], index=fruit2['fruit'])
print(f"Conversion with index assignment:\n{series_indexed}")


Direct conversion without index assignment:
fruit    [Apple, Banana, Orange]
sales            [150, 120, 100]
dtype: object
Conversion with index assignment:
Apple     150
Banana    120
Orange    100
dtype: int64


In [25]:
# Creating a dictionary containing lists of fruit names and their sales
fruit3 = {'fruit': ['Apple', 'Banana', 'Orange'], 'sales':[150,120,100]}

# Creating a Series using 'sales' list as data and 'fruit' list as custom index labels
fruit_series = pd.Series(data=fruit3['sales'], index=fruit3['fruit'])
print(f"Original Series:\n{series_indexed}")

# Reset index to default integer index
reset_series = fruit_series.reset_index()
print(f"Series with reset index:\n{reset_series}")

Original Series:
Apple     150
Banana    120
Orange    100
dtype: int64
Series with reset index:
    index    0
0   Apple  150
1  Banana  120
2  Orange  100


⚠️ Accessing **Series** elements using integer keys as shown below is being deprecated. In future versions of `Pandas`, integer keys will be treated as labels, not positions. To avoid confusion, use `.iloc` for position-based access and `.loc` for label-based access.

In [28]:
# retrieve values using index number
print(series_indexed[1])

# retrieve values using index label
print(series_indexed["Banana"])


120
120


  print(series_indexed[1])


In [29]:
# accessing an array of the Series values using values
print(series_indexed.values)

# accessing an array of the Series index labels using index
print(series_indexed.index)

[150 120 100]
Index(['Apple', 'Banana', 'Orange'], dtype='object')


## What is `NumPy`?

`NumPy` is a fundamental Python library, which offers powerful array objects and mathematical functions.

- **Arrays**: Similar to Python lists, but optimized for fast numerical operations and memory efficiency.
- **Math Made Easy**: Includes built-in functions for arithmetic, statistics, trigonometry, and linear algebra.
- **Multidimensional**: Supports arrays of any dimension—1D, 2D (grids), 3D (cubes), and beyond.
- **Foundation**: Serves as the backbone for many other Python libraries like `Pandas`, `SciPy`, and `Scikit-learn`.

## Why `NumPy`?

`NumPy` is a fast and flexible Python library for working with arrays, matrices, and numerical functions. Its key strengths include:

- **Efficiency**: Optimized for high-speed numerical operations, outperforming Python lists.
- **Ease of Use**: Simplifies tasks like sorting, reshaping, and math computations.
- **Flexibility**: Easily handles multidimensional data structures.
- **Integration**: Works smoothly with libraries like `Pandas`, `Matplotlib`, and `Scikit-learn`.

## Installing `NumPy` library

`NumPy` is preinstalled in Google Colab and many Python distributions. However, you can use Python's package installed `pip` to install it if needed:

```python
!pip install numpy
```

## Importing `NumPy` and `Pandas` library

In [30]:
import numpy as np
import pandas as pd

## Creating a `NumPy` Array

In [31]:
name_array = np.array(['Adam', 'Bob', 'Charlie', 'David', 'Eve'])
print(f"Our name array:\n{name_array}")

Our name array:
['Adam' 'Bob' 'Charlie' 'David' 'Eve']


## `NumPy` Arithmetic Operations

Arithmetic operations are **element-wise** by default.

In [None]:
matrix1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix2 = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
print(f"The first array:\n{matrix1}")
print(f"The second array:\n{matrix2}")
print(f"Addition was performed element-wise:\n{matrix1 + matrix2}")

The first array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
The second array:
[[10 20 30]
 [40 50 60]
 [70 80 90]]
Addition was performed element-wise:
[[11 22 33]
 [44 55 66]
 [77 88 99]]


`NumPy` provides **multiple aggregation** functions like:

- `sum`
- `mean`
- `min`
- `max`
- `std`
- `var`

They will be applied to the entire array by default, but using `axis` can change the direction of the aggregation:

- **axis=0**: columns (vertical/downward)
- **axis=1**: rows (horizontal/across)


In [None]:
matrix = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
print(f"The array is:\n{matrix}")
print(f"The sum of each column is:\n{matrix.sum(axis=0)}")
print(f"The mean of each row is:\n{matrix.mean(axis=1)}")
print(f"The smallest number in the array is:{matrix.min()}")
print(f"The largest number in the array is:{matrix.max()}")

The array is:
[[10 20 30]
 [40 50 60]
 [70 80 90]]
The sum of each column is:
[120 150 180]
The mean of each row is:
[20. 50. 80.]
The smallest number in the array is:10
The largest number in the array is:90


## Access and Modify `NumPy` Array Elements

### 1D Array Slicing

You can access subarrays using the syntax: `array[start:stop:step]`

- `start`: index to begin slicing (inclusive)
- `stop`: index to end slicing (exclusive)
- `step`: interval between elements (optional)

### 2D Array Slicing

Slicing is done by specifying slices for rows and columns: `array[row_start:row_stop, col_start:col_stop]`

In [32]:
array_1d = np.array([10, 20, 30, 40, 50])

# accessing data in a 1D array
print(f"The array is:\n{array_1d}")
print(f"The third element is:\n{array_1d[2]}")
print(f"The first 3 elements are:\n{array_1d[:3]}")

# modifying data in a 1D array
array_1d[2] = 35
print(f"The modified array is:\n{array_1d}")

The array is:
[10 20 30 40 50]
The third element is:
30
The first 3 elements are:
[10 20 30]
The modified array is:
[10 20 35 40 50]


In [33]:
array_2d = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])

# accessing data in a 2D array
print(f"The array is:\n{array_2d}")
print(f"The first row is:\n{array_2d[0]}")
print(f"The second element in the second row is:\n{array_2d[1:2, 1:2]}")

# modifying data in a 2D array
array_2d[1, 1] = 55
print(f"The modified array is:\n{array_2d}")

The array is:
[[10 20 30]
 [40 50 60]
 [70 80 90]]
The first row is:
[10 20 30]
The second element in the second row is:
[[50]]
The modified array is:
[[10 20 30]
 [40 55 60]
 [70 80 90]]


## Exercise 1

Create a `Pandas` **Series** using the `parks` list for the values and the `countries` list for the index. Print the Series.

```python
parks = [
    'Yellowstone', 'Banff', 'Kruger',
    'Great Barrier Reef', 'Serengeti',
    'Lake Baikal',
    ]

countries = [
    'USA', 'Canada', 'South Africa',
    'Australia', 'Tanzania','Russia',
    ]
```

In [9]:
parks = [
    'Yellowstone', 'Banff', 'Kruger',
    'Great Barrier Reef', 'Serengeti',
    'Lake Baikal',
    ]

countries = [
    'USA', 'Canada', 'South Africa',
    'Australia', 'Tanzania','Russia',
    ]

parks_series = pd.Series(data=parks, index=countries)
print(parks_series)

USA                    Yellowstone
Canada                       Banff
South Africa                Kruger
Australia       Great Barrier Reef
Tanzania                 Serengeti
Russia                 Lake Baikal
dtype: object


## Exercise 2

Create a **Series** from a dictionary.

```python
data_parks = {
    "USA": "Yellowstone",
    "Canada": "Banff",
    "South Africa": "Kruger",
    "Australia": "Great Barrier Reef",
    "Tanzania": "Serengeti",
    "Russia": "Lake Baikal"
}
```

In [10]:
data_parks = {
    "USA": "Yellowstone",
    "Canada": "Banff",
    "South Africa": "Kruger",
    "Australia": "Great Barrier Reef",
    "Tanzania": "Serengeti",
    "Russia": "Lake Baikal"
}

parks_series = pd.Series(data_parks)
print(parks_series)

USA                    Yellowstone
Canada                       Banff
South Africa                Kruger
Australia       Great Barrier Reef
Tanzania                 Serengeti
Russia                 Lake Baikal
dtype: object


## Exercise 3

We will now work with the**five highest mountains in the Americas**. We provide you with a pandas dictionary containing the mountain names and their heights to create a pandas Series.

- The **keys** in the dictionary represent the mountain names.
- The **values** represent the heights in meters.
- Name the Series `Highest Mountains in Americas`.
- Print the following:
  - All the **values** of the Series.
  - All the **indices** of the Series.

```python
data = {
    "Aconcagua": 6960,
    "Ojos del Salado": 6893,
    "Monte Pissis": 6793,
    "Huascarán": 6768,
    "Cerro Bonete": 6759
}
```

In [13]:
data = {
    "Aconcagua": 6960,
    "Ojos del Salado": 6893,
    "Monte Pissis": 6793,
    "Huascarán": 6768,
    "Cerro Bonete": 6759
}

highest_mountains_series = pd.Series(data, name='Highest Mountains in Americas')
print(highest_mountains_series.values)
print(highest_mountains_series.index)

[6960 6893 6793 6768 6759]
Index(['Aconcagua', 'Ojos del Salado', 'Monte Pissis', 'Huascarán',
       'Cerro Bonete'],
      dtype='object')


## Exercise 4

The visitor counts in the `park_data` array need to be updated.  Increase the visitor count for each park by 10%.

- Access the visitor count column from the `park_data` array (remember, this is the second column).
- Multiply the visitor count column by 1.10 to increase the values by 10%.
- Print the updated `park_data` array.

```python
park_data = np.array([
    [315, 1000],  # Elevation and visitors for park 1
    [446, 1200],  # Park 2
    [333, 900],   # Park 3
    [392, 800],   # Park 4
    [289, 1100]   # Park 5
], dtype='float64')
```

In [34]:
park_data = np.array([
    [315, 1000],  # Elevation and visitors for park 1
    [446, 1200],  # Park 2
    [333, 900],   # Park 3
    [392, 800],   # Park 4
    [289, 1100]   # Park 5
], dtype='float64')

print(f"Park visitor count before update:\n{park_data}")
park_data[:, 1] *= 1.10
print(f"Park visitor count after update:\n{park_data}")

Park visitor count before update:
[[ 315. 1000.]
 [ 446. 1200.]
 [ 333.  900.]
 [ 392.  800.]
 [ 289. 1100.]]
Park visitor count after update:
[[ 315. 1100.]
 [ 446. 1320.]
 [ 333.  990.]
 [ 392.  880.]
 [ 289. 1210.]]


## Exercise 5

Due to new surveying data, the elevations of all parks need to be adjusted. Increase the elevation of each park in the `park_elevations` array by 5 meters.

- Add 5 to each element in the `park_elevations` array.
- Print the updated `park_elevations` array.

```python
park_elevations = np.array([315, 446, 333, 392, 289])  # Elevations in meters`
```

In [36]:
park_elevations = np.array([315, 446, 333, 392, 289])  # Elevations in meters
print(f"Park elevations before update:\n{park_elevations}")
park_elevations += 5
print(f"Park elevations after update:\n{park_elevations}")

Park elevations before update:
[315 446 333 392 289]
Park elevations after update:
[320 451 338 397 294]
