In the first seminar, we will revise some of the most important concepts applied in the Python programming language, along with some explanatory notions for the Numpy and Pandas packages, widely used for mathematical operations and data analysis. 

## Python Concepts

#### 1. Variables and datatypes

Variables are used to store values, and Python is dynamically typed, meaning variable types are assigned automatically. Python has different data types, including integers, floats, strings, booleans, lists, tuples, sets, and dictionaries. Python also allows type conversion, where one data type can be converted to another, such as converting a string to an integer.

Python supports several built-in data types:

- Integer (int): whole numbers, such as 10, -5, and 1000;
- Float (float): decimal numbers, such as 3.14, -0.01, and 2.5;
- String (str): text values enclosed in quotes, such as "Hello" and 'Python';
- Boolean (bool): represents True or False;
- Complex Numbers (complex): numbers with a real and imaginary part, e.g., 2 + 3j;
- NoneType (None): represents an absence of value.

In [3]:
# Example 1: assigning and printing different data types
x = 42             # integer
y = 3.14           # float
name = "Alice"     # string
is_student = True  # boolean
comp = 1 + 2j # complex

print(type(x), type(y), type(name), type(is_student), type(comp))

<class 'int'> <class 'float'> <class 'str'> <class 'bool'> <class 'complex'>


In [2]:
# Example 2: type conversion
a = "100"
b = int(a)  # convert string to integer
print(b * 2)  # 200

200


In [3]:
# Example 3: checking data type dynamically
var = 5.5
if isinstance(var, float):
    print("This is a float")

This is a float


#### 2. Conditional statements (if-else)

Conditional statements in Python allow decision-making by executing different blocks of code based on whether a condition evaluates to True or False. Python also supports ternary operators, which allow concise conditional expressions in a single line.



In [4]:
# Example 1: basic if-else
age = 20
if age >= 18:
    print("You are an adult")
else:
    print("You are a minor")

You are an adult


In [5]:
# Example 2: multiple conditions using elif
score = 85
if score >= 90:
    print("Grade: A")
elif score >= 75:
    print("Grade: B")
else:
    print("Grade: C")

Grade: B


In [6]:
# Example 3: ternary conditional expression (equivalent of the ternary operator '?' from C/C++/JS languages)
x = 5
result = "Positive" if x > 0 else "Negative"
print(result)

Positive


#### 3. Lists and list comprehensions

Lists are ordered, mutable collections that store multiple values. They can hold different data types in a single list. Lists are one of Python’s most versatile data structures and allow:

- indexing and slicing to access elements;
- appending and removing elements dynamically;
- sorting, reversing, and counting occurrences

Python also supports list comprehensions, which provide a concise way to generate lists using a single line of code. They are widely used for filtering and transforming data.

In [7]:
# Example 1: creating and modifying lists
fruits = ["apple", "banana", "cherry"]
fruits.append("orange")
print(fruits)

['apple', 'banana', 'cherry', 'orange']


In [8]:
# Example 2: list slicing
numbers = [0, 1, 2, 3, 4, 5]
print(numbers[1:4])  

[1, 2, 3]


In [9]:
# Example 3: list comprehension
squares = [x**2 for x in range(5)]
print(squares)

[0, 1, 4, 9, 16]


#### 4. Loops (for & while)

Loops are used to execute a block of code repeatedly until a certain condition is met.
- for loops: commonly used when iterating over a sequence, such as a list, tuple, string, or range of numbers;
- while loops: execute as long as the given condition remains True

Loop control statements include:

- break: terminates the loop early;
- continue: skips the current iteration and moves to the next one;
- pass: acts as a placeholder without executing any action.

In [10]:
# Example 1: for loop iterating over a list
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num ** 2)  # Square each number

1
4
9
16
25


In [11]:
# Example 2: while loop with a counter
i = 1
while i <= 3:
    print(f"Iteration {i}")
    i += 1

Iteration 1
Iteration 2
Iteration 3


In [12]:
# Example 3: using range() in a for loop
for i in range(5, 16, 5):
    print(i)  # Outputs: 5, 10, 15

5
10
15


#### 5. Tuples

Tuples are immutable sequences, meaning their values cannot be changed once assigned. Like lists, tuples store multiple elements but offer better performance due to their immutability.


In [13]:
# Example 1: creating a tuple
coordinates = (10, 20)
print(coordinates[0])

10


In [14]:
# Example 2: tuple unpacking
x, y = (5, 15)
print(x, y)

5 15


In [15]:
# Example 3: converting a list to a tuple
numbers = [1, 2, 3]
tuple_numbers = tuple(numbers)
print(tuple_numbers)

(1, 2, 3)


#### 6. Dictionaries

Dictionaries (dict) are unordered collections of key-value pairs. They offer fast lookups and allow mapping of unique keys to specific values.

- defined using curly braces {};
- keys must be unique and immutable (strings, numbers, or tuples);
- values can be of any data type;
- dictionaries are widely used for structured data storage and retrieval, such as JSON-like data, API responses, and configuration settings

In [16]:
# Example 1: creating a dictionary
person = {"name": "Alice", "age": 30}
print(person["name"])

Alice


In [17]:
# Example 2: adding new key-value pairs
person["city"] = "New York"
print(person)

{'name': 'Alice', 'age': 30, 'city': 'New York'}


In [18]:
# Example 3: iterating over a dictionary
for key, value in person.items():
    print(f"{key}: {value}")

name: Alice
age: 30
city: New York


#### 7. Sets

Sets are unordered collections of unique elements. They eliminate duplicates and support set operations like union, intersection, and difference.
Properties:

- defined using curly braces {};
- do not allow duplicate values;
- unordered, meaning elements have no fixed position;
- used for tasks like removing duplicates from lists or mathematical set operations.

In [19]:
# Example 1: Creating a set
numbers = {1, 2, 3, 3, 2, 1}
print(numbers)  

{1, 2, 3}


In [20]:
# Example 2: Adding elements
numbers.add(4)
print(numbers)

{1, 2, 3, 4}


In [21]:
# Example 3: set operations
set1 = {1, 2, 3}
set2 = {3, 4, 5}
print(set1 | set2)  # Union
print(set1 & set2)  # Intersection
print(set1 - set2) # Difference

{1, 2, 3, 4, 5}
{3}
{1, 2}


#### 8. Functions

Functions are reusable blocks of code that perform a specific task. Instead of rewriting code, you can define a function and call it whenever needed

The function components you should have in mind:
- Function definition: Created using the def keyword;
- Parameters: optional inputs passed to a function;
- Return value: the function may return a result using the return statement

Functions can have the following types of params:

- default parameters (providing default values if none are specified);
- arbitrary arguments (*args) to accept multiple values;
- keyword arguments (**kwargs) to accept multiple named arguments

In [22]:
# Example 1: function with parameters
def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))

Hello, Alice!


In [23]:
# Example 2: default parameters
def power(base, exponent=2):
    return base ** exponent

print(power(3))     # defaults to square
print(power(3, 3))  # cube

9
27


In [24]:
# Example 3: function returning multiple values
def min_max_sum(numbers):
    return min(numbers), max(numbers), sum(numbers)

print(min_max_sum([10, 20, 30])) 

(10, 30, 60)


#### 9. Exception handling

In [25]:
# Example 1: try-except block
try:
    x = 5 / 0
except ZeroDivisionError:
    print("Cannot divide by zero!")

Cannot divide by zero!


In [26]:
# Example 2: Handling multiple exceptions
try:
    y = int("hello")
except (ValueError, TypeError):
    print("Invalid conversion!")

Invalid conversion!


In [27]:
# Example 3: using finally
try:
    file = open("file.txt", "r")
except FileNotFoundError:
    print("File not found!")
finally:
    print("Execution completed.")

File not found!
Execution completed.


#### 10. List sorting

Python provides built-in methods for sorting and filtering lists dynamically.

- sort() modifies lists in place;
- sorted() returns a new sorted list;

In [28]:
numbers = [5, 3, 8, 1, 9, 2]

# Sorting in ascending order
numbers.sort()
print(numbers) 

# Sorting in descending order
sorted_numbers = sorted(numbers, reverse=True)
print(sorted_numbers)

[1, 2, 3, 5, 8, 9]
[9, 8, 5, 3, 2, 1]


In [29]:
students = [("Alice", 90), ("Bob", 85), ("Charlie", 92)]

# Sorting by scores (index 1 in tuple)
students_sorted = sorted(students, key=lambda student: student[1])
print(students_sorted)

[('Bob', 85), ('Alice', 90), ('Charlie', 92)]


#### 11. Lambda functions

Lambda functions are anonymous functions defined using the lambda keyword. They are commonly used for short, single-expression functions.

Benefits of using lambda functions:

- concise syntax;
- useful with functions like map(), filter(), and sorted();
- eliminates the need for defining full functions;
- lambdas make code more readable and efficient for simple tasks

In [30]:
# Lambda function for squaring a number
square = lambda x: x ** 2
print(square(5))  

# Lambda function for adding two numbers
add = lambda a, b: a + b
print(add(3, 7))  

# Lambda function for finding the maximum of two numbers
maximum = lambda x, y: x if x > y else y
print(maximum(10, 20))  

25
10
20


In [31]:
numbers = [1, 2, 3, 4, 5]

# Using lambda with map() to double each number
doubled = list(map(lambda x: x * 2, numbers))
print(doubled) 

# Using lambda with map() to convert temperatures from Celsius to Fahrenheit
celsius = [0, 10, 20, 30, 40]
fahrenheit = list(map(lambda c: (c * 9/5) + 32, celsius))
print(fahrenheit) 

# Using lambda with map() to extract the length of each word
words = ["apple", "banana", "cherry"]
lengths = list(map(lambda word: len(word), words))
print(lengths) 

[2, 4, 6, 8, 10]
[32.0, 50.0, 68.0, 86.0, 104.0]
[5, 6, 6]


In [32]:

numbers = [10, 25, 30, 45, 50]

# Using lambda with filter() to keep only even numbers
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(evens) 

# Using lambda with filter() to extract words with more than 5 letters
words = ["apple", "banana", "cherry", "kiwi", "mango"]
long_words = list(filter(lambda word: len(word) > 5, words))
print(long_words)

# Using lambda with filter() to remove negative numbers from a list
nums = [-3, -2, -1, 0, 1, 2, 3]
positive_nums = list(filter(lambda n: n >= 0, nums))
print(positive_nums) 

[10, 30, 50]
['banana', 'cherry']
[0, 1, 2, 3]


## Numpy concepts

NumPy (Numerical Python) is a powerful library for numerical computing. It provides multi-dimensional arrays (ndarray) and fast mathematical operations optimized for performance.

#### 1. Creating NumPy arrays

NumPy arrays (ndarray) are the core structure of NumPy. They are faster and more memory-efficient than Python lists.

Why is this important?

- arrays allow efficient storage and manipulation of large datasets;
- numpy arrays support vectorized operations, making computations much faster than regular Python lists

In [33]:
import numpy as np

# Example 1: creating a 1D array
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1)


[1 2 3 4 5]


In [34]:
# Example 2: creating a 2D array (Matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)

[[1 2 3]
 [4 5 6]]


In [35]:
# Example 3: creating an array of zeros
zeros_arr = np.zeros((3, 3))  # 3x3 matrix of zeros
print(zeros_arr)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


#### 2. NumPy Array Attributes

NumPy provides attributes to check an array's shape, dimensions, data type, and size.

In [36]:
# Example 1: checking array properties
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)  # (2, 3) → 2 rows, 3 columns
print(arr.ndim)   # 2 → 2D array

(2, 3)
2


In [37]:
# Example 2: checking data type and size
print(arr.dtype)  # data type of elements
print(arr.size)   # total number of elements

int32
6


In [38]:
# Example 3: reshaping an array
reshaped = arr.reshape(3, 2)  # change shape to 3 rows, 2 columns
print(reshaped)

[[1 2]
 [3 4]
 [5 6]]


#### 3. Indexing and slicing in numpy

numpy allows powerful indexing and slicing to access specific elements.

In [39]:
# Example 1: accessing elements
arr = np.array([10, 20, 30, 40, 50])
print(arr[2])

30


In [40]:
# Example 2: slicing a subarray
print(arr[1:4])  

[20 30 40]


In [41]:
# Example 3: selecting a column in a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2d[:, 1])

[2 5]


#### 4. Arithmetic operations with numpy

In [42]:
# Example 1: basic element-wise operations
arr = np.array([1, 2, 3, 4])
print(arr * 2) 

[2 4 6 8]


In [43]:
# Example 2: adding two arrays
arr2 = np.array([10, 20, 30, 40])
print(arr + arr2)

[11 22 33 44]


In [44]:
# Example 3: applying mathematical functions
print(np.sqrt(arr))

[1.         1.41421356 1.73205081 2.        ]


#### 5. Broadcasting in NumPy

Broadcasting allows operations between arrays of different shapes without explicit looping.

Benfits of broadcasting

- allows operations between arrays of different sizes;
- makes vectorized computations possible.

In [45]:
# Example 1: adding a scalar to an array
arr = np.array([1, 2, 3])
print(arr + 5) 

[6 7 8]


In [46]:
# Example 2: broadcasting across different shapes
arr2 = np.array([[1], [2], [3]])  # 3x1 matrix
print(arr2 + arr) 

[[2 3 4]
 [3 4 5]
 [4 5 6]]


In [47]:
# Example 3: broadcasting in multiplication
matrix = np.array([[1, 2], [3, 4]])
print(matrix * np.array([2, 3]))

[[ 2  6]
 [ 6 12]]


#### 6. NumPy aggregations (sum, mean, max, min)

In [48]:
# Example 1: sum and mean
arr = np.array([10, 20, 30])
print(np.sum(arr))  
print(np.mean(arr))

60
20.0


In [49]:
# Example 2: finding min and max
print(np.min(arr))  
print(np.max(arr))  

10
30


In [50]:
# Example 3: aggregation along axes
matrix = np.array([[1, 2], [3, 4]])
print(np.sum(matrix, axis=0)) 
print(np.sum(matrix, axis=1))

[4 6]
[3 7]


#### 7. Generating random numbers

NumPy provides a random module for generating numbers for simulations, testing, and analysis.

Importance of generating random values:

- used for machine learning, statistics, and simulations;
- helps in data augmentation and random sampling

In [51]:
# Example 1: generate a random integer
print(np.random.randint(1, 100))

# Example 2: generate an array of random floats
print(np.random.rand(3, 3)) 

# Example 3: choosing a random sample
arr = np.array([10, 20, 30, 40, 50])
print(np.random.choice(arr, 3)) 

20
[[0.84781378 0.68439774 0.06761963]
 [0.0353445  0.55054567 0.07474346]
 [0.97183728 0.48582856 0.11764496]]
[50 30 30]


#### 8. Reshaping and transposing arrays

Reshaping changes an array’s shape, while transposing flips axes. Is essential for matrix operations and machine learning models. It is used in image processing and scientific computing.

In [52]:
# Example 1: reshape a 1D array into 2D
arr = np.array([1, 2, 3, 4, 5, 6])
print(arr.reshape(2, 3))

# Example 2: transpose a matrix
matrix = np.array([[1, 2], [3, 4]])
print(matrix.T)

# Example 3: flatten an array
print(matrix.flatten())  # Convert to 1D array

[[1 2 3]
 [4 5 6]]
[[1 3]
 [2 4]]
[1 2 3 4]


#### 9. Stacking and concatenating arrays

NumPy provides functions to combine arrays vertically and horizontally. Mostly used for data merging in scientific computing, being essential for concatenating different datasets

In [53]:
# Example 1: vertical stacking
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(np.vstack([arr1, arr2]))

# Example 2: horizontal stacking
print(np.hstack([arr1, arr2]))

# Example 3: concatenating along a specific axis
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
print(np.concatenate([matrix1, matrix2], axis=1))

[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]
[[1 2 5 6]
 [3 4 7 8]]


## Pandas concepts

Pandas is a powerful data manipulation library that provides DataFrame and Series structures to handle structured data efficiently. It is widely used in data analysis, machine learning, and finance.

#### 1. Creating a Pandas DataFrame

A DataFrame is the core data structure in Pandas. It is a table-like structure with rows and columns (similar to an Excel sheet or SQL table). You can create a DataFrame from different sources, such as:

- lists and dictionaries;
- CSV and Excel files;
- databases like SQL

In [54]:
import pandas as pd

# Example 1: creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df)

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


In [55]:
# Example 2: creating a DataFrame from a list of lists
data = [['Alice', 25], ['Bob', 30], ['Charlie', 35]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


In [56]:
# Example 3: creating a DataFrame with random values
df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])
print(df)

          A         B         C
0  1.274733  0.129708  0.988325
1 -0.297006 -1.009994  1.139460
2 -0.826497 -1.674115 -0.221835
3  0.272198  0.507338  0.368615
4 -0.833765 -2.388524 -1.605892


#### 2. Reading and writing data

Pandas provides functions to import and export data from CSV, Excel, JSON, and SQL files.

In [57]:
# Example 1: reading a CSV file
df = pd.read_csv('data/data.csv')
print(df.head())

   Duration          Date  Pulse  Maxpulse  Calories
0        60  '2020/12/01'    110       130     409.1
1        60  '2020/12/02'    117       145     479.0
2        60  '2020/12/03'    103       135     340.0
3        45  '2020/12/04'    109       175     282.4
4        45  '2020/12/05'    117       148     406.0


In [58]:
# Example 2: writing a DataFrame to an excel file
df.to_excel('data/output.xlsx', index=False)

#### 3. Exploring data (head, tail, info, describe)

Before working with a dataset, it’s important to understand its structure, types, and summary statistics.

In [59]:
# Example 1: view the first 5 rows
print(df.head())

   Duration          Date  Pulse  Maxpulse  Calories
0        60  '2020/12/01'    110       130     409.1
1        60  '2020/12/02'    117       145     479.0
2        60  '2020/12/03'    103       135     340.0
3        45  '2020/12/04'    109       175     282.4
4        45  '2020/12/05'    117       148     406.0


In [60]:
# Example 2: get summary statistics
print(df.describe())

         Duration       Pulse    Maxpulse    Calories
count   32.000000   32.000000   32.000000   30.000000
mean    68.437500  103.500000  128.500000  304.680000
std     70.039591    7.832933   12.998759   66.003779
min     30.000000   90.000000  101.000000  195.100000
25%     60.000000  100.000000  120.000000  250.700000
50%     60.000000  102.500000  127.500000  291.200000
75%     60.000000  106.500000  132.250000  343.975000
max    450.000000  130.000000  175.000000  479.000000


In [61]:
# Example 3: check data types and missing values
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  32 non-null     int64  
 1   Date      31 non-null     object 
 2   Pulse     32 non-null     int64  
 3   Maxpulse  32 non-null     int64  
 4   Calories  30 non-null     float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None


#### 4. Selecting and filtering data

Pandas allows selecting specific rows and columns using:

- direct column access (df['column'])
- boolean conditions (df[df['column'] > value])
- index-based selection (df.iloc[])

In [62]:
# Example 1: Selecting a single column
print(df['Pulse'])

0     110
1     117
2     103
3     109
4     117
5     102
6     110
7     104
8     109
9      98
10    103
11    100
12    100
13    106
14    104
15     98
16     98
17    100
18     90
19    103
20     97
21    108
22    100
23    130
24    105
25    102
26    100
27     92
28    103
29    100
30    102
31     92
Name: Pulse, dtype: int64


In [63]:
# Example 2: filtering rows based on a condition
calories = df[df['Calories'] > 300]
print(calories)

    Duration          Date  Pulse  Maxpulse  Calories
0         60  '2020/12/01'    110       130     409.1
1         60  '2020/12/02'    117       145     479.0
2         60  '2020/12/03'    103       135     340.0
4         45  '2020/12/05'    117       148     406.0
6         60  '2020/12/07'    110       136     374.0
10        60  '2020/12/11'    103       147     329.3
13        60  '2020/12/13'    106       128     345.3
14        60  '2020/12/14'    104       132     379.3
19        60  '2020/12/19'    103       123     323.0
21        60  '2020/12/21'    108       131     364.2
25        60  '2020/12/25'    102       126     334.5
30        60  '2020/12/30'    102       129     380.3


In [64]:
# Example 3: selecting specific rows and columns using iloc
print(df.iloc[0:2, [3, 4]])  # selects first 2 rows, first 2 columns

   Maxpulse  Calories
0       130     409.1
1       145     479.0


#### 5. Handling missing values

Real-world datasets often contain missing data. Pandas provides methods to:

- identify missing values (df.isnull())
- fill missing values (df.fillna())
- remove missing values (df.dropna())

In [65]:
# Example 1: check for missing values
print(df.isnull().sum())

Duration    0
Date        1
Pulse       0
Maxpulse    0
Calories    2
dtype: int64


In [66]:
# Example 2: fill missing values with the column mean
df['Calories'].fillna(df['Calories'].mean(), inplace=True)

In [67]:
# Example 3: drop rows with missing values
df_cleaned = df.dropna()
print(df_cleaned)

    Duration          Date  Pulse  Maxpulse  Calories
0         60  '2020/12/01'    110       130    409.10
1         60  '2020/12/02'    117       145    479.00
2         60  '2020/12/03'    103       135    340.00
3         45  '2020/12/04'    109       175    282.40
4         45  '2020/12/05'    117       148    406.00
5         60  '2020/12/06'    102       127    300.00
6         60  '2020/12/07'    110       136    374.00
7        450  '2020/12/08'    104       134    253.30
8         30  '2020/12/09'    109       133    195.10
9         60  '2020/12/10'     98       124    269.00
10        60  '2020/12/11'    103       147    329.30
11        60  '2020/12/12'    100       120    250.70
12        60  '2020/12/12'    100       120    250.70
13        60  '2020/12/13'    106       128    345.30
14        60  '2020/12/14'    104       132    379.30
15        60  '2020/12/15'     98       123    275.00
16        60  '2020/12/16'     98       120    215.20
17        60  '2020/12/17'  

#### 6. Sorting and ranking data

Sorting allows arranging data in ascending or descending order, and ranking assigns numerical ranks.

In [68]:
# Example 1: sort by age (ascending)
df_sorted = df.sort_values('Maxpulse')
print(df_sorted)

    Duration          Date  Pulse  Maxpulse  Calories
23        60  '2020/12/23'    130       101    300.00
18        45  '2020/12/18'     90       112    304.68
31        60  '2020/12/31'     92       115    243.00
27        60  '2020/12/27'     92       118    241.00
22        45           NaN    100       119    282.00
26        60      20201226    100       120    250.00
17        60  '2020/12/17'    100       120    300.00
16        60  '2020/12/16'     98       120    215.20
12        60  '2020/12/12'    100       120    250.70
11        60  '2020/12/12'    100       120    250.70
19        60  '2020/12/19'    103       123    323.00
15        60  '2020/12/15'     98       123    275.00
9         60  '2020/12/10'     98       124    269.00
20        45  '2020/12/20'     97       125    243.00
25        60  '2020/12/25'    102       126    334.50
5         60  '2020/12/06'    102       127    300.00
13        60  '2020/12/13'    106       128    345.30
30        60  '2020/12/30'  

In [69]:
# Example 2: sort by multiple columns
df_sorted = df.sort_values(['Pulse', 'Duration'], ascending=[True, False])
print(df_sorted)

    Duration          Date  Pulse  Maxpulse  Calories
18        45  '2020/12/18'     90       112    304.68
27        60  '2020/12/27'     92       118    241.00
31        60  '2020/12/31'     92       115    243.00
20        45  '2020/12/20'     97       125    243.00
9         60  '2020/12/10'     98       124    269.00
15        60  '2020/12/15'     98       123    275.00
16        60  '2020/12/16'     98       120    215.20
11        60  '2020/12/12'    100       120    250.70
12        60  '2020/12/12'    100       120    250.70
17        60  '2020/12/17'    100       120    300.00
26        60      20201226    100       120    250.00
29        60  '2020/12/29'    100       132    280.00
22        45           NaN    100       119    282.00
5         60  '2020/12/06'    102       127    300.00
25        60  '2020/12/25'    102       126    334.50
30        60  '2020/12/30'    102       129    380.30
2         60  '2020/12/03'    103       135    340.00
10        60  '2020/12/11'  

In [70]:
# Example 3: ranking values
df['Rank'] = df['Pulse'].rank(ascending=False)
print(df)

    Duration          Date  Pulse  Maxpulse  Calories  Rank
0         60  '2020/12/01'    110       130    409.10   4.5
1         60  '2020/12/02'    117       145    479.00   2.5
2         60  '2020/12/03'    103       135    340.00  14.5
3         45  '2020/12/04'    109       175    282.40   6.5
4         45  '2020/12/05'    117       148    406.00   2.5
5         60  '2020/12/06'    102       127    300.00  18.0
6         60  '2020/12/07'    110       136    374.00   4.5
7        450  '2020/12/08'    104       134    253.30  11.5
8         30  '2020/12/09'    109       133    195.10   6.5
9         60  '2020/12/10'     98       124    269.00  27.0
10        60  '2020/12/11'    103       147    329.30  14.5
11        60  '2020/12/12'    100       120    250.70  22.5
12        60  '2020/12/12'    100       120    250.70  22.5
13        60  '2020/12/13'    106       128    345.30   9.0
14        60  '2020/12/14'    104       132    379.30  11.5
15        60  '2020/12/15'     98       

#### 7. Grouping and aggregation

Pandas allows grouping data by categories and applying aggregate functions (sum(), mean(), etc.).

In [71]:
# Example 1: group by column and calculate mean
print(df.groupby('Duration')['Calories'].mean())

Duration
30     195.100000
45     294.013333
60     314.053333
450    253.300000
Name: Calories, dtype: float64


In [72]:
# Example 2: count occurrences of each category
print(df.groupby('Duration').size())

Duration
30      1
45      6
60     24
450     1
dtype: int64


In [73]:
# Example 3: aggregate multiple statistics
print(df.groupby('Duration').agg({'Calories': ['sum', 'mean']}))

         Calories            
              sum        mean
Duration                     
30         195.10  195.100000
45        1764.08  294.013333
60        7537.28  314.053333
450        253.30  253.300000


#### 8. Merging and joining DataFrames

In [74]:
df1 = pd.read_excel('data/data1.xlsx')
df2 = pd.read_excel('data/data2.xlsx')
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])


In [75]:
merged_df = pd.merge(df1, df2, on='Date')
merged_df

Unnamed: 0,Duration_x,Date,Pulse_x,Maxpulse_x,Calories_x,Duration_y,Pulse_y,Maxpulse_y,Calories_y
0,60,2020-12-31,104,132,379.3,60,92,115,243.0
1,60,2020-12-28,98,123,275.0,60,103,132,
2,60,2020-12-27,98,120,215.2,60,92,118,241.0
3,60,2020-12-19,100,120,300.0,60,103,123,323.0


In [76]:
# Example 2: concatenate two DataFrames vertically
df_combined = pd.concat([df1, df2])
df_combined.head(10)


Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,2020-12-01,110,130,409.1
1,60,2020-12-02,117,145,479.0
2,60,2020-12-03,103,135,340.0
3,45,2020-12-04,109,175,282.4
4,45,2020-12-05,117,148,406.0
5,60,2020-12-06,102,127,300.0
6,60,2020-12-07,110,136,374.0
7,450,2020-12-08,104,134,253.3
8,30,2020-12-09,109,133,195.1
9,60,2020-12-10,98,124,269.0


In [77]:
# Example 3: perform a left join
df_merged = df1.merge(df2, on='Date', how='left')
df_merged

Unnamed: 0,Duration_x,Date,Pulse_x,Maxpulse_x,Calories_x,Duration_y,Pulse_y,Maxpulse_y,Calories_y
0,60,2020-12-01,110,130,409.1,,,,
1,60,2020-12-02,117,145,479.0,,,,
2,60,2020-12-03,103,135,340.0,,,,
3,45,2020-12-04,109,175,282.4,,,,
4,45,2020-12-05,117,148,406.0,,,,
5,60,2020-12-06,102,127,300.0,,,,
6,60,2020-12-07,110,136,374.0,,,,
7,450,2020-12-08,104,134,253.3,,,,
8,30,2020-12-09,109,133,195.1,,,,
9,60,2020-12-10,98,124,269.0,,,,


#### 9. Applying functions to data (apply() & map())

Pandas allows applying functions to DataFrame columns using apply() for columns/rows and map() for Series.

In [78]:
# Example 2: Apply a custom function to a column
df['Maxpulse_transformed'] = df['Maxpulse'].apply(lambda x: x * 0.9)
df.head()

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories,Rank,Maxpulse_transformed
0,60,'2020/12/01',110,130,409.1,4.5,117.0
1,60,'2020/12/02',117,145,479.0,2.5,130.5
2,60,'2020/12/03',103,135,340.0,14.5,121.5
3,45,'2020/12/04',109,175,282.4,6.5,157.5
4,45,'2020/12/05',117,148,406.0,2.5,133.2


In [79]:
# Example 3: Map values to new categories
df['Category'] = df['Calories'].map(lambda x: 'High' if x > 400 else 'Low')
print(df.head())

# Drop columns exemplification
df.drop(['Rank', 'Maxpulse_transformed'], axis = 1, inplace=True)

   Duration          Date  Pulse  Maxpulse  Calories  Rank  \
0        60  '2020/12/01'    110       130     409.1   4.5   
1        60  '2020/12/02'    117       145     479.0   2.5   
2        60  '2020/12/03'    103       135     340.0  14.5   
3        45  '2020/12/04'    109       175     282.4   6.5   
4        45  '2020/12/05'    117       148     406.0   2.5   

   Maxpulse_transformed Category  
0                 117.0     High  
1                 130.5     High  
2                 121.5      Low  
3                 157.5      Low  
4                 133.2     High  


## Resources for learning. Insights

The best way to master Python, NumPy, and Pandas is through consistent daily practice. Spending at least one to two hours a day solving coding problems, exploring datasets, and working on projects will help solidify your understanding. Rather than passively reading tutorials, you should actively write code and experiment with different techniques.

Working with real-world datasets is the best way to improve your data analysis skills. Platforms like Kaggle, Google Datasets, and StrataScratch provide excellent datasets that allow you to apply your knowledge to real scenarios. Instead of only solving theoretical exercises, try tackle a project you may be interested in and follow the best practices :)

Building a strong portfolio of projects is also important for landing a job in data science or data engineering. Employers look for candidates who can demonstrate practical skills. A well-documented GitHub repository showcasing data analysis projects / machine learning models will make you stand out

Staying curious and adaptable is key to success in the data field. Technology evolves rapidly, and new libraries, tools, and techniques emerge frequently. Keep up with industry trends by reading blogs, attending webinars, and enrolling in advanced courses. Never stop learning, and don’t hesitate to ask questions!

- https://www.kaggle.com/learn - Kaggle offers interactive courses for learning Python, Pandas, and NumPy using real-world datasets. The lessons are structured into bite-sized modules with coding exercises that allow you to apply concepts immediately
- https://www.datacamp.com/ - DataCamp offers comprehensive, interactive courses for learning Python, NumPy, and Pandas. It includes video tutorials, exercises, and projects, making it one of the most hands-on learning platforms for data science. One of its best features is the career tracks, which guide learners through a structured path from beginner to expert in data science, data engineering, or machine learning
- https://www.coursera.org/specializations/machine-learning-introduction - Coursera offers professional certification programs from top universities and tech companies. These courses include real-world projects, industry use cases, and certificates that can enhance your resume.
- https://cs50.harvard.edu/python/2022/ - Harvard’s CS50 Python course is a well-structured introduction to Python, covering functions, loops, data structures, and object-oriented programming. The course also includes topics like data handling with Pandas and NumPy and is taught by Harvard professors.