<a href="https://colab.research.google.com/github/mirzanaeembeg/Python-Cheat-Sheet-ML-DL-AI/blob/main/1_Python_Fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **1. Python Fundamentals**

This section provides a comprehensive overview of core [Python](https://docs.python.org/3/) concepts essential for beginners and as a refresher for those working in Machine Learning, Deep Learning, and AI.

Here's what is covered:

* **Print Statements and String Formatting:** Demonstrates various ways to display output and format strings using f-strings, `.format()`, and the older `%` formatting. It also includes examples of escape characters.
* **Lists - Creation and Basic Operations:** Introduces lists as mutable sequences, showing how to create them, check their properties (`type`, `len`, `sum`, `max`, `min`), and use boolean functions like `any()` and `all()`.
* **Range Function:** Explains the `range()` function for generating sequences of numbers, including examples with start, stop, and step arguments. It also shows common patterns like using `range()` for indexing.
* **List Iteration and Modification:** Covers different methods for iterating over lists (direct, index-based, and `enumerate`). It also shows how to modify list elements during iteration and introduces list slicing for accessing subsets of a list.
* **List Methods:** Details various built-in list methods for adding elements (`append`, `extend`, `insert`), removing elements (`count`, `index`, `remove`, `pop`), and modifying the list in-place (`reverse`, `sort`).
* **Enumerate Function:** Explains how `enumerate()` provides both the index and value during iteration, with examples of basic usage, custom start values, and converting the output to a list.
* **Zip Function:** Shows how `zip()` pairs elements from multiple iterables, demonstrating its use with two or more lists and explaining its behavior with lists of different lengths. It also includes an example of combining `enumerate` and `zip`.
* **Iterators:** Introduces the concept of iterators and how to create and use them with `iter()` and `next()`. It also shows how to use an iterator with a loop.
* **List Comprehensions:** Provides a concise way to create lists based on existing iterables. It covers basic syntax, conditions, complex expressions, temperature conversion examples, and nested list comprehensions.
* **Set Comprehensions:** Explains how to create sets using a similar concise syntax to list comprehensions, highlighting their use for generating unique elements.
* **Dictionary Comprehensions:** Shows how to create dictionaries efficiently using comprehensions, including examples with basic syntax, conditions, and working with dictionary methods like `.items()`, `.keys()`, and `.values()`.
* **Lambda Functions:** Introduces small, anonymous functions created with `lambda`. It covers basic syntax, use with built-in functions like `filter()` and `map()`, using lambda for sorting with `key`, and examples of higher-order functions.
* **Map and Filter Functions:** Details the `map()` function for applying a function to all items in an input list and the `filter()` function for constructing an iterator from elements of an iterable for which a function returns true.
* **Collections Module:** Introduces useful data structures from the `collections` module, specifically `Counter` for counting hashable objects and `DefaultDict` for handling missing keys gracefully.
* **String Templates:** Shows how to use `string.Template` for simple string substitutions, including basic usage and safe substitution to avoid errors with missing keys.
* **Tuples:** Explains tuples as immutable sequences, covering creation, unpacking, the importance of the comma for single-element tuples, and introducing `namedtuple` for creating easily readable data structures.
* **Advanced List Operations:** Provides examples of more complex list operations, including finding elements using comprehensions, and nested list comprehensions with `zip` and multiple conditions.
* **Practical Examples and Patterns:** Illustrates common data processing patterns using the concepts covered, such as processing paired data with `zip` and grouping/aggregating with `defaultdict`. It also shows string and list combinations.
* **Common Patterns and Best Practices:** Offers guidance on writing efficient and readable Python code, including tips for efficient iteration, choosing between lists, sets, and dictionaries, and using generators for memory efficiency.
* **Common Mistakes to Avoid:** Highlights potential pitfalls, such as mutable default arguments, modifying a list while iterating over it, and understanding variable scope in comprehensions.
* **Advanced:** The advanced topic of python for Machine Learning, Deep Learning and AI is covered.


This section provides a solid foundation in Python's core features & [**`Built-in Functions`**](https://docs.python.org/3/library/functions.html), which are essential building blocks for more advanced topics in machine learning and data science.


## 1.1 Print Statements and String Formatting
---
- **Real Python**: [Python String Formatting: Available Tools and Their Features](https://realpython.com/python-string-formatting/)
- **Python.org Official Documentation**: [Input and Output](https://docs.python.org/3/tutorial/inputoutput.html)
- **GeeksforGeeks**: [Python String Formatting](https://www.geeksforgeeks.org/python-string-formatting/)
- **Programiz**: [Python String Format()](https://www.programiz.com/python-programming/methods/string/format)
- **DataCamp**: [Python String Formatting Tutorial](https://www.datacamp.com/tutorial/python-string-formatting)

### Basic Print Statements
This cell demonstrates different ways to print a single variable. Using f-strings is generally preferred for readability and efficiency, while comma separation is concise for simple cases, and string concatenation is less common for this purpose due to potential type conversion issues.

In [None]:
x = 10
print(f'The value of x is {x}')        # f-string formatting
print('The value of x is', x)          # comma-separated
print('x = ' + str(x))                 # string concatenation

The value of x is 10
The value of x is 10
x = 10


### Multiple Variables
This cell shows various methods for printing multiple variables. Comma separation is simple, f-strings offer a clean and readable way to embed variables directly into strings, and string concatenation requires explicit type conversion and can be less efficient for many variables.

In [None]:
x, y = 10, 20
print('x =', x, 'y =', y)                  # comma-separated
print(f'x = {x} y = {y}')                  # f-string
print('x = ' + str(x) + ' y = ' + str(y))  # concatenation

x = 10 y = 20
x = 10 y = 20
x = 10 y = 20


### f-string Formatting in Python

Learn about the f-string formatting technique in Python 3.6. In this tutorial, you'll see what advantages it offers and go through some example use cases.

In [None]:
## arithmetic expression
print(f"{5 * 5}")

def greet(name):
    return "Hello, " + name
## calling the function using f-string
name = "Datacamp"
print(f"{greet(name)}")

## calling the 'title' method which makes the first letter of every word upper
string = "datacamp is an educational company."
print(f"{string.title()}")

class Sample:

    def __init__(self, name, age):
        self.name = name
        self.age = age

    ## this method will be called when we print the object
    def __str__(self):
        return f'{self.name} is {self.age} years old.'
john = Sample("John", 19)
## it'll wake up the __str__() method
print(f"{john}")

25
Hello, Datacamp
Datacamp Is An Educational Company.
John is 19 years old.


### f-string in Dictionaries

We have to be a bit careful while dealing with dictionary keys inside the **f-string**. You have to use a different quotation to the dictionary key and **f-string**. You are not permitted to use the same quotations for a dictionary key as if it was an **f-string**.

In [None]:
person = {"name": "John", "age": 19}
print(f"{person['name']} is {person['age']} years old.")
# print(f'{person['name']} is {person['age']} years old.') # Throws an error

John is 19 years old.


### String Formatting Methods
This cell illustrates different string formatting methods. f-strings are the modern and recommended way due to their conciseness and performance. The `.format()` method is a good alternative for more complex formatting needs or compatibility with older Python versions. The `%` formatting is an older style and generally discouraged in new code.

In [None]:
# f-strings (Python 3.6+) - Recommended
name, age = "Alice", 25
print(f"Hello {name}, you are {age} years old")

# .format() method
print("Hello {}, you are {} years old".format(name, age))
print("Hello {0}, you are {1} years old".format(name, age))

# % formatting (older style)
print("Hello %s, you are %d years old" % (name, age))

Hello Alice, you are 25 years old
Hello Alice, you are 25 years old
Hello Alice, you are 25 years old
Hello Alice, you are 25 years old


### Escape Characters
This cell demonstrates the use of escape characters to include special characters or formatting within a string, such as newlines (`\n`), tabs (`\t`), or quotation marks (`\"`).

In [None]:
print("Hello World\n")                  # newline
print("Hello\tWorld")                   # tab
print("He said \"Hello\"")              # escaped quotes

Hello World

Hello	World
He said "Hello"


## 1.2 Lists - Creation and Basic Operations
---
- **Real Python**: [Python's list Data Type: A Deep Dive With Examples](https://realpython.com/python-list/)
- **GeeksforGeeks**: [Python Lists](https://www.geeksforgeeks.org/python-lists/)
- **Python Geeks**: [Python Lists with Examples](https://pythongeeks.org/python-lists/)
- **Programiz**: [Python List](https://www.programiz.com/python-programming/list)
- **W3Schools**: [Python Lists](https://www.w3schools.com/python/python_lists.asp)
---
<div align="center">
<img src="https://drive.google.com/uc?id=1QV25sSV1TEkmGGJH4whpSz6kbHGNqXkD" width="550">
<div align="center">
<img src="https://drive.google.com/uc?id=19AT0Wu90U3lFTuGOIdq-SJUjerxM_LnL" width="550">


In Python, a list is a built-in [data structure](https://docs.python.org/3/tutorial/datastructures.html) used to store an ordered, mutable sequence of items dynamic sized array (automatically grows and shrinks). We can store all types of items (including another list) in a list. A list may contain mixed type of items, this is possible because a list mainly stores references at contiguous locations and actual items maybe stored at different locations. Key characteristics of Python lists include:
- `Ordered:` Items in a list maintain a defined order based on their insertion sequence.
- `Changeable (Mutable):` Lists can be modified after creation, allowing for the addition, removal, or modification of elements.
- `Allow Duplicates:` Lists can contain multiple occurrences of the same value.
Indexed: List items are accessed using zero-based index numbers within square brackets `[]`.
- `Heterogeneous:` Lists can store items of different data types (e.g., integers, strings, booleans, or even other lists) within the same list.

### List Creation
This cell shows how to create lists, which are mutable ordered sequences. Lists are versatile and can hold elements of different data types, making them suitable for storing collections of items where order and duplicates matter.
<div align="center">
<img src="https://drive.google.com/uc?id=1ohr-pwPv0ntDwx8v7iIsoCwCLPlHhW0M" width="650">


In [None]:
# Different data types in lists
mixed_list = [1, 2, 3, 4, 'string', 2.222]
numbers = [1, 5, 9, 55, 4545]
empty_list = []

# Create a list [2, 2, 2, 2, 2]
a = [2] * 5
duplicate_list = [1, 1, 1, 1, 1]

# Assignment on a list means it points to the same list in the memory
b = mixed_list
print(b)

print(mixed_list)
print(numbers)
print(empty_list)
print(duplicate_list)
print(a)

[1, 2, 3, 4, 'string', 2.222]
[1, 5, 9, 55, 4545]
[]
[1, 1, 1, 1, 1]
[2, 2, 2, 2, 2]


### Using list() Constructor

In [None]:
# From a tuple
a = list((1, 2, 3, 'apple', 4.5))
print(a)

[1, 2, 3, 'apple', 4.5]


### List Properties
This cell demonstrates how to use built-in functions to quickly check basic properties of a list, such as its type, length, sum of elements (for numeric lists), and the maximum and minimum values.

In [None]:
my_list = [1, 2, 3, 4]
print(type(my_list))                    # <class 'list'>
print(len(my_list))                     # 4
print(sum(my_list))                     # 10
print(max(my_list))                     # 4
print(min(my_list))                     # 1

<class 'list'>
4
10
4
1


### Boolean Functions on Lists
This cell shows how to use the boolean functions `any()` and `all()`. `any()` is useful for checking if at least one element in an iterable is true (or non-zero/non-empty), while `all()` checks if all elements are true.

In [None]:
list1 = [1, 0, 3, 1, 0, 2]
print(any(list1))                       # True (any non-zero)
print(all(list1))                       # False (contains zeros)

list2 = [1, 2, 3, 4, 5]
print(all(list2))                       # True (all non-zero)

True
False
True


### Accessing List Elements
Elements in a list can be accessed using indexing. Python indexes start at `0`, so `a[0]` will access the first element, while negative indexing allows us to access elements from the end of the list. Like `index -1` represents the last elements of list.
<div align="center">
<img src="https://drive.google.com/uc?id=18bmIigXRW33zPTPxKiHRqGZSpYcmKEi4" width="650">

In [None]:
a = [10, 20, 30, 40, 50]

# Access first element
print(a[0])

# Access last element
print(a[-1])

10
50


## 1.3 Range Function
---
- **Real Python**: [Python range() Function](https://realpython.com/python-range/)
- **GeeksforGeeks**: [Python range() function](https://www.geeksforgeeks.org/python-range-function/)
- **Python-Course.eu**: [The range() Function](https://python-course.eu/python-tutorial/range.php)
- **Programiz**: [Python range()](https://www.programiz.com/python-programming/methods/built-in/range)

### Basic Range Usage
This cell demonstrates the basic usage of the `range()` function, which is primarily used to generate sequences of numbers for looping. It's efficient as it generates numbers on the fly rather than creating a full list in memory.

In [None]:
print(list(range(5)))                          # [0, 1, 2, 3, 4]
print(list(range(1, 6)))                       # [1, 2, 3, 4, 5]
print(list(range(1, 11, 2)))                   # [1, 3, 5, 7, 9] (step=2)
print(list(range(10, 0, -1)))                  # [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
print(list(range(-10, 0, 2)))                  # [-10, -8, -6, -4, -2]

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[1, 3, 5, 7, 9]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[-10, -8, -6, -4, -2]


### Common Range Patterns
This cell shows a common and readable pattern of using `range(len(my_list))` to iterate over a list when you need both the index and the value, allowing you to access elements by their position.

In [None]:
# For indexing
my_list = ['a', 'b', 'c', 'd']
for i in range(len(my_list)):
    print(f"Index {i}: {my_list[i]}")

Index 0: a
Index 1: b
Index 2: c
Index 3: d


## 1.4 List Iteration and Modification
---
- **Real Python**: [Python for Loops (Definite Iteration)](https://realpython.com/python-for-loop/)
- **GeeksforGeeks**: [Python List Slicing](https://www.geeksforgeeks.org/python-list-slicing/)
- **Python-Course.eu**: [Sequential Data Types](https://python-course.eu/python-tutorial/sequential-data-types.php)
- **Programiz**: [Python List Slicing](https://www.programiz.com/python-programming/list-slicing)


### Iterating Over Lists
This cell illustrates different methods for iterating over lists. Direct iteration is the most Pythonic when you only need the elements. Index-based iteration is useful when you need to access elements by their position. `enumerate()` is preferred when you need both the index and the value during iteration.

In [None]:
# Method 1: Direct iteration
fruits = ['apple', 'banana', 'orange']
for fruit in fruits:
    print(fruit)

# Method 2: Index-based iteration
for i in range(len(fruits)):
    print(f"{i}: {fruits[i]}")

# Method 3: Using enumerate (preferred for index + value)
for index, fruit in enumerate(fruits):
    print(f"{index}: {fruit}")

apple
banana
orange
0: apple
1: banana
2: orange
0: apple
1: banana
2: orange


### Modifying Lists During Iteration
This cell demonstrates how to modify elements of a list in-place during iteration using index-based access within a loop. This is necessary when you need to change the existing elements of the list.

In [None]:
# Add 2 to each element
for i in range(len(numbers)):
    numbers[i] += 2
print(numbers)  # [3, 4, 5, 6, 7]

[3, 7, 11, 57, 4547]


### List Slicing
This cell showcases list slicing, a powerful way to access or create sub-lists (or copies) of a list. It's useful for extracting specific portions of a list, reversing a list, or creating stepped sequences without modifying the original list.
<div align="center">
<img src="https://drive.google.com/uc?id=1HHA_wcYDrMhe1Dzt5fqO4NliyerhhK-q" width="650">

In [None]:
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(my_list[2:5])                     # [2, 3, 4]
print(my_list[:3])                      # [0, 1, 2]
print(my_list[3:])                      # [3, 4, 5, 6, 7, 8, 9]
print(my_list[::2])                     # [0, 2, 4, 6, 8] (every 2nd)
print(my_list[::-1])                    # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] (reverse)
print(my_list[::-2])                    # [9, 7, 5, 3, 1] (reverse every 2nd)

[2, 3, 4]
[0, 1, 2]
[3, 4, 5, 6, 7, 8, 9]
[0, 2, 4, 6, 8]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[9, 7, 5, 3, 1]


## 1.5 List Methods
---
- **GeeksforGeeks**: [Python List methods](https://www.geeksforgeeks.org/list-methods-python/)
- **Real Python**: [Working With Lists in Python](https://realpython.com/python-lists-tuples/)
- **Python.org**: [More on Lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)
- **W3Schools**: [Python List Methods](https://www.w3schools.com/python/python_lists_methods.asp)
---
Here are some other common list methods.

- `list.append(elem)` -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the original.
- `list.insert(index, elem)` -- inserts the element at the given index, shifting elements to the right.
- `list.extend(list2)` adds the elements in list2 to the end of the list. Using `+` or `+=` on a list is similar to using `extend()`.
- `list.index(elem)` -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear.
- `list.remove(elem)` -- searches for the first instance of the given element and removes it (throws ValueError if not present)
- `list.sort()` -- sorts the list in place (does not return it). (The `sorted()` function shown later is preferred.)
- `list.reverse()` -- reverses the list in place (does not return it)
- `list.pop(index)` -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of `append()`).

Notice that these are **methods** on a list object, while len() is a function that takes the list (or string or whatever) as an argument.

### Adding Elements
This cell demonstrates different methods for adding elements to a list. `append()` adds a single element to the end. `extend()` adds multiple elements from another iterable. `insert()` adds an element at a specific position, shifting existing elements.

In [None]:
x = [1, 2, 3]

# append() - adds single element
x.append(4)                             # [1, 2, 3, 4]
x.append([5, 6])                        # [1, 2, 3, 4, [5, 6]]
print(x)

# extend() - adds multiple elements
y = [1, 2, 3]
y.extend([4, 5])                        # [1, 2, 3, 4, 5]
y.extend((6, 7))                        # [1, 2, 3, 4, 5, 6, 7] (works with tuples)
print(y)

# insert() - adds element at specific position
z = [1, 2, 3]
z.insert(1, 'inserted')                 # [1, 'inserted', 2, 3]
print(z)


[1, 2, 3, 4, [5, 6]]
[1, 2, 3, 4, 5, 6, 7]
[1, 'inserted', 2, 3]


### Other Useful List Methods
This cell covers other useful list methods. `count()` is used to find how many times an element appears. `index()` finds the position of the first occurrence. `remove()` removes the first matching element. `pop()` removes and returns an element by its index (or the last element if no index is given). `reverse()` and `sort()` modify the list in-place.

In [None]:
my_list = [1, 2, 3, 2, 4, 2]

# Count occurrences
print(my_list.count(2))                 # 3

# Find index of first occurrence
print(my_list.index(3))                 # 2

# Remove elements
my_list.remove(2)                       # Removes first occurrence of 2
print(my_list)                          # [1, 3, 4, 2]
popped = my_list.pop()                  # Removes and returns last element
print(popped)                           # 2
popped_at_index = my_list.pop(1)        # Removes and returns element at index 1
print(popped_at_index)                  # 3

# Reverse and sort return None
my_list.reverse()                       # Reverses in-place
print(my_list)                          # [4, 2, 1]
my_list.sort()                          # Sorts in-place
print(my_list)                          # [1, 2, 4]
my_list.sort(reverse=True)              # Sorts in descending order
print(my_list)                          # [4, 2, 1]

3
2
[1, 3, 2, 4, 2]
2
3
[4, 2, 1]
[1, 2, 4]
[4, 2, 1]


## 1.6 Enumerate Function
---
- **Real Python**: [Python enumerate(): Simplify Looping With Counters](https://realpython.com/python-enumerate/)
- **GeeksforGeeks**: [Enumerate() in Python](https://www.geeksforgeeks.org/enumerate-in-python/)
- **Python-Course.eu**: [Enumerate](https://python-course.eu/python-tutorial/enumerate.php)
- **Programiz**: [Python enumerate()](https://www.programiz.com/python-programming/methods/built-in/enumerate)

The `enumerate()` function in Python is a built-in function that adds a counter to an iterable and returns it as an enumerate object. This function is particularly useful when you need both the index (or count) and the value of items while iterating through a sequence like a list, tuple, or string. This is often more convenient and Pythonic than manually managing a counter.

How it works:
- **Takes an iterable:** You provide an iterable object (e.g., a list, tuple, string) as an argument to `enumerate()`.
- **Assigns a counter:** `enumerate()` internally iterates through the items of the iterable and assigns a numerical counter to each item, starting from 0 by default.
- **Returns an enumerate object:** It returns an enumerate object, which can then be directly used in a for loop or converted into other data structures like a list of tuples.

**Syntax:** `enumerate(iterable, start=0)`
- *iterable:* The sequence or object you want to iterate over (e.g., a list, tuple, string).
- *start:* (Optional) The starting value for the counter. If not specified, it defaults to 0.

***Benefits:***
- *Readability:* Makes code cleaner and more concise by directly providing both index and value.
- *Efficiency:* Eliminates the need for manual index tracking using a separate counter variable or `range(len(iterable))`.
- *Reduced errors:* Decreases the likelihood of off-by-one errors common with manual indexing.



### Basic Enumerate
<div align="center">
<img src="https://drive.google.com/uc?id=1gCn-SSXuAkWP7cZi1PuhP8pxIdjr408E" width="750">

In [None]:
days = ["Mon", "Tue", "Wed", "Thu", "Fri"]

# Default start at 0
for index, day in enumerate(days):
    print(f"{index}: {day}")

# Custom start value
for index, day in enumerate(days, start=1):
    print(f"Day {index}: {day}")

# Convert to list
indexed_days = list(enumerate(days, start=10))
print(indexed_days)  # [(10, 'Mon'), (11, 'Tue'), ...]

0: Mon
1: Tue
2: Wed
3: Thu
4: Fri
Day 1: Mon
Day 2: Tue
Day 3: Wed
Day 4: Thu
Day 5: Fri
[(10, 'Mon'), (11, 'Tue'), (12, 'Wed'), (13, 'Thu'), (14, 'Fri')]


### Practical Enumerate Usage
This cell provides a practical example of using `enumerate()` to access both the index and value of numbers, allowing you to perform operations or checks that depend on both the position and the value of an element.

In [None]:
numbers = [10, 11, 20, 23, 30, 40]

for i, num in enumerate(numbers):
    if num % 2 == 0:
        print(f"Index {i}: {num} is even")
    else:
        print(f"Index {i}: {num} is odd")

Index 0: 10 is even
Index 1: 11 is odd
Index 2: 20 is even
Index 3: 23 is odd
Index 4: 30 is even
Index 5: 40 is even


## 1.7 Zip Function
---
- **Real Python**: [Python zip() Function](https://realpython.com/python-zip-function/)
- **GeeksforGeeks**: [Python zip() method](https://www.geeksforgeeks.org/python-zip-method/)
- **Python-Course.eu**: [Zip in Python](https://python-course.eu/python-tutorial/zip.php)
- **Programiz**: [Python zip()](https://www.programiz.com/python-programming/methods/built-in/zip)

The `zip()` function in Python is a built-in utility that aggregates elements from multiple iterables (like lists, tuples, or strings) into a single iterator of tuples. It effectively **"zips"** corresponding elements together based on their index.

**Syntax:** `zip(iterable1, iterable2, ...)`

**Functionality:**

- *Pairs Elements:* `zip()` takes one or more iterables as arguments and creates an iterator that yields tuples. Each tuple contains the element from the first iterable at a given index, followed by the element from the second iterable at the same index, and so on, for all provided iterables.
- *Stops at Shortest Iterable:* The resulting iterator's length is determined by the shortest input iterable. Once the shortest iterable is exhausted, `zip()` stops producing tuples.
- *Returns an Iterator:* `zip()` returns an iterator, not a list or tuple directly. This means it generates elements on demand, which can be memory-efficient for large datasets. To get a list or tuple, the result can be converted using `list()` or `tuple()`.

**Common Use Cases:**

- *Parallel Iteration:* Looping over multiple lists simultaneously, accessing corresponding elements in each iteration.
- *Creating Dictionaries:* Combining two lists (e.g., one for keys and one for values) to create a dictionary.
- *Data Transformation:* Grouping related data points from different sources.
- *Unzipping:* Using `zip()` with the "*"(unpacking) operator can effectively **"unzip"** a list of tuples back into separate iterables.
- *Transposing Matrices:* Converting rows to columns or vice versa in a matrix representation.



### Basic Zip Usage
This cell shows the basic usage of the `zip()` function, which is used to pair elements from multiple iterables based on their position. This is useful for iterating over corresponding elements from different lists or tuples simultaneously.

In [None]:
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
scores = [85, 92, 78]

# Zip two lists
for name, age in zip(names, ages):
    print(f"{name} is {age} years old")

# Zip multiple lists
for name, age, score in zip(names, ages, scores):
    print(f"{name} ({age}) scored {score}")

# Convert to list
paired = list(zip(names, ages))
print(paired)  # [('Alice', 25), ('Bob', 30), ('Charlie', 35)]

Alice is 25 years old
Bob is 30 years old
Charlie is 35 years old
Alice (25) scored 85
Bob (30) scored 92
Charlie (35) scored 78
[('Alice', 25), ('Bob', 30), ('Charlie', 35)]


### Zip with Different Length Lists
This cell demonstrates how `zip()` handles iterables of different lengths. It's important to note that `zip()` will stop creating pairs once the shortest iterable is exhausted, which can be useful or a consideration depending on your data.

In [None]:
list1 = [1, 2, 3, 4, 5]
list2 = ['a', 'b', 'c']

# Zip stops at shortest list
result = list(zip(list1, list2))
print(result)  # [(1, 'a'), (2, 'b'), (3, 'c')]

[(1, 'a'), (2, 'b'), (3, 'c')]


### Combining Enumerate and Zip
This cell shows how to combine `enumerate()` and `zip()`. This pattern is useful when you need to iterate over multiple iterables in parallel and also need to know the index of the current elements across all iterables.

In [None]:
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']

for i, (num, letter) in enumerate(zip(list1, list2)):
    print(f"Index {i}: {num} -> {letter}")

Index 0: 1 -> a
Index 1: 2 -> b
Index 2: 3 -> c


## 1.8 Iterators
---
- **Real Python**: [Python Iterators: A Step-by-Step Introduction](https://realpython.com/python-iterators-iterables/)
- **GeeksforGeeks**: [Iterators in Python](https://www.geeksforgeeks.org/iterators-in-python/)
- **Python-Course.eu**: [Iterators and Iterables](https://python-course.eu/python-tutorial/iterators-and-iterables.php)
- **Python.org**: [Iterator Types](https://docs.python.org/3/library/stdtypes.html#iterator-types)

In Python, an iterator is an object that implements the iterator protocol, allowing sequential traversal through elements in a collection. This protocol consists of two methods:

- $__iter__():$ This method is called to initialize and return the iterator object itself. If an object is an iterable (like a list or tuple), calling `iter()` on it will return an iterator.
- $__next__():$ This method retrieves the next item from the sequence. When there are no more items to return, it raises a `StopIteration` exception, signaling the end of the iteration.

**Key characteristics and uses of iterators:**

- **Sequential Traversal:** Iterators enable processing elements one by one, which is particularly useful for large datasets as it avoids loading the entire collection into memory at once.
- **Lazy Evaluation:** They support lazy evaluation, meaning values are generated only when requested, which is beneficial for potentially infinite sequences or computationally expensive operations.
- **Memory Efficiency:** By yielding elements one at a time, iterators reduce memory consumption compared to creating a complete list or other data structure in memory.
- **Integration with Loops:** Python's for loops implicitly use iterators. When you iterate over an iterable, a hidden iterator is created, and its $__next__()$ method is repeatedly called until `StopIteration` is raised.
- **Custom Iterators:** You can create your own custom iterators by defining a class that implements the$ __iter__()$ and $__next__()$ methods, allowing you to define custom iteration logic for your data structures or algorithms.

<div align="center">
<img src="https://drive.google.com/uc?id=1ynrzdj6cd1NWa4Y5AmyryWyo4UVAS3zw" width="550">

### Creating and Using Iterators
This cell introduces iterators, which are objects that represent a stream of data. Using `iter()` and `next()` allows you to process elements one by one, which can be memory efficient for large datasets.

In [None]:
days = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]

# Create iterator
day_iter = iter(days)

# Get next elements
print(next(day_iter))                   # Sun
print(next(day_iter))                   # Mon
print(next(day_iter))                   # Tue
print(next(day_iter))                   # Wed
print(next(day_iter))                   # Thu
print(next(day_iter))                   # Fri
print(next(day_iter))                   # Sat

# Safe next with default value
print(next(day_iter, "No more days"))

Sun
Mon
Tue
Wed
Thu
Fri
Sat
No more days


### Iterator with Loop
This cell demonstrates how to use an iterator with a `while` loop. This pattern is useful when you need fine-grained control over the iteration process or when working with iterators that don't have a predefined length.

In [None]:
days = ["Sun", "Mon", "Tue", "Wed"]
day_iter = iter(days)

while True:
    day = next(day_iter, None)
    if day is None:
        break
    print(day)

Sun
Mon
Tue
Wed


## 1.9 List Comprehensions
---
- **Real Python**: [When to Use a List Comprehension in Python](https://realpython.com/list-comprehension-python/)
- **GeeksforGeeks**: [List Comprehension in Python](https://www.geeksforgeeks.org/python-list-comprehensions/)
- **Python-Course.eu**: [List Comprehension](https://python-course.eu/python-tutorial/list-comprehension.php)
- **Programiz**: [Python List Comprehension](https://www.programiz.com/python-programming/list-comprehension)

List comprehension in Python provides a concise and readable way to create lists based on existing iterables (like lists, tuples, strings, or ranges). It offers a more compact syntax compared to traditional for loops for generating new lists, often resulting in more efficient and elegant code.

**Basic Syntax:**
```Python
new_list = [expression for item in iterable if condition]
```
**Components:**

- **expression:**The operation or transformation to be applied to each `item`. This determines what value will be added to the `new_list`.
- **item:** A variable that represents each element from the `iterable` during the iteration.
- **iterable:** The source sequence (e.g., a list, tuple, string, or `range()`) from which elements are processed.
- **if condition (optional):** A filtering condition that, if present, determines whether an `item` should be included in the `new_list`. Only items for which the condition evaluates to `True` are processed by the `expression`.

<div align="center">
<img src="https://drive.google.com/uc?id=1kOVsTNwEjkVZsOyK4cn5U5m1q8z6CGYT" width="550">
<div align="center">
<img src="https://drive.google.com/uc?id=1OBPMJXhkuRx1mtgVOKIE0Z-JA80bT_M9" width="550">
<div align="center">
<img src="https://drive.google.com/uc?id=1p7FdqZZilOrh2tVlZn-jUwnCi6eB2uV0" width="350">

### Basic List Comprehensions
This cell introduces basic list comprehensions, a concise and Pythonic way to create new lists. They offer a more readable and often more efficient alternative to traditional `for` loops for simple list creation and transformation tasks.

In [None]:
# Basic syntax: [expression for item in iterable]
squares = [x**2 for x in range(1, 6)]
print(squares)  # [1, 4, 9, 16, 25]

# With condition: [expression for item in iterable if condition]
evens = [x for x in range(1, 11) if x % 2 == 0]
print(evens)  # [2, 4, 6, 8, 10]

# Complex expressions
numbers = [1, 2, 3, 4, 5]
result = [x**2 if x % 2 == 0 else x**3 for x in numbers]
print(result)  # [1, 4, 27, 16, 125]

divided = [x for x in range(50) if x % 2 == 0 if x % 6 == 0]
print(divided)
divided = [x for x in range(50) if x % 2 == 0 and x % 6 == 0]
print(divided)
divided = [x for x in range(50) if x % 2 == 0 or x % 6 == 0]
print(divided)
divided = [x for x in range(50) if not (x % 2 == 0 or x % 6 == 0)]
print(divided)
divided = [x for x in range(50) if x % 2 == 0 and not x % 6 == 0]
print(divided)

[1, 4, 9, 16, 25]
[2, 4, 6, 8, 10]
[1, 4, 27, 16, 125]
[0, 6, 12, 18, 24, 30, 36, 42, 48]
[0, 6, 12, 18, 24, 30, 36, 42, 48]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49]
[2, 4, 8, 10, 14, 16, 20, 22, 26, 28, 32, 34, 38, 40, 44, 46]


### Temperature Conversion Example
This cell provides a practical example of using list comprehension for a common data transformation task: converting temperatures from Celsius to Fahrenheit. The concise syntax makes the conversion logic clear.

In [None]:
celsius = [0, 10, 20, 30, 40]
fahrenheit = [(temp * 9/5) + 32 for temp in celsius]
print(fahrenheit)  # [32.0, 50.0, 68.0, 86.0, 104.0]

# With condition
fahrenheit_filtered = [(temp * 9/5) + 32 for temp in celsius if temp > 10]
print(fahrenheit_filtered)  # [68.0, 86.0, 104.0]

[32.0, 50.0, 68.0, 86.0, 104.0]
[68.0, 86.0, 104.0]


### Nested List Comprehensions
This cell demonstrates nested list comprehensions, which are useful for working with nested data structures like matrices. They provide a compact way to create or manipulate multi-dimensional lists.

In [None]:
# Create a 3x3 matrix
matrix = [[i*j for j in range(1, 4)] for i in range(1, 4)]
print(matrix)  # [[1, 2, 3], [2, 4, 6], [3, 6, 9]]

# Flatten a matrix
flattened = [num for row in matrix for num in row]
print(flattened)  # [1, 2, 3, 2, 4, 6, 3, 6, 9]

[[1, 2, 3], [2, 4, 6], [3, 6, 9]]
[1, 2, 3, 2, 4, 6, 3, 6, 9]


## 1.10 Set Comprehensions
---
- **Real Python**: [Sets in Python](https://realpython.com/python-sets/)
- **GeeksforGeeks**: [Set Comprehensions in Python](https://www.geeksforgeeks.org/set-comprehensions-python/)
- **Python-Course.eu**: [Sets and Set Comprehensions](https://python-course.eu/python-tutorial/sets-and-set-comprehensions.php)
- **Programiz**: [Python Set Comprehension](https://www.programiz.com/python-programming/set-comprehension)

Set comprehension in Python offers a concise and efficient method for creating sets. It allows for the generation of a new set by iterating over an existing iterable (such as a list, tuple, or another set) and applying an expression and/or a conditional filter to its elements.

**Syntax:**
```Python
new_set = {expression for item in iterable if condition}
```

**Components:**

- `expression:` The operation or value to be included in the new set. This can be a transformation of the `item` or the `item` itself.
- `item:` A variable representing each element from the `iterable` during the iteration.
- `iterable:` The existing sequence or collection being iterated over.
- `if condition` (optional): A conditional statement that filters elements from the `iterable`. Only elements for which the condition evaluates to `True` are included in the new set.

**Key Characteristics:**

- *Conciseness:* Set comprehensions provide a more compact and readable way to create sets compared to traditional `for` loops with `add()` operations.
- *Uniqueness:* As sets inherently store only unique elements, any duplicate values generated by the comprehension will automatically be removed.
- *Order:* Sets are unordered collections; therefore, the order of elements in the resulting set is not guaranteed.


### Basic Set Comprehensions
This cell introduces set comprehensions, which are similar to list comprehensions but create sets. They are useful for efficiently creating sets from other iterables, automatically handling the uniqueness of elements.

In [None]:
# Basic set comprehension
numbers = [1, 2, 2, 3, 3, 4, 5]
unique_squares = {x**2 for x in numbers}
print(unique_squares)  # {1, 4, 9, 16, 25}

# Set from string (unique characters)
text = "hello world"
unique_chars = {c.upper() for c in text if not c.isspace()}
print(unique_chars)  # {'H', 'E', 'L', 'O', 'W', 'R', 'D'}

{1, 4, 9, 16, 25}
{'L', 'H', 'O', 'W', 'D', 'R', 'E'}


### Temperature Set Example
This cell provides an example of using set comprehension to quickly find the number of unique values resulting from an operation (temperature conversion), leveraging the set's property of storing only unique elements.

In [None]:
celsius_temps = [5, 10, 12, 14, 10, 23, 41, 30, 12, 24, 12, 18, 29]
fahrenheit_set = {(temp * 9/5) + 32 for temp in celsius_temps}
print(len(fahrenheit_set))  # Number of unique Fahrenheit temperatures

10


## 1.11 Dictionary Comprehensions
---
- **Real Python**: [Dictionaries in Python](https://realpython.com/python-dicts/)
- **GeeksforGeeks**: [Dictionary Comprehensions in Python](https://www.geeksforgeeks.org/python-dictionary-comprehension/)
- **Python-Course.eu**: [Dictionary Comprehension](https://python-course.eu/python-tutorial/dictionary-comprehensions.php)
- **Programiz**: [Python Dictionary Comprehension](https://www.programiz.com/python-programming/dictionary-comprehension)


Dictionary comprehension in Python provides a concise and elegant way to create dictionaries. It allows for the construction of a new dictionary by iterating over an existing iterable and applying an expression to generate key-value pairs.

**Syntax:**
```Python
new_dict = {key_expression: value_expression for item in iterable if condition}
```
**Components:**
- `{} (Curly braces):` Enclose the dictionary comprehension, indicating that a dictionary is being created.
- `key_expression:value_expression:`Defines how the key and value for each item in the new dictionary are generated. Both can be expressions that operate on item.
- `for item in iterable:` Specifies the iterable (e.g., list, tuple, set, another dictionary) from which items are processed.
- `if condition` (Optional): An optional conditional statement that filters items from the iterable. Only items for which the condition evaluates to True are included in the new dictionary.

Dictionary comprehensions offer a more compact and readable alternative to traditional for loops for creating and manipulating dictionaries, especially when dealing with transformations or filtering operations.

### Basic Dictionary Comprehensions
This cell introduces dictionary comprehensions, a concise way to create dictionaries. They are useful for efficiently building dictionaries where the key-value pairs can be generated from an iterable based on expressions.

In [None]:
# Basic syntax: {key_expr: value_expr for item in iterable}
squares_dict = {x: x**2 for x in range(1, 6)}
print(squares_dict)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# Temperature conversion dictionary
celsius = [0, 10, 20, 30, 40]
temp_dict = {c: (c * 9/5) + 32 for c in celsius}
print(temp_dict)  # {0: 32.0, 10: 50.0, 20: 68.0, 30: 86.0, 40: 104.0}

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
{0: 32.0, 10: 50.0, 20: 68.0, 30: 86.0, 40: 104.0}


### Dictionary Comprehension with Conditions
This cell demonstrates how to use dictionary comprehension with a conditional statement to filter key-value pairs from an existing dictionary based on their values. This is a concise way to create a subset of a dictionary.

In [None]:
# Filter dictionary
original_dict = {"NFLX": 4950, "TREX": 2400, "FIZZ": 1800, "XPO": 1700}
filtered_dict = {k: v for k, v in original_dict.items() if v > 2000}
print(filtered_dict)  # {'NFLX': 4950, 'TREX': 2400}

{'NFLX': 4950, 'TREX': 2400}


### Working with Dictionary Methods
This cell shows how to iterate over a dictionary using `.items()` to access both keys and values simultaneously, and how to get lists of only the keys or only the values using `.keys()` and `.values()`. These methods are useful for accessing different parts of the dictionary's data.

In [None]:
sample_dict = {'a': 1, 'b': 2, 'c': 3}

# Iterate over dictionary
for key, value in sample_dict.items():
    print(f"{key}: {value}")

# Get keys and values
print(list(sample_dict.keys()))     # ['a', 'b', 'c']
print(list(sample_dict.values()))   # [1, 2, 3]

a: 1
b: 2
c: 3
['a', 'b', 'c']
[1, 2, 3]


## 1.12 Lambda Functions
---
- **Real Python**: [Python lambda Functions](https://realpython.com/python-lambda/)
- **GeeksforGeeks**: [Python Lambda, filter, map, reduce](https://www.geeksforgeeks.org/python-lambda-anonymous-functions-filter-map-reduce/)
- **Python-Course.eu**: [Lambda, filter, map and reduce](https://python-course.eu/python-tutorial/lambda-filter-map-reduce.php)
- **Programiz**: [Python Lambda/Anonymous Function](https://www.programiz.com/python-programming/anonymous-function)

[Lambda](https://www.datacamp.com/tutorial/python-lambda-functions) functions in Python are small, anonymous functions defined using the `lambda` keyword. They are also known as anonymous functions because they do not require a formal `def` statement and a name like regular functions.

**Key characteristics of Lambda Functions:**

- `Anonymous:` They are not bound to a name unless explicitly assigned to a variable.
- `Single Expression:` A lambda function can take any number of arguments but can only have one expression. The result of this expression is automatically returned.
- `Concise:` They are typically used for short, simple operations that can be expressed in a single line of code.
- `Inline Usage:` They are frequently used as arguments to higher-order functions like `map()`, `filter()`, and `sorted()`, where a small, temporary function is needed without the overhead of defining a full function.

**Syntax:**
```Python
lambda arguments: expression
```
**When to use Lambda Functions:**

Lambda functions are convenient when:

- A small, simple function is needed for a one-time operation.
- A function is required as an argument to a higher-order function, such as `map()`, `filter()`, or `sorted()`.
- Defining a full def function would be overly verbose for the task at hand.

Limitations:

- `Single Expression:` They cannot contain multiple statements or complex logic.
- `Readability:` Overuse of complex lambda functions can sometimes decrease code readability, especially for those unfamiliar with functional programming concepts.

<div align="center">
<img src="https://drive.google.com/uc?id=1CyhDPSwoXQwsC19Nt-o3LBMgsdNkex9z" height="350" width="550">

### Basic Lambda Syntax
This cell introduces lambda functions, which are small, anonymous functions defined with the `lambda` keyword. They are typically used for short, simple operations where a full `def` function definition is unnecessary.

In [None]:
# Basic lambda: lambda arguments: expression
add = lambda x, y: x + y
print(add(5, 3))  # 8

# Single argument
square = lambda x: x**2
print(square(4))  # 16

# Multiple arguments
multiply = lambda x, y, z: x * y * z
print(multiply(2, 3, 4))  # 24

8
16
24


### Lambda with Built-in Functions
This cell demonstrates the common use of lambda functions with `filter()` and `map()`. Lambda functions are convenient for providing simple, on-the-fly functions for these built-in functions, making the code more concise.
- `map():` Applies a function to each item in an iterable and returns a map object (which can be converted to a list, tuple, etc.).
- `filter():` Constructs an iterator from elements of an iterable for which a function returns true.
- `sorted():` Returns a new sorted list from the items in an iterable. The key argument can be a function (often a lambda) to customize the sorting criteria.

In [None]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Filter with lambda
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(evens)  # [2, 4, 6, 8, 10]

# Map with lambda
squares = list(map(lambda x: x**2, numbers))
print(squares)  # [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

# Multiple outputs with map
cubes_and_squares = list(map(lambda x: (x**2, x**3), [1, 2, 3, 4]))
print(cubes_and_squares)  # [(1, 1), (4, 8), (9, 27), (16, 64)]

[2, 4, 6, 8, 10]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
[(1, 1), (4, 8), (9, 27), (16, 64)]


### Lambda for Sorting
This cell shows how to use a lambda function as the `key` argument for the `sort()` method. This allows you to specify a custom sorting criteria based on a particular element or calculation for each item in the list.

In [None]:
# Sort by second element
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda x: x[1])  # Sort by string value
print(pairs)

# Sort by first element in reverse
pairs.sort(key=lambda x: x[0], reverse=True)
print(pairs)

[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]
[(4, 'four'), (3, 'three'), (2, 'two'), (1, 'one')]


### Higher-Order Functions with Lambda
This cell demonstrates a higher-order function that returns a lambda function. This pattern allows you to create factory functions that generate other functions with specific configurations, showcasing the flexibility of lambda functions.

In [None]:
def make_incrementor(n):
    return lambda x: x + n

increment_by_5 = make_incrementor(5)
print(increment_by_5(10))  # 15

15


## 1.13 Map and Filter Functions
---
- **Real Python**: [Functional Programming in Python: When and How to Use It](https://realpython.com/python-functional-programming/)
- **GeeksforGeeks**: [Python map() function](https://www.geeksforgeeks.org/python-map-function/) & [Python filter() function](https://www.geeksforgeeks.org/filter-in-python/)
- **Python-Course.eu**: [Map, Filter and Reduce](https://python-course.eu/python-tutorial/lambda-filter-map-reduce.php)
- **Programiz**: [Python map()](https://www.programiz.com/python-programming/methods/built-in/map) & [Python filter()](https://www.programiz.com/python-programming/methods/built-in/filter)



### Map Function
The `map()` function in Python is a built-in higher-order function that applies a given function to each item of an iterable (or multiple iterables) and returns a map object, which is an iterator.

**Syntax:**
```Python
map(function, iterable, ...)
```
**Parameters:**
- `function:` The function to be applied to each item of the iterable(s). This function can be a named function or an anonymous lambda function.
- `iterable:` One or more iterables (like lists, tuples, strings, etc.) whose elements will be passed as arguments to the function. If multiple iterables are provided, the `function` must accept a corresponding number of arguments, and `map()` will iterate through them in parallel until one of the iterables is exhausted.

**Return Value:**

The `map()` function returns a map object. This object is an iterator that yields the results of applying the `function` to each item of the iterable(s). To view the results as a list, tuple, or other sequence type, you need to explicitly convert the map object using constructors like `list()`, `tuple()`, etc.

**How it Works:**

`map()` iterates through the provided iterable(s) and, for each element (or corresponding elements from multiple iterables), it calls the specified function with those elements as arguments. The return value of each function call is then yielded by the `map` object.

**Benefits:**

- `Conciseness:` It allows for applying transformations to data in a more compact and readable way compared to traditional for loops.
- `Efficiency:` For large datasets, `map()` can be more memory-efficient as it returns an iterator, processing elements on demand rather than creating an entirely new list in memory at once.
- `Functional Programming:` It aligns with functional programming paradigms by emphasizing the application of functions to data without modifying the original data structures.

<div align="center">
<img src="https://drive.google.com/uc?id=1SLuW2p0RpvZNn8R5hP3d2wC5OE0JS3TL" height="350" width="550">
<div align="center">
<img src="https://drive.google.com/uc?id=1LCuIShrOGSUsTAwL-3QIBcJE40u2Wt2B" height="350" width="550">


In [None]:
# Apply function to each element
numbers = [1, 2, 3, 4, 5]
doubled = list(map(lambda x: x * 2, numbers))
print(doubled)  # [2, 4, 6, 8, 10]

# Map with multiple iterables
list1 = [1, 2, 3]
list2 = [4, 5, 6]
sums = list(map(lambda x, y: x + y, list1, list2))
print(sums)  # [5, 7, 9]

[2, 4, 6, 8, 10]
[5, 7, 9]


### Filter Function
The `filter()` function in Python is a built-in function used to construct an iterator from elements of an iterable for which a function returns true. It offers a concise and functional way to select a subset of data based on a specific condition.

**Syntax:**
```Python
filter(function, iterable)
```

**Arguments:**
- `function:` A function that takes a single argument and returns a boolean value (`True` or `False`). This function determines whether an element should be included in the filtered output. If `None` is passed as the function, the identity function is assumed, and all "falsy" elements (e.g., `0`, `False`, `None`, empty strings, empty lists) in the iterable are skipped.
- `iterable:` An iterable object (e.g., list, tuple, set, string) whose elements are to be filtered.

**Return Value:**

The `filter()` function returns a `filter` object, which is an iterator. This iterator yields only the elements from the input iterable for which the `function` returned a "truthy" value. To view or use the filtered results, the filter object typically needs to be converted into another data structure, such as a list or tuple, using `list()` or `tuple()`.

Example:
```Python
def is_even(number):
    return number % 2 == 0

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(is_even, numbers))
print(even_numbers) # [2, 4, 6, 8, 10]
```
In this example, `filter()` applies the `is_even` function to each number in the `numbers` list. Only the numbers for which `is_even` returns `True` (i.e., the even numbers) are included in the `even_numbers` list. `filter()` is particularly useful for data manipulation and cleaning tasks, allowing for efficient and readable code when selecting elements based on a condition.

In [None]:
# Filter elements based on condition
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(evens)  # [2, 4, 6, 8, 10]

# Filter strings by length
words = ['python', 'java', 'c', 'javascript', 'go']
long_words = list(filter(lambda word: len(word) > 4, words))
print(long_words)  # ['python', 'javascript']

[2, 4, 6, 8, 10]
['python', 'javascript']


## 1.14 Collections Module
---
- **Real Python**: [Python's collections: A Buffet of Specialized Data Types](https://realpython.com/python-collections-module/)
- **GeeksforGeeks**: [Python Collections Module](https://www.geeksforgeeks.org/python-collections-module/)
- **Python.org**: [collections — Container datatypes](https://docs.python.org/3/library/collections.html)
- **PyMOTW**: [collections – Container Data Types](https://pymotw.com/3/collections/)

The `collections` module in Python is a part of the standard library that provides specialized container datatypes. These datatypes extend the functionality of Python's built-in containers like `list`, `dict`, `set`, and `tuple`, offering more efficient and convenient ways to handle specific data organization and manipulation tasks.

**Key classes provided by the `collections` module include:**

- `Counter:` Used for counting hashable objects. It creates a dictionary-like object where keys are elements and values are their counts.
- `defaultdict:` A subclass of dict that provides a default value for a key if it's not present in the dictionary, preventing `KeyError`.
- `deque:` A double-ended queue that supports fast appends and pops from both ends of the sequence, making it efficient for implementing queues and stacks.
- `namedtuple:` A factory function for creating tuple subclasses with named fields. This allows accessing elements by name instead of just by index, improving code readability.
- `OrderedDict:` A dictionary subclass that remembers the order in which its key-value pairs were inserted. (Note: In Python 3.7+, standard `dict` also preserves insertion order, but `OrderedDict` provides additional methods for reordering.)
- `ChainMap:` A class for combining multiple dictionaries into a single view. It allows searching across multiple mappings and updates only the first mapping in the chain.

These specialized containers in the collections module offer optimized solutions for common programming problems, leading to more concise, readable, and efficient code.

### Counter
The `collections.Counter` class in Python is a specialized container data type provided by the `collections` module. It is a subclass of the built-in dict class, designed specifically for counting hashable objects.

**Key Features and Functionality:**

- **Counting Occurrences:** `Counter` is primarily used to count the occurrences of elements within an iterable (like a list, string, or tuple) or to store counts from a mapping (like a dictionary).
- **Dictionary-like Behavior:** As a `dict` subclass, it inherits all dictionary methods and properties, allowing you to access counts using keys, iterate through items, and perform updates.
- **Default Values:** When accessing a key that is not present in a `Counter` object, it returns a default value of 0 instead of raising a `KeyError`, which is convenient for handling missing data.
- **`most_common()` Method:**
It provides a `most_common(n)` method that returns a list of the `n` most common elements and their counts, ordered from most to least common.
- **Arithmetic Operations:** `Counter` objects support arithmetic operations like addition, subtraction, intersection, and union, allowing you to combine or compare counts from different `Counter` instances.
- **Negative Counts:** `Counter` objects can store negative counts, which can be useful in certain scenarios.

In [None]:
from collections import Counter

# Count occurrences
text = "hello world"
char_count = Counter(text)
print(char_count)  # Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})

# Most common elements
print(char_count.most_common(3))  # [('l', 3), ('o', 2), ('h', 1)]

# Counter operations
list1 = ['a', 'b', 'c', 'a', 'b', 'a']
list2 = ['a', 'b', 'b', 'd']
c1 = Counter(list1)
c2 = Counter(list2)

print(c1 + c2)  # Add counters
print(c1 - c2)  # Subtract counters
print(c1 & c2)  # Intersection (minimum counts)
print(c1 | c2)  # Union (maximum counts)

Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})
[('l', 3), ('o', 2), ('h', 1)]
Counter({'a': 4, 'b': 4, 'c': 1, 'd': 1})
Counter({'a': 2, 'c': 1})
Counter({'b': 2, 'a': 1})
Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})


### DefaultDict
The `defaultdict` is a specialized dictionary-like object available in Python's `collections` module. It is a subclass of the built-in `dict` class and offers a convenient way to handle missing keys in a dictionary.

**Key Features of `defaultdict`:**

**Default Value for Missing Keys:** Unlike a standard `dict` which raises a `KeyError` when attempting to access a non-existent key, `defaultdict` automatically creates a new entry for that key and assigns a default value to it.

**`default_factory` Argument:** When initializing a `defaultdict`, you provide a `default_factory` argument. This argument is a callable (like a function, class, or `lambda`) that will be invoked without arguments to produce the default value for any key that is accessed but not yet present in the dictionary.
- **Common `default_factory` examples:**
  - `list`: Creates an empty list as the default value, useful for grouping items.
  - `int`: Creates an integer with a value of `0` as the default, useful for counting occurrences.
  - `set`: Creates an empty set as the default, useful for storing unique items.
  - Custom functions: Can be used to create more complex default values based on specific logic.

**Benefits of using `defaultdict`:**
- **Simplified Code:** Eliminates the need for explicit `if key in dictionary:` checks before adding or modifying values for new keys.
- **Improved Readability:** Makes the code cleaner and more concise, especially when dealing with data aggregation or grouping tasks.
- **Reduced Error Handling:** Prevents `KeyError` exceptions, leading to more robust code.

In [None]:
from collections import defaultdict

# Default dictionary with default values
dd = defaultdict(int)  # Default value is 0
dd['existing'] = 5
print(dd['existing'])     # 5
print(dd['non_existing']) # 0 (default)

# Default dictionary with lambda
dd_lambda = defaultdict(lambda: "Not found")
dd_lambda['key1'] = "Found"
print(dd_lambda['key1'])        # Found
print(dd_lambda['unknown'])     # Not found

# Grouping with defaultdict
from collections import defaultdict
students = [('Alice', 'Math'), ('Bob', 'Physics'), ('Alice', 'Chemistry'), ('Bob', 'Math')]
subjects_by_student = defaultdict(list)

for student, subject in students:
    subjects_by_student[student].append(subject)

print(dict(subjects_by_student))  # {'Alice': ['Math', 'Chemistry'], 'Bob': ['Physics', 'Math']}

5
0
Found
Not found
{'Alice': ['Math', 'Chemistry'], 'Bob': ['Physics', 'Math']}


## 1.15 String Templates
---
- **Real Python**: [Python String Formatting: Available Tools and Their Features](https://realpython.com/python-string-formatting/)
- **GeeksforGeeks**: [Template Class in Python](https://www.geeksforgeeks.org/template-class-in-python/)
- **Python.org**: [string — Common string operations](https://docs.python.org/3/library/string.html#template-strings)
- **Python-Course.eu**: [String Templates](https://python-course.eu/python-tutorial/string-class.php)

Python's string templates, specifically the `Template` class from the `string` module, offer a way to perform string substitutions using a simple, $-based syntax. This method is particularly useful for scenarios requiring user-provided input or for creating templates where safety and ease of use are paramount.

**Key Features and Usage:** Importing the Template class.
```Python
    from string import Template
```
**Creating a Template:** Define a string with placeholders marked by a dollar sign (`$`) followed by a variable name, or `${variable_name}` for clarity or when the variable name is followed by more characters.
```Python
    s = Template('Hello, $name! Your age is $age.')
```
***Substituting Values:** Use the `substitute()` method to replace placeholders with actual values, typically provided as a dictionary.
```Python
    data = {'name': 'Alice', 'age': 30}
    result = s.substitute(data)
    print(result)
    # Output: Hello, Alice! Your age is 30.
```

**Safe Substitution with `safe_substitute()`:** The `safe_substitute()` method handles cases where a placeholder's value is missing in the provided dictionary without raising a `KeyError`. Instead, the placeholder remains in the string.
```Python
    s_safe = Template('Welcome, $user. Your ID is $id.')
    data_safe = {'user': 'Bob'}
    result_safe = s_safe.safe_substitute(data_safe)
    print(result_safe)
    # Output: Welcome, Bob. Your ID is $id.
```

**Advantages of Template Strings:**
- **Simplicity:** The $-based syntax is straightforward and easy to understand.
- **Security:** `Template` strings offer a safer way to handle user-provided input compared to f-strings or `.format()` when dealing with untrusted data, as they do not evaluate arbitrary expressions.
- **Internationalization (i18n):** They are well-suited for internationalization purposes where text needs to be easily translated and adapted.

Note: Python 3.14 introduced a new concept also referred to as "Template Strings" or "T-Strings," which are a generalization of f-strings with a `t` prefix and different evaluation semantics, primarily aimed at addressing security concerns in specific contexts. However, the `Template` class from the `string` module discussed above remains a distinct and widely used feature for string templating in Python.

<div align="center">
<img src="https://drive.google.com/uc?id=18WIsvhOXo7vhq4oXnUly-HhRJJUtlF1T" height="350" width="550">


### Basic Templates
This cell demonstrates using `string.Template` for simple string substitution. It's a safer alternative to `eval()` when dealing with user-provided strings and is suitable for basic templating needs.

In [None]:
from string import Template

# Basic template
template = Template("Hello $name, welcome to $place!")
result = template.substitute(name="Alice", place="Python")
print(result)  # Hello Alice, welcome to Python!

# Template with dictionary
data = {"name": "Bob", "age": 30, "city": "New York"}
template = Template("$name is $age years old and lives in $city")
result = template.substitute(data)
print(result)  # Bob is 30 years old and lives in New York

Hello Alice, welcome to Python!
Bob is 30 years old and lives in New York


### Safe Substitution
This cell shows how to use the `safe_substitute()` method of `string.Template`. This method is useful when you are not certain that all placeholders will be provided, as it avoids raising a `KeyError` and leaves missing placeholders in the output string.

In [None]:
template = Template("Hello $name, you have $count messages")
# safe_substitute won't raise error for missing keys
result = template.safe_substitute(name="Alice")
print(result)  # Hello Alice, you have $count messages

Hello Alice, you have $count messages


## 1.16 Tuples
---
- **Real Python**: [Python's tuple Data Type: A Deep Dive With Examples](https://realpython.com/python-tuples/)
- **GeeksforGeeks**: [Python Tuples](https://www.geeksforgeeks.org/python-tuples/)
- **Programiz**: [Python Tuple](https://www.programiz.com/python-programming/tuple)
- **Python-Course.eu**: [Tuples](https://python-course.eu/python-tutorial/tuples.php)

A Python tuple is an ordered, immutable collection used to store multiple items in a single variable.

**Key characteristics of Python tuples:**

- **Ordered:** The items within a tuple maintain a defined order, which does not change after creation.
- **Immutable:** Once a tuple is created, its elements cannot be modified, added, or removed. This distinguishes tuples from lists, which are mutable.
- **Indexed:** Individual elements can be accessed by their position (index), starting from 0.
- **Allows duplicates:** Tuples can contain duplicate values.
- **Heterogeneous:** Tuples can store items of different data types.

**Creating a Python tuple:**
Tuples are typically created by placing comma-separated items inside parentheses `()`.
```Python
# Creating a tuple with multiple items
my_tuple = (1, "hello", 3.14, True)
print(my_tuple)

# Creating a tuple with a single item (note the comma)
single_item_tuple = (5,)
print(single_item_tuple)
```
**Common uses of tuples:**
- Returning multiple values from a function.
- Representing fixed collections of data that should not be modified, such as coordinates or days of the week.
- As dictionary keys, provided all elements within the tuple are themselves immutable.



### Tuple Basics
This cell covers the basics of creating and using tuples. Tuples are immutable ordered sequences, useful for grouping related data that should not be changed. Tuple unpacking is a convenient way to assign elements to individual variables.

In [None]:
# Tuple creation
coordinates = (10, 20)
colors = ('red', 'green', 'blue')
mixed = (1, 'hello', 3.14, True, 33)
my_tuple = (1, 2, 3, 4, 5)

# Tuple unpacking
x, y = coordinates
print(f"x={x}, y={y}")
a, *b, c = mixed
print(a)
print(b)
print(c)

# Tuple with one element (note the comma)
single = (42,)
print(type(single))

# Accessing elements
print(my_tuple[0])  # Access first element
print(my_tuple[-1]) # Access last element

# Tuple slicing
print(my_tuple[1:3]) # Slice from index 1 to 3 (exclusive)
print(my_tuple[:2])  # Slice from beginning to index 2 (exclusive)
print(my_tuple[3:])  # Slice from index 3 to the end
print(my_tuple[::-1]) # Reverse the tuple

# Tuple concatenation
tuple1 = (1, 2)
tuple2 = (3, 4)
combined_tuple = tuple1 + tuple2
print(combined_tuple)

# Attempting to modify tuple (will raise an error)
# my_tuple[0] = 10 # Uncommenting this line will cause a TypeError

# Attempting to delete an element (will raise an error)
# del my_tuple[0] # Uncommenting this line will cause a TypeError

# Deleting the entire tuple (is possible)
# del my_tuple # Uncommenting this line will delete the variable

x=10, y=20
1
['hello', 3.14, True]
33
<class 'tuple'>


### Tuple Immutability
This cell demonstrates the immutability of tuples. Unlike lists, once a tuple is created, its elements cannot be changed. This property makes tuples suitable for data that needs to remain constant.

In [None]:
point = (3, 4)
# point[0] = 5  # This would raise an error - tuples are immutable

# But you can reassign the variable
point = (5, 6)  # This is allowed

### Named Tuples
This cell introduces `collections.namedtuple`, which provides a way to create tuple subclasses with named fields. This improves code readability by allowing access to elements using descriptive names instead of just integer indices.

In [None]:
from collections import namedtuple

# Create a named tuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 4)
print(p.x, p.y)  # 3 4
print(p[0], p[1])  # 3 4 (still accessible by index)

# Named tuple with methods
Student = namedtuple('Student', ['name', 'age', 'grade'])
student = Student('Alice', 20, 'A')
print(student.name)  # Alice

3 4
3 4
Alice


## 1.17 Advanced List Operations
---
- **GeeksforGeeks**: [Advanced Python List Methods and Techniques](https://www.geeksforgeeks.org/advanced-python-list-methods-and-techniques/)
- **Real Python**: [Working With Lists in Python](https://realpython.com/python-lists-tuples/)
- **Python Tricks**: [Advanced List Operations](https://pythontricks.com/python-list-operations/)

### Finding Elements
This cell shows advanced examples of using list comprehensions to find elements based on specific conditions or patterns. List comprehensions provide a concise and often efficient way to filter or transform lists.

In [None]:
numbers = [1, 3, 5, 7, 9, 11, 13, 15]

# Find numbers containing digit '3'
contains_three = [i for i in range(1, 1000) if '3' in str(i)]
print(contains_three[:10])  # First 10 numbers containing '3'

# Find multiples of 7
multiples_of_seven = [i for i in range(1, 100) if i % 7 == 0]
print(multiples_of_seven)

# Count spaces in string
text = "Hello world this is Python"
space_count = len([char for char in text if char == ' '])
print(space_count)  # 4

[3, 13, 23, 30, 31, 32, 33, 34, 35, 36]
[7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98]
4


### Complex List Comprehensions
This cell demonstrates more complex list comprehensions, including combining them with `zip()` to process elements from multiple lists together and using multiple conditional statements for more specific filtering.

In [None]:
# Nested comprehensions with zip
base = [5, 10, 15, 20, 25]
height = [5, 4, 5, 5, 4]
areas = [0.5 * b * h for b, h in zip(base, height)]
print(areas)  # [12.5, 20.0, 37.5, 50.0, 50.0]

# Multiple conditions
numbers = range(1, 101)
filtered = [x for x in numbers if x % 3 == 0 and x % 5 == 0]
print(filtered)  # Multiples of both 3 and 5

[12.5, 20.0, 37.5, 50.0, 50.0]
[15, 30, 45, 60, 75, 90]


## 1.18 Practical Examples and Patterns
---
- **Real Python**: [Python Practice Problems: Get Ready for Your Next Interview](https://realpython.com/python-practice-problems/)
- **GeeksforGeeks**: [Python Programming Examples](https://www.geeksforgeeks.org/python-programming-examples/)
- **Python-Course.eu**: [Python Examples](https://python-course.eu/python-examples.php)



### Data Processing Patterns
This cell illustrates common data processing patterns using fundamental Python concepts like `zip()` and `defaultdict`. These patterns are useful for tasks such as combining related data from different lists or aggregating data based on categories.

In [None]:
# Process paired data
names = ['Alice', 'Bob', 'Charlie']
scores = [85, 92, 78]

# Create grade report
grade_report = [(name, score, 'Pass' if score >= 80 else 'Fail')
                for name, score in zip(names, scores)]
print(grade_report)

# Group and aggregate
from collections import defaultdict
sales_data = [('Jan', 100), ('Feb', 150), ('Jan', 200), ('Mar', 175), ('Feb', 125)]
monthly_sales = defaultdict(int)

for month, amount in sales_data:
    monthly_sales[month] += amount

print(dict(monthly_sales))  # {'Jan': 300, 'Feb': 275, 'Mar': 175}

[('Alice', 85, 'Pass'), ('Bob', 92, 'Pass'), ('Charlie', 78, 'Fail')]
{'Jan': 300, 'Feb': 275, 'Mar': 175}


### String and List Combinations
This cell shows how to use list comprehension, f-strings, and `zip()` to combine elements from multiple lists and format them into strings. This is a concise way to generate formatted output from multiple data sources.

In [None]:
# Combine letters and numbers
letters = ['a', 'b', 'c']
numbers = [1, 2, 3]
decimals = [0.1, 0.2, 0.3]

combined = [f"{letter}{num}{dec}" for letter, num, dec in zip(letters, numbers, decimals)]
print(combined)  # ['a10.1', 'b20.2', 'c30.3']

['a10.1', 'b20.2', 'c30.3']


## 1.19 Common Patterns and Best Practices
---
- **Real Python**: [Python Code Quality: Tools & Best Practices](https://realpython.com/python-code-quality/)
- **Effective Python**: [Brett Slatkin's Effective Python Blog](https://effectivepython.com/)
- **Python.org**: [PEP 8 -- Style Guide for Python Code](https://pep8.org/)
- **Clean Code Python**: [GitHub - Clean Code concepts adapted for Python](https://github.com/zedr/clean-code-python)



### Efficient Iteration
This cell provides best practices for efficient iteration. Using direct iteration, `enumerate()`, and `zip()` are generally more Pythonic and often more efficient than iterating using `range(len(list))`, especially for large iterables.

In [None]:
# Instead of range(len(list))
items = ['apple', 'banana', 'cherry']

# Good: Direct iteration when you don't need index
for item in items:
    print(item)

# Good: Use enumerate when you need both index and value
for i, item in enumerate(items):
    print(f"{i}: {item}")

# Good: Use zip for multiple lists
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']
for num, letter in zip(list1, list2):
    print(f"{num}: {letter}")

apple
banana
cherry
0: apple
1: banana
2: cherry
1: a
2: b
3: c


### List vs Set vs Dictionary Choice
This cell offers guidance on choosing the appropriate data structure (list, set, or dictionary) based on the requirements of your data and the operations you need to perform. Each structure has different characteristics regarding order, uniqueness, and access methods.

In [None]:
# Use list for ordered data that allows duplicates
scores = [85, 92, 78, 85, 90]
print(scores)

# Use set for unique items
unique_scores = {85, 92, 78, 90}
print(unique_scores)

# Use dictionary for key-value mapping
student_scores = {'Alice': 85, 'Bob': 92, 'Charlie': 78}
print(student_scores)

[85, 92, 78, 85, 90]
{90, 92, 85, 78}
{'Alice': 85, 'Bob': 92, 'Charlie': 78}


### Memory Efficient Operations
This cell explains how to use generators and generator expressions for memory efficiency. They are particularly useful when working with very large datasets, as they generate values on demand rather than storing the entire sequence in memory at once.

In [None]:
# Use generators for large datasets
def squares_generator(n):
    for i in range(n):
        yield i**2

# Generator expression (like list comprehension but memory efficient)
squares_gen = (x**2 for x in range(1000000))  # Won't create all at once

# Convert to list only when needed
first_ten_squares = list(squares_gen)[:10]
print(first_ten_squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


## 1.20 Common Mistakes to Avoid
---
- **Real Python**: [Python Tricks: The Book](https://realpython.com/python-tricks/)
- **GeeksforGeeks**: [Python Programming Mistakes](https://www.geeksforgeeks.org/python-programming-mistakes/)
- **Medium**: [10 Python Mistakes You Need to Avoid](https://towardsdatascience.com/10-python-mistakes-you-need-to-avoid-2bf8a4b5e5c9)



### Mutable Default Arguments
This cell highlights a common pitfall: using mutable default arguments in function definitions. This can lead to unexpected behavior as the default list is shared across all calls. The recommended approach avoids this issue by initializing the list inside the function when no list is provided.

In [None]:
# Bad
def add_item(item, target_list=[]):
    target_list.append(item)
    return target_list

# Good
def add_item(item, target_list=None):
    if target_list is None:
        target_list = []
    target_list.append(item)
    return target_list

### Modifying List While Iterating
This cell explains a common mistake of modifying a list while iterating over it, which can lead to unexpected results or errors. It provides safer alternatives, such as iterating over a copy of the list or using list comprehensions to create a new list with the desired elements.

In [None]:
numbers = [1, 2, 3, 4, 5]

# Bad - can cause issues
# for num in numbers:
#     if num % 2 == 0:
#         numbers.remove(num)

# Good - iterate over a copy
for num in numbers[:]:  # Slice creates a copy
    if num % 2 == 0:
        numbers.remove(num)

# Better - use list comprehension
numbers = [num for num in numbers if num % 2 != 0]

### Understanding Variable Scope
This cell demonstrates the concept of variable scope in comprehensions. It's important to understand that variables defined within a comprehension have a local scope and do not affect variables with the same name outside the comprehension.

In [None]:
# Be careful with variable scope in comprehensions
x = 10
squares = [x**2 for x in range(5)]  # x in comprehension is local
print(x)  # Still 10, not affected by comprehension

10


---

## Quick Reference Summary

### Essential Functions
- `len()`, `sum()`, `min()`, `max()`, `any()`, `all()`
- `range()`, `enumerate()`, `zip()`
- `map()`, `filter()`, `sorted()`

### String Methods
- `.format()`, f-strings, `.join()`, `.split()`
- `.strip()`, `.replace()`, `.upper()`, `.lower()`

### List Methods
- `.append()`, `.extend()`, `.insert()`, `.remove()`, `.pop()`
- `.sort()`, `.reverse()`, `.count()`, `.index()`

### Dictionary Methods
- `.keys()`, `.values()`, `.items()`, `.get()`, `.update()`

### Comprehension Syntax
- List: `[expr for item in iterable if condition]`
- Set: `{expr for item in iterable if condition}`
- Dict: `{key_expr: val_expr for item in iterable if condition}`




## **Starting The Advanced From Here**

## 1.21 Object-Oriented Programming (OOP)
---
- **Real Python**: [Object-Oriented Programming (OOP) in Python 3](https://realpython.com/python3-object-oriented-programming/)
- **GeeksforGeeks**: [Python OOPs Concepts](https://www.geeksforgeeks.org/python-oops-concepts/)
- **Programiz**: [Python Object Oriented Programming](https://www.programiz.com/python-programming/object-oriented-programming)
- **Python-Course.eu**: [Object Oriented Programming](https://python-course.eu/oop/)
- **Automate the Boring Stuff**: [Chapter 15 - Object-Oriented Programming](https://automatetheboringstuff.com/2e/chapter15/)

Object-Oriented Programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code that operates on the data. In the context of data science and machine learning, OOP can be used to structure code, create reusable components, and manage complexity, especially when building larger projects or custom models.

**Key OOP concepts relevant to data science include:**

- **Classes:** Blueprints for creating objects. They define the data (attributes) and the functions (methods) that objects of that class will have.
- **Objects:** Instances of a class. When a class is defined, no memory is allocated until an object is created.
- **Attributes:** Variables that belong to an object. They represent the state or characteristics of an object.
- **Methods:** Functions that belong to an object. They define the behavior or actions that an object can perform.
- **Inheritance:** A mechanism that allows a new class (subclass or derived class) to inherit attributes and methods from an existing class (superclass or base class). This promotes code reusability.
- **Encapsulation:** The bundling of data (attributes) and methods that operate on the data into a single unit (a class). This helps in hiding the internal state of an object and protecting it from external access.

In [11]:
# Define a base class for a generic model
class BaseModel:
    """
    A base class for machine learning models.
    Demonstrates attributes and methods.
    """
    def __init__(self, model_name, version=1.0):
        """
        Constructor to initialize the model with a name and version.
        Demonstrates attributes.
        """
        self.model_name = model_name  # Public attribute
        self._version = version       # Protected attribute (convention)
        self.__status = "Initialized" # Private attribute (name mangling)
        print(f"[{self.model_name}] Model initialized (Version: {self._version}).")

    def train(self, data):
        """
        A generic training method.
        Demonstrates a method.
        """
        print(f"[{self.model_name}] Training started with data...")
        # Simulate training process
        import time
        time.sleep(0.5)
        self.__status = "Trained"
        print(f"[{self.model_name}] Training finished. Status: {self.__status}")

    def predict(self, input_data):
        """
        A generic prediction method.
        Demonstrates a method and returning a value.
        """
        import time # Import time module here
        if self.__status != "Trained":
            print(f"[{self.model_name}] Warning: Model not trained. Returning dummy prediction.")
            return None
        print(f"[{self.model_name}] Making prediction for input data...")
        # Simulate prediction process
        time.sleep(0.2)
        prediction = f"Prediction from {self.model_name} for {input_data}"
        return prediction

    def get_status(self):
        """
        Demonstrates accessing a private attribute via a public method (encapsulation).
        """
        return self.__status

    # Example of encapsulation: a method to update a protected attribute safely
    def update_version(self, new_version):
        """
        Updates the model version if the new version is greater than the current one.
        """
        if new_version > self._version:
            print(f"[{self.model_name}] Updating version from {self._version} to {new_version}")
            self._version = new_version
        else:
            print(f"[{self.model_name}] New version {new_version} is not greater than current version {self._version}. Version not updated.")

# Define a subclass inheriting from BaseModel
class RegressionModel(BaseModel):
    """
    A subclass for regression models, inheriting from BaseModel.
    Demonstrates inheritance and adding specialized attributes/methods.
    """
    def __init__(self, model_name, version=1.0, learning_rate=0.01):
        """
        Constructor for RegressionModel, calling the parent constructor.
        """
        # Call the constructor of the parent class
        super().__init__(model_name, version)
        self.learning_rate = learning_rate # Specialized attribute
        print(f"[{self.model_name}] Regression model specific attribute: learning_rate={self.learning_rate}")

    def evaluate(self, test_data, actual_values):
        """
        A specialized evaluation method for regression models.
        Demonstrates overriding or adding methods.
        """
        print(f"[{self.model_name}] Evaluating regression model...")
        # Simulate evaluation process (e.g., calculate RMSE)
        import random
        rmse = random.uniform(0.5, 5.0)
        print(f"[{self.model_name}] Evaluation complete. Simulated RMSE: {rmse:.2f}")
        return rmse

    # Regression models might have a different training process
    # Python doesn't have explicit @Override, but you can redefine methods
    def train(self, data, epochs=100):
         """
         Overridden training method for regression models with epochs.
         """
         print(f"[{self.model_name}] Regression model training started for {epochs} epochs...")
         # Simulate regression specific training
         import time
         time.sleep(epochs * 0.01)
         # Accessing parent's private attribute using name mangling (less common, shows it's possible)
         # self._BaseModel__status = "Trained (Regression Specific)"
         # More standard practice is to use a public method to change status if available
         # Or manage status within the subclass if it's significantly different
         print(f"[{self.model_name}] Regression training finished after {epochs} epochs.")


# Define another subclass for a different type of model
class ClassificationModel(BaseModel):
    """
    A subclass for classification models.
    """
    def __init__(self, model_name, version=1.0, activation_function="sigmoid"):
        super().__init__(model_name, version)
        self.activation_function = activation_function
        print(f"[{self.model_name}] Classification model specific attribute: activation_function={self.activation_function}")

    def evaluate(self, test_data, actual_labels):
        """
        A specialized evaluation method for classification models.
        """
        print(f"[{self.model_name}] Evaluating classification model...")
        # Simulate evaluation process (e.g., calculate accuracy)
        import random
        accuracy = random.uniform(0.7, 0.95)
        print(f"[{self.model_name}] Evaluation complete. Simulated Accuracy: {accuracy:.2f}")
        return accuracy

    # Classification models might have a different training process
    def train(self, data, epochs=200):
        """
        Overridden training method for classification models with epochs.
        """
        print(f"[{self.model_name}] Classification model training started for {epochs} epochs...")
        # Simulate classification specific training
        import time
        time.sleep(epochs * 0.005) # Shorter sleep for more epochs
        # Update status via parent's method if needed, or manage here.
        # Since BaseModel has a train method that sets status, we can call that
        # or set it directly if we understand the parent's implementation.
        # For simplicity here, we'll just print completion.
        # If we wanted to use the parent's status, we might do:
        # super().train(data) # This would call the BaseModel train method first
        print(f"[{self.model_name}] Classification training finished after {epochs} epochs.")


# --- Demonstrating OOP Concepts ---

print("--- Creating Objects ---")
# Create objects (instances of the classes)
generic_model = BaseModel("GenericModel")
linear_reg = RegressionModel("LinearRegression", learning_rate=0.005)
logistic_reg = ClassificationModel("LogisticRegression", activation_function="relu")

print("\n--- Accessing Attributes and Calling Methods ---")
# Accessing attributes
print(f"Generic model name: {generic_model.model_name}")
# print(f"Generic model version: {generic_model._version}") # Accessing protected (by convention)
# print(f"Generic model status: {generic_model.__status}") # Direct access to private will fail

# Accessing private attribute via a public method (encapsulation)
print(f"Generic model status (via method): {generic_model.get_status()}")

# Calling methods
dummy_data = [1, 2, 3, 4, 5]
generic_model.train(dummy_data)
prediction = generic_model.predict([10])
print(f"Generic model prediction: {prediction}")

print("-" * 20)

# Accessing subclass specific attributes
print(f"Linear Regression learning rate: {linear_reg.learning_rate}")
print(f"Logistic Regression activation function: {logistic_reg.activation_function}")

# Calling subclass specific methods
linear_reg.evaluate(dummy_data, [1, 2, 3, 4, 5]) # Dummy evaluation data
logistic_reg.evaluate(dummy_data, [0, 1, 0, 1, 1]) # Dummy evaluation data

print("-" * 20)
# Demonstrating overridden method
linear_reg.train(dummy_data, epochs=50) # Calls the overridden train method in RegressionModel
logistic_reg.train(dummy_data, epochs=200) # Calls the overridden train method in ClassificationModel

print("-" * 20)
# Accessing status again to see the change after training (using the public method)
print(f"Generic model status after training: {generic_model.get_status()}")
print(f"Linear Regression status after training: {linear_reg.get_status()}") # Calls the inherited get_status
print(f"Logistic Regression status after training: {logistic_reg.get_status()}") # Calls the inherited get_status


print("\n--- Encapsulation Example (Updating Version) ---")
generic_model.update_version(1.1)
generic_model.update_version(1.0) # Won't update

print("-" * 20)
print("\n--- Demonstrating Name Mangling (Accessing 'private' attribute) ---")
# While you shouldn't normally do this, Python "mangles" private attribute names
# This makes direct access harder but not impossible.
# The name is changed to _ClassName__attribute_name
try:
    print(f"Accessing mangled private status of generic_model: {generic_model._BaseModel__status}")
    print(f"Accessing mangled private status of linear_reg: {linear_reg._BaseModel__status}")
    print(f"Accessing mangled private status of logistic_reg: {logistic_reg._BaseModel__status}") # Add this line
except AttributeError as e:
    print(f"Could not access private attribute directly: {e}")

--- Creating Objects ---
[GenericModel] Model initialized (Version: 1.0).
[LinearRegression] Model initialized (Version: 1.0).
[LinearRegression] Regression model specific attribute: learning_rate=0.005
[LogisticRegression] Model initialized (Version: 1.0).
[LogisticRegression] Classification model specific attribute: activation_function=relu

--- Accessing Attributes and Calling Methods ---
Generic model name: GenericModel
Generic model status (via method): Initialized
[GenericModel] Training started with data...
[GenericModel] Training finished. Status: Trained
[GenericModel] Making prediction for input data...
Generic model prediction: Prediction from GenericModel for [10]
--------------------
Linear Regression learning rate: 0.005
Logistic Regression activation function: relu
[LinearRegression] Evaluating regression model...
[LinearRegression] Evaluation complete. Simulated RMSE: 2.37
[LogisticRegression] Evaluating classification model...
[LogisticRegression] Evaluation complete. 

## 1.22 File Handling
---
- **Real Python**: [Working With Files in Python](https://realpython.com/working-with-files-in-python/)
- **GeeksforGeeks**: [File Handling in Python](https://www.geeksforgeeks.org/file-handling-python/)
- **Programiz**: [File I/O in Python](https://www.programiz.com/python-programming/file-io)

File handling is a fundamental aspect of data science workflows, as it involves reading data from various sources and saving processed data or model outputs to files. Python provides built-in functions and modules to interact with files, allowing you to work with different file formats commonly used in data science.

**Why File Handling is Important in Data Science:**

- **Data Loading:** Data for analysis and modeling is often stored in files (e.g., datasets in CSV format).
- **Data Saving:** Results of analysis, cleaned datasets, or trained model parameters need to be saved for future use or sharing.
- **Configuration:** Reading configuration settings from files.
- **Logging:** Writing logs during program execution.

**Common File Formats in Data Science:**

- **CSV (Comma Separated Values):** A simple and widely used format for tabular data. Each line in the file represents a row, and values within a row are separated by commas (or other delimiters).
- **JSON (JavaScript Object Notation):** A lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is commonly used for storing nested or semi-structured data.
- **Pickle:** A Python-specific format for serializing and deserializing Python object structures. It allows you to save complex Python objects (like lists, dictionaries, or even trained machine learning models) to a file and load them back later.

In [None]:
# Writing to a text file
file_content = "This is the first line.\nThis is the second line."
with open("example.txt", "w") as f:
    f.write(file_content)

print("Successfully wrote to example.txt")

Successfully wrote to example.txt


In [None]:
# Reading from a text file
with open("example.txt", "r") as f:
    read_content = f.read()

print("Content of example.txt:")
print(read_content)

Content of example.txt:
This is the first line.
This is the second line.


### Handling CSV Files
CSV (Comma Separated Values) is a simple and common file format for storing tabular data. It's widely used because it can be easily read and written by various software, including spreadsheet programs and data analysis tools.

Python's built-in csv module provides convenient functions for working with CSV files, handling issues like quoted fields and different delimiters.

- **Real Python**: [Working with CSV Files in Python](https://realpython.com/python-csv/)
- **GeeksforGeeks**: [Working with csv files in Python](https://www.geeksforgeeks.org/working-csv-files-python/)
- **Python.org**: [csv — CSV File Reading and Writing](https://docs.python.org/3/library/csv.html)

In [None]:
import csv

# Data to write to CSV
csv_data = [
    ['Name', 'Age', 'City'],
    ['Alice', 30, 'New York'],
    ['Bob', 25, 'Los Angeles'],
    ['Charlie', 35, 'Chicago']
]

# Writing to a CSV file
with open("example.csv", "w", newline="") as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerows(csv_data)

print("Successfully wrote to example.csv")

Successfully wrote to example.csv


In [None]:
import csv

# Reading from a CSV file
with open("example.csv", "r") as csvfile:
    csv_reader = csv.reader(csvfile)
    read_csv_data = list(csv_reader)

print("Content of example.csv:")
for row in read_csv_data:
    print(row)

Content of example.csv:
['Name', 'Age', 'City']
['Alice', '30', 'New York']
['Bob', '25', 'Los Angeles']
['Charlie', '35', 'Chicago']


### Handling JSON Files
JSON (JavaScript Object Notation) is a popular lightweight data interchange format. It's human-readable and easy for machines to parse, making it ideal for transmitting data between a server and web application, or for storing configuration files and semi-structured data.

Python's built-in json module provides methods for working with JSON data, converting Python objects to JSON strings (dump/dumps) and JSON strings to Python objects (load/loads).

- **Real Python**: [Working With JSON Data in Python](https://realpython.com/python-json/)
- **GeeksforGeeks**: [Working With JSON Data in Python](https://www.geeksforgeeks.org/working-with-json-data-in-python/)
- **Python.org**: [json — JSON encoder and decoder](https://docs.python.org/3/library/json.html)

In [None]:
import json

# Data to write to JSON
json_data = {
    "name": "Data Science Project",
    "version": 1.0,
    "settings": {
        "model": "linear_regression",
        "parameters": {"alpha": 0.1, "iterations": 1000}
    },
    "data_files": ["train.csv", "test.csv"]
}

# Writing to a JSON file
with open("example.json", "w") as jsonfile:
    json.dump(json_data, jsonfile, indent=4) # Use indent for pretty printing

print("Successfully wrote to example.json")

Successfully wrote to example.json


### Handling Pickle Files
Pickle is a Python-specific module used for serializing and deserializing Python object structures. Serialization (pickling) converts a Python object into a byte stream, and deserialization (unpickling) reconstructs the original object from the byte stream.

Pickle is particularly useful in data science for:

- **Saving and loading trained machine learning models:** Allows you to save a model after training and load it later for making predictions without retraining.
- **Storing large data structures:** Efficiently saving and loading complex Python objects like lists of lists, dictionaries, or custom class instances.
- **Persisting session state:** Saving the state of a Python program or analysis session.

**Caution:** The pickle module is **not secure** against maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

- **Real Python**: [Object Serialization in Python](https://realpython.com/python-pickle-module/)
- **GeeksforGeeks**: [Python Pickle Module](https://www.geeksforgeeks.org/python-pickle-module/)
- **Python.org**: [pickle — Python object serialization](https://docs.python.org/3/library/pickle.html)

In [None]:
import pickle

# Create a Python object to pickle
data_to_pickle = {
    "model_name": "linear_regression",
    "parameters": {"alpha": 0.01, "iterations": 500},
    "metrics": {"rmse": 1.5, "r_squared": 0.9},
    "data_summary": [1, 2, 3, 4, 5]
}

# Define the filename
pickle_filename = "example_data.pkl"

# Write the object to a pickle file in binary write mode
with open(pickle_filename, 'wb') as f:
    pickle.dump(data_to_pickle, f)

print(f"Successfully pickled data to {pickle_filename}")

Successfully pickled data to example_data.pkl


In [None]:
# Read the pickled object back from the file in binary read mode
with open(pickle_filename, 'rb') as f:
    loaded_data = pickle.load(f)

# Print the loaded object
print("Successfully unpickled data:")
print(loaded_data)

Successfully unpickled data:
{'model_name': 'linear_regression', 'parameters': {'alpha': 0.01, 'iterations': 500}, 'metrics': {'rmse': 1.5, 'r_squared': 0.9}, 'data_summary': [1, 2, 3, 4, 5]}


## 1.23 Error Handling and Debugging
---
- **Real Python**: [Python Exceptions Handling](https://realpython.com/python-exceptions/)
- **GeeksforGeeks**: [Python Exception Handling](https://www.geeksforgeeks.org/python-exception-handling/)
- **Programiz**: [Python Exception Handling](https://www.programiz.com/python-programming/exception-handling)
- **Python.org**: [Errors and Exceptions](https://docs.python.org/3/tutorial/errors.html)
- **Real Python**: [Python Debugging With Pdb](https://realpython.com/python-debugging-pdb/)

In data science, where code often interacts with real-world data that can be messy or unpredictable, anticipating and handling errors gracefully is crucial for writing robust and reliable applications. Error handling prevents your programs from crashing unexpectedly and allows you to provide informative feedback or take alternative actions when issues arise.

Debugging is the process of identifying and resolving errors (bugs) in your code. It's an essential skill for any developer to ensure their programs function correctly.

**Exception Handling (try-except)**
Python uses exceptions to signal errors. When an error occurs, Python raises an exception. You can catch and handle these exceptions using try, except, else, and finally blocks.

- **try:** The block of code that might raise an exception.
- **except:** The block of code that is executed if a specific type of exception occurs in the try block.
- **else:** (Optional) The block of code that is executed if the try block finishes without raising any exceptions.
- **finally:** (Optional) The block of code that is always executed, regardless of whether an exception occurred or not. This is often used for cleanup operations (like closing files).

In [None]:
# Example: Handling ZeroDivisionError
def divide_numbers(a, b):
    try:
        result = a / b
        print(f"The result of division is: {result}")
    except ZeroDivisionError:
        print("Error: Cannot divide by zero!")
    except TypeError:
        print("Error: Invalid input types for division.")
    except Exception as e: # Catching other potential errors
        print(f"An unexpected error occurred: {e}")

divide_numbers(10, 2)
divide_numbers(10, 0)
divide_numbers(10, 'a')

The result of division is: 5.0
Error: Cannot divide by zero!
Error: Invalid input types for division.


In [None]:
# Example: Using try, except, else, and finally
def process_file(filename):
    try:
        f = open(filename, 'r')
        content = f.read()
        print("File read successfully.")
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
    except Exception as e:
        print(f"An error occurred while reading the file: {e}")
    else:
        print("No exceptions occurred.")
        # Process the content if file reading was successful
        print("File content:")
        print(content)
    finally:
        if 'f' in locals() and not f.closed:
            f.close()
            print("File closed in finally block.")
        elif 'f' not in locals():
             print("File variable was not created due to error.")
        else:
             print("File was already closed.")


# Test cases
process_file("non_existent_file.txt") # This will raise FileNotFoundError
print("-" * 20)
# Create a dummy file for the successful case
with open("existing_file.txt", "w") as f:
    f.write("Hello, world!")
process_file("existing_file.txt") # This will succeed

Error: File 'non_existent_file.txt' not found.
File variable was not created due to error.
--------------------
File read successfully.
No exceptions occurred.
File content:
Hello, world!
File closed in finally block.


In [None]:
# Example: Raising an exception
def process_positive_number(number):
    if number <= 0:
        raise ValueError("Input must be a positive number")
    print(f"Processing positive number: {number}")

try:
    process_positive_number(10)
    process_positive_number(-5) # This will raise a ValueError
except ValueError as ve:
    print(f"Caught exception: {ve}")

print("\n### Debugging Basics")
print("Debugging is the process of finding and fixing errors in your code.")
print("Basic techniques include:")
print("- Using `print()` statements to inspect variable values and execution flow.")
print("- Using a debugger (like the one integrated in IDEs or Python's built-in `pdb`) to step through code execution line by line, inspect variables, and set breakpoints.")
print("\nExample using `pdb` (uncomment to run):")
print("# import pdb")
print("# def buggy_function(x, y):")
print("#     result = x + y * 2")
print("#     pdb.set_trace() # Set a breakpoint here")
print("#     final_result = result / 0 # This will cause an error")
print("#     return final_result")
print("# buggy_function(5, 3)")

Processing positive number: 10
Caught exception: Input must be a positive number

### Debugging Basics
Debugging is the process of finding and fixing errors in your code.
Basic techniques include:
- Using `print()` statements to inspect variable values and execution flow.
- Using a debugger (like the one integrated in IDEs or Python's built-in `pdb`) to step through code execution line by line, inspect variables, and set breakpoints.

Example using `pdb` (uncomment to run):
# import pdb
# def buggy_function(x, y):
#     result = x + y * 2
#     pdb.set_trace() # Set a breakpoint here
#     final_result = result / 0 # This will cause an error
#     return final_result
# buggy_function(5, 3)


## 1.24 Virtual Environments
---
- **Real Python**: [Python Virtual Environments: A Primer](https://realpython.com/python-virtual-environments-a-primer/)
- **GeeksforGeeks**: [Python Virtual Environment](https://www.geeksforgeeks.org/python-virtual-environment/)
- **Python.org**: [venv — Creation of virtual environments](https://docs.python.org/3/library/venv.html)
- **Python Guide**: [Virtual Environments](https://docs.python-guide.org/dev/virtualenvs/)

Virtual environments are isolated Python environments that allow you to manage dependencies for your projects separately. This is crucial in data science, machine learning, and AI development for several reasons:

- **Dependency Management:** Different projects may require different versions of the same library. Virtual environments prevent conflicts by keeping project dependencies isolated.
- **Reproducibility:** By specifying the exact versions of libraries used in a virtual environment, you can ensure that your project can be reproduced by others (or yourself in the future) with the same dependencies.
- **Cleanliness:** Keeps your global Python installation clean and avoids cluttering it with project-specific packages.
- **Avoiding Conflicts:** Prevents conflicts between system-wide packages and project-specific packages.

Using virtual environments is a best practice for any Python development, especially when working on multiple projects or collaborating with others.

**1.Creating a Virtual Environment**

The recommended way to create virtual environments in Python 3.3+ is using the built-in venv module. Open your terminal or command prompt and navigate to your project directory. Then run the following command:

`python -m venv myenv`
- `python`: Invokes the Python interpreter.
- `-m venv`: Tells Python to run the venv module.
- `myenv`: The name you want to give to your virtual environment. You can choose any name, but venv or .venv are common conventions.

This command creates a new directory (e.g., myenv) in your project folder containing a minimal Python environment, including a copy of the Python interpreter and the pip package installer.

**2.Activating a Virtual Environment**

Before you can install packages into a virtual environment or run scripts within it, you need to activate it. The activation command is different depending on your operating system and shell.

- **On macOS and Linux:**
`source myenv/bin/activate`
- **On Windows (Command Prompt):**
`myenv\Scripts\activate.bat`
- **On Windows (PowerShell):**
`myenv\Scripts\Activate.ps1`

Once activated, your terminal prompt will usually change to indicate the name of the active virtual environment (e.g., `(myenv) your_username@your_computer:~/your_project$`).

**3.Installing Packages in a Virtual Environment**

With the virtual environment activated, you can use pip to install packages. These packages will be installed only within the active virtual environment, not in your global Python installation.

`pip install pandas scikit-learn numpy`

You can install multiple packages at once by listing them after pip install. To save the list of installed packages and their versions for reproducibility, you can use:

`pip freeze > requirements.txt`

And to install packages from a `requirements.txt` file:

`pip install -r requirements.txt`

**4.Deactivating a Virtual Environment**

When you are finished working in a virtual environment, you can deactivate it. This returns your terminal to using the global Python environment.
`deactivate`

The prompt will return to its normal appearance.

## 1.25 More Advanced Data Structures
---
- **Real Python**: [Python Data Structures](https://realpython.com/python-data-structures/)
- **GeeksforGeeks**: [Python Data Structures](https://www.geeksforgeeks.org/python-data-structures/)
- **Python-Course.eu**: [Data Structures](https://python-course.eu/python-tutorial/data-structures.php)
- **Python.org**: [Data Structures](https://docs.python.org/3/tutorial/datastructures.html)

The `collections` module in Python offers several specialized container datatypes that go beyond the basic list, dict, and set. These can be particularly useful in data science for specific tasks requiring efficient data handling or preservation of order.

**collections.deque**

`collections.deque` (pronounced "deck") is a double-ended queue. It's designed for efficient appending and popping from both ends of the sequence. While lists support similar operations, they are less efficient for insertions and deletions at the beginning because all other elements need to be shifted. Deques are particularly useful for implementing queues and stacks, and for maintaining a fixed-size history of recent items.

**Why use deque in Data Science?**

- **Efficient History Tracking:** Maintaining a limited history of recent data points or model outputs.
- **Implementing Queues/Stacks:** For algorithms or data processing pipelines that require queue or stack-like behavior.
- **Logging or Buffering:** Efficiently adding and removing items from a buffer.

**collections.OrderedDict**

`collections.OrderedDict` is a dictionary subclass that remembers the order in which its key-value pairs were inserted. While standard dictionaries in Python 3.7+ also preserve insertion order, OrderedDict provides additional methods for reordering and equality checks that consider order. It's useful when the sequence of items is semantically important.

**Why use OrderedDict in Data Science?**

- **Configuration Loading:** Loading configuration settings where the order of parameters matters.
- **Processing Data in Sequence:** When data needs to be processed or presented in the exact order it was received or defined.
- **Maintaining Feature Order:** In some machine learning contexts, the order of features might be relevant.

In [None]:
from collections import deque

# Example: Maintaining a limited history of recent data points
# Create a deque with a maximum size
recent_data_points = deque(maxlen=5)

print("Adding data points...")
for i in range(10):
    recent_data_points.append(f"data_point_{i+1}")
    print(f"Deque after adding data_point_{i+1}: {list(recent_data_points)}")

print("\nAppending to the left...")
recent_data_points.appendleft("urgent_data")
print(f"Deque after appendleft: {list(recent_data_points)}") # Note that the oldest element is removed

print("\nPopping from the right...")
popped_right = recent_data_points.pop()
print(f"Popped from right: {popped_right}")
print(f"Deque after pop: {list(recent_data_points)}")

print("\nPopping from the left...")
popped_left = recent_data_points.popleft()
print(f"Popped from left: {popped_left}")
print(f"Deque after popleft: {list(recent_data_points)}")

# Example: Using deque as a queue (FIFO)
data_queue = deque()
data_queue.append("task1")
data_queue.append("task2")
data_queue.append("task3")
print(f"\nQueue: {list(data_queue)}")
print(f"Processing: {data_queue.popleft()}")
print(f"Processing: {data_queue.popleft()}")
print(f"Queue after processing: {list(data_queue)}")

# Example: Using deque as a stack (LIFO)
data_stack = deque()
data_stack.append("step1")
data_stack.append("step2")
data_stack.append("step3")
print(f"\nStack: {list(data_stack)}")
print(f"Undoing: {data_stack.pop()}")
print(f"Undoing: {data_stack.pop()}")
print(f"Stack after undoing: {list(data_stack)}")

Adding data points...
Deque after adding data_point_1: ['data_point_1']
Deque after adding data_point_2: ['data_point_1', 'data_point_2']
Deque after adding data_point_3: ['data_point_1', 'data_point_2', 'data_point_3']
Deque after adding data_point_4: ['data_point_1', 'data_point_2', 'data_point_3', 'data_point_4']
Deque after adding data_point_5: ['data_point_1', 'data_point_2', 'data_point_3', 'data_point_4', 'data_point_5']
Deque after adding data_point_6: ['data_point_2', 'data_point_3', 'data_point_4', 'data_point_5', 'data_point_6']
Deque after adding data_point_7: ['data_point_3', 'data_point_4', 'data_point_5', 'data_point_6', 'data_point_7']
Deque after adding data_point_8: ['data_point_4', 'data_point_5', 'data_point_6', 'data_point_7', 'data_point_8']
Deque after adding data_point_9: ['data_point_5', 'data_point_6', 'data_point_7', 'data_point_8', 'data_point_9']
Deque after adding data_point_10: ['data_point_6', 'data_point_7', 'data_point_8', 'data_point_9', 'data_point_1

In [None]:
from collections import OrderedDict

# Example: Processing data in a specific sequence
# Using OrderedDict to maintain insertion order
ordered_settings = OrderedDict()
ordered_settings['model'] = 'random_forest'
ordered_settings['n_estimators'] = 100
ordered_settings['random_state'] = 42
ordered_settings['max_depth'] = 10

print("OrderedDict (Insertion order preserved):")
for key, value in ordered_settings.items():
    print(f"{key}: {value}")

# Comparing with a standard dictionary (order is also preserved in Python 3.7+,
# but OrderedDict has specific methods and guarantees for older versions or
# when order-based equality/reordering is needed)
print("\nStandard Dictionary (Insertion order preserved in Python 3.7+):")
standard_settings = {}
standard_settings['model'] = 'random_forest'
standard_settings['n_estimators'] = 100
standard_settings['random_state'] = 42
standard_settings['max_depth'] = 10

for key, value in standard_settings.items():
    print(f"{key}: {value}")

# Example of OrderedDict specific method (move_to_end)
print("\nMoving 'model' to the end in OrderedDict:")
ordered_settings.move_to_end('model')
for key, value in ordered_settings.items():
    print(f"{key}: {value}")

# Note: Standard dictionaries do not have a move_to_end method.
# Attempting to move an element in a standard dict would typically require
# rebuilding the dictionary or using a different approach.

OrderedDict (Insertion order preserved):
model: random_forest
n_estimators: 100
random_state: 42
max_depth: 10

Standard Dictionary (Insertion order preserved in Python 3.7+):
model: random_forest
n_estimators: 100
random_state: 42
max_depth: 10

Moving 'model' to the end in OrderedDict:
n_estimators: 100
random_state: 42
max_depth: 10
model: random_forest


## 1.26 Working with Dates and Times
---
- **Real Python**: [Working with Dates and Times in Python](https://realpython.com/python-datetime/)
- **GeeksforGeeks**: [Python DateTime Module](https://www.geeksforgeeks.org/python-datetime-module/)
- **Programiz**: [Python datetime](https://www.programiz.com/python-programming/datetime)
- **Python.org**: [datetime — Basic date and time types](https://docs.python.org/3/library/datetime.html)
- **PyMOTW**: [datetime – Date and Time Value Manipulation](https://pymotw.com/3/datetime/)

Handling dates and times is a fundamental aspect of working with time-series data in data science. Python's built-in datetime module provides classes for manipulating dates, times, and time intervals.

**Importance for Time-Series Data:**

- **Indexing and Filtering:** Time-series data is often indexed by time. You need to filter or select data based on date and time ranges.
- **Resampling and Aggregation:** Aggregating data over different time frequencies (e.g., daily to monthly).
- **Feature Engineering:** Extracting features from date/time information (e.g., day of the week, month, year).
Analyzing Trends and Seasonality: Identifying patterns and cyclic variations over time.

**Key classes in the datetime module:**

- **`datetime`:** Represents a combination of a date and a time.
- **`date`:** Represents a date (year, month, day).
- **`time`:** Represents a time (hour, minute, second, microsecond).
- **`timedelta`:** Represents a duration or difference between two datetime or date objects.

In [None]:
from datetime import datetime

# Get the current date and time
now = datetime.now()
print(f"Current date and time: {now}")

# Create a specific datetime object
specific_dt = datetime(2023, 10, 26, 14, 30, 0)
print(f"Specific datetime: {specific_dt}")

# Access components
print(f"Year: {specific_dt.year}")
print(f"Month: {specific_dt.month}")
print(f"Day: {specific_dt.day}")
print(f"Hour: {specific_dt.hour}")
print(f"Minute: {specific_dt.minute}")
print(f"Second: {specific_dt.second}")

# Format datetime objects as strings (strftime)
formatted_date = now.strftime("%Y-%m-%d %H:%M:%S")
print(f"Formatted date (YYYY-MM-DD HH:MM:SS): {formatted_date}")

# Parse strings into datetime objects (strptime)
date_string = "2024-01-15 09:00:00"
parsed_dt = datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
print(f"Parsed datetime from string: {parsed_dt}")

Current date and time: 2025-07-24 19:45:28.730708
Specific datetime: 2023-10-26 14:30:00
Year: 2023
Month: 10
Day: 26
Hour: 14
Minute: 30
Second: 0
Formatted date (YYYY-MM-DD HH:MM:SS): 2025-07-24 19:45:28
Parsed datetime from string: 2024-01-15 09:00:00


In [None]:
from datetime import timedelta, date

# Create timedelta objects
delta_hours = timedelta(hours=3)
delta_days = timedelta(days=7)
delta_weeks = timedelta(weeks=2)

# Perform arithmetic with datetime objects
dt1 = datetime(2023, 10, 26, 14, 30, 0)
dt2 = dt1 + delta_hours
print(f"Adding 3 hours to {dt1}: {dt2}")

dt3 = dt1 - delta_weeks
print(f"Subtracting 2 weeks from {dt1}: {dt3}")

# Calculate difference between two datetime objects (results in timedelta)
time_difference = now - specific_dt
print(f"Time difference between now and specific_dt: {time_difference}")
print(f"Difference in days: {time_difference.days}")
print(f"Difference in seconds: {time_difference.total_seconds()}")

# Perform arithmetic with date objects
d1 = date(2023, 10, 26)
d2 = d1 + delta_days
print(f"Adding 7 days to {d1}: {d2}")

# Calculate difference between two date objects (results in timedelta)
date_difference = date.today() - d1
print(f"Date difference between today and {d1}: {date_difference}")

Adding 3 hours to 2023-10-26 14:30:00: 2023-10-26 17:30:00
Subtracting 2 weeks from 2023-10-26 14:30:00: 2023-10-12 14:30:00
Time difference between now and specific_dt: 637 days, 5:15:28.730708
Difference in days: 637
Difference in seconds: 55055728.730708
Adding 7 days to 2023-10-26: 2023-11-02
Date difference between today and 2023-10-26: 637 days, 0:00:00


## 1.27 Regular Expressions
---
- **Real Python**: [Regular Expressions: Regexes in Python](https://realpython.com/regex-python/)
- **GeeksforGeeks**: [Regular Expression in Python](https://www.geeksforgeeks.org/regular-expression-python/)
- **Programiz**: [Python RegEx](https://www.programiz.com/python-programming/regex)
- **Python.org**: [re — Regular expression operations](https://docs.python.org/3/library/re.html)
- **RegexOne**: [Interactive Regular Expression Tutorial](https://regexone.com/)
- **Python-Course.eu**: [Regular Expressions](https://python-course.eu/python-tutorial/re.php)
- **PyMOTW**: [re – Regular Expressions](https://pymotw.com/3/re/)

Regular expressions, often shortened to "regex" or "regexp", are powerful patterns used for matching character combinations in strings. They are a fundamental tool for text processing and pattern matching, and are widely used in data science for tasks such as:

- **Data Cleaning:** Finding and replacing specific patterns (e.g., removing special characters, standardizing formats).
- **Text Extraction:** Pulling out specific information from unstructured text (e.g., email addresses, phone numbers, dates).
- **Text Validation:** Checking if a string conforms to a required format.
- **Feature Engineering in NLP:** Creating features based on patterns in text data.

Python's built-in re module provides comprehensive support for regular expressions. Key functions in this module include:

- **`re.search()`:** Scans through a string looking for the first location where the regex pattern produces a match.
- **`re.match()`:** Checks for a match only at the beginning of the string.
- **`re.findall()`:** Finds all non-overlapping matches of the pattern in the string and returns them as a list.
- **`re.sub()`:** Substitutes all occurrences of the pattern in the string with a replacement.

### `re.search()`

The `re.search()` function scans through a string, looking for the **first** location where the regular expression pattern produces a match. If a match is found, it returns a match object; otherwise, it returns None. The match object contains information about the match, such as the matched string and its position.


In [None]:
import re

text = "The quick brown fox jumps over the lazy dog."

# Search for the first occurrence of "fox"
match = re.search(r"fox", text)

if match:
    print(f"Found match: {match.group()}")
    print(f"Start index: {match.start()}")
    print(f"End index: {match.end()}")
else:
    print("Pattern not found.")

# Search for a pattern that doesn't exist
match_fail = re.search(r"cat", text)

if match_fail:
    print(f"Found match: {match_fail.group()}")
else:
    print("Pattern 'cat' not found.")

# Search for a pattern using a character class
match_char_class = re.search(r"[qQ]uick", text) # Matches 'quick' or 'Quick'

if match_char_class:
    print(f"Found match using character class: {match_char_class.group()}")
else:
    print("Pattern 'quick' or 'Quick' not found.")

Found match: fox
Start index: 16
End index: 19
Pattern 'cat' not found.
Found match using character class: quick


### `re.match()`
The `re.match()` function checks for a match **only at the beginning** of the string. If the pattern matches the start of the string, it returns a match `object`; otherwise, it returns `None`. This is the key difference from `re.search()`, which scans the entire string.

In [None]:
import re

text1 = "Data science is fascinating."
text2 = "Is data science fascinating?"

# Use re.match() - checks only the beginning
match1 = re.match(r"Data", text1)
match2 = re.match(r"Data", text2)

if match1:
    print(f"re.match() found match in text1: {match1.group()}")
else:
    print("re.match() did not find match in text1.") # This won't be printed

if match2:
    print(f"re.match() found match in text2: {match2.group()}")
else:
    print("re.match() did not find match in text2.") # This will be printed

print("-" * 20)

# Use re.search() - scans the entire string
search1 = re.search(r"science", text1)
search2 = re.search(r"science", text2)

if search1:
    print(f"re.search() found match in text1: {search1.group()}")
else:
    print("re.search() did not find match in text1.")

if search2:
    print(f"re.search() found match in text2: {search2.group()}")
else:
    print("re.search() did not find match in text2.")

re.match() found match in text1: Data
re.match() did not find match in text2.
--------------------
re.search() found match in text1: science
re.search() found match in text2: science


### `re.findall()`
The `re.findall()` function finds **all non-overlapping occurrences** of the pattern in the string and returns them as a list of strings. This is useful when you need to extract multiple pieces of information from a text.

In [None]:
import re

text = "Emails: alice@example.com, bob.smith@mail.org, invalid-email"

# Find all email addresses (simple pattern)
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
emails_found = re.findall(email_pattern, text)

print(f"Found emails: {emails_found}")

text_numbers = "The numbers are 123, 456, and 789."

# Find all numbers
number_pattern = r'\d+' # Matches one or more digits
numbers_found = re.findall(number_pattern, text_numbers)

print(f"Found numbers: {numbers_found}")

Found emails: ['alice@example.com', 'bob.smith@mail.org']
Found numbers: ['123', '456', '789']


### `re.sub()`
The `re.sub()` function substitutes all occurrences of the pattern in the string with a replacement string. This is commonly used for data cleaning and anonymization.

**Syntax:** `re.sub(pattern, repl, string, count=0, flags=0)`

- `pattern:` The regular expression pattern to search for.
- `repl:` The replacement string or a function to generate the replacement string.
- `string:` The input string to perform substitution on.
- `count:` (Optional) The maximum number of pattern occurrences to be replaced. Defaults to 0, meaning replace all occurrences.
- `flags:` (Optional) Flags that modify the behavior of the pattern matching (e.g., `re.IGNORECASE`).

In [None]:
import re

# Original string with simple patterns to replace
text_simple = "Replace all spaces with underscores."
print(f"Original string (simple): {text_simple}")

# Example 1: Replacing a simple pattern (spaces with underscores)
replaced_simple = re.sub(r"\s", "_", text_simple)
print(f"After replacement (simple): {replaced_simple}")

print("-" * 20)

# Original string with more complex patterns and mixed case
text_complex = "Emails: Test@Example.com, Another.Email-123@mail.org, Invalid@.com"
print(f"Original string (complex): {text_complex}")

# Example 2: Replacing a more complex pattern (email addresses with a placeholder)
# Using re.IGNORECASE flag to match case-insensitively
email_pattern_complex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
replaced_complex = re.sub(email_pattern_complex, "[EMAIL_REMOVED]", text_complex, flags=re.IGNORECASE)
print(f"After replacement (complex): {replaced_complex}")

Original string (simple): Replace all spaces with underscores.
After replacement (simple): Replace_all_spaces_with_underscores.
--------------------
Original string (complex): Emails: Test@Example.com, Another.Email-123@mail.org, Invalid@.com
After replacement (complex): Emails: [EMAIL_REMOVED], [EMAIL_REMOVED], Invalid@.com


## 1.28 Generators and Iterators (In-depth)
---
- **Real Python**: [Introduction to Python Generators](https://realpython.com/introduction-to-python-generators/)
- **GeeksforGeeks**: [Generators in Python](https://www.geeksforgeeks.org/generators-in-python/)
- **Python-Course.eu**: [Generators and Iterators](https://python-course.eu/python-tutorial/generators-and-iterators.php)
- **Python.org**: [The Python yield keyword explained](https://docs.python.org/3/reference/expressions.html#yield-expressions)

Generators and iterators are powerful concepts in Python that are particularly important in data science for working with large datasets efficiently. They allow you to process data one item at a time, rather than loading the entire dataset into memory, which can be crucial for handling big data or infinite sequences.

- **Iterators:** Objects that represent a stream of data. They implement the iterator protocol, which means they have a `__iter__()` method (returning the iterator object itself) and a `__next__()` method (returning the next item from the stream or raising `StopIteration` when done).
- **Generators:** A simple and concise way to create iterators. They are defined like normal functions but use the `yield` keyword instead of `return`. When a generator function is called, it returns a generator object (which is a type of iterator). The `yield` keyword pauses the function's execution and saves its state, resuming from where it left off when `__next__()` is called again.

**Importance for Memory Efficiency:** In data science, you often deal with datasets that can exceed the available RAM. Generators and iterators enable lazy evaluation, meaning values are generated on demand. This significantly reduces memory usage compared to creating and storing entire lists or other sequences in memory.

### The `yield` Keyword
- **Real Python**: [Python "yield" Keyword Explained](https://realpython.com/python-yield/)

The `yield` keyword is the key difference between a regular function and a generator function. Instead of returning a value and terminating, `yield` pauses the function's execution, sends a value back to the caller, and saves its internal state (including the values of local variables and the position of execution).

When the generator's `__next__()` method is called again, the function resumes from where it last yielded, continuing execution until the next `yield` statement or the function ends (which raises `StopIteration`).

**Simple Generator Example:**

This example demonstrates a generator function that yields a sequence of numbers.

In [None]:
def simple_generator():
    print("Generator started")
    yield 1
    print("Resumed and yielded 2")
    yield 2
    print("Resumed and yielded 3")
    yield 3
    print("Generator finished")

# Create a generator object
gen = simple_generator()

# Iterate through the generator using next()
print(next(gen))
print(next(gen))
print(next(gen))

# Trying to get the next value after the generator is exhausted will raise StopIteration
try:
    next(gen)
except StopIteration:
    print("Caught StopIteration: Generator is exhausted.")

print("\nIterating with a for loop (most common way):")
# Generators are iterators, so they can be used directly in for loops
for value in simple_generator():
    print(value)

Generator started
1
Resumed and yielded 2
2
Resumed and yielded 3
3
Generator finished
Caught StopIteration: Generator is exhausted.

Iterating with a for loop (most common way):
Generator started
1
Resumed and yielded 2
2
Resumed and yielded 3
3
Generator finished


### Generator Expressions
- **GeeksforGeeks**: [Generator Expressions in Python](https://www.geeksforgeeks.org/generator-expressions/)

Similar to list comprehensions, Python provides a concise syntax for creating simple generators on the fly, known as generator expressions. They use parentheses `()` instead of square brackets `[]` (for list comprehensions) or curly braces `{}` (for set/dictionary comprehensions).

Generator expressions are more memory-efficient than list comprehensions when dealing with large sequences because they don't construct the entire list in memory. They yield values one by one as requested.

**Generator Expression Example:**

This example shows how to create and use a generator expression.

In [None]:
# Generator expression
gen_exp = (x**2 for x in range(10))
print(f"Generator expression object: {gen_exp}")

# Iterate through the generator expression
print("Iterating through generator expression:")
for value in gen_exp:
    print(value)

# Memory efficiency comparison for large ranges
import sys

# Generator expression (low memory)
gen_large = (x**2 for x in range(1000000))
print(f"\nSize of generator expression for 1M numbers: {sys.getsizeof(gen_large)} bytes") # Size of the generator object itself

# List comprehension (high memory)
# Uncommenting the next line might cause memory issues for very large ranges
# list_large = [x**2 for x in range(1000000)]
# print(f"Size of list comprehension for 1M numbers: {sys.getsizeof(list_large)} bytes") # Size of the entire list

Generator expression object: <generator object <genexpr> at 0x7c04658ecba0>
Iterating through generator expression:
0
1
4
9
16
25
36
49
64
81

Size of generator expression for 1M numbers: 208 bytes


### Benefits of Generators and Use Cases in Data Science
- **DataCamp**: [Python Generators Tutorial](https://www.datacamp.com/tutorial/python-generators)
- **Towards Data Science**: [Python Generators for Data Scientists](https://towardsdatascience.com/python-generators-for-data-scientists-bdee9a0374a5)

The primary benefits of using generators are:

- **Memory Efficiency:** As demonstrated, they consume significantly less memory for large or infinite sequences compared to creating full lists or tuples.
- **Lazy Evaluation:** Values are computed and yielded only when requested. This is useful for processing large files line by line, working with potentially infinite data streams, or when the cost of computing each item is high.
- **Improved Performance (sometimes):** In scenarios where you don't need all elements at once, lazy evaluation can save computation time and resources.

**Common Data Science Use Cases:**

- **Reading Large Files:** Processing large CSV, text, or log files line by line without loading the entire file into memory.
- **Processing Data Streams:** Handling continuous streams of data where the total size is unknown or very large.
- **Generating Data Batches:** Creating batches of data for training machine learning models (e.g., image data, text data) on the fly, especially with limited memory.
- **Implementing Infinite Sequences:** Creating generators for sequences that conceptually have no end (e.g., a sequence of prime numbers).

**The Iterator Protocol Revisited**

As mentioned earlier, generators are a simple way to create iterators. When you define a generator function using `yield`, Python automatically handles the implementation of the iterator protocol (`__iter__()` and `__next__()`) for the generator object it returns. The `__iter__()` method of a generator object simply returns the generator object itself, and the `__next__()` method is implicitly called each time you use `next()` or iterate with a for loop, executing the generator function until the next `yield` or the function ends.

In [None]:
# More practical example: Reading a large file line by line using a generator

def read_large_file(filepath):
    """
    A generator function to read a large file line by line.
    Yields each line of the file.
    """
    print(f"Opening file: {filepath}")
    try:
        with open(filepath, 'r') as f:
            for line in f:
                # You could perform some processing on the line here before yielding
                yield line.strip() # Yield the stripped line
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    finally:
        print(f"Finished reading or file not found: {filepath}")

# Create a dummy large file for demonstration
dummy_file_content = "\n".join([f"This is line {i+1}" for i in range(1000)])
with open("large_dummy_file.txt", "w") as f:
    f.write(dummy_file_content)

# Use the generator to process the file line by line
print("\nProcessing large dummy file with generator:")
line_count = 0
for line in read_large_file("large_dummy_file.txt"):
    # Process the line (e.g., print, count, analyze)
    # print(line) # Uncomment to see each line being processed
    line_count += 1
    if line_count < 5 or line_count > 995: # Print first few and last few lines processed
        print(f"Processed line {line_count}: {line}")
    elif line_count == 5:
         print("...")


print(f"\nTotal lines processed: {line_count}")

# Example with a non-existent file to show error handling within generator
print("\nTesting generator with non-existent file:")
for line in read_large_file("non_existent_file.txt"):
    # This loop body won't execute if the file is not found
    pass

# Clean up the dummy file
import os
os.remove("large_dummy_file.txt")
print("\nDummy file removed.")


Processing large dummy file with generator:
Opening file: large_dummy_file.txt
Processed line 1: This is line 1
Processed line 2: This is line 2
Processed line 3: This is line 3
Processed line 4: This is line 4
...
Processed line 996: This is line 996
Processed line 997: This is line 997
Processed line 998: This is line 998
Processed line 999: This is line 999
Processed line 1000: This is line 1000
Finished reading or file not found: large_dummy_file.txt

Total lines processed: 1000

Testing generator with non-existent file:
Opening file: non_existent_file.txt
Error: File not found at non_existent_file.txt
Finished reading or file not found: non_existent_file.txt

Dummy file removed.


## 1.29 Decorators and Generators
---
- **Real Python**: [Primer on Python Decorators](https://realpython.com/primer-on-python-decorators/)
- **GeeksforGeeks**: [Decorators in Python](https://www.geeksforgeeks.org/decorators-in-python/)
- **Python-Course.eu**: [Decorators](https://python-course.eu/python-tutorial/decorators.php)
- **Programiz**: [Python Decorators](https://www.programiz.com/python-programming/decorator)

Decorators and generators are advanced Python concepts that can significantly improve code organization, reusability, and efficiency, particularly in data science and machine learning workflows.

**Decorators**

Decorators are a powerful and flexible feature in Python that allow you to modify or enhance the behavior of a function or method. They are essentially functions that take another function as an argument, add some kind of functionality, and then return another function (or the original function modified).

The syntax for using decorators is the `@decorator_name` placed directly above the function definition.

**How Decorators Work:**

When you use the `@decorator_name` syntax, it's equivalent to writing:
```Python
def my_function():
    pass

my_function = decorator_name(my_function)
```
The `decorator_name` function is called with `my_function` as its argument, and the returned function is then assigned back to the name my_function. This returned function is often a "wrapper" function that executes the additional logic before or after calling the original function.

**Common Use Cases in Data Science:**

- **Logging:** Recording function calls, arguments, and return values.
- **Access Control/Permissions:** Restricting access to certain functions based on user roles.
- **Memoization/Caching:** Storing the results of expensive function calls and returning the cached result when the same inputs occur again.
- **Timing:** Measuring the execution time of a function.
- **Input Validation:** Checking if function arguments meet certain criteria.
- **Retries:** Automatically retrying a function call if it fails.

In [None]:
def simple_decorator(func):
    """A simple decorator that prints messages before and after function execution."""
    def wrapper():
        print("Decorator: Before calling the function.")
        func()
        print("Decorator: After calling the function.")
    return wrapper

@simple_decorator
def say_hello():
    """A simple function to be decorated."""
    print("Hello from the decorated function!")

# Call the decorated function
print("Calling the decorated function:")
say_hello()

print("\nLet's look at the function name after decoration:")
print(f"Function name: {say_hello.__name__}") # Note: Use functools.wraps for proper function metadata
print(f"Function docstring: {say_hello.__doc__}") # Note: Docstring is also lost without functools.wraps

Calling the decorated function:
Decorator: Before calling the function.
Hello from the decorated function!
Decorator: After calling the function.

Let's look at the function name after decoration:
Function name: wrapper
Function docstring: None


### Generators
As briefly introduced earlier (and covered more in-depth in section 1.28), generators are a memory-efficient way to create iterators. They are defined using functions and the yield keyword.

**How Generators Differ from Regular Functions:**

- **yield vs return**: Regular functions use return to send back a value and terminate execution. Generator functions use `yield` to send back a value and pause execution, saving their internal state.
- **State Preservation**: When a generator yields, it remembers where it left off. The next time `__next__()` is called (either explicitly or implicitly by a loop), it resumes execution from that point.
- **Lazy Evaluation**: Generators produce values one at a time as they are requested. They do not compute and store all values in memory upfront.

**Memory Efficiency:**

This lazy evaluation is the key to their memory efficiency. For very large datasets or potentially infinite sequences, generators allow you to process the data in chunks or item by item, consuming significantly less memory than loading everything into a list or other collection.

**Use Cases with Large Datasets:**

- **Processing large files:** Reading and processing data from files line by line or in chunks.
- **Creating data pipelines:** Building a sequence of operations where data is passed from one step to the next without intermediate storage of the entire dataset.
- **Generating data batches:** Providing data in small batches for training machine learning models, especially when the full dataset doesn't fit in memory.

In [None]:
def count_up_to(n):
    """A simple generator that yields numbers from 1 to n."""
    i = 1
    while i <= n:
        yield i
        i += 1

# Create a generator object
counter_gen = count_up_to(5)

# Iterate through the generator using a for loop
print("Counting up to 5 using a generator:")
for number in counter_gen:
    print(number)

# Generators can also be iterated using next() (though less common for simple cases)
# Resetting the generator requires calling the function again
counter_gen_manual = count_up_to(3)
print("\nCounting up to 3 using next():")
print(next(counter_gen_manual))
print(next(counter_gen_manual))
print(next(counter_gen_manual))

# The next call would raise StopIteration
try:
    next(counter_gen_manual)
except StopIteration:
    print("Generator manually exhausted.")

Counting up to 5 using a generator:
1
2
3
4
5

Counting up to 3 using next():
1
2
3
Generator manually exhausted.


### Applications in Data Science
- **Towards Data Science**: [Python Decorators for Data Science](https://towardsdatascience.com/python-decorators-for-data-science-6913f717669a)


Decorators and generators are not just theoretical concepts; they have practical and valuable applications in real-world data science projects.

**Decorators in Data Science:**

- **Timing Function Execution:** Use a decorator to wrap functions that perform computationally expensive tasks (like model training or complex data transformations) to measure exactly how long they take to run. This is vital for performance optimization.
- **Caching Results (Memoization):** Apply a decorator to functions that have the same output for the same input and are called multiple times. This can save significant computation time by returning cached results instead of re-calculating them.
- **Input Validation:** Create decorators to automatically check if the arguments passed to a data processing or model function are of the correct type, format, or within a valid range, preventing errors early on.
- **Resource Management:** Decorators can be used to ensure resources (like database connections or file handles) are properly opened before a function runs and closed afterward.

**Generators in Data Science:**

- **Processing Large Files:** As demonstrated, generators are ideal for reading and processing large datasets from files line by line or in chunks, preventing memory overload.
- **Creating Data Pipelines:** Link multiple data processing steps together using generators. The output of one generator becomes the input for the next, creating a memory-efficient pipeline where data flows through the steps without being fully materialized at each stage.
- **Generating Data Batches:** When training machine learning models, especially deep learning models, it's common to train on small batches of data. Generators can be used to yield these batches on demand, which is essential for large datasets that don't fit into RAM.
- **Simulating Data Streams:** Generators can simulate real-time data streams for testing or online learning scenarios.

In [None]:
import time
import functools # Import functools to preserve original function metadata

def timing_decorator(func):
    """A decorator that measures the execution time of a function."""
    @functools.wraps(func) # Use functools.wraps to preserve original function name, docstring, etc.
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        duration = end_time - start_time
        print(f"Function '{func.__name__}' took {duration:.4f} seconds to execute.")
        return result
    return wrapper

# Example data science function to time
@timing_decorator
def perform_complex_calculation(n):
    """Performs a simple but time-consuming calculation."""
    total = 0
    for i in range(n):
        total += i**2
    return total

@timing_decorator
def simulate_data_loading(file_size_mb):
    """Simulates loading a file by pausing execution."""
    print(f"Simulating loading a {file_size_mb}MB file...")
    time.sleep(file_size_mb * 0.01) # Simulate delay based on file size
    print("Data loading simulated.")
    return f"Loaded {file_size_mb}MB"

# Use the decorated functions
print("Running complex calculation:")
result_calc = perform_complex_calculation(1000000)
print(f"Calculation result: {result_calc}")

print("\nRunning data loading simulation:")
result_load = simulate_data_loading(50)
print(f"Loading result: {result_load}")

print("\nLet's look at the function name and docstring after decoration with functools.wraps:")
print(f"Function name: {perform_complex_calculation.__name__}")
print(f"Function docstring: {perform_complex_calculation.__doc__}")

Running complex calculation:
Function 'perform_complex_calculation' took 0.2108 seconds to execute.
Calculation result: 333332833333500000

Running data loading simulation:
Simulating loading a 50MB file...
Data loading simulated.
Function 'simulate_data_loading' took 0.5005 seconds to execute.
Loading result: Loaded 50MB

Let's look at the function name and docstring after decoration with functools.wraps:
Function name: perform_complex_calculation
Function docstring: Performs a simple but time-consuming calculation.


In [None]:
import os # Import os for file cleanup

# Assume 'large_dummy_file.txt' was created in a previous step
# If not, create a dummy large file for demonstration
dummy_file_path = "large_dummy_file.txt"
if not os.path.exists(dummy_file_path):
    dummy_file_content = "\n".join([f"This is line {i+1}, some data point {i*10}" for i in range(50000)])
    with open(dummy_file_path, "w") as f:
        f.write(dummy_file_content)
    print(f"Created dummy file: {dummy_file_path}")


def data_chunk_generator(filepath, chunk_size=1000):
    """
    A generator function to read and yield data from a file in chunks.
    Assumes each line is a data record.
    """
    print(f"\nStarting to read file in chunks: {filepath}")
    chunk = []
    try:
        with open(filepath, 'r') as f:
            for line in f:
                # Basic processing: split line into parts (e.g., simulating CSV)
                data_point = line.strip().split(', ')
                chunk.append(data_point)
                if len(chunk) == chunk_size:
                    print(f"Yielding a chunk of {chunk_size} data points.")
                    yield chunk
                    chunk = [] # Reset chunk after yielding
            # Yield any remaining data in the last chunk
            if chunk:
                print(f"Yielding the final chunk of {len(chunk)} data points.")
                yield chunk
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
    except Exception as e:
        print(f"An error occurred while reading the file: {e}")
    finally:
        print(f"Finished reading file: {filepath}")


# Use the generator to process the large file in chunks
print("Processing large file in chunks:")
total_processed_lines = 0
for data_chunk in data_chunk_generator(dummy_file_path, chunk_size=5000):
    # Process each chunk (e.g., perform analysis, train on batch)
    print(f"Received a chunk with {len(data_chunk)} lines.")
    total_processed_lines += len(data_chunk)
    # Example processing: print the first data point in the chunk
    if data_chunk:
         print(f"  First data point in chunk: {data_chunk[0]}")

print(f"\nTotal lines processed across all chunks: {total_processed_lines}")

# Clean up the dummy file
if os.path.exists(dummy_file_path):
    os.remove(dummy_file_path)
    print("Dummy file removed.")

Created dummy file: large_dummy_file.txt
Processing large file in chunks:

Starting to read file in chunks: large_dummy_file.txt
Yielding a chunk of 5000 data points.
Received a chunk with 5000 lines.
  First data point in chunk: ['This is line 1', 'some data point 0']
Yielding a chunk of 5000 data points.
Received a chunk with 5000 lines.
  First data point in chunk: ['This is line 5001', 'some data point 50000']
Yielding a chunk of 5000 data points.
Received a chunk with 5000 lines.
  First data point in chunk: ['This is line 10001', 'some data point 100000']
Yielding a chunk of 5000 data points.
Received a chunk with 5000 lines.
  First data point in chunk: ['This is line 15001', 'some data point 150000']
Yielding a chunk of 5000 data points.
Received a chunk with 5000 lines.
  First data point in chunk: ['This is line 20001', 'some data point 200000']
Yielding a chunk of 5000 data points.
Received a chunk with 5000 lines.
  First data point in chunk: ['This is line 25001', 'some da

## 1.30 Parallelism and Concurrency
---
- **Real Python**: [Speed Up Your Python Program With Concurrency](https://realpython.com/python-concurrency/)
- **GeeksforGeeks**: [Multithreading in Python](https://www.geeksforgeeks.org/multithreading-python-set-1/)
- **Python.org**: [concurrent.futures — Launching parallel tasks](https://docs.python.org/3/library/concurrent.futures.html)
- **Real Python**: [Python Parallel Processing](https://realpython.com/python-parallel-processing/)
- **Python-Course.eu**: [Parallel Processing](https://python-course.eu/numerical-programming/parallel-processing.php)
- **DataCamp**: [Parallel Programming with Python](https://www.datacamp.com/tutorial/parallel-programming-with-python)

In data science and computationally intensive tasks, leveraging multiple CPU cores or efficiently managing I/O operations can significantly speed up computations. Parallelism and concurrency are two related but distinct concepts for achieving this.

- **Concurrency:** Deals with managing multiple tasks that can be executed simultaneously but might not be running at the exact same instant (e.g., switching between tasks during I/O waits). It's about structuring code to handle multiple operations that are in progress at the same time.
- **Parallelism:** Deals with running multiple tasks truly simultaneously on multiple processors or cores. It's about achieving speedup by performing computations in parallel.

**Benefits for Speeding Up Computations:**

- **Reduced Execution Time:** For tasks that can be broken down into independent sub-tasks, running them in parallel can drastically reduce the total time required.
- **Improved Responsiveness:** For applications involving I/O (like reading files or making network requests), concurrency allows the program to continue doing other work while waiting for I/O operations to complete, preventing the program from freezing.

**CPU-bound vs. I/O-bound Tasks:**

- **CPU-bound tasks:** Spend most of their time performing computations (e.g., complex mathematical calculations, matrix operations, model training). These benefit most from **parallelism** using multiple CPU cores.
- **I/O-bound tasks:** Spend most of their time waiting for input/output operations (e.g., reading/writing files, network requests, database queries). These benefit most from **concurrency** using threading or asynchronous programming.

**Multiprocessing**

Python's multiprocessing module is used for creating and managing processes. Each process has its own independent memory space, bypassing the Global Interpreter Lock (GIL). This makes multiprocessing well-suited for **CPU-bound** tasks where you want to utilize multiple CPU cores to perform computations in parallel.

**Threading**

Python's threading module is used for creating and managing threads within a single process. Threads share the same memory space. While threads can provide concurrency for **I/O-bound tasks** (as the thread can switch to another task while waiting for I/O), they are less effective for speeding up **CPU-bound tasks** in CPython due to the **Global Interpreter Lock (GIL)**.

**Global Interpreter Lock (GIL):** The GIL is a mutex (a lock) that protects access to Python objects, preventing multiple native threads from executing Python bytecode at the same time in a single process. This means that even on multi-core processors, a single Python process running CPU-bound code using threads will only execute on one core at a time. The GIL is released during I/O operations, which is why threading can still be beneficial for I/O-bound tasks.


### **Conceptual Example 1: Parallel Data Processing**

Imagine you have a large dataset split into several smaller files, and you need to perform the same independent analysis on each file. This is a scenario where parallelism could be beneficial.

In [4]:
# Conceptual code - not actual multiprocessing implementation here

def analyze_data_file(filepath):
    """
    Conceptual function to simulate data analysis on a single file.
    This would likely involve reading, processing, and analyzing data,
    which could be CPU-bound.
    """
    print(f"Analyzing file: {filepath}")
    # Simulate some work
    import time
    time.sleep(1) # Simulate computation time
    print(f"Finished analyzing: {filepath}")
    return f"Results from {filepath}"

# List of independent data files to process
data_files = [
    "data_part_1.csv",
    "data_part_2.csv",
    "data_part_3.csv",
    "data_part_4.csv",
]

print("Starting sequential processing...")
# Sequential processing (without parallelism)
results_sequential = []
for file in data_files:
    result = analyze_data_file(file)
    results_sequential.append(result)
print("Sequential processing finished.")

# --- How Multiprocessing COULD be applied here (conceptual) ---
# import multiprocessing
#
# print("Starting parallel processing with multiprocessing...")
# # Using a Pool of worker processes
# with multiprocessing.Pool(processes=4) as pool: # Use up to 4 cores
#     # Map the analyze_data_file function to the list of files
#     results_parallel = pool.map(analyze_data_file, data_files)
# print("Parallel processing finished.")
# --------------------------------------------------------------

# In the conceptual multiprocessing example, each analyze_data_file call
# would run in a separate process on a different CPU core, potentially
# reducing the total execution time compared to the sequential approach,
# assuming the task is CPU-bound.

Starting sequential processing...
Analyzing file: data_part_1.csv
Finished analyzing: data_part_1.csv
Analyzing file: data_part_2.csv
Finished analyzing: data_part_2.csv
Analyzing file: data_part_3.csv
Finished analyzing: data_part_3.csv
Analyzing file: data_part_4.csv
Finished analyzing: data_part_4.csv
Sequential processing finished.


### **Conceptual Example 2: Concurrent I/O Operations**

Imagine you need to download data from several different URLs. While one download is waiting for data from the network, another download can start or continue. This is an I/O-bound scenario where concurrency could be beneficial.

In [7]:
# Conceptual code - not actual threading implementation here

def download_data_from_url(url):
    """
    Conceptual function to simulate downloading data from a URL.
    This would likely involve waiting for network responses, which is I/O-bound.
    """
    print(f"Starting download from: {url}")
    # Simulate waiting for network data
    import time
    import random
    wait_time = random.uniform(0.5, 2.0) # Simulate variable network latency
    time.sleep(wait_time)
    print(f"Finished download from: {url}")
    return f"Data from {url}"

# List of URLs to download data from
data_urls = [
    "http://example.com/data1",
    "http://example.com/data2",
    "http://example.com/data3",
]

print("Starting sequential download...")
# Sequential download (without concurrency)
results_sequential_io = []
for url in data_urls:
    result = download_data_from_url(url)
    results_sequential_io.append(result)
print("Sequential download finished.")

# --- How Threading COULD be applied here (conceptual) ---
# import threading
#
# print("Starting concurrent download with threading...")
# # List to store results (need to be careful with shared data in threading)
# results_concurrent_io = []
# # Lock for safely updating the shared results list
# results_lock = threading.Lock()
#
# def download_and_store(url, results_list, lock):
#    result = download_data_from_url(url)
#    with lock:
#        results_list.append(result)
#
# threads = []
# for url in data_urls:
#    thread = threading.Thread(target=download_and_store, args=(url, results_concurrent_io, results_lock))
#    threads.append(thread)
#    thread.start()
#
# # Wait for all threads to complete
# for thread in threads:
#    thread.join()
#
# print("Concurrent download finished.")
# --------------------------------------------------------------

# In the conceptual threading example, while one thread is waiting for a download
# to complete (I/O), the GIL is released, allowing other threads to run and start
# their downloads. This can lead to a shorter total execution time compared to
# sequential processing for I/O-bound tasks.

Starting sequential download...
Starting download from: http://example.com/data1
Finished download from: http://example.com/data1
Starting download from: http://example.com/data2
Finished download from: http://example.com/data2
Starting download from: http://example.com/data3
Finished download from: http://example.com/data3
Sequential download finished.


## 1.31 Practical Examples
---
- **Real Python**: [Python Practice Problems](https://realpython.com/python-practice-problems/)
- **GeeksforGeeks**: [Python Programming Examples](https://www.geeksforgeeks.org/python-programming-examples/)
- **HackerRank**: [Python Domain](https://www.hackerrank.com/domains/python)
- **LeetCode**: [Python Problems](https://leetcode.com/problemset/all/)

This section provides practical examples that combine multiple Python concepts covered in the previous sections. These examples demonstrate how these fundamental building blocks can be integrated to solve common data science and data processing tasks.

### Example 1: Processing Student Scores
- **GeeksforGeeks**: [Python Programs for Student Data](https://www.geeksforgeeks.org/python-program-to-find-student-with-highest-marks/)


In [None]:
import csv
import os

# Example 1: Processing student data from a CSV file

# Create a dummy CSV file for demonstration
dummy_csv_content = """name,subject,score
Alice,Math,85
Bob,Physics,92
Alice,Chemistry,78
Bob,Math,90
Charlie,History,65
Alice,Physics,88
"""
dummy_csv_path = "student_scores.csv"
with open(dummy_csv_path, "w", newline="") as f:
    f.write(dummy_csv_content)

print("--- Example 1: Processing Student Scores ---")

student_data = []
try:
    # Read data from the dummy CSV file
    with open(dummy_csv_path, 'r') as csvfile:
        csv_reader = csv.reader(csvfile)
        header = next(csv_reader) # Skip header row
        for row in csv_reader:
            # Basic validation and type conversion
            if len(row) == 3:
                try:
                    name, subject, score_str = row
                    score = int(score_str)
                    student_data.append({'name': name, 'subject': subject, 'score': score})
                except ValueError:
                    print(f"Skipping row due to invalid score: {row}")
            else:
                print(f"Skipping invalid row format: {row}")

except FileNotFoundError:
    print(f"Error: The file '{dummy_csv_path}' was not found.")
except Exception as e:
    print(f"An unexpected error occurred while reading the file: {e}")


if student_data:
    print("\nRaw student data loaded:")
    print(student_data)

    # Use list comprehension to filter data (e.g., scores above 80)
    high_scores = [s for s in student_data if s['score'] >= 80]
    print("\nStudents with scores >= 80:")
    print(high_scores)

    # Use dictionary comprehension to create a dictionary of average scores per student
    # Group scores by student first using defaultdict (or manual grouping)
    from collections import defaultdict
    scores_by_student = defaultdict(list)
    for record in student_data:
        scores_by_student[record['name']].append(record['score'])

    # Calculate average using a lambda function with map and sum
    average_scores = {
        name: sum(scores) / len(scores)
        for name, scores in scores_by_student.items()
    }
    print("\nAverage scores per student:")
    print(average_scores)

    # Example using map and lambda to apply a function to filtered data
    # Get just the names of students with high scores
    high_achievers_names = list(map(lambda s: s['name'], high_scores))
    print("\nNames of high achievers:")
    print(high_achievers_names)

else:
    print("\nNo student data was loaded to process.")

# Clean up the dummy CSV file
if os.path.exists(dummy_csv_path):
    os.remove(dummy_csv_path)
    print(f"\nCleaned up dummy file: {dummy_csv_path}")

--- Example 1: Processing Student Scores ---

Raw student data loaded:
[{'name': 'Alice', 'subject': 'Math', 'score': 85}, {'name': 'Bob', 'subject': 'Physics', 'score': 92}, {'name': 'Alice', 'subject': 'Chemistry', 'score': 78}, {'name': 'Bob', 'subject': 'Math', 'score': 90}, {'name': 'Charlie', 'subject': 'History', 'score': 65}, {'name': 'Alice', 'subject': 'Physics', 'score': 88}]

Students with scores >= 80:
[{'name': 'Alice', 'subject': 'Math', 'score': 85}, {'name': 'Bob', 'subject': 'Physics', 'score': 92}, {'name': 'Bob', 'subject': 'Math', 'score': 90}, {'name': 'Alice', 'subject': 'Physics', 'score': 88}]

Average scores per student:
{'Alice': 83.66666666666667, 'Bob': 91.0, 'Charlie': 65.0}

Names of high achievers:
['Alice', 'Bob', 'Bob', 'Alice']

Cleaned up dummy file: student_scores.csv


### Example 2: Analyzing Log Data using Generators and Collections
- **Real Python**: [Working with Large Datasets in Python](https://realpython.com/working-with-large-excel-files-in-pandas/)
- **Towards Data Science**: [Log Analysis with Python](https://towardsdatascience.com/log-analysis-with-python-b74a86c0a97)

In [None]:
# Example 2: Analyzing Log Data using Generators and Collections

import os
import random
from collections import Counter

# Create a dummy large log file for demonstration (simulating different event types)
dummy_log_path = "dummy_log.txt"
event_types = ["INFO", "WARNING", "ERROR", "DEBUG"]
log_content = "\n".join([
    f"[{i+1}] {random.choice(event_types)}: This is a log message."
    for i in range(50000) # Simulate a large file with 50,000 lines
])

with open(dummy_log_path, "w") as f:
    f.write(log_content)

print("--- Example 2: Analyzing Log Data ---")

def log_line_generator(filepath):
    """
    A generator function to read log file line by line.
    Includes basic error handling for file not found.
    """
    print(f"Processing log file: {filepath}...")
    try:
        with open(filepath, 'r') as f:
            for line in f:
                yield line.strip() # Yield each line, stripped of whitespace
    except FileNotFoundError:
        print(f"Error: Log file not found at {filepath}")
    except Exception as e:
        print(f"An error occurred while reading the log file: {e}")
    print(f"Finished processing log file: {filepath}.")


# Use the generator to read lines and Counter to count event types
event_counts = Counter()
total_processed_lines = 0

# Iterate over the log lines using the generator
for line in log_line_generator(dummy_log_path):
    # Simple parsing to extract event type
    if ":" in line:
        try:
            # Assuming format like "[ID] EVENT_TYPE: message"
            parts = line.split(":", 1) # Split only on the first colon
            event_part = parts[0].split("] ")[1] # Extract the part after "] "
            event_type = event_part.strip()
            if event_type in event_types: # Only count predefined event types
                event_counts[event_type] += 1
            else:
                 # Handle lines that don't match expected event types format
                 # print(f"Skipping line with unexpected event type format: {line}")
                 pass # Or count as 'UNKNOWN' event_counts['UNKNOWN'] += 1

            total_processed_lines += 1
        except IndexError:
            # Handle lines that don't have the expected "[ID] EVENT_TYPE: " format
            # print(f"Skipping line with unexpected format: {line}")
            pass # Or count as 'MALFORMED' event_counts['MALFORMED'] += 1
        except Exception as e:
             print(f"An error occurred while processing line: {line} - {e}")
             pass # Continue processing other lines
    else:
        # Handle lines without a colon (e.g., empty lines)
        # print(f"Skipping line without colon: {line}")
        pass # Or count as 'OTHER' event_counts['OTHER'] += 1


if total_processed_lines > 0:
    print(f"\nTotal lines processed: {total_processed_lines}")
    print("\nEvent type counts:")
    print(event_counts)

    print("\nMost common event types:")
    print(event_counts.most_common(2)) # Get the 2 most common types

else:
    print("\nNo log data was processed.")


# Test generator with a non-existent file
print("\nTesting log generator with non-existent file:")
for line in log_line_generator("non_existent_log.txt"):
    # This loop body won't execute if the file is not found
    pass


# Clean up the dummy log file
if os.path.exists(dummy_log_path):
    os.remove(dummy_log_path)
    print(f"\nCleaned up dummy log file: {dummy_log_path}")

--- Example 2: Analyzing Log Data ---
Processing log file: dummy_log.txt...
Finished processing log file: dummy_log.txt.

Total lines processed: 50000

Event type counts:

Most common event types:

Testing log generator with non-existent file:
Processing log file: non_existent_log.txt...
Error: Log file not found at non_existent_log.txt
Finished processing log file: non_existent_log.txt.

Cleaned up dummy log file: dummy_log.txt


## 1.32 Code Optimization and Profiling
---
- **Real Python**: [Python Timer Functions: Three Ways to Monitor Your Code](https://realpython.com/python-timer/)
- **GeeksforGeeks**: [Python Code Optimization](https://www.geeksforgeeks.org/python-code-optimizations/)
- **Python.org**: [The Python Profilers](https://docs.python.org/3/library/profile.html)

In data science, especially when dealing with large datasets or complex models, the performance of your code can become a critical factor. **Code optimization** is the process of modifying your code to make it run more efficiently, typically by reducing execution time or memory usage. **Profiling** is the process of analyzing your code to identify performance bottlenecks – the parts of your code that consume the most resources (time, memory, CPU).

**Importance in Data Science:**

- **Handling Large Data:** Processing massive datasets efficiently requires optimized code to avoid excessive memory consumption or prohibitively long execution times.
- **Faster Model Training:** Training complex machine learning models can be computationally expensive. Identifying and optimizing bottlenecks in training loops or data preprocessing can significantly reduce training time.
- **Real-time Applications:** For applications requiring low latency (e.g., real-time predictions), optimized code is essential.
- **Resource Management:** Efficient code uses fewer computational resources, which can be important in cloud environments or on resource-constrained systems.

**Common Python Profiling Tools:**

Python's standard library provides several tools for profiling:

- **`cProfile`:** A deterministic profiler that provides detailed reports on function calls, execution times, and call counts. It's useful for identifying which functions are taking the most time.
- **`timeit`:** A module for microbenchmarking small code snippets. It runs the code multiple times and reports the average execution time, useful for comparing the performance of different approaches for the same task.

### Using cProfile
- **Real Python**: [Python's cProfile: Profiling Your Code](https://realpython.com/python-profiling/)
- **PyMOTW**: [profile and cProfile – Performance Analysis](https://pymotw.com/3/profile/)

`cProfile` is a built-in profiler that gives you a detailed breakdown of where your program spends its time. It is deterministic, meaning it tracks every function call. This makes it suitable for identifying the most time-consuming functions in your code.

To use `cProfile` from within a script or a Jupyter Notebook cell, you can use the `%prun` magic command.

`%prun options function_call`

Common options include:

- `-s sort_order`: Sort the output by a specific column (e.g., `cumulative`, `tottime`, `ncalls`). cumulative is often the most useful, showing the total time spent in a function including calls to other functions.

**Interpreting `cProfile` Output:**

The output of `cProfile` typically includes columns such as:

- `ncalls`: The number of times the function was called.
- `tottime`: The total time spent in the function itself, excluding time spent in functions called by it.
- `percall` (for tottime): Average time per call (tottime / ncalls).
- `cumtime`: The cumulative time spent in the function including time spent in functions called by it.
- `percall` (for cumtime): Average cumulative time per call (cumtime / ncalls).
- `filename`:lineno(function): The location and name of the function.

Look for functions with high cumtime or tottime to identify performance bottlenecks.

**Example:**

Let's profile a function that simulates some computational work by calling other functions that introduce delays.

In [None]:
import time

def slow_function_part1(n):
    """Simulates a slow computation."""
    time.sleep(0.1)
    return sum(i for i in range(n))

def slow_function_part2(data):
    """Simulates another slow computation."""
    time.sleep(0.2)
    return sorted(data)

def main_slow_process(size):
    """Main function that calls other potentially slow functions."""
    print(f"Starting slow process with size {size}")
    result1 = slow_function_part1(size // 2)
    data_list = list(range(size))
    result2 = slow_function_part2(data_list)
    print("Slow process finished.")
    return result1, result2

# Profile the main_slow_process function
print("Profiling main_slow_process...")
get_ipython().run_cell_magic('prun', '-s cumulative', 'main_slow_process(10000)')

Profiling main_slow_process...
Starting slow process with size 10000
Slow process finished.
 

### Using timeit
- **Real Python**: [Python Timer Functions](https://realpython.com/python-timer/)
- **GeeksforGeeks**: [Timeit in Python](https://www.geeksforgeeks.org/timeit-python-examples/)

`timeit` is a module designed for microbenchmarking small bits of Python code. It runs the code snippet multiple times (controlled by the number parameter) and repeats the entire test multiple times (controlled by the repeat parameter) to provide a more reliable average execution time, minimizing the impact of variations in system load.

It's ideal for comparing the performance of two or more different ways to achieve the same result, helping you choose the most efficient one for small operations.

To use timeit from within a Jupyter Notebook cell, you can use the `%timeit` or `%%timeit` magic commands.

- `%timeit`: Times a single line of Python code.
- `%%timeit`: Times an entire cell of Python code.

`%timeit [-n number] [-r repeat] [-p prec] [statement]`

- `-n number`: The number of times to execute the statement in each run (loop). timeit automatically determines a reasonable default if not specified.
- `-r repeat`: The number of times to repeat the timing of the statement (the entire test). timeit automatically determines a reasonable default if not specified.
- `-p prec`: The precision of the reported time.

**Example:**

Let's use timeit to compare the performance of two common ways to concatenate strings or lists.

In [None]:
# Example: Comparing string concatenation methods

print("Comparing string concatenation using '+' vs '.join()':")
long_string_list = ['a'] * 10000

# Method 1: Using '+' (generally slower for many concatenations)
get_ipython().run_cell_magic('timeit', '', "result_plus = ''\nfor s in long_string_list:\n    result_plus += s")

# Method 2: Using '.join()' (generally faster for many concatenations)
get_ipython().run_cell_magic('timeit', '', "result_join = ''.join(long_string_list)")

print("\nComparing list creation/concatenation methods:")

# Method 1: Using '+' for list concatenation (creates new lists)
get_ipython().run_cell_magic('timeit', '', "my_list_plus = []\nfor i in range(1000):\n    my_list_plus = my_list_plus + [i]")

# Method 2: Using list.append() (modifies list in-place, generally faster)
get_ipython().run_cell_magic('timeit', '', "my_list_append = []\nfor i in range(1000):\n    my_list_append.append(i)")

# Method 3: Using list comprehension (often the most Pythonic and efficient)
get_ipython().run_cell_magic('timeit', '', "my_list_comp = [i for i in range(1000)]")

Comparing string concatenation using '+' vs '.join()':
1.42 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Comparing list creation/concatenation methods:
1.38 ms ± 227 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.8 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
24.4 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### Common Code Optimization Techniques
- **Towards Data Science**: [5 Python Performance Tips](https://towardsdatascience.com/5-python-performance-tips-b5e3d23d2c00)
- **Real Python**: [Python Code Quality](https://realpython.com/python-code-quality/)

Once profiling has helped you identify performance bottlenecks, you can apply various optimization techniques:

- **Choose Appropriate Data Structures:** Selecting the right data structure (lists, sets, dictionaries, tuples, or specialized `collections` types like `deque` or `Counter`) based on the operations you need to perform can have a significant impact on performance. For example, checking for membership (`in`) is much faster in sets than in lists.
- **Use Built-in Functions and Libraries:** Python's built-in functions (like `sum()`, `len()`, `max()`, `min()`) and standard library modules are often implemented in C and are highly optimized. Prefer these over writing your own equivalent logic in pure Python when possible.
- **List/Set/Dictionary Comprehensions:** As demonstrated in earlier sections, comprehensions are often more concise and faster than traditional for loops for creating and transforming lists, sets, and dictionaries.
- **Generators:** For processing large datasets or potentially infinite sequences, use generators and generator expressions (as discussed in section 1.28) to avoid loading everything into memory, saving significant memory and potentially time.
- **Vectorization with NumPy:** For numerical operations, especially on arrays and matrices, using NumPy's vectorized operations is dramatically faster than using Python loops. NumPy operations are implemented in C and can leverage optimized linear algebra libraries.
- **Avoid Unnecessary Object Creation:** Creating and destroying objects in loops can add overhead. Try to reuse objects or use techniques that minimize temporary object creation.
- **Optimize Loops:** If you must use loops, try to move computations outside the loop if they don't depend on the loop variable. Avoid repeatedly accessing attributes or performing expensive operations inside tight loops.
- **Consider Algorithms:** Sometimes, the biggest performance gains come from choosing a more efficient algorithm for your task, rather than micro-optimizing the code implementation.
- **Just-In-Time (JIT) Compilers:** For highly performance-critical sections of numerical code, libraries like Numba or PyPy can sometimes provide significant speedups by compiling Python code to machine code.
- **Parallelism and Concurrency:** For CPU-bound tasks, use `multiprocessing` to leverage multiple cores. For I/O-bound tasks, use `threading` or asynchronous programming (as briefly introduced in section 1.30) to overlap waiting time with other work.

## 1.33 Recommended Libraries for Data Science
---
- **Real Python**: [The Most Popular Python Libraries](https://realpython.com/python-libraries/)
- **GeeksforGeeks**: [Python Libraries for Data Science](https://www.geeksforgeeks.org/python-libraries-for-data-science/)
- **DataCamp**: [Python Data Science Libraries](https://www.datacamp.com/blog/top-python-libraries-for-data-science)
- **Towards Data Science**: [Essential Python Libraries for Data Science](https://towardsdatascience.com/best-python-libraries-for-machine-learning-and-deep-learning-b0bd40c7e8c)
- **KDnuggets**: [Top Python Libraries for Data Science](https://www.kdnuggets.com/2020/11/top-python-libraries-data-science.html)

While Python's standard library provides a solid foundation, the true power of Python for data science, machine learning, and AI comes from its extensive ecosystem of specialized third-party libraries. These libraries offer highly optimized tools and functionalities for a wide range of tasks, from numerical computation to model building and visualization.

**Numerical Computing**
- **NumPy (Numerical Python):** The fundamental package for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. NumPy is the backbone of many other data science libraries.

**Data Manipulation and Analysis**
- **Pandas:** An essential library for data manipulation and analysis. It introduces two primary data structures: the `DataFrame` (a two-dimensional table) and the `Series` (a one-dimensional array). Pandas provides powerful tools for reading and writing data, cleaning, filtering, transforming, and aggregating data.

**Visualization**
- **Matplotlib:** A comprehensive library for creating static, animated, and interactive visualizations in Python. It offers a wide variety of plots and charts and provides a high degree of control over the appearance of your visualizations.
- **Seaborn:** Built on top of Matplotlib, Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. It simplifies the creation of common plot types and offers aesthetically pleasing default styles.

**Machine Learning**
- **Scikit-learn:** A powerful and easy-to-use library for machine learning in Python. It provides a wide range of supervised and unsupervised learning algorithms, as well as tools for model selection, evaluation, and data preprocessing. Its consistent API makes it easy to experiment with different models.

**Deep Learning**
- **TensorFlow:** An open-source platform for machine learning and deep learning developed by Google. It provides a comprehensive ecosystem of tools, libraries, and resources for building and deploying machine learning models, particularly neural networks.
- **PyTorch:** An open-source machine learning library developed by Facebook's AI Research lab. It is known for its flexibility, ease of use, and dynamic computation graph, making it popular for research and development in deep learning.

**Other Useful Libraries**
- **SciPy (Scientific Python):** Built on top of NumPy, SciPy provides a collection of algorithms and functions for scientific and technical computing, including modules for optimization, linear algebra, integration, interpolation, and more.
- **Statsmodels:** A library that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and statistical data exploration.

## **Additional Comprehensive Learning Platforms**

### **Complete Course Platforms**
- **Real Python**: [Complete Python Learning Path](https://realpython.com/learning-paths/)
- **GeeksforGeeks**: [Python Tutorial](https://www.geeksforgeeks.org/python-programming-language/)
- **Python-Course.eu**: [Complete Python Course](https://python-course.eu/)
- **Programiz**: [Learn Python Programming](https://www.programiz.com/python-programming)
- **W3Schools**: [Python Tutorial](https://www.w3schools.com/python/)

### **Interactive Learning**
- **Codecademy**: [Learn Python 3](https://www.codecademy.com/learn/learn-python-3)
- **DataCamp**: [Introduction to Python](https://www.datacamp.com/courses/intro-to-python-for-data-science)
- **freeCodeCamp**: [Scientific Computing with Python](https://www.freecodecamp.org/learn/scientific-computing-with-python/)

### **Books and Documentation**
- **Python.org**: [Official Python Tutorial](https://docs.python.org/3/tutorial/)
- **Automate the Boring Stuff with Python**: [Online Book](https://automatetheboringstuff.com/)
- **Python Crash Course**: [Book Resources](https://ehmatthes.github.io/pcc/)
- **Think Python**: [Online Book](https://greenteapress.com/wp/think-python-2e/)

---

**Note**: This comprehensive list provides the most authoritative and well-maintained resources for each topic. Real Python, GeeksforGeeks, and official Python documentation are consistently the most reliable sources for learning Python concepts in depth. Each resource has been selected based on content quality, comprehensive coverage, practical examples, and community recognition.

# 🚀 Complete Python Cheat Sheet for Machine Learning, Deep Learning & AI
### 📋 Table of Contents

- [Introduction](https://colab.research.google.com/drive/1AmLQ-g3jVHH52TD0C3paQpnJu4Wl9e0D?usp=sharing)


1. [**Python Fundamentals**](https://colab.research.google.com/drive/1linKYA8PHgnMb4ugYkClIWu0_7SdfLtk?usp=sharing) - Your foundation for data science success
2. [**NumPy - Numerical Computing**](https://colab.research.google.com/drive/1qZFirXOdQtbtfCdJPtT9RU-FshLo9qLH?usp=sharing) - Power through numbers with blazing speed
3. [**Pandas - Data Manipulation**](https://colab.research.google.com/drive/18QZJEVNTCqfHAATjvYZZy4e-gcDmKpMk?usp=sharing) - Transform chaos into insights
4. **Matplotlib & Seaborn - Data Visualization** - Paint pictures with data
5. **Scikit-learn - Machine Learning** - Your first step into intelligent systems
6. **TensorFlow & Keras - Deep Learning** - Build brains that think
7. **PyTorch - Deep Learning** - Flexible neural network creation
8. **Data Preprocessing** - Clean data, clear results
9. **Model Evaluation & Metrics** - Measure what matters
10. **Advanced Topics** - Push the boundaries of possibility
11. **Best Practices** - Code like a professional
12. **Resources & Further Learning** - Never stop growing

* * *

### **Copyright © 2025, [`Mirza Naeem Beg`](https://mirzanaeembeg.github.io/). All rights reserved.**

### <mark>**Important Note:**</mark>
***This content is provided solely for educational purposes. I am the sole author of this entire cheat sheet. While some images from the internet and AI-generated code and descriptions were used where necessary, the overall compilation, structure, and majority of the content are my original work. And I am inspired by my Soft Computing Lab (CSE4130) and Pattern & Machine Learning Lab (CSE4114) to do this entire thing.***

* * *

**`Mirza Naeem Beg`**<br>
`Final Year UG Student,`<br>
`Dept. of CSE,` [**`AUST`**](https://aust.edu/)

[`Learn more about me;`](https://mirzanaeembeg.github.io/)

