# Section A: Importing and Pip

`Pip`: A terminal command which downloads code and manages version from PyPi

`Import`: A python command allows you to access code locally across your computer 

In [None]:
# Prints the working directory (scope)
! pwd

Run a terminal command by using `!`


In [None]:
# touch is a bash command which creates an empty file
! touch lib.py 

Write some lines of code to lib.py

In [None]:
# `echo` and `>` are both bash commands, where one outputs the text 
# and `>` writes to a file
! echo "var = 10" > lib.py

Access the `var` variable locally in your python script using the following template:

Method 1:
```
from <file_path> import <object>
```
Method 2:
```
import <file_path>
```


**Important Note**: Sometimes you will need an `__init__.py` file to import python objects

In [None]:
from lib import var

In [None]:
var

In [None]:
import lib

In [None]:
lib.var

When using `PIP` we are downloading code to a specific directory on your local

In [None]:
# Terminal command to install numpy
! pip install numpy

In my laptop `numpy` lives in `./venv/lib/python3.10/site-packages`

What about on your laptop?

In [None]:
import numpy 

In [None]:
numpy.__version__

# Section B: Data Types



## Numbers

Includes `float` and `int`

In [None]:
# Example of a float 
num = 3.13
print(type(num))

In [None]:
# Example of an int
num = 3
print(type(num))

## Strings

- Are defined using a double quotes `"` or single quotes `'`  ```'''```

- Strings are **not** a type of list in Python, but they share some similar properties with lists.

In [None]:
'\\t ello'

In [None]:
print('\\t ello')

In [None]:
# A number wrapped in quotes in still a string
var1 = '1'
print(type(var1))

In [None]:
# The following shows which method of class `<str>` I have access to
print(dir(var1))

**IMPORTANT REMINDER**:

Round brackets `()` are used to "call/execute" a method

In [None]:
'2.2'.title

In [None]:
# By adding an `?` to the name of the method, I can prompt some documentation and read it
'2.2'.isdigit()

In [None]:
# By adding an `?` to the name of the method, I can prompt some documentation and read it
'2'.isdigit()

In [None]:
'1.1'.isnumeric()

In [None]:
var1.isdigit()

In [None]:
var = 'lower-case-text'

In [None]:
var

In [None]:
l = [1,2,3,4]
help(l.append)

In [None]:
l.append?

In [None]:
l

In [None]:
# The .upper() method converts all text to uppercase
print(var.upper())

In [None]:
# Split converts the string into a list that in split by the argument in the function
var.split('e')

**Important**: in-place means if the object was modified/overwritten.

In our case the `upper()` method does not made modications in place

In [None]:
updatedvar = var.upper()

In [None]:
updatedvar

### Escape Characters
```
\n: Newline
\t: Tab
\\: Backslash
\': Single quote
\": Double quote
```

In [None]:
"Hello\nWorld"

In [None]:
print("Hello\nWorld")  # Output: Hello (newline) World

In [None]:
print("Hello\tWorld")  # Output: Hello (newline) World


In [None]:
# Sometimes we have problems dealing with escape characters 
path = "C:\new_folder\file.txt"
print(path)

### Raw Strings 

- One fix to the problem above is to use raw strings.
- `r'str'`

In [None]:
# Boom it works now!
path = r"C:\new_folder\file.txt"
print(path)

In [None]:
type(r"C:\new_folder\file.txt")

### Formatted String Literals (f-strings)

- A cool way of embedding another variable inside your string
- `f'string_{var_name}'`
- uses a combination of `f` before your string and `{}` wrapped around your variable

In [None]:
var = 10
print(f'example_{var}')

In [None]:
yolo = 'not_yolo'

new_string = f'yolo{yolo}'

print(new_string)

If you are interested search `.format()`

## Boolean

- A binary value of True or False
- A return type when validating a condition

In [None]:
print(type(True))

In [None]:
print(type(False))

In [None]:
var = 10

In [None]:
var == 100

In [None]:
var != 100

In [None]:
mask = var == 100

In [None]:
var == 10

In [None]:
var != 9

In [None]:
var == 9

## `and` `or` `not` Operations

`and` checks to see if **both** values are valid (not null)
or if they are both True 

`or` checks to see if **any** values are valid (not null)
or if they are both True 


In [None]:
1 or 10

In [None]:
if 1 or 10:
    print('10')

In [None]:
if False:
    print('10')

In [None]:
if None:
    print('10')

In [None]:
if True:
    print('10')

In [None]:
if 100:
    print('10')

In [None]:
False or not True

In [None]:
True and True

In [None]:
None

In [None]:
None or None

In [None]:
True or True

In [None]:
False or True

In [None]:
False or False

In [None]:
if 10 and [1,1,1]:
    print(True)

In [None]:
if None:
    print(True)

In [None]:
if None and [1,1,1]:
    print(True)

In [None]:
False and True 

In [None]:
10 and 100

In [None]:
10 or True 

In [None]:
True or 10

In [None]:
is_red = True
length = 3
width = 10

In [None]:
print(3 and 9 or 7 and [] or () and '')

### Membership Operators

In [None]:
'hello' in 'hello world'

In [None]:
'i' in 'hello world'

In [None]:
print(3 and 9)

In [None]:
print(3 and 9 or 7)

In [None]:
True and False

In [None]:
True and None

In [None]:
print(3 and 9 or 7 and [])

In [None]:
hash(10)

In [None]:
hash?

In [None]:
hash(10)

In [None]:
hash(7578623418076940886)

In [None]:
hash('\n')

## Iterables

Iterables are data types which you can iterate over!


### Lists:

- Defined using `[]` notation
- Are mutable
- Ordered
- Indexable
- Allows for duplicates



In [None]:
# Using `[` to open my list and `]` to close my list 
fruits = ['apple','banana','kiwi']
print(fruits)

In [None]:
# I can also add a new line after each `,`
fruits = [
    '''apple
    ''',
    'banana',
    
    'kiwi',
]
print(fruits)

In [None]:
# I can also add a new line after each `,`
fruits = [
    '\napple',
    'banana',
    
    'kiwi',
]
print(fruits)

In [None]:
print('\napple')

### How to iterate

Using a `for loop` can occurs in 2 ways: 

Method 1:

```
for item in items:
```

Method 2:

```
for i in range(len(items)):
    item = items[i]
```

In [None]:
fruits

In [None]:
# Using method 1:

for fruit in fruits:
    print(1)
    
print(fruit)

print(10)

print(1)


In [None]:
# Using method 1:

for num in [1,2,3,4]:
    pass

print(num)

In [None]:
matrix = [
    [1,2,3,4,5,6,7],
    [1,2,3,4,5,6,7],
    [1,2,3,4,5,6,7],
]  

# 7x3

### Slicing a list

`Start:Stop:Step`

In [None]:
matrix[0][2::3]

In [None]:
matrix[0][:5]

In [None]:
# Using method 2:

for i in range(len(fruits)):
    fruit = fruits[i]


print(fruit)


**IMPORTANT NOTE**: 


Indentation tells python what should run inside your `for-loop` 

### Tuples:
- Defined using `(,)` or just `,` notation
- Are *not* mutable
- Ordered
- Indexable
- Allows for duplicates



In [None]:
'1'.isdigit()

In [None]:
y = (1,2,3,4)


In [None]:
y = 10

In [None]:
10 + y

In [None]:
# Method 1 without the `()`:

my_tuple = 1,2,3,
print(type(my_tuple))

In [None]:
# Method 2, with the `()`:

my_tuple = (1,2,3)
print(type(my_tuple))

### Sets: 

- Defined using `{}` or `set()` notation
- Are **mutable** (but their elements must be immutable)
- **Unordered** (no guaranteed order of elements)
- **Not indexable** (cannot access elements by index)
- **Does not allow duplicates** (each element is unique)
- Must not contain mutable objects

### Frozen Sets
- Defined using `frozeset()` notation
- Are **not mutable**
- **Unordered** (no guaranteed order of elements)
- **Not indexable** (cannot access elements by index)
- **Does not allow duplicates** (each element is unique)

In [None]:
frozenset([1,2,3,4,4])

In [None]:
y = frozenset([1,2,3,4])

In [None]:
type(y)

In [None]:
y.update({10})

In [None]:
y

In [None]:
y[0]

In [None]:
(
    [1,2,3,4],
    [1,2,3,4],
    [1,2,3,4],
    [1,2,3,4],
)

In [None]:
# Example of how set must contain immutable objects
{
    [1,2,3,4],
    [1,2,3,4],
    [1,2,3,4],
    [1,2,3,4],
}

In [None]:
# Example of how set must contain immutable objects
{
    {1},
    {2,3,4},
    {1,2},
    {5},
}

In [None]:
frozenset?

In [None]:
# Example of how set must contain immutable objects
{
    frozenset({1}),
    frozenset({2,3,4}),
    frozenset({1,2}),
    frozenset({5}),
}

In [None]:
s = {1,2,3,4}
dir(s)

In [None]:
[
    [1,2,3,4],
    [1,2,3,4],
    [1,2,3,4],
    [1,2,3,4],
]

In [None]:
i = (1,2,3,4, [1,1,1,1])

In [None]:
i

### Hashing
Hashing converts data (like strings or numbers) into a fixed-size integer, called a **hash value**. 

This is done using Python’s `hash()` function.

Hashing is critical for fast lookups in structures like **sets** and **dictionaries**.


An object is **hashable** if it has a hash value that doesn't change during its lifetime. This is why **immutable** types like strings, numbers, and tuples are hashable, while lists are not.


Cryptography Hash (ex: SHA)

In [None]:
hash([1,2,3])

In [None]:
print(hash("""
test       
test       
test       
test       
test       
test       
test       
test       
"""))  # Hash of a string


In [None]:
print(hash("""
test       
test       
test       
test       
"""))  # Hash of a string


In [None]:
# Example: Hashing different objects
print(hash("hello"))  # Hash of a string
print(hash(42))       # Hash of an integer (is the number itself!)


In [None]:
print(hash("bye"))  # Hash of a string


In [None]:
# Immutable objects are hashable
print(hash((1, 2, 3)))  # Hash of a tuple

# Lists are not hashable (will raise a TypeError)
# print(hash([1, 2, 3]))  # Uncomment to see the error


In [None]:
1 in [1,2,3,4]

In [None]:
1 in set([1,2,3,4])

In [None]:
1 in {1,2,3,4}

In [None]:
1 in (1,2,3,4)

In [None]:
hash('13')

**How Hashing Relates to Sets**

Sets use **hash tables** to store elements. 

A **hash table** is a structure where elements are stored based on their **hash index**, allowing for fast lookup and ensuring uniqueness.

This index is not a memory address but an offset in the internal table that maps the hash to the memory where the actual object is stored.

When you check if an item is in the set, Python:

- Computes the hash value of the item.
- Looks up the corresponding index in the hash table.
- Fetches the item from memory using that index.


Hashing enables constant-time access for operations like checking membership (x in set) because instead of scanning the entire set, Python uses the hash value to jump directly to the correct "bucket" (or index) in the hash table.

This method avoids the need for traversing memory addresses directly and speeds up data retrieval by indirectly accessing the memory location through the hash value.


When you check if an element exists in a set, Python uses the **hash value** of the element to quickly find it in the hash table.


In [None]:
# Example: Using a set

my_set = {"apple", "banana", "cherry"}

print("banana" in my_set)  # Hashing used for fast lookup

In [None]:
# Example: Using a set

my_set = {"apple", "banana", "cherry"}

print(("banana" in my_set) or ("not" in my_set))  # Hashing used for fast lookup

In [None]:
A = {1,2,3,4,5} 
B = {1,2,4,5} 

In [None]:
[1,2,3,4,5] + [1,2,3]

In [None]:
[1,2,3,4,5] - [1,2,3]

In [None]:
A - B

In [None]:
B - A

In [None]:
A.intersection(B)

In [None]:
my_set.issubset?

In [None]:
my_set.issuperset?

In [None]:
print("banana" in [1,2,3,4])  # Hashing used for fast lookup

**Same Object, Same Hash?**

Yes, for the same immutable object within the same Python session, the hash will always be the same. 

However, Python applies hash randomization for strings across different sessions (for security reasons), so the hash for a string may differ between sessions but remains consistent within the same session.



In [None]:
# Example of Hash collision
n = 42
a = 42.0
print(hash(n))             # Example hash of an integer
print(hash(a))           # Example hash of a float

# Both produce the same hash value


In [None]:
a == n

In Python 2, "Aa" and "BB" used to cause collisions due to the way the hash function worked at the time. However, in modern Python (like Python 3), the hashing algorithm was improved to avoid such trivial collisions.


In [None]:
# Example strings that produce the same hash value
s1 = "Aa"
s2 = "BB"

print("String 1:", s1)
print("String 2:", s2)

print("Hash of s1 (Aa):", hash(s1))
print("Hash of s2 (BB):", hash(s2))

print("Are the strings equal?", s1 == s2)  # False, because the strings are different


In [None]:
my_set

In [None]:
#Sets only allow unique elements. Hashing helps ensure this by preventing duplicates.

# Duplicates are ignored in sets
my_set.add("banana")
print(my_set)  # 'banana' is not added again


In [None]:
a = 1
a.__hash__()

In [None]:
'12344'.__hash__()

In [None]:
hash('12344')

In [None]:
hash(
    (1,2,3,4,4)
)

In [None]:
print(dir(1))

In [None]:
y = [1,2,3,4,]
y.__hash__()

In [None]:
print(dir([1,2,3,4,]))

In [None]:
%timeit

my_set = {1, 2, 3, 4, 5}
print(my_set)  # No change, as duplicates are ignored

# Check for membership
print(3 in my_set)  # Fast lookup using hashing


In [None]:
my_set.add(5)  # Adding a duplicate


In [None]:
{1,2,3,4, (1,2,3,4)}

**Why Can't Sets Contain Other Sets?**

Sets are mutable, so they don’t have a stable hash value. Since sets require all their elements to be hashable, you can't store a set inside another set.

In [None]:
# Example: Trying to add a set inside another set (which will raise an error):

# This will raise a TypeError because sets are unhashable
outer_set = {1, 2, 3}
inner_set = {4, 5}


outer_set.add(inner_set)


**FrozenSets: The Hashable Version of Sets**

A **frozenset** is an immutable version of a set, meaning it has a stable hash value. Since it's hashable, you can include a **frozenset** inside a regular set.


In [None]:
# Creating a frozenset
inner_frozenset = frozenset({4, 5})

# Adding a frozenset to a regular set
outer_set = {1, 2, 3}
outer_set.add(inner_frozenset)

print(outer_set)  # Output: {1, 2, 3, frozenset({4, 5})}


**Hashable Values Allowed in Sets**

Since sets can only store **hashable** (immutable) objects, you can include elements like:
- Numbers
- Strings
- Tuples
- FrozenSets

However, objects like lists and sets cannot be included.


In [None]:
my_set = {1, "apple", (2, 3), frozenset({4, 5})}
my_set

In [None]:
#### Example: Adding hashable and unhashable elements to a set


# Hashable elements
my_set = {1, "apple", (2, 3), frozenset({4, 5})}
print("Hashable values in set:", my_set)

# Trying to add an unhashable element (list)
try:
    my_set.add([6, 7])
except TypeError as e:
    print("Error:", e)  # Output: Error: unhashable type: 'list'


**Recap**:

- **Sets** can only store **hashable** (immutable) objects.
- **Sets are unhashable** because they are mutable.
- You can include immutable objects like **frozen sets** or **tuples** in a set but not mutable objects like sets or lists.

This concept helps ensure efficient membership testing and uniqueness in Python sets!

## Key Takeaways:

- **Hashability**: Only immutable objects can be hashed and used in sets.
- **Hash Table**: The underlying structure of sets, enabling fast lookup and unique element storage.
- **Hash Index**: Elements are stored based on their hash values, making set operations like checking membership very efficient.

Hashing is essential for how Python handles sets and dictionaries. It allows fast lookups, enforces uniqueness, and supports high performance in data-heavy operations.

In [None]:
dir([1,2,5,4,])

In [None]:
l = [1,2,23,5,2,1,5]

In [None]:
matrix = [
    [1,2,3,],
    [1,2,3,],
    [1,2,3,],
    [1,2,3,],
    [1,2,3,],
    [1,2,3,],
    [1,2,3,],
]

In [None]:
# Dimensionality = 2
matrix

In [None]:
matrix[::2]

`[start_index:stop_index:step]`


In [None]:
l[::2]

**Note:** Easy way to check odd vs even. Using Modulus `%`

In [None]:
# Modulus 
10%2

In [None]:
# Modulus 
5%2

# Section C: Functions & Closures

## Introduction:

A function is a reusable block of code that performs a specific task. It can take inputs (parameters) and return outputs.


- **Parameters** are variables defined in the function signature.
- **Arguments** are the actual values you pass to the function when calling it.



In [None]:
name = "Tyson"
print(f"Hello, {name}")
name = "Jack"
print(f"Hello, {name}")
name = "Mike"
print(f"Hello, {name}")
name = "Rob"
print(f"Hello, {name}")

In [None]:
# Example:

def greet(name):
    return f"Hello, {name}!"


In [None]:
# Recap: Embed a variable inside of a string using `{}`
name = "Peter"
print(f"Hello {name}")

In [None]:
# TODO: Examples of .format()
"hello {}".format(name)

In [None]:
greet

Execution `()`

In [None]:
    
print(greet("Alice"))
print(greet("Rob"))
print(greet("Tyson"))


In [None]:
greet("Alice")

## Return Values

- Functions can return values using the `return` statement. The return value can be of any type (string, number, list, etc.) or even `None`.
- If no `return` is specified, Python will automatically return `None`.



In [None]:
def add(a, b):
    print('function is starting')
    total = a+b
    print(total)
    print('function is ending')

result = add(3, 4)

print(result) 


In [None]:
a,b,c = 1,2,3

In [None]:
a

In Python, you can return multiple items from a function in several ways. Python provides flexibility by allowing you to return multiple values in a single `return` statement. Here’s a breakdown of different methods to do so:



**Returning Multiple Values as a Tuple**

The most common way to return multiple values is to return them as a **tuple**. You don’t need to explicitly create a tuple using parentheses—Python does this automatically if you return several values separated by commas.



In [None]:
def get_coordinates():
    x = 10
    y = 20
    return x, y 



In [None]:
get_coordinates?

In [None]:
# Executing or "calling" my function
coords = get_coordinates()
print(coords) 
print(type(coords))
print(coords[0])     


In [None]:
coords[1]

In [None]:
# unpacking 
outx,outy = get_coordinates()


**Returning Multiple Values as a List**

You can also return a **list** if the number of items you want to return may vary or you need to return a mutable sequence.


In [None]:
def get_fruits():
    return ["apple", "banana", "cherry"]

fruits = get_fruits()
# The print function is implicitly ordering your set 
print(fruits) 
print(type(fruits))

In [None]:
example_set = {"test", "hello",'apple'}
example_set

In [None]:
example_set[0]

In [None]:
# Can i return a set?
def get_fruits():
    return {"banana", "apple", "cherry"}

fruits = get_fruits()
# The print function is implicitly ordering your set 
print(fruits) 

In [None]:
# Example of a logical error
a,b,c = get_fruits()

In [None]:
type(rest)

In [None]:
a

In [None]:
c

**Returning Multiple Values as a Dictionary**

If you want to return a set of **named values** (key-value pairs), you can return a **dictionary**. This is useful when returning items that are logically related but might benefit from being identified by name.


In [None]:
def get_person_info():
    return {"name": "Alice", "age": 30, "location": "NYC"}

In [None]:
person_info_database = []

for i in range(10):
    row = get_person_info()
    row.update({'id':i})
    person_info_database.append(
        row
    )

In [None]:
person_info_database

In [None]:
row['id'] = 1 

In [None]:
person_info_database.__len__()

In [None]:

print(person_info)     
print(type(person_info))
print(person_info["name"])



In [None]:
person_info['name']

In [None]:
person_info.get?

In [None]:
help(person_info.get)

In [None]:
person_info.get('country', 'Canada')

In [None]:
person_info['country']

**Returning Multiple Values Using `*args` (as a Tuple)**

You can also return multiple values by leveraging the **`*args`** syntax to collect arguments into a tuple.

**Important**:

`*` is used for unpacking or expanding iterable values

In [None]:
# Ordered iterable = Sequence? TODO (Yes)

l = (1,2,3)


Characteristics of Sequences:

**Order**: The elements are stored in a specific order, meaning the position of elements is preserved.

**Indexing**: You can access elements by their index (e.g., `my_list[0]`).

**Slicing**: You can retrieve a subset of elements (e.g., `my_list[1:3]`).

Examples of Sequences:
Lists, Tuples, Strings, Ranges


In [None]:
[1,2,3, {1,2,3,}]

In [None]:
{{1,2,3}, {3,4,5}}

In [1]:
l = (1,2,3)


In [2]:
# Unpacking 
a, b, c = l

In [3]:
a

1

In [4]:
a, *args = l

In [5]:
args

[2, 3]

In [None]:
args

In [None]:
# Will not work because python does not know ur unpacking into a tuple
*rest = l


In [None]:
# Will work 
*rest, = l

In [None]:
rest

In [None]:
matrix

In [None]:
def get_first_item(m):

    def get_sum(*args):
        print(args[0])
        
        a,b,c = args[0]
        print(a,b,c)
        return sum([a,b])
        
    get_sum(m[0])





In [None]:
get_first_item(matrix)

In [None]:
*args, = [1, 2, 3]


In [None]:
args

In [None]:
def get_numbers(*args,):
    print(args)
    print(type(args))
    return args  


In [None]:
numbers = get_numbers([1, 2, 3, 4])


**Unpacking Multiple Return Values**

If a function returns multiple values as a tuple, you can **unpack** the values directly into multiple variables when calling the function. This avoids handling the result as a tuple.

    

In [None]:
def get_dimensions():
    length = 5
    width = 10
    height = 15
    return length, width, height  

l, w, h = get_dimensions() 
print(f"Length: {l}, Width: {w}, Height: {h}")

In [None]:
def get_dimensions():
    length = 5
    width = 10
    height = 15
    return length, width, height  

*dims, = get_dimensions() 
print(f"Dimensions {dims}")

## Doc Strings and Type Hinting

A **docstring** is a special string at the start of a function that explains what the function does. It can be accessed using `help()` or the `__doc__` attribute. 

**Type hinting** allows you to specify the expected types of function arguments and return values. While Python does not enforce these types, they help improve code clarity and provide useful information to developers and IDEs.



In [None]:
def add(a, b):
    """
    Summary: Adds two numbers and returns the result.
    
    Args:
    a (int): The first number.
    b (int): The second number.
    
    Returns:
    int: The sum of a and b.
    """
    return a + b


In [None]:
add?

In [None]:
import pandas as pd

In [None]:
pd.merge?

In [None]:
help?

In [None]:
# Access the docstring
help(add)


**Type Hinting**


In [None]:
# You can try passing incorrect types and observe behavior

def multiply(x: int, y: int) -> int:
    """Multiplies two integers and returns the result."""
    return x * y

# Call the function
result = multiply(3.123, 4)
print(result) 

## Arguments and Parameters

**Default Arguments**
You can also define default values for function parameters. If the argument is not provided, the default value is used.

In [None]:
# Using Type Hinting
def greet(name: str = "Guest"):
    """Greets the user by name or as a guest."""
    return f"Hello, {name}!"

# Call the function with and without an argument
print(greet("Alice"))  

In [None]:
print(greet()) 

In [None]:
# Not using Type Hinting
def greet(name = "Guest"):
    """Greets the user by name or as a guest."""
    return f"Hello, {name}!"

# Call the function with and without an argument
print(greet("Alice"))  
print(greet())        

In [None]:
name

**(Arbitrary Positional Arguments) `*args`**

Using `*args` in a function allows you to pass a variable number of positional arguments to a function.

In [None]:
def sum_all(*args):
    """Returns the sum of all provided arguments."""
    print(args)
    return sum(args)


In [None]:
sum_all(1,2,32,3)

In [None]:
a,b,c , *rest, = 1,1,23,3,4,4,5,5,6

In [None]:
rest

In [None]:
b

In [None]:
rest

In [None]:
a

In [None]:
a,b,c , *rest = set([1,1,23,3,4,4,5,5,6])

In [None]:
set([1,1,23,3,4,4,5,5,6])

In [None]:
c

In [None]:
rest

In [None]:
# It doesn't have to be `args` it can be any variable following 
# a single astriks
def sum_all(num1, *cake):
    """
    Returns the sum of all provided arguments.
    """
    print(num1)
    print(cake)
    return sum(cake)

# The first value is being assigned to num1
sum_all(1,2,32,3)

In [None]:
# Defining a function that takes multiple positional arguments
def add_numbers(a, test, b, c):
    return a + b + c

numbers = [1, 2, 3, 1]

result = add_numbers(*numbers)
print(result) 


In [None]:
# Defining a function that takes multiple positional arguments
def add_numbers(a, _, b, c):
    return a + b + c

numbers = ['1','1','2','3']

result = add_numbers(*numbers)
print(result) 


In [None]:
list1 = [1, 2, 3]
list2 = [4, 5, 6]

# Unpacking and combining lists
combined_list = [*list1, *list2]
print(combined_list) 


In [None]:
*list1, *list2

**Arbitrary Keyword Arguments `**kwargs`**

**`**kwargs`** allows you to pass a variable number of keyword arguments (key-value pairs) to a function. Inside the function, they are accessible as a dictionary.


In [None]:
def display_info(**kwargs):
    """Displays information about a person using keyword arguments."""

    # return **kwargs # Error
    return kwargs # Correct
    


In [None]:
# Call the function with keyword arguments
returned_kwarg = display_info(
    name="Alice", 
    age=30, 
    profession="Engineer", 
    tshirt_color = 'red',
)


In [None]:
returned_kwarg

In [None]:
# Returns a list of tuples
items = returned_kwarg.items()

In [None]:
items

In [None]:
# Unpacking each tuple into `k` and `v` as we iterate through the list
for k,v in returned_kwarg.items():
    print(k,v)

In [None]:
def test_func(a, b):
    print(b)
    print(a)




In [None]:
# ORDER MATTERS! (when not dealing with key word arguments)
# setting the params of the func: a = 1, b = 2
test_func(1,2)

In [None]:
# setting the params of the func: a = 2, b = 1
test_func(2,1)

In [None]:
# ORDER does not MATTER! (when dealing with key word arguments)

test_func(a=1, b=2)

In [None]:
test_func(b=2,a=1)

In [None]:
# Defining a function that accepts keyword arguments
def greet(age,name):
    print(f"Hello, my name is {name} and I am {age} years old.")

# Dictionary with keys that match the function's parameter names
person_info = {"name": "Alice", "age": 30}

# Unpacking the dictionary using ** to pass it to the function
greet(**person_info)


In [None]:
# Merging two dictionaries using **
dict1 = {"a": 1, "b": 2}
dict2 = {"c": 3, "d": 4}

# Unpacking both dictionaries into a new one
combined_dict = {**dict1, **dict2}

print(combined_dict)  # Output: {'a': 1, 'b': 2, 'c': 3, 'd': 4}


**Combining `*args` and `**kwargs`**

You can use both `*args` and `**kwargs` in the same function to handle any number of positional and keyword arguments.

In [None]:
#### Example:

def complete_info(*args, **kwargs):
    """Handles both positional and keyword arguments."""
    print("Positional arguments:", args)
    print("Keyword arguments:", kwargs)

complete_info(1, 2, 3, name="Alice", age=30)




## Local and Global Variables

**Local Variables**:
- Variables declared **inside** a function are **local** to that function. They only exist while the function is running and are not accessible outside the function.
  
**Global Variables**:
- **Global variables** are declared **outside** any function and can be accessed from anywhere in the script.
- To modify a global variable inside a function, you must explicitly use the `global` keyword; otherwise, the function will treat the variable as local.


In [None]:
# Example using an `immutable object`
x = 10  # Global variable

print(f'global id of x: {id(x)}')
y = 20 # Global variable


def my_function():
    x = 5  # Local variable (only within the function)
    print("Inside function:", x)
    print(f'local id of x: {id(x)}')
    print(y)

    u = 'not working' # `u` is declared locally within the scope of the function



In [None]:
my_function()


In [None]:
u

In [None]:
print("Outside function:", x)  # Global variable remains unchanged

In [None]:
# Modifying a global variable

x = 10  # Global variable
print(f'id globally: {id(x)}')

def modify_global():
    global x  # Referring to the global variable
    x = 5  # Modifies the global variable
    print("Inside function:", x)
    print(f'id locally: {id(x)}')

modify_global()
print("Outside function:", x)  # Global variable is now modified

# TODO: Why are the IDS different?


In Python, integers are immutable objects. 

When you assign a new value to x, Python creates a new object and binds x to this new object.

Each integer has a unique memory address.


## Passing by reference

In Python, the behavior of how objects are passed into functions is best described as "pass by object reference" (or "pass by assignment"). 

This means that the function receives a reference to the object, not a copy of the object. However, whether changes to the object affect the original depends on whether the object is mutable or immutable.




**Are All Mutable Objects Passed by Reference?**

Yes, in Python, mutable objects are passed by reference in the sense that the function receives a reference to the same object in memory.  However, whether the original object is changed depends on how you modify the object.



**Side Effects**:
- A **side effect** occurs when a function modifies something outside its scope, such as changing a global variable, modifying a list passed by reference, or printing to the console.

Example of a side effect (modifying a list passed by reference)

In [None]:
# Method executes in place

my_list = [1,2,3]
my_list.append(10)
print(my_list)

In [None]:
# Example of method that doesn't execute in place

text='lower-case-text'
text.upper()
print(text)

In [None]:
id?

In [None]:
def append_item(local_my_list):
    print(f'id locally: {id(local_my_list)}')
    local_my_list.append(4)      #The variable name does not stop the list `numbers` from being updated

numbers = [1, 2, 3] # Global variable
print(f'id globally: {id(numbers)}')
append_item(numbers)  # This modifies the original list
print(numbers) 


In [None]:
# Greeting is an `immutable object`
def modify_string(my_string):
    my_string += " world"
    print(f'local id of greeting: {id(my_string)}')
    print(f'local value of greeting: {my_string}')


greeting = "hello"

print(f'global id of greeting: {id(greeting)}')

modify_string(greeting)


In [None]:
my_string

In [None]:
print(greeting)  


In [None]:
my_string = my_string + " world"

my_string += " world" 

In [None]:
y = 'test'

In [None]:
y = y + '_test'

In [None]:
y

In [None]:
# Mutable objects can still be reassigned values
y = 5
y = 10

In [None]:
y

In [None]:
y

## Scope

### Scope:
Scope refers to the region in the code where a variable is accessible. Python has different types of scopes, and they determine where you can use variables.

Types of Scope:
- Local Scope: Variables defined within a function are local to that function and are not accessible outside of it.
- Enclosing (Nonlocal) Scope: Variables defined in outer functions are accessible to inner (nested) functions, but not globally.
- Global Scope: Variables defined at the top level of the module or script are accessible throughout the script.
- Built-in Scope: Python's built-in names like len, int, etc., are always available.


In [None]:
# Bad coding practice
global_var = 10

def checking_vars(y):
    print(global_var)
    print(y)
    print(id(global_var))
    print(id(y))
    return None

checking_vars(global_var)

In [None]:
from copy import copy

In [None]:
y = [1,2,3,4]

updated_y = copy(y)

In [None]:
id(y)

In [None]:
id(updated_y)

In [None]:
updated_y.append([1,2,3])

In [None]:
updated_y.extend([1,2,3])

In [None]:
updated_y

In [None]:
y

##  First-Class Functions:

Python treats functions as first-class citizens (or first-class objects), meaning that functions can be:

- Assigned to variables.
- Passed as arguments to other functions.
- Returned from other functions.



In [None]:
sum([1,2,3])

In [None]:
sum

In [None]:
# Example of Scope
def outer_f():
    
    y = "value" # Free Variable

    def inner_f():
        print(y)

    
    return inner_f()


In [None]:
y 

In [None]:
func = outer_f()

In [None]:
y

In [None]:
inner_f_return = outer_f()

In [None]:
inner_f_return

## Closure 

A closure occurs when a nested function "remembers" the variables from its enclosing scope even after the outer function has finished executing. Closures depend heavily on how Python handles scope.

Closure is created when an inner function refers to variables in its enclosing (nonlocal) scope and that function is returned or used outside the outer function.


In [None]:
# Closure
def outer_f():
    
    y = "value" # Free Variable

    def inner_f():
        print(y)
    
    return inner_f



In [None]:
y

In [None]:
# my_func is a variable which equates to a function that can be executed at a later point
my_func = outer_f()

In [None]:
my_func

In [None]:
my_func()

In [None]:
hex(id(my_func))

In [None]:
print(my_func)

In [None]:
print(my_func.__name__)

In [None]:
my_func()
my_func()
my_func()


In [None]:
u = 'hello'

In [None]:
def add_name():
    new_name = u + 'name' # u is a global variable
    return new_name # new_name is a local variable

In [None]:
add_name()

In [None]:
# Closure with arguments
def outer_f(msg):
    
    y = msg # Free Variable

    def inner_f():
        print(y) # y is a global variable relative to inner_f
    
    return  inner_f



In [None]:
hi_func = outer_f('Hi')
hello_func = outer_f('Hello')


In [None]:
hi_func()

In [None]:
hello_func()

# Section D: Control Flow

## 1. Basic Control Flow

`if` Statement: allows you to execute a block of code if a condition is True or is not None



In [None]:
x = 10
x > 5

In [None]:
# Example:
x = 10
if x > 5:
    print("x is greater than 5")



In [None]:
# Example:
x = None
if x:
    print("x is not None")


In [None]:
x = -100
if x:
    print("x is not None")


`else` Statement: used to execute a block of code when the condition in the `if` statement is not met.

In [None]:
# Example:
x = 3
if x > 5:
    print("x is greater than 5")
else:
    print("x is not greater than 5")



`elif` statement allows you to check multiple conditions after an `if`.



In [None]:
# Example:
x = 100
if x > 10:
    print("x is greater than 10")
elif x > 5:
    print("x is greater than 5 but less than or equal to 10")
elif x > 10:
    print("x is greater than 5 but less than or equal to 10")
elif x > 20:
    print("x is greater than 5 but less than or equal to 20")
elif x > 30:
    print("x is greater than 5 but less than or equal to 30")
else:
    print("Previous conditions are not met")

# Note: The order of the elif condition is important!

In [None]:
# Example:
x = -10
if x > 10:
    print("x is greater than 10")
elif x > 5:
    print("x is greater than 5 but less than or equal to 10")
else:
    print("x is 5 or less")


In [None]:
# Example:
x = 15
if x > 10:
    print("x is greater than 10")

if x > 5:
    print("x is greater than 5 but less than or equal to 10")
else:
    print("x is 5 or less")


`break` statement is used to **exit** a loop


In [None]:
# Example:
for i in range(1, 6):
    if i == 3:
        break  # Exits the loop when i equals 3
    print(i)


`continue` statement is used to skip the rest of the current iteration and move to the next one.


In [None]:
# Example:
for i in range(1, 6):
    if i == 3:
        continue  # Skips printing. Stops immediately.
        print('skipping')  # Does not print. Continues to the next iteration.
    print(i)


In [None]:
list(range(6))

In [None]:
range?

`pass` statement does nothing. It's used as a placeholder where you want to leave code for later.


In [None]:
x = 0
if x>10:
    pass # Pass is requires here.
else:
    print('hi')

In [None]:
# Example:
x = 10
for i in range(10):
    pass  # Placeholder to be implemented later
    print(i)


In [None]:
# Example:
x = 10
if x > 5:
    pass
else:
    print("x is less than or equal to 5")


## Nested `if` Statements


In [None]:
# You can nest `if` statements inside other `if` statements to check more complex conditions.

# Example:
x = 20
if x > 10:
    print("x is greater than 10")
    if x > 15:
        print("x is also greater than 15")
    else:
        print("x is 15 or less")
else:
    print("x is 10 or less")


```python
if condition1:
    if condition2:
        if condition3:
            # Do something
            pass
        else:
            # Handle case
            pass
    else:
        # Handle case
        pass
else:
    # Handle case
    pass
```

In [None]:
not False

In [None]:
if not condition1:
    # Handle case
    return
if not condition2:
    # Handle case
    return
if not condition3:
    # Handle case
    return

# Do something if all conditions are met


In [None]:
has_passport = False
has_driving_license = True

if has_passport:
    print("Can travel")
else:
    if has_driving_license:
        print("Can travel")


In [None]:
has_passport = False
has_driving_license = True

if has_passport or has_driving_license:
    print("Can travel")


In [None]:
age = 22
is_student = False
has_special_permission = True

In [None]:
if age > 18:
    if is_student or has_special_permission:
        print("Eligible for discount")


In [None]:
if age > 18 and (is_student or has_special_permission):
    print("Eligible for discount")


In [None]:
is_holiday = True
has_work_to_do = False

if not is_holiday:
    if has_work_to_do:
        print("Go to work")


In [None]:
is_holiday = True
has_work_to_do = False

if not is_holiday and has_work_to_do:
    print("Go to work")
else:
    print("stay home bake some cake")


## `while` Loop with Control Flow


In [None]:
# The `while` loop will continue to execute as long as the condition is True.

# Example:
i = 1
while i <= 5:
    print(i)
    i += 1  # Increment i to avoid infinite loop


In [None]:
# Using `break` and `continue` in a `while` Loop
i = 0
while i < 10:
    i += 1
    if i == 3:
        continue  # Skip the iteration when i equals 3
    if i == 5:
        break  # Exit the loop when i equals 5
    print(i)


In [None]:
# You can use an `else` clause with loops. The `else` block runs when the loop is not terminated by a `break`.

# Example:
for i in range(1, 5):
    print(i)
else:
    print("Loop completed without hitting a break")


In [None]:

# If a break occurs, the else block is skipped.
for i in range(1, 5):
    if i == 3:
        break
    print(i)
else:
    print("This will not be printed because the loop was broken")



In [None]:

# If a break occurs, the else block is skipped.
for i in range(2,4):
    print(f'i is:{i}')
    if i == 3:
        print('breaking')
        break
    
for n in range(10): # This if statement will not be executed after the break
    print(n)


In [None]:

# If a break occurs, the else block is skipped.
for i in range(2,10):
    print(f'i is:{i}')
    if i == 3:
        print('breaking')
        break
    
    for n in range(10): # This if statement will not be executed after the break
        print(n)


## One-line `if` Statements (Ternary Operator)

Python allows you to write compact `if-else` statements using the ternary operator.


In [None]:

# Example:
x = 10

# <Value if true>, <condition>, <value if false>
result = "x is even" if x % 2 == 0 else "x is odd"
print(result)


In [None]:
"x is even" if x==10 else "x is not 10"

In [None]:
if x % 2 == 0:
    print("x is even")
else:
    print("x is not 10")

In [None]:
# ### 3.6 Combining Multiple Conditions with `and`/`or`
# You can combine conditions using `and`/`or`.

# Example:
x = 12
if x > 10 and x < 20:
    print("x is between 10 and 20")

if x == 10 or x == 12:
    print("x is either 10 or 12")

## Summary
- We covered basic control flow: `if`, `else`, `elif`
- Explored more advanced control flow concepts like `break`, `continue`, `pass`
- We also went over expert topics such as nested `if`, using loops with `else`


# Section E: Python Exception Handling: try, except, raise

- `try` block: Tests code for errors.
- `except` block: Catches and handles exceptions.
- `else` block: Runs if no exceptions occur.
- `finally` block: Always runs, useful for cleanup.
- `raise`: Used to raise custom or built-in exceptions.



In [None]:
# Basic Try-Except Block

# The `try` block lets you test a block of code for potential errors. The `except` block allows you to handle the error gracefully without crashing the program.
n=0
try:
    int('hi')
    result = 10 / n  # Division by zero causes an exception
except Exception as e: # Generic Exception
    print(f"Error is: {e}") # Printing the exception
    

In [None]:
ZeroDivisionError?

In [None]:
result = 10 / n  # Division by zero causes an exception


In [None]:
SyntaxError?

In [None]:
print(type(ZeroDivisionError))

In [None]:
result = 10 / 0  

In [None]:
int("not_a_number")

In [None]:
# You can catch multiple types of exceptions by specifying multiple `except` blocks.

try:
    result = int("not_a_number")  # This will raise a ValueError
except ValueError:
    print("Caught a ValueError!")
except ZeroDivisionError:
    print("Caught a ZeroDivisionError!")  # This won't be executed
    


# Section F: Iterable Comprehension
List comprehensions allow you to create a list based on existing iterables like lists or ranges.


## List Comprehension 
```
[expression for item in iterable if condition]
```

In [None]:
l = []
for i in range(10):
    if i%2 == 0:
        l.append(i)

In [None]:
squares = [
    x**2 # The value you are appending to your list
    for x in range(10) # Your iterator
]
print(squares)


In [None]:
range?

In [None]:
evens = [
    x # This is what we append to the list
    for x in range(10)  # This is what we iterate over 
    if x % 2 == 0 # A condition for when we append
]
print(evens)


In [None]:
evens = [
    x # This is what we append to the list
    if x % 2 == 0 # A condition for when we append
    else 'hi' # Your Else value
    for x in range(10)  # This is what we iterate over 
]
print(evens)


### Nested List Comprehension

In [None]:
[
    x for x in range(5)
]

In [None]:
matrix = [
    1 # We are appending `1` to our list
    for _ in range(3) # Our first iterable
]
print(matrix)


In [None]:
matrix = [
    [x for x in range(5)] # We are trying to append this item to our list, however it's another list comp
    for _ in range(3) # Our first iterable
]
print(matrix)


In [None]:
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]
matrix

In [None]:
l = []
for row in matrix:
    for num in row:
        l.append(num)
l

In [None]:
flattened = [
    num # What we are appending to our list
    for row in matrix  # Our first for loop
    for num in row    # This our second loop
]
print(flattened)


In [None]:
flattened = [num for row in matrix  for num in row ]
print(flattened)


In [None]:
def square(x):
    return x ** 2

squares = [square(x) for x in range(5)]
print(squares)


## Dict Comprehension
```
{key_expression: value_expression for item in iterable if condition}
```

In [None]:
s = {1,2,4,5}

In [None]:
k = {
    'h':1,
    'm':2,
}

In [None]:
type(s)

In [None]:
d = {
    'name':"Josh",
    'name':"Bipin",
    'age':10,
}

In [None]:
d

In [None]:
# In place operation
# Creating new key:value pairs in my dictionary
d['another_key'] = 'example'

In [None]:
d 

In [None]:
# Method 2 of operating 
d.update({'example2': 'woah it works'})

In [None]:
d

In [None]:
s = {1:2}

In [None]:
s

In [None]:
type(s)

In [None]:
s = {1,2}

In [None]:
s

In [None]:
type(s)

In [None]:
d

In [None]:
d = {}


In [None]:
d

In [None]:
for i in range(10):
    # The use of `[]`
    d[i] = i**2 # Creating new key:value pairs in my dictionary
d

In [None]:
# Use of `{}` instead of `[]`
# Use of `:` 
even_squares_dict = {
    x: x**2 # This is what we are "adding" to the dictionary
    for x in range(10)  # Our iterable
    if x % 2 == 0 # Our condition
}
print(even_squares_dict)


##  Set Comprehension
Set comprehensions work similarly to list comprehensions but generate a set, which contains only unique elements.


In [None]:
# Not using a key:value pair 
squares_set = {x**2 for x in range(10)}
print(squares_set)


In [None]:
9 in squares_set

In [None]:
l = [i for i in range(10)]

In [None]:
l

In [None]:
9 in l

In [None]:
# Quick reminder, the tuple object itself is immutable 
# However it can contain mutable objects like a list
# There is no look up operation here
([1,2,3,],2,1,1,2,3)

In [None]:
9 in squares_set

In [None]:
9 in l

## Tuple Comprehension

Tuple comprehensions don't exist directly in Python, but you can use generator expressions to create a tuple-like sequence. Generator expressions are like list comprehensions but with parentheses.



In [None]:
# Instead of .append() it is using yield
g = (x**2 for x in range(10))
g

In [None]:
next(g)

In [None]:
[1,2,3,4,5]

# Section G: Generators

Generators are a type of iterable in Python, but unlike lists or tuples, they do **not store their contents in memory** all at once.

Instead, they **generate items on the fly and return them one at a time**, making them very **memory efficient**.


**Lazy Evaluation**: Generators don't evaluate values until they are needed, which improves performance and responsiveness in cases where you don't need the entire result set at once.



A generator is created using a function with the `yield` statement, which pauses the function and remembers its state.

The function resumes where it left off each time it is called.


In [None]:
# A function that generates numbers from 0 to n-1:
# Returning 1 item at a time
def count_up_to(n):
    count = 0
    while count < n:
        yield count
        count += 1

count_up_to(10)

In [None]:
y = count_up_to(10)

In [None]:
# A function that generates numbers from 0 to n-1:
# Returning 1 item at a time
def count_up_to_for(n):
    for i in range(n):
        yield i

In [None]:
i = ('hi' for i in range(10))

In [None]:
set(l)

In [None]:
generator(l)

In [None]:
next(i)

In [None]:
dir(i)

In [None]:
def create_gen(n):
    return (i for i in range(n))


In [None]:
t = create_gen(10)

In [None]:
t

In [None]:
next(t)

In [None]:
t = count_up_to_for(10)
hex(id(t))

In [None]:
hex(id(t))

In [None]:
t

In [None]:
# TODO: How is the id different from the memory (it's the same, using hexadecimal)

In [None]:
# A function that generates numbers from 0 to n-1:
def count_up_to(n):
    count = 0
    while count < n:
        yield count
        count += 1

In [None]:
# Generators are immutable 
t = count_up_to(10)
t.__hash__()

In [None]:
l = [1,2,3,4]
l.__hash__()

In [None]:
type(t)

In [None]:
counter = count_up_to_for(6)
counter

In [None]:
 counter_l = list(counter)

In [None]:
list(counter_l)

In [None]:
 list(counter)

In [None]:
counter

In [None]:
counter = count_up_to_for(6)

In [None]:
print(next(counter)) 
print(next(counter)) 
print(next(counter)) 


In [None]:
# You cannot index over generator
counter[0]

In [None]:
counter = count_up_to_for(6)
# You can iterate over a generator like any other iterable.
for num in counter:
    print(num) 


Just like you can use list comprehensions, you can create generator expressions using parentheses `()` instead of square brackets `[]`.


In [None]:
# Example:
gen_exp = (x**2 for x in range(5))
print(gen_exp)  


In [None]:
# Generators return values one by one when requested:
for value in gen_exp:
    print(value)


## Pros of Generators

**Memory Efficiency**

Generators produce items only when needed, so they are useful for handling large datasets that would not fit in memory.


In [None]:
# Example: Generating a large sequence without storing it in memory
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1


In [None]:
# You can now generate as many numbers as you want without creating a huge list.
large_gen = infinite_sequence()


In [None]:
# Note: We don't store or load the whole sequence into memory!
print(next(large_gen))
print(next(large_gen))
print(next(large_gen))


## Cons of Generators


### One-time Use


In [None]:
# Generators can only be iterated once. After they are exhausted, you need to recreate them if you need the data again.

gen = (x**2 for x in range(3))


In [None]:
print(list(gen)) 


In [None]:
list(gen)

In [None]:
next(gen)

Lack of Random Access / Debugging

Unlike lists, you can't randomly access elements in a generator by index. You must iterate through them sequentially.
For example, `gen[2]` would raise a TypeError because generators don't support indexing.


No Length Information

You can't easily know the length of a generator without iterating through all of its items, unlike lists where `len(list)` is available.


## When to Use Generators

For Large Datasets

If you need to process a **large dataset** or **infinite stream** of data without loading everything into memory, generators are the best choice.


In [None]:
# Example: Reading large files line by line using a generator
def read_large_file(file_path):
    with open(file_path) as file:
        for line in file:
            yield line


In [None]:
# ### 5.2 When You Don't Need All Data at Once
# If you only need to process part of a result set (e.g., first few items or stopping early), generators allow you to access data lazily, improving performance.

# Example: Generator for prime numbers
def prime_numbers(limit):
    for num in range(2, limit):
        for i in range(2, int(num ** 0.5) + 1):
            if num % i == 0:
                break
        else:
            yield num

# Using the prime number generator:



In [None]:
primes = prime_numbers(20)

In [None]:
print(list(primes)) 


In [None]:
next(primes)

## Summary

- **Generators** provide a memory-efficient way to handle large datasets or infinite sequences by yielding values on demand.
- **Pros** include memory efficiency, lazy evaluation, and simpler code for iterative algorithms.
- **Cons** include single-use, lack of random access, and more challenging debugging.
- **Use Cases**: Ideal for working with large datasets, lazy evaluation, or when processing pipelines where memory efficiency is crucial.


# Section H: Decorators functions

Basic Structure of a Decorator functions

A decorator is just a function that takes another function as an argument, 
and returns a new function that adds additional behavior before or after the original function is called.



In [None]:
# Simple decorator without *args and **kwargs
def simple_decorator(func):
    def wrapper():
        print("Before function call")
        func() # Execute the function
        print("After function call")
    return wrapper


In [None]:
@simple_decorator
def say_hello():
    print("Hello!")


In [None]:
say_hello()

In [None]:
@simple_decorator
def adding_nums():
    print("adding nums")


In [None]:
adding_nums()

In [None]:
# Closure
def outer_f():
    
    y = "value" # Free Variable

    def inner_f():
        print(y)
    
    return inner_f # inner_f is not executed.



In [None]:
# my_func is a variable which equates to a function that can be executed at a later point
my_func = outer_f()

In [None]:
my_func()

In [None]:
my_func()
my_func()
my_func()


In [6]:
# Closure with arguments
def decorator_function(msg):
    def wrapper_function():
        print(msg)
    
    return wrapper_function



In [None]:
hi_func = decorator_function('Hi')

In [None]:
hi_func()

In [None]:
# Instead of printing a message, we can execute a function!

In [7]:
# Closure with arguments
def decorator_function(executable_func):
    def wrapper_function():
        print("Decorating something cool")
        return executable_func() # Return the called/executed function

    return wrapper_function # Return the wrapper function which returns an executed function



In [8]:
def adding_nums():
    return "returning sum"

In [9]:
adding_nums.__name__

'adding_nums'

In [12]:
# Decorator is accepting adding_nums as a parameter. 
# Decorator prints a statement before executing the function adding_nums.
adding_nums_decorated = decorator_function(adding_nums)

In [13]:
adding_nums_decorated.__name__

'wrapper_function'

In [None]:
# This method creates a new decorated object/function


In [None]:
adding_nums_decorated

In [None]:
adding_nums_decorated()

In [None]:
# This method creates decorates your object/function inplace
@decorator_function
def adding_nums():
    return "returning sum"

In [None]:
adding_nums()

In [14]:
@decorator_function
def subs_nums():
    return "returning diff"

In [15]:
subs_nums.__name__

'wrapper_function'

In [None]:
adding_nums()


In [None]:
def adding_nums():
    return "returning sum"

adding_nums()

In [None]:
adding_nums()

In [None]:
adding_nums_decorated()

In [None]:
hifunc = decorator_function('hi')
hifunc()

In [None]:
adding_func = decorator_function(adding_nums)

In [None]:
adding_func()

In [None]:
# This is syntactically the same as above: `adding_func = decorator_function(adding_nums)`
@decorator_function
def adding_nums():
    return "returning sum"


In [None]:

adding_nums()

In [16]:
@decorator_function
def adding_nums(a,b):
    return sum([a,b])


In [17]:
adding_nums(1,2)

TypeError: decorator_function.<locals>.wrapper_function() takes 0 positional arguments but 2 were given

In [None]:
# Closure with arguments
def decorator_function(executable_func):
    def wrapper_function():
        print("Decorating something cool")
        return executable_func() # Return the called/executed function

    return wrapper_function # Return the wrapper function which returns an executed function



@decorator_function
def adding_nums(a,b):
    return sum([a,b])



In [None]:
adding_nums(1,2) # OH NO, how will i do this

In [None]:
adding_nums.__name__

- A function abstracts/isolates a code logic, and returns values

- Wrapper function is the callable function/object that gets returned
- kwargs uses **, which allows key word arguments to be passed dynamically
- The wrapper function is performing the "decoration" before/after the exectuble function is called

In [18]:
# Closure with arguments
def better_decorator_function(executable_func):
    
    def wrapper_function(*args, **kwargs): # Allows us to unpack dynamically
        print("Decorating something cool")
        return executable_func(*args, **kwargs)

    return wrapper_function



In [19]:
@better_decorator_function
def adding_nums(a,b):
    return sum([a,b])


In [20]:
adding_nums(1,2)

Decorating something cool


3

In [None]:
@better_decorator_function
def adding_nums_new(*args):
    return sum(args)

adding_nums_new(1,2,2,3,4)

In [None]:
# Simple decorator without *args and **kwargs
def simple_decorator(func):
    def wrapper():
        print("Before function call")
        func()
        print("After function call")
    return wrapper


In [None]:
@simple_decorator
def say_hello():
    print("Hello!")

say_hello()


In [None]:
@simple_decorator
def greet_person(name):
    print(f"Hello, {name}!")



In [None]:
# Trying to call the function with an argument
greet_person("Alice")


In [None]:
# Decorator with *args and **kwargs
def flexible_decorator(func):
    def wrapper(*args, **kwargs):
        print("Before function call")
        result = func(*args, **kwargs)  # Pass along arguments to the original function
        print("After function call")
        return result
    return wrapper


In [None]:
@flexible_decorator
def say_hello():
    print("Hello!")

# Call the function
say_hello()

In [None]:
@flexible_decorator
def greet_person(name):
    print(f"Hello, {name}!")

# Call the function with an argument
greet_person("Alice")


In [None]:
import time  # We'll use the time module to measure execution time

# In this example we are executing the function in the wrapper.
def time_decorator(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()  
        print(f"Start time is '{start_time}'")

        result = func(*args, **kwargs)  # Execute the wrapped function
        end_time = time.time()  # End time after function execution
        print(f"Function '{func.__name__}' executed in {end_time - start_time:.6f} seconds")
        return result  # Return the result of the executed function
    return wrapper  # Return the wrapper function



In [None]:
@time_decorator
def example_function(x):
    time.sleep(x)  # Simulate a function taking 'x' seconds to execute
    return None


In [None]:
example_function(10)

In [None]:
@time_decorator
def adding(a,b):
    return a+b

In [None]:
adding = time_decorator(adding(1,100))

In [None]:
# You can also stack decorators
@flexible_decorator
@time_decorator
def example_function(x):
    time.sleep(x)  # Simulate a function taking 'x' seconds to execute
    return None


example_function(2) 


### Complex use of `.format()`

**Reusing the same argument multiple times**
You can reference the same argument more than once by using its index.


In [None]:
print("The numbers are: {0}, {1}, and {0} again".format(1, 2))


**Mixing of positional and keyword arguments**, providing more flexibility in how you pass and reference variables.


In [None]:
print("The person is {0}, they are a {1}, and their hobby is {hobby}".format(
    "Alice",
    "Data Scientist",
    hobby="painting")
     )


**Aligning text (justification)**
You can align text using `.format()` with padding and justification options (`<`, `>`, `^` for left, right, and center alignment respectively).

In [None]:
print("{:<10} | {:^10} | {:>10}".format('Left', 'Center', 'Right'))

**Specifying width and precision for numbers**:
You can specify the width and precision for numbers easily in `.format()`. This is useful for aligning numbers or formatting them with decimal precision.

In [None]:
print("Pi to 3 decimal places: {0:.3f}".format(3.14159))

print("Pi padded: {0:10.5f}".format(3.14159))


In the formatting string `{x:y.zf}`, each part has a specific meaning:

- **`x`**: This is the **positional argument index**. It refers to the first argument passed to `.format()`. If you use `1`, it would refer to the second argument, and so on.

- **`y`**: This specifies the **minimum width** of the output. It means the total number of characters (including the decimal point and digits) the formatted number should take up. If the formatted number has than the specified width, it will be padded with spaces on the left.

- **`.zf`**: `z` specifies the **precision** for floating-point numbers. In this case, it means that the number should be displayed with 5 digits after the decimal point. `f` the **format type**, indicating that the number should be displayed as a **floating-point number**.

- `0`: Refers to the first argument, `3.14159`.
- `10`: The output will take up 10 characters, padded with spaces if needed.
- `.5`: The number will be displayed with 5 decimal places.
- `f`: The number is formatted as a floating-point number.



In [None]:
print("Pi padded: {0:10.5f}".format(3.14159, 1))


In [None]:
print("Pi padded: {0:10.1f}".format(1.23456, 3.14159))


**Number formatting (comma separator, percentage, etc.)**


In [None]:
print("Success rate: {:.2%}".format(0.85432))


In [None]:
# Comma separator for large numbers
print("The population is {:,}".format(1234567890))


In [None]:
print("{:*<10}".format("Test"))   # Left-aligned, fill with *


In [None]:
name = "Alice"
age = 30
salary = 90000.5

print("Name: {0}, Age: {1}, Salary: ${2:.2f}".format(name, age, salary))


In [None]:
# Decimal
print("The number is {:d}".format(42)) 

# Binary
print("Binary: {:b}".format(42)) 

# Octal
print("Octal: {:o}".format(42)) 

# Hexadecimal (lowercase)
print("Hexadecimal: {:x}".format(42))

# Hexadecimal (uppercase)
print("Hexadecimal: {:X}".format(42)) 


`s`: String type, used for formatting strings.


In [None]:
name = "Alice"
print("Hello, {:1}".format(name))  
print("Hello, {:s}".format(name))  


In [None]:
print("Scientific notation: {:e}".format(12345.6789))  
print("Scientific notation: {:E}".format(12345.6789)) 


In [None]:
from datetime import datetime
today = datetime.now()
today

In [None]:
# Datetime Formatting

print("Today is: {:%Y-%m-%d %H:%M}".format(today))  


# Section I: Object Oriented Programming


OOP is a programming paradigm based on the concept of "objects": 
- which can contain data (attributes)
- methods (functions)

It helps organize code by grouping related data and behaviors into objects.

#### Why is OOP Useful in Data Science?
- **Modularity**: Organizes code into reusable components (classes).
- **Reusability**: Classes can be reused for tasks like data preprocessing, model training, and evaluation.
- **Maintainability**: OOP encourages clean, structured code, making it easier to modify and scale data science projects.



Unlike Procedural programming practice which is centered on the procedures. Procedures can be used to retrieve data


**OOA (Object Oriented Analysis)**

The process of looking at a problem, system, or task and identifying the objects and interactions between those objects.
The analysis stage is all about what needs to be done.

**OOD (Object Oriented Design)**

The process of converting OOA requirements into an implementation specification


Objects can have properties and behaviours


**Person** can have properties such as *name*, *age*, *height*, *weight* ...

**Person** can have behaviours such as *walking*, *talking* ...



1. **Classes**: Blueprints for creating objects (e.g., data preprocessing pipelines).
2. **Objects**: Instances of classes (e.g., a specific model with specific parameters).
3. **Attributes**: Properties that hold data (e.g., features, target variables, model parameters).
4. **Methods**: Functions that operate on the data (e.g., training the model, evaluating it).


### Class Variable

- Class variables are shared across all instances of the class and are accessible using the class name




In [None]:
#Example of a class
class Student:
    roomnum = 100

    def __init__(self, roomnum):
        self.roomnum = roomnum

In [None]:
# When you instantiate the class. The __init__ method is automaticaly called. 
s1 = Student(200)

In [None]:
# Notice how this is not 100! 
s1.roomnum

In [None]:
# Notice how this is 100! 
Student.roomnum

In [None]:
#Example 2
class Circle:
    pi=3.141592653589793
    radius=2

    def getArea():
        area=(Circle.radius**2)*Circle.pi
        return area

# This is not related to an object. So the `self` is not required.
Circle.getArea()

In [None]:
# This is not related to an object. So the `self` is not required.
Circle.getArea()

<h3> Instance or Object Instance </h3>
<h5> An instance is a unique copy of a Class that represents an Object. When a new instance of a class is created </h5>

In [None]:
# instantiating (creating an instance of) my class, which creates an object
c1 = Circle()

In [None]:
c1.pi

In [None]:
c1.radius

In [None]:
Circle.getArea()

In [None]:
c1 = Circle()

In [None]:
c1.getArea()

In Python, `self` is used as the first parameter in instance methods of a class to represent the **instance** of the class. It allows you to access the attributes and methods of the object itself.

### Why Do We Need `self`?

- When you create an object (an instance of a class), each object can have its own set of data (attributes). The `self` keyword allows methods in the class to access and modify these attributes.
- Without `self`, Python wouldn’t know which object’s data you're referring to.
- For example, if you have two objects, `p1` and `p2`, both will have their own `name` attribute. `self` ensures that when you call `p1.set_name("Alice")`, only `p1`'s `name` is changed, not `p2`'s.



### What Happens Without `self`?

If you try to omit `self` when defining instance methods, you'll encounter the following problems:

1. **Method Won’t Work Properly**:
   - Without `self`, the method doesn't have a reference to the object, so it can't access or modify the instance's attributes.
   - You’ll often get unexpected behavior, as the method won’t know which instance it should operate on.

2. **Python Raises an Error**:
   - If you forget `self` in the method definition, and still try to call the method on an instance, Python will throw an error because the instance is implicitly passed as the first argument.


- Without `self`, the method doesn't know which instance it’s modifying. The line `name = name` only modifies the local variable, not any instance variable. This won’t do what you expect, because there's no link to the actual object `p`.


#### Correct Example with `self`:

- Here, `self.name` refers to the attribute `name` of the object `p`. Now, calling `p.set_name("Alice")` correctly sets `p.name` to `"Alice"`.


In [None]:
class Circle:
    pi=3.141592653589793
    radius=2

    def getArea(self):
        area=(Circle.radius**2)*Circle.pi
        return area

In [None]:
c = Circle()

In [None]:
c.getArea()

In [None]:
# Replacing Circle with `self`
class Circle:
    pi=3.141592653589793
    radius=2

    def getArea(self):
        area=(self.radius**2)*self.pi
        return area

In [None]:
c = Circle()

In [None]:
c.getArea()

In [None]:
class Person:
    population = 0  

    def __init__(self, name):
        self.name = name

        # This is NOT related directly with the object.
        Person.population += 1 

In [None]:
p1 = Person("Alice")

In [None]:
print(p1.population) 

In [None]:
p1.name

In [None]:
p2 = Person("Bob")

In [None]:
# Uses class variable population
print(p2.population) 

In [None]:
p2.name

In [None]:
p1.name

In [None]:
print(p1.population) 


In [None]:
Person.population

### Key Points:

- **`self` is a reference to the instance** of the class and is required to access the instance’s variables and methods.
- Without `self`, methods wouldn’t know which object they are supposed to operate on.
- When calling a method on an object (`p.set_name()`), Python automatically passes the instance (`p`) as the first argument, which is why the method must have `self` as the first parameter.

### Final Takeaway:
`self` is crucial because it links the method to the instance of the class. Without it, you wouldn’t be able to differentiate between different instances or properly manage instance variables in object-oriented programming.

## I.1 Constructors 
Constructors are a name for the function that is used to create an object of the class

This function gets called whenever a new object of that class is instantiated (created), using `()`.

It's purpose is to initialize all associated attributes/variables.

All classes have a function called `__init__()`, which is always executed when the class is being initiated. `()`.

`__init__()` should return None, not 'float'


In [None]:
from math import pi

In [None]:
class Circle:
    def __init__(self, color, radius):
        self.color = color
        self.area = None
        self.radius = radius

    def __str__(self):
        return f'This circle is {self.color} and area of {self.area}'

    def getArea(self):
        self.area=(self.radius**2)*pi
        return self.area


In [None]:
# The constructor is getting called
my_red_circle = Circle(color='red', radius=2)

In [None]:
my_red_circle.area

In [None]:
s1+s1

In [None]:
s1.__add__(s1)

In [None]:
s1= "12334"
dir(s1)

In [None]:
my_red_circle.__str__()

In [None]:
print(my_red_circle)

In [None]:
print(my_red_circle)

In [None]:
my_red_circle.getArea()

In [None]:
print(my_red_circle)

In [None]:
my_blue_circle = Circle(color='blue', radius=2)

In [None]:
my_blue_circle.radius

In [None]:
my_blue_circle.area

In [None]:
print(my_blue_circle.__str__())

In [None]:
print(my_blue_circle)

In [None]:
my_blue_circle.getArea()

In [None]:
print(my_blue_circle)

In [None]:
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

In [None]:
p1 = Person("John", 36)

print(p1.name)
print(p1.age)

In [None]:
p2 = Person("bob", 10)

print(p2.name)
print(p2.age)

Not all functions in Python are dunder methods. Here's the difference:

1. **Normal Methods (like `.append()`):**
   - These are regular methods that you can call directly on objects, like `list.append()`.
   - These methods are specific to certain objects (e.g., lists, strings) and are used for specific actions, like adding an item to a list with `.append()`.

2. **Dunder Methods (like `__init__` or `__str__`):**
   - These are special methods that Python calls **automatically** in certain situations.
   - For example, `__init__()` is called when you create an object (`my_object = MyClass()`), and `__str__()` is called when you print an object.

### Key Difference:
- Normal methods, like `.append()`, are **explicitly called** by you.
- Dunder methods are **automatically triggered** by Python to control how objects behave with basic operations (like creating, printing, or comparing them).

In [None]:
! pip install scikit-learn

In [None]:
import sklearn.preprocessing as skp

In [None]:
skp.StandardScaler()

In [None]:
skp.MinMaxScaler()

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


In [None]:
from sklearn.preprocessing import StandardScaler


In [None]:
data = pd.DataFrame({
    'feature1': np.random.randn(100)*100 + 100,
    'feature2': np.random.randn(100),
    'target': np.random.choice([0, 1], size=100)
})

In [None]:
# Create a Preprocessing class
class PreprocessingPipeline:
    def __init__(self):
        self.scaler = StandardScaler()

    # Method to scale features
    def scale_features(self, X):
        return self.scaler.fit_transform(X)
    
    # Method to split the data into train and test
    def split_data(self, X, y, test_size=0.2):
        return train_test_split(X, y, test_size=test_size, random_state=42)


In [None]:
preprocessor = PreprocessingPipeline()

In [None]:
data_processed = data.copy()

In [None]:
another_version = pd.DataFrame(
    preprocessor.scale_features(data[['feature1','feature2']]),
    columns=['feature1','feature2']
)

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1,2)
data[['feature1']].hist(ax = axes[0])
data_processed[['feature1']].hist(ax = axes[1])


Explain whether generators are iterable or not.

Can the membership operator (in) be performed on a generator? Show me an example of proving or disproving.


In [None]:
def gen(n):
    for i in range(n):
        yield i

In [None]:
gen(10)

In [None]:
# Indexable does not mean iterable

In [None]:
t = (i for i in range(10))

In [None]:
for value in t:
    print(value)

In [None]:
# we can do a membership operation on a generator object
7 in t

In [None]:
t[0]

In [None]:
X = data[['feature1','feature2']]

In [None]:
y=data['target']

In [None]:
X_train, X_test, y_train, y_test = preprocessor.split_data(X,y)

C1. Properties of Generators
Explain whether generators are iterable or not.

Can the membership operator (in) be performed on a generator? Show me an example of proving or disproving.

In [None]:
[1,2,3,{1,2,3}]

In [None]:
{(1,2,3), (1,3,5)}

In [None]:
{[1,2,3], [1,3,5]}

## Key Components of OOP in Data Science

- **1. Encapsulation**: Bundles data and methods into one unit (class), which helps hide complex logic while keeping the workflow simple.
- **2. Inheritance**: Allows you to build reusable classes and extend them to fit different tasks, children classes inherit attributes and methods from parent classes (e.g., trying different models with the same preprocessing pipeline).
- **3. Polymorphism**: Enables methods to operate on objects of different types, such as applying different models in the same way.
- **4. Abstraction**: Abstraction means hiding the complexity of the internal implementation and showing only the essential features of an object.
Focuses on what an object does, rather than how it does it. But every function does this... So i'm not covering it 😊

## I.1 Encapsulation in Model Training

- Bundling the variables (attributes) and methods (functions) that operate into a single unit, or class.

- Hides internal representation of an object, exposing only what is necessary (via public methods) while keeping the rest private.


In [74]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


class ModelTraining:
    def __init__(self, model):
        self.__model = model  # Accept any model (e.g., RandomForest, LogisticRegression)
        self.accuracy = None

    def __getmodel__(self):
        return print(self.__model)
        
    # Method to train the model
    def __train(self, X_train, y_train):
        self.__model.fit(X_train, y_train)
    
    # Method to evaluate the model
    def evaluate(self, X_test, y_test):
        y_pred = self.__model.predict(X_test)
        self.accuracy = accuracy_score(y_test, y_pred)
        print(f"Model Accuracy: {self.accuracy * 100:.2f}%")
        return self.accuracy



# TODO: How does it work under the hood

In [75]:
obj = ModelTraining(RandomForestClassifier)

In [76]:
obj.__getmodel__()

<class 'sklearn.ensemble._forest.RandomForestClassifier'>


In [None]:
rf_model = RandomForestClassifier()
trainer = ModelTraining(rf_model)
trainer.train(X_train, y_train)
trainer.evaluate(X_test, y_test)

# TODO: Discuss RandomForest in detail. XGBoost.

In [None]:
trainer.accuracy # Public attribute

In [None]:
trainer.__model # Private attribute

## I.2: Inheritance in Data Science

Inherit common functionality to create reusable pipelines. For example, create a pipeline that inherits from a **BaseModel** class and allows switching between different models without duplicating code.


Relies on the use of `super().__init__()`

In [77]:
class Parent:
    def __init__(self, name):
        self.name = name  

class ChildWithoutSuper(Parent):
    def __init__(self, name):
        pass  

class ChildWithSuper(Parent):
    def __init__(self, name):
        super().__init__(name) 



In [None]:
p1 = Parent('Dave')

In [None]:
p1.name

In [None]:
c1 = ChildWithoutSuper('Alice')

In [None]:
c1.name

In [None]:
c2 = ChildWithSuper("Bob")


In [None]:
c2.name# Works fine because super().__init__ was called

In [None]:
class A:
    def __init__(self):
        print("A's init")

class B(A):
    def __init__(self):
        super().__init__()
        print("B's init")

class C(A):
    def __init__(self):
        super().__init__()
        print("C's init")

class D(B, C):
    def __init__(self):
        super().__init__()  # Calls B's, C's, and A's __init__ based on MRO
        print("D's init")

# Note how A is the parent is printed first, even though D is instantiated (This is due to MRO)
d = D()  

In [None]:
class Parent:
    def speak(self):
        print("Parent speaks")

class Child(Parent):
    def speak(self):
        super().speak()  # Calls Parent's speak method
        print("Child speaks")



In [None]:
child = Child()


In [None]:
child.speak()

In [None]:
import sklearn 
sklearn.__version__

In [83]:
# Parent class for model pipelines
class BaseModelPipeline:
    def __init__(self, model):
        self.model = model
    
    def train(self, X_train, y_train):
        self.model.fit(X_train, y_train)
    
    def evaluate(self, X_test, y_test):
        y_pred = self.model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        print(f"Accuracy: {accuracy * 100:.2f}%")

In [84]:
from sklearn.ensemble import RandomForestClassifier

class RandomForestPipeline(BaseModelPipeline):
    def __init__(self, n_estimators=100):
        model = RandomForestClassifier(n_estimators=n_estimators)
        # super().__init__(model)

In [85]:
r = RandomForestPipeline()

In [86]:
r.model

AttributeError: 'RandomForestPipeline' object has no attribute 'model'

In [None]:
from sklearn.linear_model import LogisticRegression
class LogisticRegressionPipeline(BaseModelPipeline):
    def __init__(self):
        model = LogisticRegression()
        super().__init__(model) 

In [None]:
l = LogisticRegressionPipeline()

In [None]:
l.train()

In [None]:
# Create instances of both pipelines
rf_pipeline = RandomForestPipeline(n_estimators=200)
logreg_pipeline = LogisticRegressionPipeline()


In [None]:
# Using the methods of BaseModelPipeline
rf_pipeline.train(X_train, y_train)
rf_pipeline.evaluate(X_test, y_test)



In [None]:
# Using the methods of BaseModelPipeline
logreg_pipeline.train(X_train, y_train)
logreg_pipeline.evaluate(X_test, y_test)

In [None]:
class Animal:
    def speak(self):
        print("Animal makes a sound")

class Dog(Animal):
    def speak(self):
        print("Dog barks") 

d = Dog()
d.speak()  # Output: Dog barks

## Inheritence from multiple classes

Behavior as a String: Since ModelVariant inherits from str, its members (e.g., ModelVariant.xgboost) are treated as strings and can be compared or manipulated like regular strings.

Behavior as an Enum: By inheriting from Enum, ModelVariant has the behavior of an enumeration, providing features such as unique members, immutability, and iterable properties.



In [None]:
import collections

In [None]:
collections?

In [None]:
Enum?

In [87]:
from enum import Enum

class ModelVariant(str, Enum):
    xgboost = "XGBoost"
    catboost = "CatBoost"


In [105]:
ModelVariant.xgboost

<ModelVariant.xgboost: 'XGBoost'>

In [106]:
ModelVariant.xgboost

<ModelVariant.xgboost: 'XGBoost'>

In [108]:
ModelVariant.xgboost == 'XGBoost'

True

## I.3: Polymorphism in Data Science:

Polymorphism allows different models to be used interchangeably with the same interface. For example, you can switch models in a pipeline and use them with the same `train()` and `evaluate()` methods.


In [1]:
class Cat():
    def speak(self): # This method is called on the OBJECT
        print("Cat meows")


In [2]:
# List contains class
animals = [Cat, Cat]
for animal in animals:
    animal().speak()


Cat meows
Cat meows


In [102]:
# List contains objects, because it's being instantiated
animals = [Cat(), Cat()]
for animal in animals:
    animal.speak()


Cat meows
Cat meows


In [None]:
from sklearn.metrics import accuracy_score

# Base class for all model pipelines
class BaseModelPipeline:
    def train(self, X_train, y_train):
        raise NotImplementedError("Subclass must implement abstract method")

    def evaluate(self, X_test, y_test):
        raise NotImplementedError("Subclass must implement abstract method")


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

class RandomForestPipeline(BaseModelPipeline):
    def __init__(self):
        self.model = RandomForestClassifier()

    def train(self, X_train, y_train):
        self.model.fit(X_train, y_train)

    def evaluate(self, X_test, y_test):
        y_pred = self.model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        print(f"RandomForest Accuracy: {accuracy * 100:.2f}%")


class SVMPipeline(BaseModelPipeline):
    def __init__(self):
        self.model = SVC()

    def train(self, X_train, y_train):
        self.model.fit(X_train, y_train)

    def evaluate(self, X_test, y_test):
        y_pred = self.model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        print(f"SVM Accuracy: {accuracy * 100:.2f}%")


class KNNPipeline(BaseModelPipeline):
    def __init__(self):
        self.model = KNeighborsClassifier()

    def train(self, X_train, y_train):
        self.model.fit(X_train, y_train)

    def evaluate(self, X_test, y_test):
        y_pred = self.model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        print(f"KNN Accuracy: {accuracy * 100:.2f}%")


In [None]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Create a dataset
X, y = make_classification(n_samples=100, n_features=10, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

# List of different pipelines (polymorphism in action)
pipelines = [RandomForestPipeline(), SVMPipeline(), KNNPipeline()]

# Loop through each pipeline and train + evaluate the models (Here is Polymorphism!)
for pipeline in pipelines:
    pipeline.train(X_train, y_train)
    pipeline.evaluate(X_test, y_test)


## I.4: OOP in Practice — Data Science Model Pipeline

Let’s create a reusable model pipeline that handles the entire process: **preprocessing, training, and evaluation**. This can be particularly useful in a real-world scenario where you want to build different models using the same pipeline.



In [None]:
# Final pipeline class for building an ML model pipeline
class MLModelPipeline:
    def __init__(self, model, scaler=StandardScaler()):
        self.model = model
        self.scaler = scaler
    
    # Method for preprocessing data
    def preprocess(self, X_train, X_test):
        self.scaler.fit(X_train)
        X_train_scaled = self.scaler.transform(X_train)
        X_test_scaled = self.scaler.transform(X_test)
        return X_train_scaled, X_test_scaled
    
    # Method to train the model
    def train(self, X_train, y_train):
        self.model.fit(X_train, y_train)
    
    # Method to evaluate the model
    def evaluate(self, X_test, y_test):
        y_pred = self.model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        print(f"Model Accuracy: {accuracy * 100:.2f}%")
        return accuracy



In [None]:

rf_pipeline = MLModelPipeline(RandomForestClassifier(n_estimators=100))

X_train_scaled, X_test_scaled = rf_pipeline.preprocess(X_train, X_test)

rf_pipeline.train(X_train_scaled, y_train)

rf_pipeline.evaluate(X_test_scaled, y_test)



By integrating OOP principles into data science tasks, you can write code that is not only more organized and maintainable but also more scalable for larger projects! 😎


## Advanced Class Concepts

**Class Methods (`@classmethod`)**

**Overview**:
- A `classmethod` is a method that takes the class itself (`cls`) as the first argument, instead of the instance (`self`).
- It's useful when you want to work with class-level data or methods that affect the class itself rather than individual instances.

In [7]:
class DataModel:
    model_count = 0

    def __init__(self, name, model_count):
        self.name = name
        DataModel.model_count model_count

    @classmethod
    def get_model_count(cls):
        return cls.model_count


In [8]:

# Create two instances
model1 = DataModel("Model A", 40)
model2 = DataModel("Model B", 41)


In [9]:
DataModel.get_model_count()

41

The `from_diameter` method doesn’t require an instance to exist. It operates purely based on the input diameter and creates a new instance of Circle.


It provides an alternative way to create a Circle object, showing the flexibility of class methods in controlling the creation of instances.

In [10]:
class Circle:
    pi = 3.141592653589793

    def __init__(self, radius):
        self.radius = radius

    def getArea(self):
        return (self.radius**2) * Circle.pi

    @classmethod # gives u access to the class
    def from_diameter(cls, diameter):
        """Class method to create a Circle from the diameter."""
        radius = diameter / 2
        return cls(radius) # What's going on here? What tom foolery is occurring? 
        # cls = Circle
        # cls() this is the same as Circle()
        # A NEW OBJECT IS BEING INSTANTIATED


In [13]:
c = Circle(7)

In [14]:
R = Circle

In [17]:
c2 = R(7)

In [18]:
c2

<__main__.Circle at 0x107996fd0>

In [119]:
@classmethod?

[0;31mInit signature:[0m [0mclassmethod[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
classmethod(function) -> method

Convert a function to be a class method.

A class method receives the class as implicit first argument,
just like an instance method receives the instance.
To declare a class method, use this idiom:

  class C:
      @classmethod
      def f(cls, arg1, arg2, ...):
          ...

It can be called either on the class (e.g. C.f()) or on an instance
(e.g. C().f()).  The instance is ignored except for its class.
If a class method is called for a derived class, the derived class
object is passed as the implied first argument.

Class methods are different than C++ or Java static methods.
If you want those, see the staticmethod builtin.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     abstractclassmethod

In [117]:
Circle.pi

3.141592653589793

In [115]:
# Creating a Circle using the class method
circle_from_diameter = Circle.from_diameter(10)


In [None]:

# Checking the area of the new Circle
print(f"Radius: {circle_from_diameter.radius}")
print(f"Area: {circle_from_diameter.getArea()}")


**Static Methods (`@staticmethod`)**:

**Overview**:
- A `staticmethod` does not take `self` or `cls` as the first argument. 
- It's a regular function that happens to be in a class for organizational purposes.
- It's useful for utility methods that are related to the class but don’t need access to class or instance data.



In [23]:
class MathOperations:
    @staticmethod
    def add(a, b):
        return a + b

# Call the static method
print(MathOperations.add(3, 4))

7


In [26]:
MathOperations.add(7,2)

9

In [24]:
m = MathOperations()

In [25]:
m.add(1,3)

4

**Method Overloading**

**Overview**:
- Method overloading allows a class to have multiple methods with the same name but different arguments.
- Python does not support method overloading in the traditional sense, but you can achieve similar behavior by using default arguments or `*args` and `**kwargs`.

In [None]:
class Multiply:
    def product(self, a, b=1):
        return a * b

mul = Multiply()
print(mul.product(2, 3))  # Output: 6
print(mul.product(5))     # Output: 5 (uses default value of b)

Try it:
- Create a method that accepts a variable number of arguments using `*args`.
- Can you simulate method overloading?


**Operator Overloading**

**Overview**:
- Operator overloading allows you to define or customize how operators like `+`, `-`, `*`, etc., behave for your custom objects.
- You do this by defining special dunder methods like `__add__`, `__sub__`, and `__mul__`.

In [30]:
class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        return Vector(self.x + other.x, self.y + other.y)

    # What does the __repr__ dunder method do?
    def __repr__(self):
        return f"Vector({self.x}, {self.y})"

v1 = Vector(1, 2)
v2 = Vector(3, 4)


In [32]:
v3 = v1 + v2  # Using overloaded + operator
print(v3) 

Vector(4, 6)


In [59]:
class Vector:
    # Gets called automatically when class is instianted 
    def __init__(self, x, y):
        self.x = x
        self.y = y

    # Dunder. Gets called automatically through "SOME PROCESS". 
    def __add__(self, other): #__add__ gets called when we use `+`
        return Vector(self.x - other.x, self.y - other.y)

    def __sub__(self, other): #__add__ gets called when we use `-`
        return Vector(self.x - other.x, self.y - other.y)

    # What does the __repr__ dunder method do?
    # TODO: How is __repr__ different from __str__
    def __repr__(self):
        return f"Vector({self.x}, {self.y})"

    def __magnitude__(self):
        return (abs(self.x) + abs(self.y))**2

v1 = Vector(1, 2)
v2 = Vector(3, 4)
v3 = v1 + v2  # Using overloaded + operator
print(v3) 

Vector(-2, -2)


In [60]:
dir(v3)

['__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__magnitude__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__weakref__',
 'x',
 'y']

In [None]:
v3.__magnitude__()

In [55]:
magnitude(v3) # Will this work?
# TODO: Make it work! 

NameError: name 'magnitude' is not defined

In [52]:
len(v3) # This automatically called __len__

TypeError: object of type 'Vector' has no len()

In [37]:
v3 = v1 - v2  # What method gets called when i use `+`
print(v3)  # What method is called when i do print(v3)? 

Vector(-2, -2)


In [None]:
print

Try it:
- Overload an operator like `+` or `*` for a class.
- How would you overload `__eq__` to compare two custom objects?


**Abstract Base Classes (ABCs) and Interfaces**

**Overview**:
- `Abstract Base Classes` (ABCs) define a blueprint for other classes. You can't instantiate ABCs directly; they must be subclassed.
- ABCs help ensure that subclasses implement required methods.
- In Python, ABCs are defined using the `abc` module.

In [61]:
from abc import ABC, abstractmethod

class Model(ABC):
    @abstractmethod
    def train(self, data):
        pass

class LinearModel(Model):
    def __init__():
        return 'nice'
    # def train(self, data):
        # print("Training linear model with data")



In [None]:
lm = LinearModel()
lm.train([1, 2, 3])  

Try it:
- Define an abstract base class with at least one abstract method.
- What happens if a subclass doesn’t implement the abstract method?


### Wrap-Up:
1. **Class and Static Methods**: Why would you use a `staticmethod` instead of a `classmethod` or instance method?
2. **Properties**: Can you think of a case where you'd want to control how an attribute is set?
3. **Operator Overloading**: Can you overload the subtraction (`-`) operator for a custom class?
4. **ABCs**: What advantage do abstract base classes provide when designing large systems?

In [143]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from abc import ABC, abstractmethod

# Base Model Class
class BaseModel(ABC):
    @abstractmethod
    def train(self, X_train, y_train):
        pass

    @abstractmethod
    def predict(self, X):
        pass

    @abstractmethod
    def evaluate(self, X_test, y_test):
        pass

    @classmethod
    @abstractmethod
    def get_model_name(cls):
        pass

In [144]:
# RandomForestModel with Class Method
from sklearn.ensemble import RandomForestClassifier

class RandomForestModel(BaseModel):
    def __init__(self, n_estimators=100):
        self.model = RandomForestClassifier(n_estimators=n_estimators)
        self._accuracy = None

    def train(self, X_train, y_train):
        self.model.fit(X_train, y_train)
        print("RandomForestModel trained.")

    def predict(self, X):
        return self.model.predict(X)

    def evaluate(self, X_test, y_test):
        y_pred = self.predict(X_test)
        self._accuracy = accuracy_score(y_test, y_pred)
        print(f"Accuracy: {self._accuracy * 100:.2f}%")
        return self._accuracy

    @property
    def accuracy(self):
        return self._accuracy

    @classmethod
    def get_model_name(cls):
        return cls.__name__

In [145]:
# Dataset Class with Class Methods and Static Methods
class Dataset:
    def __init__(self, data):
        self.data = data

    @classmethod
    def from_csv(cls, filepath):
        data = pd.read_csv(filepath)
        return cls(data)

    @staticmethod
    def has_missing_values(data):
        return data.isnull().values.any()

    def preprocess(self):
        if self.has_missing_values(self.data):
            self.data.fillna(self.data.mean(), inplace=True)
            print("Missing values filled.")

    def get_features_targets(self, target_column):
        X = self.data.drop(target_column, axis=1)
        y = self.data[target_column]
        return X, y

In [146]:
# Main Pipeline Class
class MLPipeline:
    def __init__(self, data_path, target_column, model_class):
        # Uses the Dataset class as an OBJECT!
        self.dataset = Dataset.from_csv(data_path)
        self.target_column = target_column
        self.model = model_class()

    def run(self):
        self.dataset.preprocess()
        X, y = self.dataset.get_features_targets(self.target_column)
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
        self.model.train(X_train, y_train)
        self.model.evaluate(X_test, y_test)
        print(f"Completed pipeline for {self.model.get_model_name()}")

In [None]:
# pipeline = MLPipeline('data.csv', 'target', RandomForestModel)
# pipeline.run()

# Section J: File Manipulation in Python

This section will cover file manipulation, working with text files, using context managers (`with open`), JSON, the `json` package, pickle files, and an overview of serialization. We will also delve into text buffering and I/O buffering to understand how Python handles data transfers between files and memory.


### 1. **Working with Text Files**

#### Opening and Reading Files

The first step in working with files in Python is opening them. Python provides the `open()` function, which opens a file and returns a file object.


In [24]:
file = open('example.txt', 'r')  # 'r' means read mode
content = file.read()
print(content)
file.close()

This is an example of buffering in Python.


Add more lines.

Updating the file.


In [25]:
# file.read() is empty. Because the buffer is exhasted.
file.read()

ValueError: I/O operation on closed file.

In [17]:
# Running file.read() again.
content = file.read()
print(content)





Add more lines.


In [14]:
file.read?

[0;31mSignature:[0m [0mfile[0m[0;34m.[0m[0mread[0m[0;34m([0m[0msize[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Read at most n characters from stream.

Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
[0;31mType:[0m      builtin_function_or_method

In [15]:
file = open('example.txt', 'r')  # 'r' means read mode
content = file.read()
print(content)


This is an example of buffering in Python.

Update this file again.


In [9]:
content = file.read()
print(content)
file.close()  # Always close the file after reading or writing to it


ValueError: I/O operation on closed file.

### I/O Types in Python

- **stdin**: Standard input from the terminal (e.g., user typing).
- **stdout**: Standard output to the terminal (e.g., printed text).
- **stderr**: Standard error output (e.g., error messages).
- **File I/O**: Reading from and writing to files using `open()`, `read()`, `write()`.

### Key Differences
- **stdin, stdout, stderr**:
  - Used for terminal interactions.
  - Handle data between the program and the terminal.
- **File I/O**:
  - Used for disk operations.
  - Involves data stored in files.

In short: stdin, stdout, and stderr are types of I/O, but they deal with terminal/console interactions, not files. File I/O deals specifically with reading from and writing to files.

---

### Buffering in Python I/O

Buffering is a technique used to optimize the performance of input/output (I/O) operations by temporarily storing data in memory before it is sent to or read from a resource like a file, network socket, or terminal. Instead of writing or reading data byte by byte, buffering allows data to be handled in chunks, which reduces the number of time-consuming I/O operations.


- **Buffering**: 
  - Temporarily stores data in memory (RAM) before writing to or reading from disk/network.
  - Optimizes I/O performance.

- **Why Buffering?**:
  - Accessing RAM is faster than accessing disk:  Disk (hard drives or SSDs) is much slower compared to RAM, so the buffer helps minimize the number of times the system needs to access the disk by reading or writing in larger chunks.
  - Reduces the number of I/O operations.
  - Handles data in larger chunks for efficiency.

- **Where is the Buffer?**:
  - Located in RAM, not on disk or cache.


When you write data to a file, for example, it is first stored in a buffer (in memory). Once the buffer is full or flushed, the data is written to the disk.

---

### Streams in I/O

- **Stream**: 
  - Represents a continuous flow of data.
  - Source → destination data transfer.

- **Sources**:
  - Files on disk.
  - Keyboard input (stdin).
  - Network sockets.

- **Destinations**:
  - File output on disk.
  - Terminal output (stdout).
  - Data over a network.


The source of the stream could be:
- A file on disk.
- The keyboard input (in the case of stdin).
- Data from a network socket.

The destination of the stream could be:
- Output to a file on disk.
- Output to the terminal or console (in the case of stdout).
- Data sent over a network.

---

#### `with` Statement and Context Managers

Python has a more elegant way to work with files using **context managers**. This automatically handles closing the file for you, even if an exception is raised during the operation.


https://book.pythontips.com/en/latest/context_managers.html#context-managers


Context managers allow you to allocate and release resources precisely when you want to. The most widely used example of context managers is the with statement. Suppose you have two related operations which you’d like to execute as a pair, with a block of code in between. Context managers allow you to do specifically that. For example:


It’s most commonly used when working with resources such as files, network connections, and database connections, which need to be opened, used, and then properly closed to prevent resource leaks.


In [26]:
# When opening a file, you can specify a `buffering` argument:
with open('example.txt', 'w', buffering=1024) as file:  # 1KB buffer
    file.write("This is an example of buffering in Python.")

In [30]:
with open('example.txt', 'r') as file:
    content = file.read()
    print(content)  # No need to call file.close()

# The `with` statement ensures that the file is closed when the block of code is exited, either after reading or due to an error.


This is an example of buffering in Python.


In [34]:
with open('another.txt', 'r') as f:
    text = f.read()
    print(text)

ABC

DEF


In [35]:
f = open('another.txt', 'r')
text = f.read()
print(text)

ABC

DEF


In [36]:
# Exhasted file.
f.read()

''

**Reading Files Efficiently**

#### Reading Line by Line

If the file is large, loading it all at once with `read()` might not be ideal. Instead, you can read it line by line:


In [29]:
open?

[0;31mSignature:[0m
[0mopen[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfile[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmode[0m[0;34m=[0m[0;34m'r'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbuffering[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mencoding[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0merrors[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnewline[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mclosefd[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mopener[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Open file and return a stream.  Raise OSError upon failure.

file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file

In [28]:
with open('example.txt', 'r') as file:
    for line in file:
        print(line.strip())  # strip() removes the extra newline character


This is an example of buffering in Python.



- `'r'`: Read (default).
- `'w'`: Write (overwrite).
- `'a'`: Append (add to the end of the file).
- `'b'`: Binary mode (used with reading/writing non-text files, such as images, and pickles).

You can combine these modes, like `'rb'` for reading a binary file or `'wb'` for writing a binary file.

**JSON Files**

#### Introduction to JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. Python has a built-in `json` module for handling JSON data.

#### Writing JSON Data

Let’s create a Python dictionary and write it to a file in JSON format.

In [41]:
import json

data = {
    'name': 'Alice',
    'age': 30,
    'city': 'New York'
}

In [42]:
with open('data.json', 'w') as file:
    json.dump(data, file)

In [54]:
import json

data = ['abc','def',]

In [55]:
type(data)

list

In [56]:
with open('data.json', 'w') as file:
    json.dump(data, file)

In [57]:
with open('data.json', 'r') as file:
    data = json.load(file)
    print(data)




['abc', 'def']


In [51]:
data

['abc', 'def']

In [53]:
type(data)

list

In [50]:
# Convert Python dict to JSON string
json_string = json.dumps(data)
print(json_string)

["abc", "def"]


In [52]:
type(json_string)

str

In [None]:
# Convert Python dict to JSON string
json_string = json.dumps(data)
print(json_string)

# Convert JSON string back to Python dict
python_dict = json.loads(json_string)
print(python_dict)


## Serialization

In Python, serialization refers to the process of converting a data structure or object into a format that can be easily stored or transmitted and later reconstructed (deserialized). This process is essential for saving objects to a file, sending data over a network, or passing objects between different systems or programs.

Two common serialization methods in Python are Pickle and JSON.



## What is Pickling?

Pickling is a way to serialize Python objects so they can be saved to a file and later restored. This is useful when you want to save Python data structures like lists, dictionaries, or even custom classes.

### Writing Pickled Data

Here’s an example of writing an object to a pickle file:


### Disadvantages: 
- Python version
- Python package dependant
- Python-specific: Pickle can only be used with Python, so it's not suitable for sharing data with other languages.
- Security risk: Unpickling data from untrusted sources can be dangerous because it can execute arbitrary code, leading to security vulnerabilities.

## Advantages:
- Can handle almost all Python Objects

In [None]:
import pickle

data = {"name": "Bob", "age": 25, "city": "Los Angeles"}

with open('data.pkl', 'wb') as file:  # 'wb' for writing in binary mode
    pickle.dump(data, file)


In [None]:
with open('data.pkl', 'rb') as file:  # 'rb' for reading in binary mode
    data = pickle.load(file)
    print(data)


- **JSON** is a text format and is human-readable, but it only supports basic data types (strings, numbers, lists, and dictionaries).
- **Pickle** supports almost all Python data types but is not human-readable and is specific to Python.
