## Good Python Practices

This section includes some best practices to write Python code. 

### Write Meaningful Names

It is a bad practice to use vague names such as `x`, `y`, `z` in your Python code since they don't give you any information about their roles in the code. 

In [1]:
x = 10 
y = 5 
z = x + y  

Write declarative variables names instead. You can also add type hints to make the types of these variables more obvious.  

In [3]:
num_members: int = 10
num_guests: int = 5
sum_: int = num_members + num_guests  

### Assign Names to Values

It can be confusing for others to understand the roles of  some values in your code.

In [4]:
circle_area = 3.14 * 5**2

Thus, it is a good practice to assign names to your variables to make them readable to others. 

In [5]:
PI = 3.14
RADIUS = 5

circle_area = PI * RADIUS**2

### Name Complex Conditions to Make Your Code More Readable

Consider naming your conditions if your if-else statement is too complex. Doing so will improve the readability of your code.

In [21]:
# Confusing

x = -10
y = 5

if (x % 2 == 0 and x < 0) and (y % 2 != 0 and y > 0):
    print("Both conditions are true!")
else:
    print("At least one condition is false.")


Both conditions are true!


In [22]:
# Clearer

x = -10
y = 5

# Assign names to conditions
x_is_even_and_negative = x % 2 == 0 and x < 0
y_is_odd_and_positive = y % 2 != 0 and y > 0

if (x_is_even_and_negative) and (y_is_odd_and_positive):
    print("Both conditions are true!")
else:
    print("At least one condition is false.")


Both conditions are true!


### Avoid Duplication in Your Code

While writing code, we should avoid duplication because:
- It is redundant
- If we make a change to one piece of code, we need to remember to make the same change to another piece of code. Otherwise, we will introduce bugs into our code.

In the code below, we use the filter `X['date'] > date(2021, 2, 8)` twice. To avoid duplication, we can assign the filter to a variable, then use that variable to filter other arrays.

In [None]:
import pandas as pd 
from datetime import date

df = pd.DataFrame({'date': [date(2021, 2, 8), date(2021, 2, 9), date(2021, 2, 10)],
									'val1': [1,2,3], 'val2': [0,1,0]})
X, y = df.iloc[:, :1], df.iloc[:, 2]

# Instead of this
subset_X = X[X['date'] > date(2021, 2, 8)]
subset_y = y[X['date'] > date(2021, 2, 8)]

# Do this
filt = df['date'] > date(2021, 2, 8)
subset_X = X[filt]
subset_y = y[filt]

### Underscore(_): Ignore Values That Will Not Be Used

When assigning the values returned from a function, you might want to ignore some values that are not used in future code. If so, assign those values to underscores `_`.

In [1]:
def return_two():
    return 1, 2

_, var = return_two()
var

2

### Underscore “_”: Ignore The Index in Python For Loops

If you want to repeat a loop a specific number of times but don’t care about the index, you can use `_`. 

In [2]:
for _ in range(5):
    print('Hello')

Hello
Hello
Hello
Hello
Hello


### Python Pass Statement

If you want to create code that does a particular thing but don’t know how to write that code yet, put that code in a function then use `pass`.

Once you have finished writing the code in a high level, start to go back to the functions and replace `pass` with the code for that function. This will prevent your thoughts from being disrupted. 

In [None]:
def say_hello():
    pass 

def ask_to_sign_in():
    pass 

def main(is_user: bool):
    if is_user:
        say_hello()
    else:
        ask_to_sign_in()

main(is_user=True)

### Stop using = operator to create a copy of a Python list. Use copy method instead

When you create a copy of a Python list using the `=` operator, a change in the new list will lead to the change in the old list. It is because both lists point to the same object.

In [7]:
l1 = [1, 2, 3]
l2 = l1 
l2.append(4)

In [8]:
l2 

[1, 2, 3, 4]

In [9]:
l1 

[1, 2, 3, 4]

Instead of using `=` operator, use `copy()` method. Now your old list will not change when you change your new list. 

In [10]:
l1 = [1, 2, 3]
l2 = l1.copy()
l2.append(4)

In [11]:
l2 

[1, 2, 3, 4]

In [12]:
l1

[1, 2, 3]

### deepcopy: Copy a Nested Object

If you want to create a copy of a nested object, use `deepcopy`. While `copy` creates a shallow copy of the original object, `deepcopy` creates a deep copy of the original object. This means that if you change the nested children of a shallow copy, the original object will also change. However, if you change the nested children of a deep copy, the original object will not change.

In [13]:
from copy import deepcopy

l1 = [1, 2, [3, 4]]
l2 = l1.copy() # Create a shallow copy

In [14]:
l2[0] = 6
l2[2].append(5)
l2 

[6, 2, [3, 4, 5]]

In [15]:
# [3, 4] becomes [3, 4, 5]
l1 

[1, 2, [3, 4, 5]]

In [4]:
l1 = [1, 2, [3, 4]]
l3 = deepcopy(l1) # Create a deep copy

In [5]:
l3[2].append(5)
l3  

[1, 2, [3, 4, 5]]

In [6]:
# l1 stays the same
l1 

[1, 2, [3, 4]]

### Avoid Side Effects When Using List in a Function

When using a Python list as an argument in a function, you might inadvertently change its value. 

For example, in the code below, using the `append` method ends up changing the values of the original list. 

In [1]:
def append_four(nums: list):
    nums.append(4)
    return nums 

In [2]:
a = [1, 2, 3]
b = append_four(a)

In [3]:
a 

[1, 2, 3, 4]

If you want to avoid this side effect, use `copy` with a list or `deepcopy` with a nested list in a function. 

In [12]:
def append_four(nums: list):
    nums1 = nums.copy()
    nums1.append(4)
    return nums1 

In [13]:
a = [1, 2, 3]
b = append_four(a)
a 

[1, 2, 3]

### Enumerate: Get Counter and Value While Looping


Are you using `for i in range(len(array))` to access both the index and the value of the array? If so, use `enumerate` instead. It produces the same result but it is much cleaner. 

In [13]:
arr = ['a', 'b', 'c', 'd', 'e']

# Instead of this
for i in range(len(arr)):
    print(i, arr[i])

0 a
1 b
2 c
3 d
4 e


In [14]:
# Use this
for i, val in enumerate(arr):
    print(i, val)

0 a
1 b
2 c
3 d
4 e


### Don't Use Multiple OR Operators. Use in Instead

It is lengthy to write multiple OR operators. You can shorten your conditional statement by using `in` instead. 

In [1]:
a = 1 

if a == 1 or a == 2 or a == 3:
    print("Found one!")

Found one!


In [2]:
if a in [1, 2, 3]:
    print("Found one!")

Found one!


### Stop Using `+` to Concatenate Strings. Use Join Instead 

It is more efficient to concatenate strings using the `join` method than the `+` operator.

The code below shows the difference in performance between the two approaches.

In [7]:
from random import randint
chars = [str(randint(0, 1000)) for _ in range(10000)]

In [8]:
%%timeit

text = ""
for char in chars:
    text += char

411 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [9]:
%%timeit

text = "".join(chars)

60.5 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


### A Function Should Only Do One Task

A function should do only one task, not multiple tasks. The function `process_data` tries to do multiple tasks such as adding new features, adding one, and taking a sum of all columns. Using comments helps explain each block of code, but it takes a lot of work to keep the comments up-to-date. It is also difficult to test each unit of code inside a function.

In [11]:
import pandas as pd

data = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})


In [12]:
def process_data(df: pd.DataFrame):
    # Create a copy
    data = df.copy()

    # Add new features
    data["c"] = [1, 1, 1]

    # Add 1
    data["a"] = data["a"] + 1

    # Sum all columns
    data["sum"] = data.sum(axis=1)
    return data


In [13]:
process_data(data)

Unnamed: 0,a,b,c,sum
0,2,4,1,7
1,3,5,1,9
2,4,6,1,11


A better practice is to split the function `process_data` into smaller functions that do only one thing.  In the code below, I split the function `process_data` into 4 different functions and apply these functions to a pandas DataFrame in order using `pipe`.

In [17]:
def create_a_copy(df: pd.DataFrame):
    return df.copy()


def add_new_features(df: pd.DataFrame):
    df["c"] = [1, 1, 1]
    return df


def add_one(df: pd.DataFrame):
    df["a"] = df["a"] + 1
    return df


def sum_all_columns(df: pd.DataFrame):
    df["sum"] = df.sum(axis=1)
    return df


(data
    .pipe(create_a_copy)
    .pipe(add_new_features)
    .pipe(add_one)
    .pipe(sum_all_columns)
)


Unnamed: 0,a,b,c,sum
0,2,4,1,7
1,3,5,1,9
2,4,6,1,11


### A Function Should Have Fewer Than Four Arguments

A function should only do one thing and have fewer than four arguments to make testing easier. 



If a function only does one task but has more than three arguments, consider grouping the arguments using higher-level objects such as a dataclass. 

In [1]:
def process(
    drop_columns: list,
    target: str,
    test_size: float,
    random_state: int,
    shuffle: bool = True
):
    ...


In [3]:
from dataclasses import dataclass

@dataclass
class ProcessConfig:
    drop_columns: list 
    target: str 
    test_size: float 
    random_state: int
    shuffle: bool = True

In [None]:
def process(config: ProcessConfig):
    target = config.target 
    test_size = config.test_size
    ...


### Avoid Using Flags as a Function's Parameters

A function should only do one thing. If flags are used as a function's parameters, the function is doing more than one thing.

In [2]:
def get_data(is_csv: bool, name: str):
    if is_csv:
        df = pd.read_csv(name + '.csv')
    else:
        df = pd.read_pickle(name + '.pkl')
    return df  

When you find yourself using flags as a way to run different code, consider splitting your function into different functions.

In [None]:
def get_csv_data(name: str):
    return pd.read_csv(name + '.csv')

def get_pickle_data(name: str):
    return pd.read_pickle(name + '.pkl')

### Condense an If-Else Statement into One Line

If your if-else statement is short, you can condense it into one line for readability. 

In [2]:
purchase = 20

# if-else statement in several lines
if purchase > 100:
    shipping_fee = 0
else: 
    shipping_fee = 5
shipping_fee

5

In [3]:
# if-else statement in one line

shipping_fee = 0 if purchase > 100 else 5 
shipping_fee

5

### Exception Handling vs. If-Else Statements: Which is Better for Error Handling?

Exception handling is more efficient and concise than if-else statements for error handling. When an exception is raised, the interpreter jumps directly to the appropriate exception handler, avoiding unnecessary comparisons and branching.

It also improves code readability by separating error-handling logic from the main program flow. 

In [2]:
# Okay 
import os


def read_file(filename: str):
    if os.path.exists(filename):
        with open(filename, "r") as f:
            contents = f.read()
    else:
        contents = ""
    return contents


In [1]:
# Better 
def read_file(filename: str):
    try:
        with open(filename, "r") as f:
            contents = f.read()
    except FileNotFoundError:
        contents = ""
    return contents


### Python Switch Statement

It is common to use the if-else statements to execute multiple conditional statements.

In [1]:
def get_price(food: str):
    if food == "apple":
        return 4
    elif food == "orange":
        return 3
    elif food == "grape":
        return 5
    else:
        return "Unknown"


get_price("apple")


4

In Python 3.10 and above, you can use the switch statement to do the same thing. 

With the switch statement, you don't need to repeat the same statements multiple times (`food==x`), which makes the code cleaner than using multiple if-else statements. 

In [2]:
def get_price(food: str):
    match food:
        case "apple": # if food == "apple"
            return 4
        case "orange":
            return 3
        case "grape":
            return 5
        case _:        # else
            return "Unknown"

get_price("apple")


4

### Structural Pattern Matching in Python 3.10

Have you ever wanted to match complex data types and extract their information? 

Python 3.10 allows you to do exactly that with the `match` statement and the `case` statements. 

The code below uses structural pattern matching to extract ages from the matching data structure.  

In [32]:
def get_youngest_pet(pet_info):
    match pet_info:
        case [{"age": age1}, {"age": age2}]:
            print("Age is extracted from a list")
            return min(age1, age2)

        case {'age': {}}:
            print("Age is extracted from a dict")
            ages = pet_info['age'].values()
            return min(ages)


In [33]:
pet_info1 = [{"name": "bim", "age": 1}, {"name": "pepper", "age": 9}]
get_youngest_pet(pet_info1)

Age is extracted from a list


1

In [34]:
pet_info2 = {'age': {"bim": 1, "pepper": 9}}
get_youngest_pet(pet_info2)

Age is extracted from a dict


1

### Write Union Types as X|Y in Python 3.10

Before Python 3.10, you need to use `typing.Union` to declare that a variable can have one of several different types.  

In [4]:
from typing import Union

num = 2.3
isinstance(num, Union[int, float])

True

In Python 3.10, you can replace `Union[X, Y]` with `X | Y` to simplify the expression. 

In [5]:
isinstance(num, int | float)

True

### Walrus Operator: Assign a Variable in an Expression

The walrus operator (`:=`) in Python 3.8 and above allows you to assign a variable in an expression. The walrus operator is useful when you want to:
- debug the components in an expression
- avoid repeated computations

In the code below, I use the walrus operator to assign a value to `r` when getting the circumference of a circle, which is then used to find the area of the circle. 

In [1]:
from math import pi

d = 4

In [2]:
# without Walrus operator
circumference = (d / 2) * 2 * pi
area = (d / 2) * pi**2  # d/2 is computed twice


In [3]:
# with Walrus operator
circumference = (r := d / 2) * 2 * pi
area = r * pi**2  # d/2 is computed twice
