<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Examples.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Clean coding with PEP 8
© ExploreAI Academy

In this train, we shall delve into the world of Python coding standards, focusing on the renowned PEP 8 style guide. This guide is more than merely a set of rules; it is a pathway to writing Python code that is clean, readable, and professional. Our journey will take us through the core principles of PEP 8, aiding our understanding of not just the 'how', but also the 'why' behind each guideline.

## Learning objectives
In this train, we will learn how to:

- Write clean code that follows PEP 8 guidelines.
- Design code to be readable.
- Correctly implement indentation and follow the recommended format used for comments.
- Write docstrings in an appropriate format and utilise PEP 8 naming conventions for variable names.

## Introduction

As we embark on the journey of writing code, it's essential to familiarise ourselves with guidelines and concepts regarding code styling. The art of coding transcends the mere act of typing words and expecting them to compile; the format and style of these words are of paramount importance. In the same manner that poorly punctuated text hinders comprehension, poorly formatted code challenges readability. To enhance the readability of our code, we shall turn to the Python Enhancement Proposal (PEP), specifically PEP 8.

Below, there is an image serving as a PEP 8 Cheat Sheet. Throughout this train, we will delve deeper into each of these elements.

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://github.com/Explore-AI/Pictures/blob/master/PEP_8_Guide.jpg?raw=true"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

Each line of code we've seen up to this point in the course is according to PEP 8, so the guidelines below are just formalising what we've seen.

PEP 8 guidelines are just that – **guidelines**. They **should be broken** if it improves **readability** and doesn't break **compatibility**, but we have to **apply it consistently**.

## Why is it important?
Data scientists, like other software developers, benefit greatly from learning and implementing coding best practices. Here's why:

1. **Code reusability and reproducibility**: Data science involves a lot of exploratory work where a particular analysis might need to be run multiple times with slight variations. Well-structured and properly documented code is easier to understand, reuse, and reproduce, which is crucial in data science projects.

2. **Collaboration**: Data science is typically a team effort, and having a consistent coding style makes it easier for different team members to understand each other's code. This leads to more efficient collaboration and less time spent deciphering what a piece of code does.

3. **Maintenance and debugging**: Data science projects are not always 'one-off' analyses. Models might need to be updated, bugs might need to be fixed, or new data might need to be incorporated. Following best practices makes maintaining and debugging code significantly easier.

4. **Scaling and performance**: As data grows in volume and complexity, code that follows best practices tends to be more efficient and easier to scale. This is especially important for data scientists working with big data.

5. **Professional development**: Understanding and implementing best coding practices helps data scientists write higher-quality code. This could increase their employability and offer more opportunities for growth within their current roles.

6. **Communication of results**: The ultimate goal of most data science projects is to inform decision-making. High-quality, well-structured code helps ensure that the results are reliable and accurate, which is critical for gaining trust and effectively communicating results to stakeholders.

In essence, for data scientists, following coding best practices is not just about writing good code. It's also about ensuring the reliability and credibility of their findings and making their work more efficient and impactful.

## Design principles

### Write self-explanatory code

Our code should behave in a way that's least surprising to others. This principle helps other programmers understand and predict the behaviour of our code.

Not simple or self-explanatory ❌:

In [1]:
def proc(x):
    return x ** 3 - 2 * x + 5

- The function name 'proc' is vague and doesn't convey the function's purpose.

- The formula inside the function lacks context, making it unclear what it's calculating or representing.

Simple and self-explanatory ✅:

In [2]:
def calculate_cubic_polynomial(x):
    coefficient_a = 1
    coefficient_b = -2
    constant = 5
    cubic_term = coefficient_a * x ** 3
    linear_term = coefficient_b * x
    polynomial_value = cubic_term + linear_term + constant
    return polynomial_value

- The function name 'calculate_cubic_polynomial' clearly describes its purpose.

- By breaking down the polynomial into clearly named variables, the code becomes more readable and understandable. It's evident that it's computing a cubic polynomial value.

### DRY (don't repeat yourself)

If we find ourselves writing the same code in multiple places, it's usually better to encapsulate it in a function or class and reuse it.

❌ Repeating code:

In [3]:
# Calculating square of numbers in two different lists
squares_list1 = [i ** 2 for i in list1]
squares_list2 = [i ** 2 for i in list2]

NameError: name 'list1' is not defined

✅ DRY applied:

In [4]:
def calculate_squares(numbers_list):
    return [i ** 2 for i in numbers_list]

squares_list1 = calculate_squares(list1)
squares_list2 = calculate_squares(list2)

NameError: name 'list1' is not defined

## Code layout

### Indentation

**Use four spaces per indentation level.** Python allows indentation of a single space, but to keep code consistent, PEP 8 requires four spaces as an indent, and most coding environments will automatically use four.

❌ Incorrect:

In [5]:
def incorrect_function():
 return 'Incorrect indentation'

✅ Correct:

In [6]:
def correct_function():
    return 'Correct indentation'

### Blank lines

Use blank lines sparingly. They should separate functions and classes and larger blocks of code inside functions.

❌ Incorrect:

In [7]:
def a():
    return 'a'
def b():
    return 'b'

✅ Correct:

In [8]:
def a():
    return 'a'

def b():
    return 'b'

### Maximum line length

Limit all lines to a maximum of 79 characters.

❌ Incorrect:

In [9]:
def long_function_name(var_one, var_two, var_three, var_four, var_five, var_six):

SyntaxError: incomplete input (790877776.py, line 1)

✅ Correct:

In [10]:
def long_function_name(var_one, var_two, var_three, 
                       var_four, var_five, var_six):

SyntaxError: incomplete input (2136708480.py, line 2)

### Parameters on their own line

When defining functions with many parameters, especially in cases where the line length exceeds the recommended maximum (such as 79 characters in PEP 8), it's good practice to place parameters on their own lines. This approach enhances readability and maintainability.

❌ Incorrect:

In [11]:
def process_data(input_data, data_format, error_handling, log_level, output_type, verbose_mode):
    # Function implementation
    pass

- This line exceeds the recommended maximum length, making it hard to read.
- It can also cause horizontal scrolling in editors, hindering readability.

✅ Correct:

In [12]:
def process_data(input_data, 
                 data_format, 
                 error_handling, 
                 log_level, 
                 output_type, 
                 verbose_mode):
    # Function implementation
    pass

- Each parameter is on its own line, which makes the function signature easy to read.
- It adheres to the line length guidelines and is more maintainable.

### Line breaks

It's recommended to break before binary operators to improve readability.

❌ Incorrect:

In [13]:
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)

NameError: name 'gross_wages' is not defined

✅ Correct:

In [14]:
income = (gross_wages 
         + taxable_interest
         + (dividends - qualified_dividends) 
         - ira_deduction
         - student_loan_interest)

NameError: name 'gross_wages' is not defined

### Whitespace in expressions and statements

Surround operators with a single space on either side of the operator.

❌ Incorrect:

In [15]:
x=10
x+=5
y=x<20
data = [x for x in range(10) if x>5]
flag = x==10 and y!=0

It is hard to identify the operators used in this code.

✅ Correct:

In [16]:
x = 10
x += 5
y = x < 20
data = [x for x in range(10) if x > 5]
flag = x == 10 and y != 0

It is simpler to identify which operators are used in this code.

## Commenting

Comments allow the reader of the code to follow along and see what the author intends to convey. This can be helpful when people change projects and new people have to work on projects they know nothing about.

Generally, there are two types of comments: **inline** and **block**. Python uses the hash symbol, noted as `#`, for comments. The intended use of comments is to:

- Explain **assumptions made** in that line of code. 
- Include **important details** in a line of code that we're trying to solve with our code. 

A rule of thumb to keep in mind is: *Code tells you how, comments should tell you why.*

### Inline comments:

Inline comments are comments that are on the same line as a statement. PEP 8 suggests **using inline comments sparingly**.
An inline comment should be **separated** by at least **two spaces** from the statement. They should *start* with a `#` followed by a **single space**.
Inline **comments are useful for clarifying complex pieces of code** or providing quick annotations where necessary. However, the **code should be as self-explanatory as possible**, reducing the need for extensive inline comments.

❌ Incorrect:

In [17]:
x = x + 1 #Increment x, it's important for UI alignment
y = y + 2 #Now increment y, for vertical spacing
calculate_position(x, y) #This calls the function to calculate position. It's very crucial for layout.
update_display() #This will update the display with new positions. It's part of the refresh cycle.

NameError: name 'calculate_position' is not defined

Each comment is crammed next to the code with insufficient spacing, and every line of code is followed by an unnecessary comment that often states the obvious. The comments are verbose and don't add much value.

✅ Correct:

In [18]:

x = x + 1  # Adjust position for UI alignment
y = y + 2  

calculate_position(x, y) # Calculate and update the position in the display
update_display()

NameError: name 'calculate_position' is not defined

Comments are used sparingly and effectively. They are well spaced from the code, and each comment provides a clear, concise overview of what the following lines of code are meant to accomplish. This style is cleaner and more in line with PEP 8 guidelines, aiding in readability and maintainability.

### Block comments

Block comments are used to explain the code that follows them and are **written on their own line(s)**.
Each line of a block comment starts with a `#` and a **single space**, and should be at the **same indentation level** as the code it describes.
Block comments are useful for providing more detailed explanations or for **summarising a section of code**.

❌ Incorrect:

In [19]:
# This section of code is for loading the dataset
# We are using pandas to read a CSV file
# The file name is 'data.csv'
import pandas as pd
dataset = pd.read_csv('data.csv')

# Now we are going to clean the dataset
# We will remove null values
# Dropping rows with any missing value
dataset = dataset.dropna()

FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'

These comments are redundant and simply describe the actions line by line, which are already clear from the code itself. They don't provide additional insight or rationale behind the steps.

❌ Incorrect:

In [20]:
#Load the dataset
dataset = ['Apple', 'Pear', 'Orange', '', 'Strawberry   ', 'Banana', '  Grape  ']

#Data cleaning process
#Now we're going to clean the dataset by removing empty strings.
dataset = [fruit for fruit in dataset if fruit.strip()]

#Next, we'll remove all the spaces from the data.
dataset = [fruit.strip() for fruit in dataset]

#Then we make everything lowercase, so it looks uniform.
dataset = [fruit.lower() for fruit in dataset]

#Finally, we'll get rid of duplicates because we don't need them.
dataset = list(set(dataset))

#Here's what we have now
print(dataset)


['orange', 'grape', 'pear', 'strawberry', 'banana', 'apple']


The comments in this example are somewhat informal and don't add much value. They tend to state the obvious, merely describing what the code does rather than explaining why these steps are important in the context of data cleaning. This lacks the depth and professionalism expected in a data science context.

✅ Correct:

In [21]:
dataset = ['Apple', 'Pear', 'Orange', '', 'Strawberry   ', 'Banana', '  Grape  ']

# Data cleaning process
# Step 1: Remove empty rows
# Empty rows can skew analysis, so removing them is crucial for data accuracy.
dataset = [fruit for fruit in dataset if fruit.strip()]

# Step 2: Trim whitespace from data
# Uniformity in data is key; removing extra spaces enhances data consistency.
dataset = [fruit.strip() for fruit in dataset]

# Step 3: Standardize text format
# Consistency in text format, like lowercase, ensures better data comparability.
dataset = [fruit.lower() for fruit in dataset]

# Step 4: Remove duplicate entries
# Duplicates can distort analysis, so they're removed for data reliability.
dataset = list(set(dataset))

print(dataset)


['orange', 'grape', 'pear', 'strawberry', 'banana', 'apple']


Each comment provides context and explains the purpose of the step in the data cleaning process, enhancing understanding without being overly verbose.

### Docstrings

One requirement for good code is for methods, classes, and functions to have docstrings. Docstrings, or documentation strings, are strings enclosed between two sets of triple quotation marks. We can use either single or double quotes here, but we can't mix them. Docstrings appear on the first line of any function, class, method, or module and are used to explain and document a block of code.


❌ Incorrect:

In [22]:
def square(num):
    # This function squares a number.
    return num ** 2

This is a comment, not a docstring. It fails to use the triple quotation marks and does not fully document the function's purpose, parameters, and return value.

✅ Correct:

In [23]:
def square(num):
    """
    Calculate the square of a number.

    Args:
        num (int or float): The number to be squared.

    Returns:
        int or float: The square of the input number.
    """
    return num ** 2

This docstring:

- Starts immediately below the function definition.

- Is enclosed within triple double quotes (""").

- Clearly explains the purpose of the function.

- Includes an 'Args' section detailing the argument, its expected type, and description.

- Includes a 'Returns' section explaining the type and description of the return value.

## Naming conventions

In Python, adopting consistent naming conventions as outlined in PEP 8 enhances code readability and maintainability. Here's a concise overview:

**Variables:** Use lowercase with underscores to separate word: `starting_amount`

**Constants:** Name constants in all uppercase with underscores separating words. `MAX_ITEMS`

**Classes:** Use CamelCase convention for class names. `RegressionModel`

❌ Incorrect:

In [24]:
class shoppingCart:
    maxItems = 10

    def __init__(self):
        self.ItemList = []

    def AddItem(self, Item):
        if len(self.ItemList) < shoppingCart.maxItems:
            self.ItemList.append(Item)

ShoppingCart = shoppingCart()
ShoppingCart.AddItem('Apple')

It is hard to understand what `shoppingCart` or `maxItems` is in the line `shoppingCart.maxItems`. One is a constant value while the other is a class object. 

✅ Correct:

In [None]:
class ShoppingCart:
    MAX_ITEMS = 10

    def __init__(self):
        self.item_list = []

    def add_item(self, item):
        if len(self.item_list) < ShoppingCart.MAX_ITEMS:
            self.item_list.append(item)

shopping_cart = ShoppingCart()
shopping_cart.add_item('Apple')

In [25]:
def bubble_sort(items):
    for i in range(len(items)):
        for j in range(len(items)-1-i):
            if items[j] > items[j+1]:
                # Swap the elements if the current element is greater than the next one
                items[j], items[j+1] = items[j+1], items[j]
    return items

In [29]:
bubble_sort([64, 34, 25, 12, 22, 11, 90])

[11, 12, 22, 25, 34, 64, 90]

In [37]:
class BankAccount:
    def __init__(self, balance):
       self.balance = balance
    def deposit(self, amount):
        self.balance += amount



In [38]:
account_n = BankAccount(100)
account_n.deposit(50)


In [39]:
print(account_n.balance)

150


In [40]:
def r(s):
    if len(s) == 0:
        return s
    return r(s[1:]) + s[0]



In [46]:
r('programming') == 'gnimmargorp'

True

In [47]:
numbers = [-2, 5, -9, 8, -1, 10]
result = filter(lambda x: x < 0, numbers)
print(list(result))

[-2, -9, -1]


In [48]:
class Circle:
    def __init__(self, radius):
        self.radius = radius
    def circumference(self):
        return 2 * 3.14159 * self.radius



In [50]:
Circle(3).circumference() == 18.849539999999998

True

In [51]:
class Rectangle:
    def __init__(self, width, height):
        self.width = width
        self.height = height
    def perimeter(self):
        return 2 * (self.width + self.height)
    def area(self):
        return self.width * self.height


rect = Rectangle(3, 4)
print("Area:", rect.area(), "Perimeter:", rect.perimeter())


Area: 12 Perimeter: 14


In [53]:
numbers = [1, 2, 3, 4, 5]
result = map(lambda x: x**2, numbers)
print(list(result))

[1, 4, 9, 16, 25]


In [61]:
class Shape:
    def __init__(self, name):
        self.name = name

class Circle(Shape):
    def __init__(self, radius):
        super().__init__("Circle")
        self.radius = radius

    def area(self):
        return 3.14 * self.radius * self.radius
    
class Square(Shape):
    def __init__(self, side):
        super().__init__("Square")
        self.side = side

    def area(self):
        return self.side * self.side

circle = Circle(3)
square = Square(4)
total_area = circle.area() + square.area()
print("Total Area:", total_area)
 

Total Area: 44.26


By using proper naming conventions, a colleague can quickly see that `ShoppingCart.MAX_ITEMS` refers to a class and a constant and that `shopping_cart` is a variable containing a class object `ShoppingCart()`.

## Summary

As we conclude this journey through the essential principles of PEP 8 and good coding practices, it's important to reflect on the key insights gained. We delved into the world of Python coding standards, focusing on how to write code that is not only functional but also clean, readable, and professional. By embracing the guidelines of PEP 8, we learned the significance of well-structured code and the impact it has on maintainability and collaboration.

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>