# Good coding practices

**Author:** Seda Radoykova, Science Executive (21/22), UCL DSS

**Date:** 11 Nov, 2021

***Proudly presented by the UCL Data Science Society***

<b>Acknowledgement:</b> The content of this workshop is inspired by [W3 schools](https://www.w3schools.com/) 

## What is good code?
The definition of good code will depend on its purpose and intended audience. Some code you've written for singe personal use might not need to adhere to the same level of stylistic "hygiene" or detail as a project developed in collaboration with others. There will almost always be a tradeoff between different features of interest _e.g._ speed and readability, as making code readable can be time-consuming. But writing neat code can, in retrospect, also save you hours or some headache later on. <br><br>
We will split this workshop into two sections: 
1. Writing readable code: tips on style, structure 
1. Writing effective and efficient code 

In [2]:
import this

## The importance of style
![code_quality_2x.png](attachment:code_quality_2x.png)
_https://xkcd.com/1513/_

## Tips around structuring your work space
- Use a <b>virtual environment</b>. This helps avoid library version clashes. 
- Use effective <b>version control</b>. GitHub is an amazing means of backing up your work, collaborating with others, and introducing new features without compromising anything which already works. Commit frequently, and you will not be let down. 
- Follow a clear <b>repo structure</b> by including certain files and respecting naming conventions. Here is an example:

### Repo structure
`README.md` = describe aim/purpose; could be in Markdown, reStructuredText, or plain text  
`LICENSE` = legally binding requirements for use/distribution of sofware; more at https://choosealicense.com/;  
`requirements.txt` = not compulsory, lists software or library dependencies   
`setup.py` =  distutils can build and distribute modules needed by the project; more info: https://docs.python.org/3.6/distutils/setupscript.html;  
<b> Documentation </b> = clean documentation forces you to write clear code _e.g._  
\- `docs/conf.py`  
\- `docs/index.rst`  
<b> Module code </b> = contains the actual code of the project _e.g._  
\- `module/__init__.py`  
\- `module/core.py`  
<b> Tests </b> = most code will need diagnostic tests to ensure functionality; keep in a separate directory  
\- `tests/core.py`  

## Style: basics, tips, and further resources
PEP is the abbreviation for Python Enhancement Proposals. These documents act as guidelines and standards of good coding and development practice. The style guide (PEP 8) can be found [here](https://www.python.org/dev/peps/pep-0008/). <br><br>
The key point aspect of good style is that it is <b>consistent</b>. To get a sense of _some_ (not nearly all) of conventions to consider:  
- Use 4 spaces for indentation. For more conventions, refer to PEP8;
- Use proper <b>naming conventions</b> for variables, functions, methods, and more;
  - Variables, functions, methods, packages, modules: `this_is_a_variable`;
- Classes and exceptions: `CapWord`;
- Protected methods and internal functions: `_single_leading_underscore`;
- Private methods: `__double_leading_underscore`;
- Constants: `CAPS_WITH_UNDERSCORES`;  

Look at problems/challenges in the other file!! 

There is a command-line program which could check the style of your code. Install using: <br> 
`$ pip install pycodestyle` <br>
Run on a file using: <br>
`$ pycodestyle my_script.py`<br>
Autoformatting tool, helps solve PEP 8 violations: <br>
`$ pip install autopep8` <br>
Output modified code to console for review: <br>
`$ autopep8 --in-place my_script.py` <br>
Use ` --aggressive` for more substantial changes. <br>
To improve the format of your code aside from PEP 8 compliance, use <br>
`$ pip install yapf` <br>
`$ yapf --in-place my_script.py` is similar to autopep8 <br>

### Line continuation
Generally, 79 characters is the accepted norm for a line length. If your statement is longer than that, it is still possible to make code readable. <br> <br>
`\` will allow the Python interpreter to join consecutive lines. However, then followed by a whitespace, this notation breaks down. <br> <br>
A better alternative is to use `()`: 

In [None]:
# Bad:

my_very_big_string = """For a long time I used to go to bed early. Sometimes, when I had put out my \
    candle, my eyes would close so quickly that I had not even time to say “I’m going to sleep.”"""

from some.deep.module.inside.a.module import a_nice_function, another_nice_function, \
    yet_another_nice_function

# Good:

my_very_big_string = (
    "For a long time I used to go to bed early. Sometimes, "
    "when I had put out my candle, my eyes would close so quickly "
    "that I had not even time to say “I’m going to sleep.”"
)

from some.deep.module.inside.a.module import (
    a_nice_function, another_nice_function, yet_another_nice_function)

Bear in mind that if your lines are too long, you might be trying to cram too many things in one line to begin with. So you should first consider simplifying your code, then using line continuation. 

## To comment or not to comment? 
Commenting practice always stirs things up. People with more coding experience will usually say that commenting your code impedes readability. The argument is that stylistically and logically sound code will be readable _per se_, so comments would be getting in the way. Less experienced users will usually find well-commented code a great means to learn and follow through someone else's code. <br> <br>
So while it may never be clear _how much_ to comment, the quality should always be more than the quantity. Comments should make it clear what a given piece of code is doing and/or why it is done in a particular way. There are generally two types of comments: block and in-line. As a rule of thumb, do not state the obvious. For example, compare: <br> <br>
`x = x + 1    # Increment x` <br> 
`x = x + 1    # Compensate for border` <br> <br> 
Some tips: <br>
- Contradictory, vague, or obvious comments are worse than no comments. 
- Keep the comments up-to-date when the code changes!
- Use succinct but complete sentences;
- Capitalise the first word (except identifiers); 
- Block comments will generally resemble paragraphs;
  - Use two spaces after a sentence-ending period (except after the final sentence).
- Indent block comments to the same level of the code that follows them _i.e._ the code they explain; 
    - Line format: starts with a # and a single space.
    - Paragraphs inside a block comment are separated by a line containing a single #.
- Use inline comments sparingly!
- Separated by at least two spaces from the statement. 
    - Format: Start with a # and a single space.

## Modular code
When working on a bigger project, splitting code up into modules is always a great idea. A module is like a unit - it is a logical collection of functions. 
Markdown, reStructuredText, Sphinx, or docstrings

## Writing object-oriented _Pyhtonic_ code 
Python is an object-oriented language, and everything in Python is an object. You should use the object-oriented paradigm if writing code for Python.
This has the advantages of data hiding and modularity. It allows reusability, modularity, polymorphism, data encapsulation, and inheritance.

### Unpacking
If you know the length of a list or tuple, you can assign names to its elements with unpacking. For example, since `enumerate()` will provide a tuple of two elements for each item in list:

In [None]:
# unpacking 
for index, item in enumerate(some_list):
    # do something with index and item

# swap variables 
a, b = b, a

# nested unpacking 
a, (b, c) = 1, (2, 3)

# extended unpacking 
a, *rest = [1, 2, 3]
# a = 1, rest = [2, 3]
a, *middle, c = [1, 2, 3, 4]
# a = 1, middle = [2, 3], c = 4

### Ignored variables 
If you need to assign something (for instance, in Unpacking) but will not need that variable, use `__`: <br> 
`filename = 'foobar.txt'` <br>
`basename, __, ext = filename.rpartition('.')` <br><br>
_NOTE: Many Python style guides recommend the use of a single underscore “\_” for throwaway variables rather than the double underscore “\_\_” recommended here. The issue is that “\_” is commonly used as an alias for the `gettext()` function, and is also used at the interactive prompt to hold the value of the last operation. Using a double underscore instead is just as clear and almost as convenient, and eliminates the risk of accidentally interfering with either of these other use cases._

### Python Properties 
Avoid Explicit Getters and Setters, use properties with `@property`.  
https://codesource.io/python-coding-practices/

# Memory and efficiency

### Creating lists
Create a length-N list of the same thing using list `*` operator: <br>
`four_nones = [None] * 4`<br>
Create a length-N list of lists <br>
`four_lists = [[] for __ in range(4)]` <br>
Create a string from a list using `str.join()` on an empty string <br>
`letters = ['s', 'p', 'a', 'm']` <br>
`word = ''.join(letters)` <br>


# Data manipulation tips

### Searching for an item in a collection
Compare the following approaches of parsing through a list: 

In [None]:
s = set(['s', 'p', 'a', 'm'])
l = ['s', 'p', 'a', 'm']

def lookup_set(s):
    return 's' in s

def lookup_list(l):
    return 's' in l

"Even though both functions look identical, because lookup_set is utilizing the fact that sets in Python are hashtables, the lookup performance between the two is very different. To determine whether an item is in a list, Python will have to go through each item until it finds a matching item. This is time consuming, especially for long lists. In a set, on the other hand, the hash of the item will tell Python where in the set to look for a matching item. As a result, the search can be done quickly, even if the set is large. Searching in dictionaries works the same way."

### Time complexity 
https://wiki.python.org/moin/TimeComplexity?


### Set/dictionary vs lists: 
The differences in performance dictate that lists should be avoided when:
- The collection is large;
- Repeatedly looping in search of items in the collection;
- There are duplicates.
BUT "For small collections, or collections which you will not frequently be searching through, the additional time and memory required to set up the hashtable will often be greater than the time saved by the improved search speed."

### Check if a variable equals a constant
You don’t need to explicitly compare a value to True, or None, or 0 – you can just add it to the if statement. See Truth Value Testing for a list of what is considered false.

In [None]:
# Bad:
if attr == True:
    print('True!')

if attr == None:
    print('attr is None!')

# Good:
# Just check the value
if attr:
    print('attr is truthy!')

# or check for the opposite
if not attr:
    print('attr is falsey!')

# or, since None is considered false, explicitly check for it
if attr is None:
    print('attr is None!')

### More general point about logic operands 
The logical operations found in Python are not just meant for giving the usual True or False values. They also have the capability to return the actual value of the operations and this reveals how powerful the operand can be for you. Compare the readability of the following: 

In [None]:
# check_in_cache returns object or None

# more conventional 
def get_obj():
  result = check_in_cache()
if result is None:
  result = pull_from_db()
return result

# more elegant 
def get_obj(): 
return check_in_cache() or pull_from_db()



### Access a Dictionary Element
Don’t use the dict.has_key() method. Instead, use x in d syntax, or pass a default argument to dict.get().

In [None]:
# Bad:
d = {'hello': 'world'}
if d.has_key('hello'):
    print(d['hello'])    # prints 'world'
else:
    print('default_value')

In [None]:
# Good:
d = {'hello': 'world'}

print(d.get('hello', 'default_value')) # prints 'world'
print(d.get('thingy', 'default_value')) # prints 'default_value'

# Or:
if 'hello' in d:
    print(d['hello'])

### List manipulation 
Cosider using [<b>generator expressions</b>](https://docs.python.org/3/tutorial/classes.html#generator-expressions) instead of <b>list comprehensions</b> to avoid creating useless copies of data in mmeory. Use list comprehensions when you _really_ need to create a second list, for example if you need to use the result multiple times. 

In [None]:
# needlessly allocates a list of all (gpa, name) entires in memory
valedictorian = max([(student.gpa, student.name) for student in graduates])
valedictorian = max((student.gpa, student.name) for student in graduates)

In [7]:
# another example 

# bad 
[print(x) for x in sequence]

# good 
for x in sequence:
    print(x)

### Filtering a list 
Never remove items from a list while you are iterating through it.
Don’t make multiple passes through the list.

In [None]:
# Filter elements greater than 4

# Bad
a = [3, 4, 5]
for i in a:
    if i > 4:
        a.remove(i)

while i in a:
    a.remove(i)

In [None]:
# Good 

# comprehensions create a new list object
filtered_values = [value for value in sequence if value != x]

# generators don't create another list
filtered_values = (value for value in sequence if value != x)

Modifying the original list can be risky if there are other variables referencing it. But you can use slice assignment if you really want to do that.

In [None]:
# replace the contents of the original list
sequence[::] = [value for value in sequence if value != x]

### Modifying the values in a list 
Remember that assignment never creates a new object. If two or more variables refer to the same list, changing one of them changes them all.

In [None]:
# Add three to all list members.

# bad 
a = [3, 4, 5]
b = a                     # a and b refer to the same list object

for i in range(len(a)):
    a[i] += 3             # b[i] also changes
    
# good 
# safer to create a new list object and leave the original alone.
a = [3, 4, 5]
b = a

# assign the variable "a" to a new list without changing "b"
a = [i + 3 for i in a]

Use `enumerate()` keep a count of your place in the list. The `enumerate()` function has better readability than handling a counter manually. Moreover, it is better optimized for iterators.

In [None]:
# good 
a = [3, 4, 5]
for i, item in enumerate(a):
    print(i, item)

### Reading files
Use the with open syntax to read from files. This will automatically close files for you. The with statement is better because it will ensure you always close the file, even if an exception is raised inside the with block.

In [None]:
# Bad:
f = open('file.txt')
a = f.read()
print(a)
f.close()

# Good:
with open('file.txt') as f:
    for line in f:
        print(line)

# Debugging

## Using the built-in debugger `pdb`
`import pdb`
`pdb.set_trace()`
https://codeburst.io/how-i-use-python-debugger-to-fix-code-279f11f75866?gi=2f993eba8d3e#:~:text=Python%20has%20a%20built%2Din,debugger%20with%20features%20from%20IPython.

# Works consulted  
https://data-flair.training/blogs/python-best-practices/
https://docs.python-guide.org/writing/style/ 
https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it
https://codesource.io/python-coding-practices/

# Further resources
https://realpython.com/tutorials/best-practices/