# Good Coding Practices

This is a small guide on *how-to-code*. Although it's quite succint, there are a few things you should know:
 * Remember that the *you* from two months ago does not reply to emails about what this function does and what that variable is.
 * Don't try to be a perfectionist when first learning a new language (there is no need to spend weeks going through documentation).
 * These are *good enough* practices to ease your code maintenance and collaboration. You'll have your whole career to find out the *best* practices (if they even exist...).
 * Follow the KISS principle (Keep it simple, *s’ilvousplait*).
 * The goal of coding is not to code faster but to write maintainable code that can be understood and modified more easily in the future, potentially by developers other than you.
 * Consistency is king. Even if your approach is not the best, by keeping it consistent people will be able to understand and deal with your code much more easily.

## 1. Coding Style

Especially if you are a beginner, using text editors (or IDEs) may help you a lot in the process. PyCharm, for example, gives you easy acces to code predictions and methods' documentation. Aside from that, the way you structure your code should follow certain guidelines. PEP8 is the latest official standard for Python. In a nutshell, this standard suggests:

 * Use either spaces or tabs to indent your code (but be consistent with your choice):
 
 ```python
 def space_indented():
       pass
 ```
 **Note:** Although using tabs is not the end of the world, mixing tabs and spaces might very well be.
 
 * Surround top-level functions and class definitions with two blank lines, while only leaving one line for other methods/functions:
 
 ```python
 # CODE ABOVE
 
 class MyClass:
     
       def __init__(self):
           pass
         
 # CODE BELOW
 ```
 
 * Imports should be on top, one per line and in increasing order of specificity:
 
 ```python
 import library
 import library as lb
 from library import module
 from library import module_one, module_two
 ```
 
 * String delimiters should not matter, unless you want to avoid manually escaping characters with '\':
 
 ```python
 print('Single quotes look nice.')
 print("So do double quotes.")
 print('But backslashes in the middle of code don\'t.')
 ```
 
 * White spaces are good when used properly (like after a punctuation mark) but should be avoided when unnecessary:
 
 ```python
 # Good practice
 spam(ham[1], {eggs: 2})
 
 # Bad practice
 spam( ham[ 1 ], { eggs: 2 } )
 ```
 
 * Do not inline *if*, *for* and *while*:
 
 ```python
 # Good practice
 if foo == 'blah':
       do_blah_thing()
 else:
       do_non_blah_thing()
 
 # Bad practice
 if foo == 'blah': do_blah_thing()
 else: do_non_blah_thing()
 ```
 
 * Commenting should be used whenever necessary (but it's not always necessary):
 
 ```python
 # Good practice
 x = x + 1
 
 # Bad practice
 x = x + 1 # Increment by one
 ```
 
 * Naming should obey these simple patterns:
 
 ```python
 # Variables should be named in lower case
 variable_name = None
 
 # Functions should be named like variables
 def function_name(): pass
 
 # Classes should be named in CamelCase
 class ClassName: pass
 
 # Constants should be name in all caps
 CONSTANT_NAME = None
 ```
 
 * You should stay away from global variables (and assign default parameter values instead):
 
 ```python
 # Good practice
 def inc(x, step = 1):
       return x + step
     
 # Bad practice
 step = 1
 def inc(x):
       return x + step
 ```
 
 * You should avoid using absolute file paths and use relative paths instead:
 
 ```python
 # Good practice
 with open('./files/info.txt', 'r') as file:
       print(file.read())
 
 # Bad practice
 with open('/home/user/Documents/files/info.txt') as file:
       print(file.read())
 ```
 * Note that "./" refers to current folder/directory, use "../" to select the parent direct. 
 <br/><br/>
 

## 2. Python Tips
 
 * The Collections library is your friend when using dictionaries:
 
 ```python
 # Without Collections
 todo_list = {}
 if 'ADA' not in todo_list:
       todo_list['ADA'] = list([])
 todo_list['ADA'].append('Homework 2')
 
 # With Collections
 todo_list = collections.defaultdict(list)
 todo_list['ADA'].append('Homework 2')
 ```
 
 * Collections library is your friend, for counting multiple objects:
 
 ```python 
 # Without Collections
 counter = {}
 counter['apples'] = 0
 counter['oranges'] = 0
 counter['apples'] += 1
 counter['oranges'] += 1
 
  # With Collections
 counter = collections.Counter()
 counter['apples'] += 1
 counter['oranges'] += 1
 ```
 
 * Reading files should be done with the following construct:
 
 ```python
 # Good practice
 with open('file', 'r') as file:
       print(file.read())
       
 # Bad practice
 file = open('file', 'r')
 try:
       print(file.read())
 finally:
       file.close()
 ```
 
 **Note:** Using `open(...) as ...` automatically closes the file after the block finishes running.
 
 * You can use pickle to store complex objects in memory (saving time when repeating the same operations):
 
 ```python
 import pickle
 
 def load_pickle(file_path):
       with open(file_path, 'rb') as file:
           return pickle.load(file)
 
 def save_pickle(result, file_path = 'pickle'):
       with open(file_path, 'wb') as file:
           pickle.dump(result, file_path)
 
 def very_complex_operation():
       try:
           return load_pickle('pickle')
       except (FileNotFoundError, EOFError) as e:
           # 30 minute long computation
           save_pickle(result)
           return result
 
 print(very_complex_operation())  # Takes ~30 minutes
 print(very_complex_operation())  # Takes a couple of seconds
 ```
 
 * String formatting is easy if you know how to do it:
 
 ```python
 # This will convert the integer to string automatically before printing.
 print(num_of_apples)

 # This however, will not. 
 print("There are " + num_of_apples + " apples on the table.")

 # You can convert the integers to strings manually, but it's a bad practice.
 print("There are " + str(num_of apples) + " apples on the table.")

 # Instead, we recommend you to format strings using %
 print("There are %d apples on the table." % (num_off_apples))
 ```

 **Note:** Use %d for decimals/integers, %f for floats (alternatively %.**x**f to specify a precision of **x**), and %s for strings (or objects with string representations).

* Define configuration-related variables at the top of your modules or scripts. This is a data science class, so you are going to process lots of data by reusing the same code - and this will allow you to re-configure it easily:

 ```python
 import pandas as pd
 
 DATA_PATH = './data/'
 DEFAULT_ENCODING = 'UTF8'
 DEFAULT_COMPRESSION = 'gzip'
 
 # CODE BELOW
 ```
 
 * Also, you should avoid manually loading every single file you need to process: 
 
 ```python
 # Looping through every file in a directory
 from os import listdir
 
 DATA_PATH = './data/'
 
 def process_data(path = DATA_PATH):
     for file in listdir(path):
         do_something(path + file)
         
 # Providing the target files through the command line
 from argparse import ArgumentParser
 
 parser = ArgumentParser()
 parser.add_argument("--filename", help="Name of the file to process", type = str)
 args = parser.parse_args()
 
 print(args.filename)
 ```
 
 **Note:** An indepth guide can on command line parsing can be found [here](https://docs.python.org/3/howto/argparse.html).
 

 ## 3. Code Organization
 
 This part definitely comes naturally with practice so here we simply give you simple pointers to get you started:
 
* Every *.py* file should be solely comprised of functions and/or classes at its top level (and their necessary imports). Any executable code should go into the *main* function:
 
 ```python
 import math
 
 def factorial(n):
       return math.factorial(n)
 
 if __name__ == '__main__':
       print(factorial(3))
 ```
 
* Use docstring for documenting your code. More than allowing text editors and IDEs to show you this information afterwards, it will make your code more understandeable and maintainable:
 
 ```python
 '''
 File name: test.py
 Author: ADA
 Date created: 03/10/2018
 Date last modified: 01/11/2019
 Python Version: 3.6
 '''
 import math

 def factorial(n):
       '''
       Calculate the factorial of a number.
       :param n: int
       :return: result
       '''
       return math.factorial(n)
      
 if __name__ == '__main__':
       print(factorial(3))
 ```
 **Note:** Docstring help your IDE recognizing the type of an argument (in this example, PyCharm will recognize n is an int).
 
* Keep differently purposed sections of your code in different *.py* files (and import them as you'd like):
 
 ```python
 # simple_plot.py
 def plot(x):
       raise NotImplementedError
     
 # complex_math.py
 def factorial(n):
       raise NotImplementedError
     
 # main.py
 from simple_plot import plot
 from complex_math import factorial
 
 if __name__ == '__main__':
       results = [factorial(i) for i in range(3)]
       plot(results)
 ```
 
* Use top down design to decide on which modules to use (or parts of a single module):
 1. Define your problem.
 2. Define the necessary tasks (fetch data, preprocess, etc...) and enumerate them (1, 2, 3, ...). 
 3. Decompose each task into smaller subtasks and enumerate them again (1.1, 1.2, ...).
      
 **Note:** You can also use this hierarchy in either file comments or file names (naming your files *1-fetch_data.py* or *2-preprocess_data.py* to keep them organized). 

* Folder organization:

| **Directory** | **Purpose** |
| :-- | :-- |
| /doc/ | For text documents |
| /src/ | For source code |
| /data/ | For raw data |
| /generated/ | For manipulated data |
| /temp/ | For temporary files |
| /results/ | For results |
| requirements.txt | For which 3rd party libraries to install |
| readme.md | For *how to run* and examples |

 **Note:** Although you can experiment with different folders, subfolders or folder names, it's strongly advised to keep *Data Manipulation* (i.e. generating schemas from raw data) and *Data Analysis* (everything else that do not work on raw data) in SEPARATE files.

 ## 4. ADA Specific Tips
 
 * When working with larger amounts of data (especially for the project) test your code with small chunks of it:
 
 ```python
 def dubious_analysis(df):
       raise NotImplementedError
     
 dubious_analysis(df.head(100))
 ```
 
 * You can easily pickle DataFrames with one-liners:
 
 ```python
 df.to_pickle('path')
 ```
 
 * Use databases to keep persistent data (with a three-liner):
 
 ```python
 from sqlalchemy import create_engine
 engine = create_engine('connection string')
 df = pd.read_sql('events', con = engine)
 ```
 
 * When you scale up and start considering more scalable tools (that you will hear about in a couple of weeks) you should always consider whether or not the volume of your data justifies your choice. Everything comes at a cost.

## 5. Modularity
 
 Last but definitely not least, examples! For those of you that read in-between the lines this might have been obvious but we still want to make sure everyone remembers these. To simplify your code - and your teammates' life when reading it - we will go over a simple code example that you can draw from in the near future.
 
 Imagine a situation where you have several airports filghts' departure data for the month of January. In this instance, you want to create an application that can show you a single airport's data for a given day, sorted by the departure time (much like you see on an actual airport's tableau). The data's columns are very simple: 
 
     airport_id, flight_id, dest_id, day, hour
 
 Your first approach might be something like this:
 
 ```python
import pandas as pd
 
# Read data from memory and clean it
df = pd.read_csv('flight_data.csv.zip', compression='zip').dropna()
df['day'] = df['day'].astype(int)
df['hour'] = df['hour'].astype(int)
df['dest_id'] = df['dest_id'].astype(str)
df['flight_id'] = df['flight_id'].astype(int)
df['airport_id'] = df['airport_id'].astype(int)

day = 12
airport_id = 13

# Getting sorted flights for that day
flights = df[(df['day'] == day) & (df['airport'] == airport_id)]
flights = flights.sort_values('hour')[['flight_id', 'dest_id']]
 ```
 
 But as we know you've been paying attention to this tutorial, you'll quickly come up with something better:
 
 ```python
 '''
 File name: test.py
 Author: ADA
 Date created: 03/10/2018
 Date last modified: 01/11/2019
 Python Version: 3.6
 '''
 import pandas as pd
 
 FLIGHT_DATA = 'flight_data.csv.zip'
 COMPRESSION = 'zip'
 
 def format_attr(dataframe):
     '''
     Create a new dataframe with all attributes
     formatted according to the flight datasets'
     documentation (at https://....).
     :param dataframe: pandas.DataFrame
     :return: new dataframe
     '''
     formatted_df = pd.DataFrame()
     formatted_df['day'] = df['day'].astype(int)
     formatted_df['hour'] = df['hour'].astype(int)
     formatted_df['dest_id'] = df['dest_id'].astype(str)
     formatted_df['flight_id'] = df['flight_id'].astype(int)
     formatted_df['airport_id'] = df['airport_id'].astype(int)
     return formatted_df
 
 if __name__ == '__main__':
     day = 12
     airport_id = 13
     
     # Read data from memory and clean it
     df = pd.read_csv(FILGHT_DATA, compression=COMPRESSION)
     df = df.dropna()
     df = format_attr(df)
     
     # Getting flights for that day
     flights = df[(df['day'] == day) & (df['airport'] == airport_id)]
     flights = flights.sort_values('hour')[['flight_id', 'dest_id']]
 ```

This may still not be enough. There are several things you can do at this point:
* Do a for loop over the columns for casting their types, avoiding multiple lines of essentially the same code.
* Put the helper functions in a different file. This will increase modularity and code reusability, and help you structure your project better.
* If you're really feeling like diving into software engineering, create a class that contains both the data and the logic around the data. An object-oriented design might be overkill for this case, but for a large project, it could be very useful in logically organising your code and orienting your thinking (but make sure you are familiar with object oriented development before putting everything into classes!).

## 6. Summary
* Write modular, reusable code. Makes use of classes, different files, and packages to organise your code.
* Be aware of libraries that could make your life easier, and also of Pythonic ways of doing things.
* Test your code (we didn't go into unit tests because this isn't a Python course) and use small data samples to verify its correctness.
* Save intermediate results to make new analyses easier and faster.