# Why do we care about programming style?
Always assume that your code will be reused, even if you think it's for a one-off task. 

You want to make it easy for others to use and understand (including your future self who won't remember what you did!)

# Comments & docstrings

You've seen that we can comment code with hashtags to define variables, explain logic, etc.


In [30]:
# Count number of codons in DNA string
DNA = "ACTGTCACTCTGTCAAACTCT"
num = len(DNA) / 3
print(num)

if not num.is_integer(): # Check if DNA is a multiple of 3
    print("Something went wrong")

2.6666666666666665
Something went wrong


Comments are really useful, except when they're not. Nobody likes an uninformative comment! They just clutter up your code.

In [None]:
x = 5
y = 7

# Add x and y
x + y

Comments can also help you sketch out pseudocode/overall logic of your code.

In [None]:
if num == 0:
  # Print an error message because the string is empty
else:
  # Do the other analysis we want to do

We can also use docstrings (portmanteau of documentation strings) for functions.

Docstrings are (potentially) multi-line strings encased in triple quotes and located at the start of a function.

In [8]:
def count(my_DNA):
  """Counts number of codons in my_DNA."""
  n = len(my_DNA) / 3
  return(n)

count(DNA)

3.0

The nice thing about docstrings is that you can fetch the docstring of an unknown function with the built-in *help* command.

In [10]:
help(count)

Help on function count in module __main__:

count(my_DNA)
    Counts number of codons in DNA.



# Variable conventions
Variable and function names should be descriptive.

In [33]:
# A better-named function
def count_codons(my_DNA):
  """Counts number of codons in my_DNA."""
  num_codons = len(my_DNA) / 3
  return(num_codons)

This goes without saying, but names should *not* be confusing. A good way to avoid confusion is to use different variable names in a function than in a global environment.

In [12]:
n = 10

def add_ten(n): 
  return(n + 10)

add_ten(5)

15

There are different styles of variable names, and preference differs by programming language.

In [25]:
# In Python, snake_case is common
first_name = "Roshni"
last_name = "Patel"

# In other languages (e.g. Java) and for certain use cases in Python, 
# people really like camelCase
firstName = "Ramya"
lastName = "Rangan"

# It doesn't really matter what you use, just don't use both together 
first_name != firstName

True

If you use a quantity multiple times, save it as a variable. (Conversely, if you don't use something, don't create a variable for it!)

In [35]:
# Let's say I counted the number of codons present
count_codons(DNA)

12.666666666666666

In [None]:
# And now I want to add 10 to that number
add_ten(count_codons(DNA))

In [None]:
# It would be much better to save the initial value
num_codons = count_codons(DNA)
add_ten(num_codons)

# Writing modular code
* Any tasks within your code that are performed more than once should go inside a function
* That being said, each function should perform a specific task -- not 20 different tasks
* You can write helper functions (or functions within functions) to increase the modularity of your code

In [21]:
def GC_proportion(my_DNA):
  num_GC = 0
  for base in my_DNA:
    if base == 'C' or base == 'G':
      num_GC += 1
  prop_GC = num_GC / len(my_DNA)
  return(prop_GC)

In [22]:
GC_proportion("ACTGTGTCGCTAGC")

0.5714285714285714

In [23]:
GC_proportion("CGCCGGGCCCGCGC")

1.0

In [24]:
# Example of how to use helper functions
def GC_proportion(my_DNA):
  def is_GC(letter):
    if letter == 'C' or letter == 'G':
      return(True)
    else:
      return(False)

  num_GC = 0  

  for base in my_DNA:
    if is_GC(base):
      num_GC += 1

  prop_GC = num_GC / len(my_DNA)
  return(prop_GC)

# Where is this all coming from, anyway?
PEP8 is the name of a Python style guide, and they have lots of opinions about various conventions. So far we covered:
* Comments and docstrings to document your code and thought process
* Descriptive variable and function names
* Modular code

Other important things:
* Lines that are not too long (under 80 characters)
* Proper indentation
* Whitespace between functions, but not excessively

# Style exercise
What is the code below doing? What can you change to make it more readable and user-friendly?

In [None]:
new_s = ""
for x in s:
  a = x
  if x=="A": # First we are going to check if x is a part of s and if it is we will update new_s
    new_s+="T"
  elif x=="T": # Next we check if x is equal to T
    new_s+="A"
  elif x=="G": # Next we check if x is equal to G
    new_s+="C"
  else: # At the end we assume that if x is not equal to A, T, or G, then it is equal to C
    new_s+="G"
  x = "T"
print(new_s)

In [36]:
num_A = 0

num_T = 0

num_C = 0

num_G = 0

for base in DNA:
  if base == "A":
    num_A += 1


  if base == "C":
    num_C += 1 


  if base == "T":
    num_T += 1

    
  if base == "G":
    num_G += 1

prop_A = num_A / len(DNA)

prop_C = num_C / len(DNA)

prop_T = num_T / len(DNA)

prop_G = num_G / len(DNA)

print("our string is ", prop_A, "adenine,", prop_T, "thymine,", prop_C, "cytosine, and", prop_G, "guanine.")

our string is  0.25 adenine, 0.25 thymine, 0.375 cytosine, and 0.125 guanine.
