# Notebook 2: Data types

This notebook will introduce basic Python data types and data structures:
-  Numbers (integers, floats, bools)
-  Strings
-  Lists and Tuples
-  Dictionaries

Six exercises are given (in two groups of three).

In [None]:
# Import the usual stuff first
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt        
%matplotlib inline

## Numbers

In [None]:
# Integers
radius = 2
print('radius =', radius)
print('radius has type:', type(radius))

In [None]:
# Floating point numbers
print('pi =', np.pi)
print('pi has type:', type(np.pi))

In [None]:
# Powers are computed using **, not ^. Otherwise, math is as you would expect.
sphere_volume = (4/3)*np.pi*(radius**3)
print('A sphere of radius', radius, 'has volume', sphere_volume)

In [None]:
# Boolean variables represent quantities that are True or False
x = (np.pi == 3)
print('pi == 3 is', x)
y = (np.pi > 3)
print('pi > 3 is', y)
z = (np.pi <= 3)
print('pi <= 3 is', z)

## Strings

In [None]:
# Strings can be defined using either single or double quotes
first_name = 'Barbara'
last_name = "McClintock"
print(first_name, last_name)

In [None]:
# Multiline strings can be defined using triple quotes (single or double)
address = """
Cold Spring Harbor Laboratory
1 Bungtown Rd.
Cold Spring Harbor, NY 11724
"""
print(address)

In [None]:
# The '+' sign concatenates strings
full_name = last_name + ', ' + first_name
print(full_name)

In [None]:
# It is simple to test if one string is contained within another
'Barb' in full_name

In [None]:
# The len() function tells you the length of a string
len(full_name)

In [None]:
# The contents in a string can be indexed using brackets
print('First character:', full_name[0])         # strings are index starting at 0
print('Last character:', full_name[-1])         # str[-n] returns the n'th character from the end.
print('Characters 5-7:', full_name[5:7])        # str[start:stop]. This is called a 'slice'.
print('Every other character:', full_name[::2]) # str[start:stop:stride]
print('Reverse the string:', full_name[::-1])    # strings can be reversed using a stride of -1

In [None]:
# You can convert from an string to an integer ...
my_num = int('5')
print('my_num =', my_num, '; type is', type(my_num))

# ... and from an integer to a string
my_str = str(5)
print('my_str =', my_str, '; type is', type(my_str))

'String formatting' allows strings to be built up from numbers, other strings, etc.

In [None]:
print(f'A float in the default format: {np.pi}')
print(f'A float with 3 decimal places: {np.pi:.3f}')
print(f'A float in exponential notation: {np.pi:.3e}')

i = 3_628_800
print(f'An integer, default format: {i:d}')
print(f'An integer with commas and sign: {i:+,d}')

print(f'Two strings and a number: {first_name} {last_name} loves {np.pi}')

In [None]:
# The old string formatting method
print('An int: %+d'%i)
print('A float: %.2f'%np.pi)
print('Two strings and a number: %s %s loves %f'%(first_name, last_name, np.pi))

In [None]:
# Make a string uppercase
print(full_name.upper())

## Exercises, part 1 of 2

Here is the DNA sequence of the multiple cloning site (MCS) on the plasmid [pcDNA5](https://www.addgene.org/vector-database/2132/), a popular vector for mammalian gene expression.

In [None]:
# Note how to define a long string over multiple lines
seq = 'GAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATCCACTA' \
      'GTCCAGTGTGGTGGAATTCTGCAGATATCCAGCACAGTGGCGGCCGCTCGAGTCTAG' \
      'AGGGCCCGTTTAAACCCGCTGATCAGCCT'
print(seq)

**E2.1**: Does this MCS contain a restriction site for NheI (GCTAGC)? How about for MscI (TGGCCA)? 

In [None]:
# Answer here

**E2.2**: Using the string method `.find()`, find the location(s) of the above restriction sites within the MCS. What does `.find()` return when a restriction site isn't in the sequence?

In [None]:
# Answer here

**E2.3**: Using the string method `.replace()`, compute the RNA sequence transcribed from the GFP gene sequence (given below). 

In [None]:
gfp_seq = 'ATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATG' \
          'TTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACATACGGAAAACTTAC' \
          'CCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTC' \
          'GCGTATGGTCTTCAATGCTTTGCGAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAGA' \
          'GTGCCATGCCCGAAGGTTATGTACAGGAAAGAACTATATTTTTCAAAGATGACGGGAACTACAA' \
          'GACACGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATAGAATCGAGTTAAAAGGTATT' \
          'GATTTTAAAGAAGATGGAAACATTCTTGGACACAAATTGGAATACAACTATAACTCACACAATG' \
          'TATACATCATGGCAGACAAACAAAAGAATGGAATCAAAGTTAACTTCAAAATTAGACACAACAT' \
          'TGAAGATGGAAGCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCT' \
          'GTCCTTTTACCAGACAACCATTACCTGTCCACACAATCTGCCCTTTCGAAAGATCCCAACGAAA' \
          'AGAGAGACCACATGGTCCTTCTTGAGTTTGTAACAGCTGCTGGGATTACACATGGCATGGATGA' \
          'ACTATACAAATAA'

# Answer here

## Lists and tuples

In [None]:
# Define a list using brackets and commas.
v = [1, 'two', 3.0, 'four', 5]
v

In [None]:
# Lists can be indexed using brackets just like strings can.
print(f'First element: {v[0]}')
print(f'Last element: {v[-1]}')
print(f'The list reversed: {v[::-1]}')

In [None]:
# Use 'in' to test whether an element is contained in a list.
'four' in v

In [None]:
# Change an element in a list.
print(f'Before: {v}')
v[0] = 'one'
print(f'After: {v}')

In [None]:
# Append an element to the end of a list.
print(f'Before: {v}')
v.append('five')
print(f'After: {v}')

In [None]:
# You get an error if you try to access an index that doesn't exist.
v[10]

In [None]:
# You also get an error if you pass a non-integer as an index.
v[4.0]

In [None]:
# To create a list of numbers from 0 to n, use list(range(n))
v = list(range(10))
v

In [None]:
# Sort a list of numbers
v = [0,2,4,6,8,1,3,5,7,9]
print(f'Before sorting: {v}')
v.sort()
print(f'After sorting: {v}')

In [None]:
# Tuples are like lists, though they are defined using parentheses instead of brackets.
# Functions often pass tuples (not lists) back to the user.
t = (0, 1, 2, 3, 4)
print(t)

In [None]:
# The key difference is that, while lists are "mutable", tuples are "immutable"
# i.e., you cannot change an element in a tuple after it has been created.
t[3] = 5

In [None]:
# You can join multiple strings together, separating them with a specified character,
# using the .join() string method
v = ["Dr.", "Barbara", "McClintock"]
s = ' '.join(v)
print(s)

In [None]:
# Use the split() string method to chop a string into a list at a specific character
s = 'github.com/jkinney/github/21_qbbootcamp/2_datatypes.ipynb'
v = s.split('/')
v

## Dictionaries

Dictionaries are one of Python's most useful datatypes. They can be thought of as a list of key-value pairs that allow values to be rapidly looked up via keys. Keys can be any (immutable) variable. Values can be anything.

In [None]:
# Dictionaries are defined using braces, colons, and commas
d = {'first':'Eleanor', 'last':'McClintock'}
print(d)

In [None]:
# Access dictionary elements using a "key" enclosed in brackets
print(d['first'])

In [None]:
# You can replace and add elements to a dictionary after it is created.
d['first'] = 'Barbara'
d['title'] = 'Dr.'
print(d)

In [None]:
# From a dictionary, you can get a list of both the keys and the values.
keys = list(d.keys())
print(f'keys: {keys}')

values = list(d.values())
print(f'values: {values}')

In [None]:
#If you pass a key that doesn't exist, you get an error.
d['middle']

In [None]:
# It is sometimes useful to get a default value instead of an error when a key doesn't exist
d.get('middle','')    

In [None]:
# You can create a dictionary from a list of keys and values by using 'dict' and 'zip'
values = ['Dr.','Barbara','McClintock']
keys = ['title','first','last'] 
new_dict = dict(zip(keys, values))
print(new_dict)

## Exercises, part 2 of 2

**E2.4**: Create a dictionary called `rc_dict` that maps DNA bases to their complementary bases. I.e., A -> T, C -> G, etc.. Then use `str.maketrans()` to convert this to a "translation table" named `rc_table`.

In [None]:
# Answer here

**E2.5**: By passing `rc_table` to the string method `.translate()`, then using indexing with a step of -1, compute the reverse complement of the MCS sequence given above.

In [None]:
# Answer here

**E2.6**: We have not yet discussed sets. Using Google, figure out what `set` objects are and explain what they represent. In particular, explain why Python evaluates {2,3,3} < {1,2,3} as True.

In [None]:
# Answer here