## Variable Types

Python doesn't generally require *explicit* variable type declarations (with some exceptions that will come later as we get into more advanced programming).  However, it is still useful to know what kinds of data there is, what can be done with it, and how it's stored.

First, let's explore data types like `int`, `float`, `list`, `tuple`, and `string`.


In [None]:
my_int    = 2
my_float  = 3.1415
my_list   = [1,3.1415,"Hello World!","pizza"]
my_tuple  = (5,6,7,8,9)
my_string = "Hello World!"

These examples are fairly simple.  

- `my_int` is an integer, and gets treated like one.  Integers are useful for things like indexes, counters, and so forth.
- `my_float` is a float (often called a "double" in other programming languages), and are regular numbers including decimals.
- `my_list` is a list of values enclosed in square brackets.  Lists are indexed from zero, which means the first item in a list is "item 0".  Lists are great ways to keep collections of data organized and in order, and you can extract individual values simply by including the index with the variable name:  `my_list[3]` will return "pizza".  You can also get values from the end of a list with negative indices.  my_list`[-1]` will return "pizza" because it's the last value.
- `my_tuple` is similar to a list, except that it is a little more difficult to pull individual values from it.  Tuples are useful when you need to maintain groups of values together in relation to each other, such as with (x,y,z) coordinates.
- `my_string` is a list of characters including letters, numbers, punctuation, whitespace (tabs, spaces, line breaks, etc.).  The contents of a string do not include the quotation marks on either side.  Strings can include quotes using *escapes* like `\"` or `\\` to include a backslash.

Variable manipulation comes in many forms and depends on the type of data contained within.  Better understanding of how data types work can allow you to do some interesting things, like taking a "slice" of a string like you would from a list.  

Consider the examples below.

In [None]:
my_list = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
# my_list has a length of 26 individual values
len(my_list)

In [None]:
my_string = "Once more into the breach!"
# my_string has a length of 26 characters including whitespace and punctuation.
len(my_string)

### Slicing Lists

One common use for lists is "slicing", where you can get a small subsection of the list.  Let's say you wanted just the first five elements in `my_list`.  You would use a slice.  Slices are generated similar to how an individual element is called from a list, from inside square brackets.  However, we can put a `:` between the starting and ending indices to get everything between.  We can also use an empty space to indicate "everything".  Check out the examples below.

In [None]:
my_list[:5]

In [None]:
my_list[5:10]

Note how the two results are different.  The ending in the first cell is the same as the beginning of the second cell, but we don't actually get "5" in the results in the first cell.  Slices go "up to" the ending value, but don't include it.  Keep this in mind when working with slices.  We can also combine other tricks from list manipulations, like using negative indices to go backwards from the end.

In the next cell, we'll get the last seven elements from the list.

In [None]:
my_list[-7:]

What if we wanted every third element in the list?

In [None]:
my_list[::3]

The second `:` indicates a "stride".  This is useful when you have data that is strangely shaped (such as a long list of values that correspond to x,y,z coordinates, but aren't in a (3,n) shaped list.

Now let's combine these.  We'll get every other element starting from the tenth and going up to the twentieth.

In [None]:
my_list[10:20:2]

Now let's look at strings.  Strings are just lists of letters, numbers, and any other characters you can think of.  With this in mind, we can do things to strings that we have done to lists.

In [None]:
my_string

In [None]:
my_string[:5]

In [None]:
my_string[5:10]

In [None]:
my_string[-7:]

In [None]:
my_string[::3]

In [None]:
my_string[10:20:2]

... some functions are more useful than others, but you get the idea!

Now let's look at integers and floats.  In some programming languages, the difference between these two can be pretty severe.  For example, in C++, dividing a double by an integer will give you a truncated integer, which means you can lose some of the information in your data if you're not careful.  Thankfully, Python is a little more forgiving.

Normal division works like we might intuitively expect, where a float divided by an integer can be a float, and is therefore assumed to be.

In [None]:
my_float/my_int

We can also force the division to return an integer value (which is useful in some situations)

In the example above, we got a value of 1.57075.  If we were to round this using conventional methods, we'd get 2
However, forcing integer division with the `//` below gives us a truncated (not rounded) value of 1.0.  This is also slightly deceptive, as the `.0` implies the value is a float,even though the result is a whole number.  This is important to be aware of when doing mathematical work in python.  Truncation just removes everything after the decimal point, while rounding actually considers the value beforehand.


In [None]:
my_float//my_int

In [None]:
round(my_float/my_int)

### Math with Variables

Math can get incredibly complex, so it's important to remember your Order of Operations (PEMDAS) - Parentheses, Exponents, Multiplication, Division, Addition, and Subtraction.

However, in python it's a little different.  Parentheses are solved first, then exponents, until everything in a given equation is reduced down to a series of terms separated by `+`,`-`,`*`, and `/`.  Then, the values are processed left-to-right.

In [None]:
1 + 2 - 3 * 4 / 5

In [None]:
(1 + 2) - 3 * 4 / 5

In [None]:
(1 + 2 - 3 * 4) / 5

In [None]:
(1 + 2 - 3) * 4 / 5

These are just a few examples of how order of operations affects the results.  With this in mind, you can see why it's very important to keep track of what you're doing in a complex mathematical function.  The next cell has a complex equation in a single line, then the same equation separated into more easily-managed terms.

In [None]:
x=3
y=5
z=7
answer =  (x**(y/z)-x/((y+2)*z)-x)/(y*z)*x

print(answer)

Not only is that difficult to read, but it's also harder to see where errors might be arising.  So we can rewrite it and create additional variables to hold small chunks

In [None]:
x=3
y=5
z=7

# (x**(y/z)-x/((y+2)*z)-x)/(y*z)*x
p = y/z
# (x**p-x/((y+2)*z)-x)/(y*z)*x
q = x**p
# (q-x/((y+2)*z)-x)/(y*z)*x
r = y+2
# (q-x/(r*z)-x)/(y*z)*x
s = r*z
# (q-x/s-x)/(y*z)*x
t = y*z
# (q-x/s-x)/t*x
u = x/s
# (q-u-x)/t*x
v = q-u-x
# v/t*x
w = v/t
# w*x
answer = w*x

print(answer)

This may seem overengineered, but breaking down the individual terms is helpful in both programming and math, especially when it reveals certain trends, or even ways to rearrange an equation to reduce the overall number of calculations being performed.  This kind of breakdown can also be useful when you begin building larger, more complicated functions, even up to the point of creating entire programs or modules.

### Booleans

Booleans are simply variables that are either `True` or `False`.  They can also be interpreted as `1` and `0`.  Booleans get used all the time in programming, though we may not be constantly aware of them.  

For example, whenever we compare two numbers, the comparison creates a boolean


In [1]:
3<5

True

In [3]:
3>5

False

In [4]:
3 == 5

False

We can see that the responses for the different comparisons are correct.  $3 < 5$ is true, while $3 > 5$ and $3 == 5$ are both false.  Incidentally, the `==` is intentional.  In Python and C++, `=` *assigns* a value, while `==` *compares* two values.

Booleans get used constantly in things like "if-else statements" or "while loops".

### Dictionaries

Another python data type is the `dictionary` (or `dict` as it's written in python).  The dictionary is a very useful datatype, as it can be used to store many different pieces of information in their own types.

In [2]:
# A dictionary is denoted by { } 
my_dictionary = {}

# At this point, "my_dictionary" is an empty dictionary with no keys or values assigned.
# We can assign a key/value pair like this

my_dictionary["Name"] = "Mark"
my_dictionary["Age"]  = 37
my_dictionary["Job"] = "Postdoc"

# Now we can recall any of the values held in the dictionary by using the [key]. Keep in mind, if a key already exists, the previous value will be overwritten.

# If you have a dictionary with keys that you don't know, you can get them like this:
key_list = [key for key in my_dictionary.keys()]
print(key_list)
# This might look strange, but it's done this way because my_dictionary.keys() is a function call that returns an iterative set of single values, rather than the entire list.

# You can also iterate through all the keys and values together.
for key,value in my_dictionary.items():
    print(key,"=",value)



['Name', 'Age', 'Job']
Name = Mark
Age = 37
Job = Postdoc


In [12]:
# In a more relevant example to the lab (and demonstration of dictionary initialization with keys and values):

variant_prmtops = {"WT":"A3H_WT.prmtop",
                   "K121E":"A3H_K121E.prmtop",
                   "K117E":"A3H_K117E.prmtop",
                   "R124D":"A3H_R124D.prmtop"}

# Now I have a list of filenames stored, and I can recall them anytime with this
print(variant_prmtops["K121E"])

# We can also have dictionaries inside dictionaries, which can be useful for bigger datasets.

full_systems = {
"WT" : {"prmtop":"WT.prmtop","trajectory":"WT_100ns.dcd","num_residues":180,"duration":100},
"K121E" : {"prmtop":"K121E.prmtop","trajectory":"K121E_150ns.dcd","num_residues":180,"duration":150},
"K117E" : {"prmtop":"K117E.prmtop","trajectory":"K117E_200ns.dcd","num_residues":180,"duration":200} 
}

print(full_systems["WT"]) ## This prints the entire dictionary

print(full_systems["WT"]["prmtop"])

# You can also store larger datasets inside dictionaries this way.  For example, let's say you have a dataset for the RMSD of an MD trajectory called "rmsd", and one for correlated motion called "correl"
import numpy as np
rmsd = np.random.rand(100)
correl = np.random.rand(50,50)

WT_analyses = {"RMSD":rmsd,"correl":correl}

A3H_K121E.prmtop
{'prmtop': 'WT.prmtop', 'trajectory': 'WT_100ns.dcd', 'num_residues': 180, 'duration': 100}
WT.prmtop


In [13]:
WT_analyses["correl"]

array([[0.73605305, 0.83598442, 0.71570452, ..., 0.3171875 , 0.23520608,
        0.81539024],
       [0.23137924, 0.34448314, 0.55976433, ..., 0.98656348, 0.50170287,
        0.81814234],
       [0.77974392, 0.99301726, 0.67001237, ..., 0.19997487, 0.95236781,
        0.19561078],
       ...,
       [0.077598  , 0.82010693, 0.11895961, ..., 0.37007584, 0.96924269,
        0.10288062],
       [0.37170359, 0.86672473, 0.92999848, ..., 0.23037564, 0.00270931,
        0.996168  ],
       [0.49717833, 0.58887648, 0.81491634, ..., 0.55857894, 0.83691413,
        0.27322632]])

In [14]:
WT_analyses["RMSD"]

array([0.22101948, 0.7121048 , 0.45931714, 0.64889073, 0.34687359,
       0.10350116, 0.02655353, 0.08336816, 0.87864535, 0.95204309,
       0.64662435, 0.13220801, 0.96315642, 0.01367289, 0.01204332,
       0.46710067, 0.41590448, 0.96186539, 0.96150274, 0.32284111,
       0.50774768, 0.7084362 , 0.59167722, 0.14122492, 0.25402775,
       0.78582176, 0.6247598 , 0.7950481 , 0.52878754, 0.18770375,
       0.60145874, 0.85452103, 0.01594719, 0.57356258, 0.16672294,
       0.50915781, 0.38028769, 0.3404673 , 0.33082846, 0.62941975,
       0.00544898, 0.86960049, 0.1727984 , 0.14633374, 0.34715725,
       0.25982982, 0.45809769, 0.56074929, 0.43494216, 0.27285824,
       0.6851663 , 0.32973507, 0.58938936, 0.97437296, 0.45746796,
       0.67225843, 0.71137066, 0.20209049, 0.64923743, 0.5421072 ,
       0.94287357, 0.03760845, 0.7620343 , 0.50766966, 0.72939888,
       0.0194925 , 0.16302266, 0.0315469 , 0.66533926, 0.49689533,
       0.29022452, 0.92761978, 0.47507251, 0.30826174, 0.04465