## Problem 0: Working with variables (3 points)

*This first problem is mostly a quick recap of what we covered in class (with a few small bonus topics). It may seem tedious to keep going over the same ideas, but the point here is to practice working with Python and Jupyter.*

**NOTE about homework files:** Feel free to add blank cells anywhere to use as scratch space; it won't affect the grading software. In the notebook menu above, choose "Insert", then insert a cell below or above. DO NOT change the filename; that will affect the grading software.

In class we discussed a few important features of variables in Python and Jupyter notebook:

**Variables can be assigned different types of data**

The main simple data types are these:
```python
gene = 'CDC28'            # string (str) - any sequence of characters placed within quotes
n_mice = 13               # integer (int)
protein_level = 1.76      # floating-point numbers (float) - numbers with decimals
is_present = True         # boolean (bool) - True or False, also 0 or 1 can be interpreted as boolean
```


**A variable assigned one type of data can be reassigned to another type of data**

As your code gets more complex, sometimes you'll find that a variable hasn't been assigned the data type you expect - such as the string `'1'` rather than the integer `1`.

Remember you can use `type()` to determine what type of value a variable holds.
```python
data = 123.4     # type is float
data = 'CGCG'    # type is now string
data = '123.4'   # type is still string
```

**Variables defined from other variables are NOT automatically updated when you change other variables**

Unlike math variables, computer variables can't be defined by other variables. Instead, computer variables are assigned whatever specific value was calculated from the variables on the right side of the assignment operator (`=`), at the time the assignment is made.


```python
test_value = 21
control_value = 3

fold_change = test_value/ control_value # fold-change is assigned the value of 7

control_value = 4 # value of fold_change is not affected by this subsequent line of code.
```

**Variables will capture the result of whatever happens to the right of the assignment operator**

In the example below, the function (more properly, the *method*) `.count()` is used to count the occurrences of 0 in the list `some_data`. The *output* of `some_data.count(0)` is 2. In order to store this result for future use, the output of `some_data.count(0)` is assigned to the variable `zeros`.

```python
some_data = [2, 6, 0, 7, 3, 0]
zeros = some_data.count(0)
```

This is a simple example, but the principle is not trivial. As you write and debug more complex code, you'll likely run into cases in which your code performs some calculation whose output gets lost, because it's not being assigned to a variable.


**Variables don't care about the order of cells in your notebook**
The line numbers to the side of your code cells (`In [1]:`) show the order in which cells were run, and that is what really matters.

If you change the value of a previously defined variable in a lower cell, re-running an upper cell will reflect that changed value. (The exception is when the upper cell has a line that sets the variable to its original value.) 

This can quickly lead to problems as you try to troubleshoot your code. The best way to avoid these problems is to try to make your code in each cell as self-contained as possible, defining all needed variables within a single cell. This isn't always possible or practical, so it's good to be aware of this issue.

If, as you edit cells out of order, the state of your variables has become hopelessly confusing, it's best to just restart the Python interpreter. Click on the **Kernel** menu above, and select **"Restart & Clear Output"**. This will reset your Python session, clearing all variable assignments and output. Don't worry, it will not erase your work - your code will still be there. In fact, it's a good idea to try this with your homework to make sure all code runs without errors. (Select **"Restart & Run All"** to clear the Python session and automatically run all cells in order).

Below is a simple example of what happens when you run cells out of order, when they have some variables in common. With more complex code, a cell near the top of your notebook that you rerun might be affected by a variable change that occurred a dozen cells lower down! 

In [1]:
# Step 1: Define a variable - run this cell
multiplier = 6

In [2]:
# Step 2: Use our variable - run this cell
data = 10
print(data * multiplier)

60


In [3]:
# Step 3: Change our variable - run this cell. Then go back and re-run Step 2 above. Does the result make sense?
multiplier = 10

Now for the actual problem 0 - a simple variable assignment exercise. Imagine that you have measured an expression value for the gene *Gli1*. In the cell below, do the following three things: 

1. Create a variable called `gene` and assign it the gene name Gli1. (What data type is the gene name? Don't answer this question in the code cell - just answer it for yourself.)
2. Create a variable called `expression` and assign it the value 2.5. (What data type is this?)
3. Print the values of `gene` and `expression` using `print()`. 

In [4]:
### BEGIN SOLUTION
gene = 'Gli1'
expression = 2.5
print(gene, expression)
### END SOLUTION

Gli1 2.5


In [5]:
# This is cell is part of the auto-grading system. You can run it to check your answer.
# If you do not get an error, it means your answer is (probably) correct.

assert type(gene) == str
assert type(expression) == float

As we saw in class, whether a function or operator works on a particular variable depends on what type of value the variable has been assigned:

```python
x = 10  # integer data type
y = x + 1 # this works

x ='10' # string data type
y = x + 1 # invalid

z = x + '1' # valid, but this is not mathematical addtion. What is it?

```

One function that does not work on integers or floats, but does work on strings (and lists, as well as some other data types) is the length function `len()`. It returns the number of characters in a string or items in a list.

We saw how to use `len()` to determine the number of items in a list. You can use it to count the number of base pairs in a DNA sequence. In the cell below, do the following:

1. Assign the sequence `CTAAGCCC` to a variable named `dna`.
2. Use `len()` to determine the length of the sequence.
3. Capture the output of `len()` by assigning the output to a variable named `answer`.
    
You can also create a blank cell to test the code example above, but it's not required.

In [6]:
### BEGIN SOLUTION
dna = 'CTAAGCCC'
answer = len(dna)
### END SOLUTION

In [7]:
assert type(dna) == str
assert answer == len(dna)

## More about String variables

The next few cells demonstrate some features of string variables, which we didn't cover in class. Run the next few example cells to learn about strings. Then solve the problem below.

In [8]:
# Variables don't just hold numbers

n = 'ACGT' # Characters enclosed in quotes are called strings
print(n)

ACGT


In [9]:
# To define a string, you can use single or double quotes. Single quotes are more typical in Python
dna1 = 'ACGT'
dna2 = "ACGT" # same as dna1
print(dna1, dna2)

ACGT ACGT


Normally we just use single quotes, but sometimes double quotes are useful. The following won't work - Python ends the string with the second quote mark after the n, and then sees a loose quote mark after t:

```python
word = 'don't'
```
If you ran this in an actual cell, you'd get an error (which would mess up the autograder). Also, note that the syntax coloring  shows you that your string consists of `'don'`. (You can create a scratch cell and try it - errors there won't mess up the grading software.)

In [10]:
# Here's how to do it properly:
word = "don't"
print(word)

# You can also use the escape character \ to achieve the same thing:
word = 'don\'t'
print(word)

don't
don't


In [11]:
# Another subtlety
print(word) # print the value of the variable word
print('word') # print the string word - note the change in syntax coloring

don't
word


In [12]:
# Triple quotes allow you to write strings that span multiple lines

sequence = '''ATCGAGCTAGCGATC
TGCCGAGCTACGATC
CTCCGTTGCGTTGGC'''

print(sequence)

ATCGAGCTAGCGATC
TGCCGAGCTACGATC
CTCCGTTGCGTTGGC


In [13]:
# Some math operators have different meanings when used with strings:

dna3, dna4 = 'AAAA', 'CCCC' # This is trick to assign multiple variables in one line

dna3 + dna4 # What does + mean here? Try it out.

'AAAACCCC'

## Problem: Math operators and strings

As mentioned above, math operators sometimes have different meaning when they are applied to strings. The plus '+' operator, when applied to strings, *concatenates* two strings. This is somewhat analagous to addition, so it makes sense. The technical term for an operator with different behaviors for different data types is called *operator overloading*. It's common in programming languages.

Let's look at a different operator. What if you want to repeat a string several times? Which math operator would you use?

In the cell below, use a math operator to repeat the string variable `dna4` five times, and assign the result to a new variable, `dna5`. To solve this problem, you'll have to explore a little - try a few math operators on strings to see what (if anything) they do.

In [14]:
# Repeat the string variable dna4 four times using a math operator, assign output to dna5.

### BEGIN SOLUTION
dna5 = dna4*5

### END SOLUTION

print(dna5) # see the result

CCCCCCCCCCCCCCCCCCCC


In [15]:
assert len(dna5) == 20