# Variables, Data Types, Indices, and Slices

- Start a Jupyter Notebook file to try out the following basic syntax code.

- Use markdown cells and comment lines to explain the purpose of your Jupyter Notebook and code. The following markdown cheat sheet can help:

```{image} ./Images-Introduction/MarkdownCheatSheet.png
:alt: A Markdown Cheat Sheet
:width: 600px
:align: center
```

- Execute code cells by placing your cursor in the code cell and hitting Shift and Enter. The result will print out in the Raw NBConverter output cell. 

## __<font color=blue>Assigning variables and using `print()` to check how the code is working</font>__
---

To save a value, we assign them to a __variable__ for later use.

The syntax for assigning variables is:

```
variable_name = variable_value
```

_Tips:_
- Choose informative names for variables.
- Use comment lines to express the units of the variable or to describe the meaning of the variable.

Let's see this in action with _<font color=red>calculations using variables</font>_.

```{exercise}
:label: my-exercise

How many grams of solid NaOH (40.0 g/mol) are required to prepare 500 ml of a 0.04 M solution?
```

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
liters = 0.5   #l
M = 0.04   #mol/l
MW = 40.0   #g/mol

wt = (liters * M) * MW   #(l * mol/l) * g/mol = g

print(wt)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise

An enzyme has a $V_{max}$ of 1.2 $\mu$$M s^{-1}$ and a $K_m$ of 10 $\mu$$M$. What is the initial velocity (in $\mu$$M s^{-1}$) for an 8 $\mu$$M$ substrate concentration?
```

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
Km = 10   #microM
Vmax = 1.2   #microM/s
S = 8   #microM

V0 = (Vmax * S) / (Km + S)   #(microM/s * microM) / (microM + microM) = microM/s; Michaelis-Menten equation

print(V0)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise

Convert the initial velocity in $\mu$$M s^{-1}$ from the previous exercise to $\mu$$M min^{-1}$.
```

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
V0permin = V0 * 60   #microM/s * 60s/min = microM/min

print(V0permin)   #print the value that we calculated
```
````

## __<font color=blue>Data types, using `len()` to return the number of items in a sequence, and using `.count()` to count how many times an item appears in a sequence</font>__
---

In Python, the __data type__ is set when we assign a value to a variable. Different data types can do different things.

The most common data types are
- __strings__ (`str`) for text (surrounded by either single quotation marks or double quotation marks),
- __integers__ (`int`) for whole numbers, positive or negative, without decimals, of unlimited length,
- __floating point numbers__ (`float`) for numbers, positive or negative, containing one or more decimals,
- __lists__ (`list`) for multiple ordered and changeable items of different data types within one variable (created using square brackets (`[]`)),
- __tuples__ (`tuple`) for multiple ordered and unchangeable items of different data types within one variable (created using round brackets (`()`)).

Use the `type()` function to identify the data type of any variable.

Use the `len()` function to determine the __length__ of a sequence (_e.g._ a string, list, or tuple).

Use the `.count()` function to __count__ the number of items with a specified value within a sequence. The syntax for the count function is:

```
sequence_name.count(value)
```

Let's see this in action with _<font color=red>DNA, RNA, and protein sequences as strings</font>_ and _<font color=red>lists with substrate concentrations and amino acids</font>_.

```{exercise}
:label: my-exercise

Determine the data type, length, and number of tryptophan residues for this LRRK2 protein sequence containing one letter code amino acids.
```

In [None]:
protseqLRRK2 ="MASGSCQGCEEDEETLKKLIVRLNNVQEGKQIETLVQILEDLLVFTYSERASKLFQGKNIHVPLLIVLDSYMRVASVQQVGWSLLCKLIEVCPGTMQSLMGPQDVGNDWEVLGVHQLILKMLTVHNASVNLSVIGLKTLDLLLTSGKITLLILDEESDIFMLIFDAMHSFPANDEVQKLGCKALHVLFERVSEEQLTEFVENKDYMILLSALTNFKDEEEIVLHVLHCLHSLAIPCNNVEVLMSGNVRCYNIVVEAMKAFPMSERIQEVSCCLLHRLTLGNFFNILVLNEVHEFVVKAVQQYPENAALQISALSCLALLTETIFLNQDLEEKNENQENDDEGEEDKLFWLEACYKALTWHRKNKHVQEAACWALNNLLMYQNSLHEKIGDEDGHFPAHREVMLSMLMHSSSKEVFQASANALSTLLEQNVNFRKILLSKGIHLNVLELMQKHIHSPEVAESGCKMLNHLFEGSNTSLDIMAAVVPKILTVMKRHETSLPVQLEALRAILHFIVPGMPEESREDTEFHHKLNMVKKQCFKNDIHKLVLAALNRFIGNPGIQKCGLKVISSIVHFPDALEMLSLEGAMDSVLHTLQMYPDDQEIQCLGLSLIGYLITKKNVFIGTGHLLAKILVSSLYRFKDVAEIQTKGFQTILAILKLSASFSKLLVHHSFDLVIFHQMSSNIMEQKDQQFLNLCCKCFAKVAMDDYLKNVMLERACDQNNSIMVECLLLLGADANQAKEGSSLICQVCEKESSPKLVELLLNSGSREQDVRKALTISIGKGDSQIISLLLRRLALDVANNSICLGGFCIGKVEPSWLGPLFPDKTSNLRKQTNIASTLARMVIRYQMKSAVEEGTASGSDGNFSEDVLSKFDEWTFIPDSSMDSVFAQSDDLDSEGSEGSFLVKKKSNSISVGEFYRDAVLQRCSPNLQRHSNSLGPIFDHEDLLKRKRKILSSDDSLRSSKLQSHMRHSDSISSLASEREYITSLDLSANELRDIDALSQKCCISVHLEHLEKLELHQNALTSFPQQLCETLKSLTHLDLHSNKFTSFPSYLLKMSCIANLDVSRNDIGPSVVLDPTVKCPTLKQFNLSYNQLSFVPENLTDVVEKLEQLILEGNKISGICSPLRLKELKILNLSKNHISSLSENFLEACPKVESFSARMNFLAAMPFLPPSMTILKLSQNKFSCIPEAILNLPHLRSLDMSSNDIQYLPGPAHWKSLNLRELLFSHNQISILDLSEKAYLWSRVEKLHLSHNKLKEIPPEIGCLENLTSLDVSYNLELRSFPNEMGKLSKIWDLPLDELHLNFDFKHIGCKAKDIIRFLQQRLKKAVPYNRMKLMIVGNTGSGKTTLLQQLMKTKKSDLGMQSATVGIDVKDWPIQIRDKRKRDLVLNVWDFAGREEFYSTHPHFMTQRALYLAVYDLSKGQAEVDAMKPWLFNIKARASSSPVILVGTHLDVSDEKQRKACMSKITKELLNKRGFPAIRDYHFVNATEESDALAKLRKTIINESLNFKIRDQLVVGQLIPDCYVELEKIILSERKNVPIEFPVIDRKRLLQLVRENQLQLDENELPHAVHFLNESGVLLHFQDPALQLSDLYFVEPKWLCKIMAQILTVKVEGCPKHPKGIISRRDVEKFLSKKRKFPKNYMSQYFKLLEKFQIALPIGEEYLLVPSSLSDHRPVIELPHCENSEIIIRLYEMPYFPMGFWSRLINRLLEISPYMLSGRERALRPNRMYWRQGIYLNWSPEAYCLVGSEVLDNHPESFLKITVPSCRKGCILLGQVVDHIDSLMEEWFPGLLEIDICGEGETLLKKWALYSFNDGEEHQKILLDDLMKKAEEGDLLVNPDQPRLTIPISQIAPDLILADLPRNIMLNNDELEFEQAPEFLLGDGSFGSVYRAAYEGEEVAVKIFNKHTSLRLLRQELVVLCHLHHPSLISLLAAGIRPRMLVMELASKGSLDRLLQQDKASLTRTLQHRIALHVADGLRYLHSAMIIYRDLKPHNVLLFTLYPNAAIIAKIADYGIAQYCCRMGIKTSEGTPGFRAPEVARGNVIYNQQADVYSFGLLLYDILTTGGRIVEGLKFPNEFDELEIQGKLPDPVKEYGCAPWPMVEKLIKQCLKENPQERPTSAQVFDILNSAELVCLTRRILLPKNVIVECMVATHHNSRNASIWLGCGHTDRGQLSFLDLNTEGYTSEEVADSRILCLALVHLPVEKESWIVSGTQSGTLLVINTEDGKKRHTLEKMTDSVTCLYCNSFSKQSKQKNFLLVGTADGKLAIFEDKTVKLKGAAPLKILNIGNVSTPLMCLSESTNSTERNVMWGGCGTKIFSFSNDFTIQKLIETRTSQLFSYAAFSDSNIITVVVDTALYIAKQNSPVVEVWDKKTEKLCGLIDCVHFLREVMVKENKESKHKMSYSGRVKTLCLQKNTALWIGTGGGHILLLDLSTRRLIRVIYNFCNSVRVMMTAQLGSLKNVMLVLGYNRKNTEGTQKQKEIQSCLTVWDINLPHEVQNLEKHIEVRKELAEKMRRTSVE"

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
print(type(protseqLRRK2))   #determine and print the data type

len_protseqLRRK2 = len(protseqLRRK2)   #determine the length of the string
print(len_protseqLRRK2)   #print the value that we calculated

Wcount_protseqLRRK2 = protseqLRRK2.count("W")   #count the number of times W appears in the string
print(Wcount_protseqLRRK2)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise

Define the EcoRI DNA recognition sequence (GAATTC) as a string.
```

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
DNAseqEcoRI = "GAATTC"   #create a string using double quotation marks
```
````

```{exercise}
:label: my-exercise

Determine the data type and length for this list with substrate concentrations.
```

In [None]:
subconc = [0, 1, 2, 4, 8, 15, 30, 60, 125, 250, 500]

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
print(type(subconc))   #determine and print the data type

len_subconc = len(subconc)   #determine the length of the list
print(len_subconc)   #print the value that we calculated
```
````

```{exercise}
:label: my-exercise

Determine the data type and length for this list with substrate concentrations.
```

In [None]:
AA3Letter = ["ALA", "ARG", "ASN", "ASP", "CYS", "GLN", "GLU", "GLY", "HIS", "ILE", "LEU", "LYS", "MET", "PHE", "PRO", "SER", "THR", "TRP", "TYR", "VAL"]

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
print(type(AA3Letter))   #determine and print the data type

len_AA3Letter = len(AA3Letter)   #determine the length of the list
print(len_AA3Letter)   #print the value that we calculated

ALAcount_AA3Letter = AA3Letter.count("ALA")   #count the number of times ALA appears in the string
print(ALAcount_AA3Letter)   #print the value that we calculated
```
````

## __<font color=blue>Index and slice</font>__
---

Sequence-based data types (_e.g._ a string, list, or tuple) are __index__ed: the first item has index `[0]`, the second item has index `[1]` ...

If we have a long sequence and want to select an item towards the end, we can count backwards, starting at the index number `[-1]`.

The syntax for selecting a subset of an existing sequence, a __slice__, is:

```
sequence_name[start:end]
```

When we specify the end item for the slice, it goes up to but does not include that item of the list!

If you have no start number (_i.e._ `[:end]`), the slice starts from the beginning of the sequence.

If you have no end number (_i.e._ `[start:]`), the slice goes to the end of the sequence.

Let's see this in action with _<font color=red>DNA, RNA, and protein sequences as strings</font>_.

```{exercise}
:label: my-exercise

Select the signal peptide (residues 1 to 22: MVSTMLSGLVLWLTFGWTPALA) and serine residues that are phosphorylated (S136 and S200) of this 7B2 protein sequence containing one letter code amino acids.
```

In [None]:
protseq7B2 ="MVSTMLSGLVLWLTFGWTPALAYSPRTPDRVSETDIQRLLHGVMEQLGIARPRVEYPAHQAMNLVGPQSIEGGAHEGLQHLGPFGNIPNIVAELTGDNTPKDFSEDQGYPDPPNPCPIGKTDDGCLENTPDTAEFSREFQLHQHLFDPEHDYPGLGKWNKKLLYEKMKGGQRRKRRSVNPYLQGQRLDNVVAKKSVPHFSDEDKDPE"

````{solution} my-exercise
:label: my-solution
:class: dropdown

Here's one possible solution.

```{code-block} python
SPseq7B2 = protseq7B2[0:22]   #Select residues 1 (0 as the first item has index 0) to 22 (22 as it goes up to but does not include item 22)
print(SPseq7B2)   #print the value that we calculated

PhosS136 = protseq7B2[135]   #Select residue 136 (135 as the first item has index 0)
print(PhosS136)   #print the value that we calculated

PhosS200 = protseq7B2[199]   #Select residue 200 (199 as the first item has index 0)
print(PhosS200)   #print the value that we calculated
```
````