# Programming

Programs are sets of instructions intended to achieve a particular task.

Programs generally try to "abstract" a problem so that the same program can be used to achieve many similar, but non-identical tasks.  They do this by using *variables*.  For example:

     Say Hello to Mark
     
This small "program" is a concrete solution to the specific problem of saying Hello to Mark.  It solves that problem perfectly!  But I have to write many many programs if I want to say Hello to each of you in this class...

     Say Hello to Pedro
     Say Hello to Julio
     Say Hello to Maria

This is not efficient.  So we make an "abstrat" version of the same program:

     Say Hello to {X}
     
Now this single program can say Hello to whomever {X} is.  Or better:

     Say {S} to {X}

That is a program can say anything to anybody!  

...and that's all you really need to understand about programming, at this level.  It will take some practice before you become good at abstraction - it's a different way of thinking!  You will use abstraction to automate repetitive tasks, to create data models for a wide range of data structures/content, and to create "generic" solutions to a variety of other problems.





# The Python Programming Language

**Python** is an *interpreted*, *general purpose*, *object-oriented* language that can be run on any operating system. 

It is pre-installed on most Linux distributions (including this VM)

In the next 60 seconds you will create your first two Python programs!

1. Open a terminal window.
2. Type "python"  
3. At the <code>>>></code> prompt type
<code>
           print("this is my first Python program")
</code>

Done!

Now for something a little bit more interesting:

1. At the <code>>>></code> prompt type
<code>
           i_am_smart=True
           if i_am_smart:
               print("I am going to do well in this course!")
</code>
    
(note that the second line has a few spaces at the beginning.  This is because emptiness is meaningful in Python! You will hear me complain about this often during this course!  In my other course - Programming Challenges - you will use a much more aethetically beautiful and elegant language... Ruby!)

(now press *ENTER* twice)
    
### Try changing the value of "i_am_smart" to 'False' - what happens?


Now that you have written two simple Python programs, we will talk more deeply about what programming is, and how to write more complex apps.


### TO EXIT FROM THE PYTHON TERMINAL

type <code>exit()</code>





# Data Types and Structures

Programming requires you to think very carefully about the KIND and STRUCTURE of data.  For example, the digit 9 and the character "9" are *not* the same thing (though in some very flexible languages like Perl, it interprets them in whatever way is most appropriate).  For example, hypothetically:

     9 + 9 = 18
     "9" + "9" = "99"

Most languages (like SQL!) support a wide range of data types - integers, floating-point numbers, "strings", date/time, exponentials, 'booleans' - and you should select the correct one for the data you are trying to represent.

Data *structures* are slightly different from data *types*.  For example:

* There is data that is "scalar" - it has a single value (like {X} in the example above). It could be Mark, or John but not Mark and John.  
* There are also lists (1,3,5,7,9) 
* and 'Associative lists' (Chapter1 : Introduction, Chapter2 : Installing,  Chapter3 : Running) or **hash**

There are more complex data structures, and you will learn them later in this course.  For the moment, we will only look at simple examples that combine these simple data types, with simple data structues.


# Code Documentation

_**I promise you this:  when you write a Python program today, you will not remember how it works next year!  And nobody who wants to use it after you will understand what it does, or how to use it, so they will have to write another program themselves.**_

This is a waste of everyone's time, and a waste of your supervisor's money!!  

For this reason, the first thing I am going to tell you about is **documentation**

Simple documentation in Python is achieved with <code>'# comments'</code>.  the <code>#</code> symbol tells Python that the rest of the line is NOT a part of the program, but is rather a "chat" with whoever is reading the code, to help them understand what the code does.  <code>#</code> can appear by itself, or after other Python commands.

For example

<code>
    
    # the next few lines set-up the initial connection to the database
    # starting from the first record of this year
    
    a=14435  # set a to the first database record of the current year
    if (db.retrieve(a)):   # check that the record exists
        print("record exists")  # inform the user that the record exists
</code>    


The comments allow us to "read" what the intent of each line is, so that we don't have to *interpret* the code.

I want you to use comments, even for simple programs, because it is a **very important good habit (buen hábito).**

<pre>

</pre>
# Reserved Words

There are words in Python that are "reserved" because they have a special meaning in the language.  This means that you cannot use them for variable names, or the names of functions inside of your app.   These are


| Reserved Words |  - | -  | -  | -  |  
| ---    | --- | --- | --- | --- |
| False |	class |	finally  |	is |	return |
| None  |	continue |	for |	lambda |	try |
| True |	def 	| from |	nonlocal |	while  |
| and |	del |	global |	not |	with  |
| as | 	elif |	if |	or |	yield  |
| assert |	else |	import | 	pass |   |
| break |	except |	in |	raise |    |
 
 
 
Note that the reserved words "True", "False" and "None" are capitalized.  All the rest are lower-case.
 
 

<pre>


</pre>
# Data Types

Data Types means the "nature" of the data that Python can represent.  Generally speaking, it can represent all of the "core" data types such as Numbers and words/letters/strings, and there are some other datatypes that are useful when coding.

## Numbers

Numbers are represented as follows:
<code>
123
1234
1_234
1_234.56
1.234e-56
**0x**ffff  # hexadecimal
**0b**010101 # binary
**0o**377 # octal  (zero + 'oh')
</code>

In [1]:
print(123)
print(1234)
print(1_234)
print(1_234.56)
print(1.234e-56)
print(0xffff)     # hexadecimal
print(0b010101)   # binary
print(0o11)       # octal  (zero + 'oh')

123
1234
1234
1234.56
1.234e-56
65535
21
9


## Strings
In most cases, Strings are enclosed in single (') or double (") quotes.

For example:

print("hello Python programmer!")



In [3]:
print("hello Python programmer!")
print("hello #1 Python programmer!")   # digits can be part of a string
print("123")   # digits can be the entire string
print("hello 1")

hello Python programmer!
hello #1 Python programmer!
123
hello 1


<pre>

</pre>
There are some special characters in Strings.  A few common examples are:

    \n = 'newline'
    \r = 'return' (important for MS Windows!)
    \t = 'tab'

So for example:

    

In [11]:
print("this is my \n\t\t first \t Python \t app")
print("this is my \n\t\t first \t Python \t app")
print("this is my \r\t first")
print("this is my \nfirst iphone")

this is my 
		 first 	 Python 	 app
this is my 
		 first 	 Python 	 app
this is my 	 first
this is my 
first iphone


Or you can create multi-line strings using triple-quotes:


In [12]:
print("""This
is
a multi-
line 
string""")

This
is
a multi-
line 
string


In [13]:
print("this is "  +  " a way to add strings together")
print(" say this twice" * 2) # '*' is the multiplication operator
print("")
# you can also take substrings of strings
print("say what?"[2]) # the number in [] is the "index" position, starting from 0 (zero)
print("say what?"[2:6]) # the numbers in [] are the start index, ":" UP TO, NOT INCLUDING, the second index


this is  a way to add strings together
 say this twice say this twice

y
y wh


# use what you know

* create a Python script that starts with "say what?" and prints "say something!" using substring and "+" operators

<p style="visibility:hidden">
    print("say what?"[0:3] + " something!")
</p>

In [20]:
print("say what?"[0:4] + "something!")
print("say what?"[0:4] + "something!" * 2)
print( ("say what?"[0:4] + "something!") * 2)


say something!
say something!something!
say something!say something!


# Transforming strings to digits and digits to strings

NOTE: "100" and 100 are NOT the same in Python (unlike in Perl!).  "100" is a string, and 100 is an integer.

So this does not work because we are trying to add together an integer and a string:

In [21]:
print(1 + " more time!")

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Therefore, you might need to force them to represent themselves as numbers or as strings.

In Python, there are methods you can call on the various kinds of data-type to achieve this.  

    int() = "to integer"
    float() = "to float"
    str() = "to string"

For example:

In [23]:
print(str(1) + " more time")  # from integer to string
print(float("1") + 3)  # from string to float
print(float(1) + 3)  # from integer to float
print(int("1") + 3)  # from string to int

#when you use float you get a decimal point


1 more time
4.0
4.0
4


## Prove that you understand

Create some python that creates the following output:



    Dear Mark Wilkinson,
    
         You are the
              #1 teacher in the world!


<span hidden>
print("""
Dear Mark Wilkinson,

\tYou are the\n\t\t#""" + str(1) + " teacher in the world!")
</span>

In [29]:
print("Dear Mark Wilkinson, \n\n\t You are the \n\t #1 teacher in the world!")

Dear Mark Wilkinson, 

	 You are the 
	 #1 teacher in the world!


## Variables and Constants

Other than the reserved words, you can use any set of characters as your variables.  In Python, variables are usually written in lower\_case, using a \_ to separate words.  Other languages, like Java, use "CamelCase".  Please don't do this - it will annoy other Python programmers!  :-)

The rules are:

* Variables may start with a '_' or a letter character
* after the starting character, it may be letters or digits or underscores 
* variables are case-sensitive (A_B is not the same as a_b)


Constants (variables that SHOULD not change their value are indicated in CAPITAL_LETTERS... however, unlike some other programming languages, Python does not enforce this rule!  (keep out of my yard :-) )

**SCOPE**

Scope is an important concept.  A variable always exists in a particular "scope" - meaning, "within a particular context within that program".  Some variables will be "global" in their scope, others will be "local" to a particular function.

We have not yet talked about 'functions', but just think of a 'function' as a sub-program inside of your program, that can be re-used many times by different parts of your software.

Here is an example of scope.  You don't need to understand this code right now!  Just listen to what I tell you about it:


In [30]:

a = 1  # assign the value of 1 to a
b = 2  # assign 2 to b
c = 9

def add_two_numbers(x, y):
    c = x + y  # we are defining 'c' here, so it has only a local scope
    print("c inside of the function is " + str(c))

print ("a is " + str(a))
print ("b is " + str(b))
print ("c is (before function) " + str(c))
print ("....call the function...")
add_two_numbers(a,b)

print ("c is (after function) " + str(c))



a is 1
b is 2
c is (before function) 9
....call the function...
c inside of the function is 3
c is (after function) 9



## Pre-defined variables and CONSTANTS

We wont discuss all of these (there are many!), but will discuss only a few that are particularly important ones.  Many of these variables are important only in certain situations (for example, they can represent the context of a regular expression match, such as the content to the left and right of the match).    The critical ones for us right now are:   


| Function/CONST|  meaning  |
| -----  | ------  | 
|  sys.stderr  |   the filehandle for printing errors  |
|  sys.stdin   |   the filehandle for input  |
|  sys.stdout  |   the filehandle for standard output (usually the terminal window)  |
|  False    |    the value "false"  |
|  True   |    the value "true"  |
|  None   |    the value "does not exist"  |
|  sys.argv    |    the command-line arguments |
  
    

In [None]:
import sys
print (sys.argv)


## Ranges - a Python data-type similar to a constant

A "range" is a sequence of numbers from "start" to "stop" with a particular "step" value between them.  Ranges in Python are slightly different from many other languages, because Python treats a range similar to a constant - it is "one thing", that has properties.

ranges are specified using the <code>range()</code> command

1..10    - from 1 to 10

"a".."z" - from a to z

(1..10) === 5    - does the range of 1 to 10 include 5?  (true)

(1..10) === 34    - does the range of 1 to 10 include 34? (false)



In [None]:
a = range(1,10)  # numbers from 1 to 10, with INDEX POSITIONS starting from 0
print(a)  # see, it is a "thing"
print(list(a))  # to see its content, you pass it to functions or call methods
print(a.index(7))  # where is the number "7" in this range?
print(a[3])  # what is the third element of the range?
# you can also create ranges from ranges
print(a[1:4])
print("")
print("range 0 to 30 step 3")
print(list(range(0,30,3)))

#why use a range?

a = range(1,2)
b = range(1,100000000000000000000000)

# that is much easier than a = [1,2,3,4,5,6,7,8,9,10,......1000000000000000000]
# also smaller memory footprint



<pre>


</pre>

# Regular Expressions

Regular expressions are a tool that allow you to do **complex searches of text**.  It is a set of characters that are used to encode a "pattern" that can then be applied to a piece of text looking for things that match the pattern.

When I started my career in bioinformatics, I used regular expressions *every day*, so we will spend a lot of time on this section.  They are incredibly useful, especially in a field like biology where much of the information is in the form of large, complex "wordy" descriptive documents.

For example, this is a GenBank record:
     
<code>
LOCUS       SCU49845     5028 bp    DNA             PLN       21-JUN-1999
DEFINITION  Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p
            (AXL2) and Rev7p (REV7) genes, complete cds.
ACCESSION   U49845
VERSION     U49845.1  GI:1293613
KEYWORDS    .
SOURCE      Saccharomyces cerevisiae (baker's yeast)
  ORGANISM  Saccharomyces cerevisiae
            Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;
            Saccharomycetales; Saccharomycetaceae; Saccharomyces.
REFERENCE   1  (bases 1 to 5028)
  AUTHORS   Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W.
  TITLE     Cloning and sequence of REV7, a gene whose function is required for
            DNA damage-induced mutagenesis in Saccharomyces cerevisiae
  JOURNAL   Yeast 10 (11), 1503-1509 (1994)
  PUBMED    7871890
REFERENCE   2  (bases 1 to 5028)
  AUTHORS   Roemer,T., Madden,K., Chang,J. and Snyder,M.
  TITLE     Selection of axial growth sites in yeast requires Axl2p, a novel
            plasma membrane glycoprotein
  JOURNAL   Genes Dev. 10 (7), 777-793 (1996)
  PUBMED    8846915
REFERENCE   3  (bases 1 to 5028)
  AUTHORS   Roemer,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (22-FEB-1996) Terry Roemer, Biology, Yale University, New
            Haven, CT, USA
FEATURES             Location/Qualifiers
     source          1..5028
                     /organism="Saccharomyces cerevisiae"
                     /db_xref="taxon:4932"
                     /chromosome="IX"
                     /map="9"
     CDS             <1..206
                     /codon_start=3
                     /product="TCP1-beta"
                     /protein_id="AAA98665.1"
                     /db_xref="GI:1293614"
                     /translation="SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA
                     AEVLLRVDNIIRARPRTANRQHM"
     gene            687..3158
                     /gene="AXL2"
     CDS             687..3158
                     /gene="AXL2"
                     /note="plasma membrane glycoprotein"
                     /codon_start=1
                     /function="required for axial budding pattern of S.
                     cerevisiae"
                     /product="Axl2p"
                     /protein_id="AAA98666.1"
                     /db_xref="GI:1293615"
                     /translation="MTQLQISLLLTATISLLHLVVATPYEAYPIGKQYPPVARVNESF
                     TFQISNDTYKSSVDKTAQITYNCFDLPSWLSFDSSSRTFSGEPSSDLLSDANTTLYFN
         </code>
         
                     
What is the gene name for this record?  

If I give you 10 of these, could you give me the gene names for those 10?  

If I gave you 35,000 (i.e. ~the number of genes in the human genome) could youi give me the gene names for all of them?  

This is why we need regular expressions - to look for patterns in text in order to extract specific information from massive volumes of records.



## Regular expression structure in Python

We can use regular expressions to look for text or a specific text in a file 

There is [a good tutorial](https://www.tutorialspoint.com/python3/python_reg_expressions.htm) on regular expressions at TutorialsPoint.  It goes into more detail than I will here.

Regular expressions are applied using the "Regular Expression" object (called "re").   

A "search" function can be called, that takes a regular expression patten, and a piece of text as arguments, plus some additional "flags" that modify how the match is executed.

<code> re.search(pattern, text, flags) </code>
     
FIRST, I will show you one somewhat complicated regular expression just to show you what they can do.  Then we will learn step-by-step how to build them.

In [31]:
#!/usr/bin/python3
import re  # this brings the python regular expression object into your program

text = "Cats are smarter than dogs"

matchObj = re.search( r'(ca\w+) are .*?(d.*)', text, re.I)  # this should match "Cats" and "dogs the flag is case insesitive

if matchObj:
   print ("matchObj.group(0) : ", matchObj.group(0))  
   print ("matchObj.group() : ", matchObj.group())  # group() can also be used instead of group(0)
   print ("matchObj.group(1) : ", matchObj.group(1))
   print ("matchObj.group(2) : ", matchObj.group(2))
else:
   print ("No match!!")


matchObj.group(0) :  Cats are smarter than dogs
matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  dogs



## Simple searches

The most simple use of regular expressions are for "search" - does the pattern exist?  Yes or no.  We will now search for for simple characters.  The letter "A", the number "5", the character ">".  The regular expression pattern for these is straightforward:

<code>  r'A' </code>      <code>  r'5' </code>      <code>  r'>' </code>  

Search happens from "left" to "right".

For example:


In [32]:
#!/usr/bin/python3
import re  # this brings the python regular expression object into your program

text = "We will learn <A LOT> in the next 2 weeks!  I hope..."
print(text)


matchObj = re.search( r'A', text)  # this should match "A"

if matchObj:
   print ("match r'A' : ", matchObj.group())
else:
   print ("No match!!")

matchObj = re.search( r'2', text)  # this should match "2"

if matchObj:
   print ("match r'2' : ", matchObj.group())
else:
   print ("No match!!")

matchObj = re.search( r'>', text)  # this should match ">"

if matchObj:
   print ("match r'>' : ", matchObj.group())
else:
   print ("No match!!")

matchObj = re.search( r'}', text)  # this should fail to match "}" because } doesn't exist in the string

if matchObj:
   print ("match r'}' : ", matchObj.group())
else:
   print ("No match  r'}'")

# can also search for multiple characters
matchObj = re.search( r'A LOT', text)  # this should match "A LOT" 
if matchObj:
   print ("match r'A LOT' : ", matchObj.group())
else:
   print ("No match!!")


We will learn <A LOT> in the next 2 weeks!  I hope...
match r'A' :  A
match r'2' :  2
match r'>' :  >
No match  r'}'
match r'A LOT' :  A LOT


## Now you try

* Create the text "Selection of axial growth sites in yeast requires Axl2p"
* create a regular expression to match the protein name
* match "yeast requires"

In [41]:
import re
text2 = "Selection of axial growth sites in yeast requires Axl2p"
print (text2)

matchObj = re.search(r'Axl2p',text2)
if matchObj:
    print("match r'Axl2p' : ", matchObj.group())
else:
    print("no match r'Axl2p' : ", matchObj.group())
    
matchObj = re.search(r'yeast requires',text2)
if matchObj:
    print("match r'yeast requires' : ", matchObj.group())
else:
    print("no match r'yeast requires' : ", matchObj.group())

Selection of axial growth sites in yeast requires Axl2p
match r'Axl2p' :  Axl2p
match r'yeast requires' :  yeast requires


## 'arbitrary' character - matching a little more complex

So far, regular expressions don't seem very useful.  We have to already know the name of the protein before we can match it!  How do we match things that we *don't* already know?

Regular expressions have special symbols that represent 'kinds' of characters.  For example:

<code>
        \w --> a 'word' character (a-z A-Z 0-9 _)
        \W --> a 'non-word' character (everthing EXCEPT a word character)
        \d --> a 'digit' (0-9) = any digit
        \D --> a non-digit
        \s --> a space (newline, return, space, tab)
        \S --> non-space (anything other than a space)
        .  --> ANY character EXCEPT the newline (\n) character.
        ^  --> anchor to the beginning (the regular expression MUST match the first character of the text)
        $  --> anchor to the end (the regular expression MUST match the last character of the text)
        \b --> 'word boundary'  (the invisible space between the end of a word and the space character after the word)
        \t --> tab
        \n --> newline
        \r --> return
        \  --> make a special character non-special (e.g. \. means "period")
        [a-z] --> the range of lowercase letters from a to z
        [A-Z] --> the range of uppercase letters from A to Z
        [0-9] --> the range of digits from 0 to 9 = digits from one number to another
</code>

As before, matches are read from left to right, and the first match found is reported.

For example:
    

In [43]:
#!/usr/bin/python3
import re  # this brings the python regular expression object into your program

text = "We will learn <A LOT> in the next 2 weeks!  I hope... yes"
print(text)


matchObj = re.search( r'\w', text)  # this should match the first 'word character'
if matchObj:
   print ("match r'\w' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'\w\w', text)  # this should match the first two word characters
if matchObj:
   print ("match  r'\w\w' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'\d', text)  # this should match the first digit
if matchObj:
   print ("match r'\d' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'\d\d', text)  # this should match the first two digits (fail!)
if matchObj:
   print ("match  r'\d\d' :  ", matchObj.group())
else:
   print ("No match of \d\d!!")



matchObj = re.search( r'\d\s\w', text)  # this should match digit, space, word character
if matchObj:
   print ("match r'\d\s\w': ", matchObj.group())
else:
   print ("No match!!")



matchObj = re.search( r'\S\S\s\S\S\S\S', text)  # this should match the first instance of 2 nonspace, space, 4 nonspace
if matchObj:
   print ("match r'\S\S\s\S\S\S\S' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'>............', text)  # this should match the first instance  '2' and a bunch of characters
if matchObj:
   print ("match r'>............' : ", matchObj.group())
else:
   print ("No match!!")



matchObj = re.search( r'[A-Z]', text)  # this should match the the first capital letter
if matchObj:
   print ("match r'[A-Z]': ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'[a-z]', text)  # this should match the first lowercase letter
if matchObj:
   print ("match r'[a-z]' : ", matchObj.group())
else:
   print ("No match!!")



matchObj = re.search( r'^[a-z]', text)  # this should match the first lowercase letter AT THE BEGINNING OF THE TEXT
if matchObj:
   print ("match ^[a-z] : ", matchObj.group())
else:
   print ("No match of lowercase at beginning of text!!")



matchObj = re.search( r'[a-z]$', text)  # this should match the lowercase letter AT THE END OF THE TEXT
if matchObj:
   print ("match r'[a-z]$' : ", matchObj.group())
else:
   print ("No match!!")



matchObj = re.search( r'\.\w', text)  # this should match a period followed by a word
if matchObj:
   print ("match r'\.\w' : ", matchObj.group())
else:
   print ("No match of period followed by word!!")


matchObj = re.search( r'\..\w', text)  # this should match a period followed by 2 X anything 
if matchObj:
   print ("match r'\..\w' : ", matchObj.group())
else:
   print ("No match of period followed by word!!")

matchObj = re.search( r'\w{5}', text)  # match a word character 5 times
if matchObj:
   print ("match r'\w{5}' : ", matchObj.group())
else:
   print ("No match of five characters")



We will learn <A LOT> in the next 2 weeks!  I hope... yes
match r'\w' :  W
match  r'\w\w' :  We
match r'\d' :  2
No match of \d\d!!
match r'\d\s\w':  2 w
match r'\S\S\s\S\S\S\S' :  We will
match r'>............' :  > in the next
match r'[A-Z]':  W
match r'[a-z]' :  e
No match of lowercase at beginning of text!!
match r'[a-z]$' :  s
No match of period followed by word!!
match r'\..\w' :  . y
match r'\w{5}' :  learn


## Try some more matches for yourself!


In [47]:
matchObj = re.search( r'[a-z]{5}', text)  # match a word character 5 times
if matchObj:
   print ("match r'[a-z]{5}' : ", matchObj.group())
else:
   print ("No match of five characters")


match r'[a-z]{5}' :  learn


## test yourself

      selection of axial growth sites in yeast requires Axl2p

* find the gene name in this sentence
 * assume that gene names are always 5 characters long, 
 * gene names are at the end of a sentence
 * gene name patters are three letters, a number, and a letter

In [60]:
text="selection of axial growth sites in yeast requires Axl2p"


#assuming gene names are 5 characters long
matchObj = re.search( r'\s\w{5}', text)  # match a word character 5 times
if matchObj:
   print ("match r'\s\w{5}' : ", matchObj.group())
else:
   print ("No match of five characters")

#assuming gene names are at the end of the sentence using $
matchObj = re.search( r'\w{5}$', text)  # match a word character 5 times
if matchObj:
   print ("match r'\w{5}$' : ", matchObj.group())
else:
   print ("No match of five characters")

#assuming gene names 3 character numb and letter
matchObj = re.search( r'\w{3}\d\w{1}', text)  # match a word character 5 times
if matchObj:
   print ("match r'[a-z]{3}\[0-9\w{1}': ", matchObj.group())
else:
   print ("No match of five characters")




match r'\s\w{5}' :   axial
match r'\w{5}$' :  Axl2p
match r'[a-z]{3}\[0-9\w{1}':  Axl2p


## Matching more than once

There are other symbols that have a special meaning.  
<code>         
`     + --> the preceding pattern should match "one or more times"`
`     * --> the preceding pattern should match "zero or more times"`
`     ? --> optional OR "limit to the smallest match possible"`
</code>

They are used in this way:

<code>
`    r'\w+'  --> one or more word characters`
`    r'\d*'  --> zero or more digits`
`    r'\s*\w'  --> zero or more spaces, followed by a word character`
</code>

These special symols have surprising (and often bad!) behaviors!  They are probably the hardest part to understand...

See their behavior below:


In [62]:
text = 'selection of axial       growth sites in yeast requires Axl2p'

matchObj = re.search( r'\w+', text)  # this should match the first word of any length 
if matchObj:
   print ("match r'\w+' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'\w+$', text)  # this should match the last word of any length
if matchObj:
   print ("match r'\w+$' : ", matchObj.group())
else:
   print ("No match!!")



matchObj = re.search( r'axial\s*\w+', text)  # this should match 'axial', any number of spaces (including zero!), and the next word
if matchObj:
   print ("match r'axial\s*\w+' : ", matchObj.group())
else:
   print ("No match!!")


print()

print("TRYING r'.*' --> match everything")
matchObj = re.search( r'.*', text)  # this should match any character any number of times (i.e. everything!)
if matchObj:
   print ("match : ", matchObj.group())
else:
   print ("No match!!")


print("TRYING r'.*\s' --> maximal match")
matchObj = re.search( r'.*\s', text)  # this should match any character any number of times, followed by a space
if matchObj:
   print ("match : ", matchObj.group())
else:
   print ("No match!!")

#you take everything you possibly can and then backup from right to left looking for the space 

print("TRYING r'.*?\s'  --> minimal match")
matchObj = re.search( r'.*?\s', text)  # any character any number of times, followed by a space, smallest match possible
if matchObj:
   print ("match : ", matchObj.group())
else:
   print ("No match!!")

#the ? will make it take as little as it can so we changed the direction of the search



match r'\w+' :  selection
match r'\w+$' :  Axl2p
match r'axial\s*\w+' :  axial       growth

TRYING r'.*' --> match everything
match :  selection of axial       growth sites in yeast requires Axl2p
TRYING r'.*\s' --> maximal match
match :  selection of axial       growth sites in yeast requires 
TRYING r'.*?\s'  --> minimal match
match :  selection 


## matching this or that, or anything other than this or that

It is possible to be more selective than "any character".  Maybe you only want words that start with a vowel!  (why?  I have no idea... ;-) )

To make these kinds of expressions, use square brackets.  For example:  <code> [aeiou] </code>

To do the OPPOSITE, use square brackets with a '^'.  For example:  <code> [^aeiou] </code> **(not aeio or u)**

You can also say "or" for multiple characters using a '|' ("pipe").  For example <code> cat|dog </code>


In [69]:
text = 'selection of axial       growth sites in yeast requires Axl2p'

matchObj = re.search( r'[aeiou]\w+', text)  # aeiou followed by the rest of the word
if matchObj:
   print ("match r'[aeiou]\w+' : ", matchObj.group(), "(not what we expected?)")
else:
   print ("No match!!")


matchObj = re.search( r'\b[aeiou]\w+', text)  # word boundary, aeiou, followed by the rest of the word
if matchObj:
   print ("match r'\\b[aeiou]\w+': ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'\b[^aeiou]\w+', text)  # word boundary, not a vowel, followed by the rest of the word
if matchObj:
   print ("match r'\\b[^aeiou]\w+' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'\b[AEIOU]\w+', text)  # word boundary, capital vowel, followed by the rest of the word
if matchObj:
   print ("match r'\\b[AEIOU]\w+' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'axial|growth', text)  # axial or growth (matches the first success)
if matchObj:
   print ("match r'axial|growth' : ", matchObj.group())
else:
   print ("No match!!")


matchObj = re.search( r'nothing|growth', text)    # nothing or growth (matches the first success)
if matchObj:
   print ("match r'nothing|growth' : ", matchObj.group())
else:
   print ("No match!!")


match r'[aeiou]\w+' :  election (not what we expected?)
match r'\b[aeiou]\w+':  of
match r'\b[^aeiou]\w+' :  selection
match r'\b[AEIOU]\w+' :  Axl2p
match r'axial|growth' :  axial
match r'nothing|growth' :  growth


## Capturing sub-portions of what you match

You can use round brackets () to capture parts of regular expressions.  These become elements of the .groups function.

For example:



In [68]:
text = "A: we will learn regular expressions. B: we will apply that knowledge."


matchObj = re.search( r'[A-Z]:\s(.*?\.)', text)  # match the index letter, then CAPTURE the rest of the sentence
if matchObj:
   print ("match : ", matchObj.group())
   print ("match group 1 : ", matchObj.group(1))
else:
   print ("No match!!")


matchObj = re.search( r'[A-Z]:\s(.*?\.).*?[A-Z]:\s(.*?\.)', text)  # notice the .*?  ....remember for later
if matchObj:
   print ("match : ", matchObj.group())
   print ("match group 1 : ", matchObj.group(1))
   print ("match group 2 : ", matchObj.group(2))
else:
   print ("No match!!")


match :  A: we will learn regular expressions.
match group 1 :  we will learn regular expressions.
match :  A: we will learn regular expressions. B: we will apply that knowledge.
match group 1 :  we will learn regular expressions.
match group 2 :  we will apply that knowledge.


## Case-insensitive matches and multi-line matches

We haven't discussed the "flags" that are the third element that can be passedto a regular expression search.

There are a variety of flags (you can look them up yourself!), but we will only discuss two of them:  Case insenitive search, and multi-line search.  These are represented by the 'flags' re.I (case-insensitive) and re.S (multi-line)

Case insensitive example:

<code>
    matchObj = re.search( r'birne', text, re.I) # matches Birne or birne or bIRne or...
</code>


In [70]:

text = 'selection of axial       growth sites in yeast requires Axl2p'

matchObj = re.search( r'axl2p', text, re.I)  # axl2p regardless of capital letters
if matchObj:
   print ("match : ", matchObj.group())
else:
   print ("No match!!")

match :  Axl2p


<pre>

</pre>
Multi-line matches require a bit more explanation.  Remember the definition of the <code> . </code> symbol?  It was "matches anything except newline".  We should first see what that means:



In [71]:
#we used the enter key to get onto the next line and .* wont match that

text = """A: we will learn regular expressions. 
B: we will apply that knowledge."""

# this is exactly the same regular expression as in the previous example, but now the text is split into two lines
matchObj = re.search( r'[A-Z]:\s(.*?\.).*?[A-Z]:\s(.*?\.)', text)  
if matchObj:
   print ("match group 1 : ", matchObj.group(1))
   print ("match group 2 : ", matchObj.group(2))
else:
   print ("No match!!")




No match!!


<pre>

</pre>
Why is there no match?   Because the <code>.</code> symbol does not match the newline character --> the regular expression doesn't cross the newline --> the second match "[B]: we wil apply that knowledge" never happens.

the <code>re.S</code> changes the definition of <code>.</code> so that it now also matches newline (i.e. it matches EVERYTHING!)

Let's try again with the multiline (S) flag:


In [72]:


text = """A: we will learn regular expressions. 
B: we will apply that knowledge."""

# this is exactly the same regular expression as in the previous example, but now the text is split into two lines
matchObj = re.search( r'[A-Z]:\s(.*?\.).*?[A-Z]:\s(.*?\.)', text, re.S)  
if matchObj:
   print ("match group 1 : ", matchObj.group(1))
   print ("match group 2 : ", matchObj.group(2))
else:
   print ("No match!!")




match group 1 :  we will learn regular expressions.
match group 2 :  we will apply that knowledge.


<pre>


</pre>
## Now you try

* use Regular Expressions to extract the Genbank Reference Sequence identifier (NM_115294.6) from the first FASTA record below
* use Regular Expressions to extract the gene name (AP3) from the title line of the first record
* use Regular Expressions to extarct the nucleotide sequence from the first FASTA record
* **prove that your Regular Expressions are "good" by extracting the SAME INFORMATION from the second FASTA record **  (Do you remember the very first day we talked about "abstraction"?  This is another example of abstraction - a single regular expression that matches every possible case)

<pre>

>NM_115294.6 Arabidopsis thaliana K-box region and MADS-box transcription factor family protein (AP3), mRNA
AAAAAAATCAGTTTACATAAATGGAAAATTTATCACTTAGTTTTCATCAACTTCTGAACTTACCTTTCAT
GGATTAGGCAATACTTTCCATTTTTAGTAACTCAAGTGGACCCTTTACTTCTTCAACTCCATCTCTCTCT
TTCTATTTCACTTCTTTCTTCTCATTATATCTCTTGTCCTCTCCACCAAATCTCTTCAACAAAAAGATTA
AACAAAGAGAGAAGAATATGGCGAGAGGGAAGATCCAGATCAAGAGGATAGAGAACCAGACAAACAGACA
AGTGACGTATTCAAAGAGAAGAAATGGTTTATTCAAGAAAGCACATGAGCTCACGGTTTTGTGTGATGCT
AGGGTTTCGATTATCATGTTCTCTAGCTCCAACAAGCTTCATGAGTATATCAGCCCTAACACCACAACGA
AGGAGATCGTAGATCTGTACCAAACTATTTCTGATGTCGATGTTTGGGCCACTCAATATGAGCGAATGCA
AGAAACCAAGAGGAAACTGTTGGAGACAAATAGAAATCTCCGGACTCAGATCAAGCAGAGGCTAGGTGAG
TGTTTGGACGAGCTTGACATTCAGGAGCTGCGTCGTCTTGAGGATGAAATGGAAAACACTTTCAAACTCG
TTCGCGAGCGCAAGTTCAAATCTCTTGGGAATCAGATCGAGACCACCAAGAAAAAGAACAAAAGTCAACA
AGACATACAAAAGAATCTCATACATGAGCTGGAACTAAGAGCTGAAGATCCTCACTATGGACTAGTAGAC
AATGGAGGAGATTACGACTCAGTTCTTGGATACCAAATCGAAGGGTCACGTGCTTACGCTCTTCGTTTCC
ACCAGAACCATCACCACTATTACCCCAACCATGGCCTTCATGCACCCTCTGCCTCTGACATCATTACCTT
CCATCTTCTTGAATAATTAAAGGCTAAAAGGTTTGCTGGTGCCATCATTGTCTATCTAATTATTTAGTAA
CTACTTAAAACATAAGGCATGGTGTTGCTAAAACCTTAAACTGTCATGTTTCTTAGTTATGTATTTTAAA
GCCTAAAGAAATATGGATTGTGTGATCAGTAGTGCTTAGGCTTATTGTGTGTGGAATGTTTTCAAGACTT
TTATCATGTATCGTATTATTATATTGACCACTCTACTTAATTATGCTACAAATTTACTCGATTTGATTTT
CTACTTGAATGCATATATATTGTC


>NM_127349.4 Arabidopsis thaliana Homeodomain-like superfamily protein (WUS), mRNA
CTCTCACACAAAACCTAAAATCTCTTTACTACCAGCAAGTTGTTTTCTTGCTAACTTCAAACTTCTCTTT
CTCTTGTTCCTCTCTAAGTCTTGATCTTATTTACCGTTAACTTTGTGAACAAAAGTCGAATCAAACACAC
ATGGAGCCGCCACAGCATCAGCATCATCATCATCAAGCCGACCAAGAAAGCGGCAACAACAACAACAACA
AGTCCGGCTCTGGTGGTTACACGTGTCGCCAGACCAGCACGAGGTGGACACCGACGACGGAGCAAATCAA
AATCCTCAAAGAACTTTACTACAACAATGCAATCCGGTCACCAACAGCCGATCAGATCCAGAAGATCACT
GCAAGGCTGAGACAGTTCGGAAAGATTGAGGGCAAGAACGTCTTTTACTGGTTCCAGAACCATAAGGCTC
GTGAGCGTCAGAAGAAGAGATTCAACGGAACAAACATGACCACACCATCTTCATCACCCAACTCGGTTAT
GATGGCGGCTAACGATCATTATCATCCTCTACTTCACCATCATCACGGTGTTCCCATGCAGAGACCTGCT
AATTCCGTCAACGTTAAACTTAACCAAGACCATCATCTCTATCATCATAACAAGCCATATCCCAGCTTCA
ATAACGGGAATTTAAATCATGCAAGCTCAGGTACTGAATGTGGTGTTGTTAATGCTTCTAATGGCTACAT
GAGTAGCCATGTCTATGGATCTATGGAACAAGACTGTTCTATGAATTACAACAACGTAGGTGGAGGATGG
GCAAACATGGATCATCATTACTCATCTGCACCTTACAACTTCTTCGATAGAGCAAAGCCTCTGTTTGGTC
TAGAAGGTCATCAAGAAGAAGAAGAATGTGGTGGCGATGCTTATCTGGAACATCGACGTACGCTTCCTCT
CTTCCCTATGCACGGTGAAGATCACATCAACGGTGGTAGTGGTGCCATCTGGAAGTATGGCCAATCGGAA
GTTCGCCCTTGCGCTTCTCTTGAGCTACGTCTGAACTAGCTCTTACGCCGGTGTCGCTCGGGATTAAAGC
TCTTTCCTCTCTCTCTCTCTTTCGTACTCGTATGTTCACAACTATGCTTCGCTAGTGATTAATGATGCAG
TTGTTATATTAGTAGTTAACTAGTTATCTCTCGTTATGTGTAATTTGTAATTACTAGCTAAGTATCGTCT
AGGTTTTAATTGTAATTGACAACCGTTTTATCTCTATGATGAATAAGTTAAAATTTTA
</pre>

<span style="visibility:hidden;">
text1 = """"""
text2 = """"""
import re
#use Regular Expressions to extract the Genbank Reference Sequence identifier (NM_115294.6) from the first FASTA record below
#use Regular Expressions to extract the gene name (AP3) from the title line of the first record
#use Regular Expressions to extarct the nucleotide sequence from the first FASTA record
#*prove that your Regular Expressions are "good" by extracting the SAME INFORMATION from the second FASTA record * 
mo = re.search(r'>([A-Z]{2}_\d+\.\d+)\b', text1)
print(mo.group(1))
mo = re.search(r'\(([^\)]+)\)', text1)
print(mo.group(1))
mo = re.search(r'>.*?\n(.*)', text1, re.S)
print(mo.group(1))
mo = re.search(r'>([A-Z]{2}_\d+\.\d+)\b', text2)
print(mo.group(1))
mo = re.search(r'\(([^\)]+)\)', text2)
print(mo.group(1))
mo = re.search(r'>.*?\n(.*)', text2, re.S)
print(mo.group(1))
</style>

In [128]:
text3 =">NM_115294.6 Arabidopsis thaliana K-box region and MADS-box transcription factor family protein (AP3), mRNA \
AAAAAAATCAGTTTACATAAATGGAAAATTTATCACTTAGTTTTCATCAACTTCTGAACTTACCTTTCAT \
GGATTAGGCAATACTTTCCATTTTTAGTAACTCAAGTGGACCCTTTACTTCTTCAACTCCATCTCTCTCT \
TTCTATTTCACTTCTTTCTTCTCATTATATCTCTTGTCCTCTCCACCAAATCTCTTCAACAAAAAGATTA \
AACAAAGAGAGAAGAATATGGCGAGAGGGAAGATCCAGATCAAGAGGATAGAGAACCAGACAAACAGACA \
AGTGACGTATTCAAAGAGAAGAAATGGTTTATTCAAGAAAGCACATGAGCTCACGGTTTTGTGTGATGCT \
AGGGTTTCGATTATCATGTTCTCTAGCTCCAACAAGCTTCATGAGTATATCAGCCCTAACACCACAACGA \
AGGAGATCGTAGATCTGTACCAAACTATTTCTGATGTCGATGTTTGGGCCACTCAATATGAGCGAATGCA \
AGAAACCAAGAGGAAACTGTTGGAGACAAATAGAAATCTCCGGACTCAGATCAAGCAGAGGCTAGGTGAG \
TGTTTGGACGAGCTTGACATTCAGGAGCTGCGTCGTCTTGAGGATGAAATGGAAAACACTTTCAAACTCG \
TTCGCGAGCGCAAGTTCAAATCTCTTGGGAATCAGATCGAGACCACCAAGAAAAAGAACAAAAGTCAACA \
AGACATACAAAAGAATCTCATACATGAGCTGGAACTAAGAGCTGAAGATCCTCACTATGGACTAGTAGAC \
AATGGAGGAGATTACGACTCAGTTCTTGGATACCAAATCGAAGGGTCACGTGCTTACGCTCTTCGTTTCC \
ACCAGAACCATCACCACTATTACCCCAACCATGGCCTTCATGCACCCTCTGCCTCTGACATCATTACCTT \
CCATCTTCTTGAATAATTAAAGGCTAAAAGGTTTGCTGGTGCCATCATTGTCTATCTAATTATTTAGTAA \
CTACTTAAAACATAAGGCATGGTGTTGCTAAAACCTTAAACTGTCATGTTTCTTAGTTATGTATTTTAAA \
GCCTAAAGAAATATGGATTGTGTGATCAGTAGTGCTTAGGCTTATTGTGTGTGGAATGTTTTCAAGACTT \
TTATCATGTATCGTATTATTATATTGACCACTCTACTTAATTATGCTACAAATTTACTCGATTTGATTTT \
CTACTTGAATGCATATATATTGTC"


#Sequence identifier
matchObj = re.search(r'[A-Z]{2}_\d+\.\d', text3)  
if matchObj:
   print ("match", matchObj.group())
else:
   print ("No match!!")

#Gene symbol
matchObj = re.search(r'\((\w+)\)', text3)  
if matchObj:
   print ("match", matchObj.group())
else:
   print ("No match!!")

matchObj = re.search(r'[(](\w+)[)]', text3)  
if matchObj:
   print ("match", matchObj.group())
else:
   print ("No match!!")

matchObj = re.search(r'\(([^\)]+)\)', text3)  #start a bracket and capture anything that is not ^\ a bracket 
#this way we wont make assumptions about the aspect of the identifier
if matchObj:
   print ("match", matchObj.group())
else:
   print ("No match!!")

#Gene sequence
matchObj = re.search (r'\n[ATGC\n]*', text3,re.S)  
if matchObj:
   print ("match", matchObj.group())
else:
   print ("No match!!")



match NM_115294.6
match (AP3)
match (AP3)
match (AP3)
No match!!


# Data Structures

So far we have talked about simple data types (strings, numbers, ranges, and regular expressions).  Now we will talk about higher level data structures - lists, "dictionaries", etc.


## Lists

A list in python is a series of values separated by commas, and enclosed in square brackets.  For example

    a = [1,2,3,4,5]


In [152]:
a = [1,2,3,4,5]
print(a)

[1, 2, 3, 4, 5]



It is *usually* the case that you will only put the same KIND of value into a list.  e.g. a list of numbers or a list of strings.  This is not *always* true, but please believe me when I tell you - if you don't follow this rule, your code will not be as reliable as it should be!

But, just to prove that it is OK:

    a = [1, 2, "three", 4, 5]
    


In [130]:
a = [1,2,"three",4,5]
print(a)

[1, 2, 'three', 4, 5]



Every element of a list has an "address" or an "index position".  In Python, **index position numbers start with '0'** so the first element of the list is element index '0', the third element is element index '2', etc.  To see the individual elements, you put the index number after the variable:

    a = ["one", "two", "three", "four", "five"]
    print(a[2])
    

In [151]:
a = ["one", "two", "three", "four", "five"]
print(a)


['one', 'two', 'three', 'four', 'five']



### List Slices

A "slice" is a sub-list inside of a list.  The notation for a slice is:  <code>index_start:index_end:step</code>

for example:

    a = ["one", "two", "three", "four", "five"]
    print(a[2:4])
    print(a[4:2:-1])

**The start index is INCLUDED in the output, the end index IS NOT INCLUDED in the output.**  Be careful!


In [138]:
a = ["one", "two", "three", "four", "five"]
print(a[2:4])  # this is index 2 and 3!!  NOT 2, 3 and 4!!!
print(a[4:2:-1])  # note how easy it is to make mistakes with list indexes!
print(a[3:1:-1])

['three', 'four']
['five', 'four']
['four', 'three']



It is possible to specify only the start, or only the end index, by eliminating the other.  For example:

    a = ["one", "two", "three", "four", "five"]
    print(a[2:])  # print from index 2 until the end
    print(a[:4])  # print from index 0 until index 3 
    
**Nunca se incluye el último index cuidado!!!!!**

In [139]:
a = ["one", "two", "three", "four", "five"]
print(a[2:])  # print from index 2 until the end
print(a[:4])  # print from index 0 until index 3

['three', 'four', 'five']
['one', 'two', 'three', 'four']



### Modify list values

You can modify a list value, or a slice value, in the same way as assigning any other variable.  Simply assign a value to that index position.  For example:

    a = ["one", "two", "three", "four", "five"]
    print(a[2])
    a[2] = "THREE"  # switch it to capital letters
    print(a[2])
    
    a = ["one", "two", "three", "four", "five"]
    print(a[2:3])
    a[2] = ["TWO", "THREE"]  # NOTE that you are assiging a list, because a slice is a list!
    print(a)

In [142]:
a = ["one", "two", "three", "four", "five"]
print(a[2])
a[2] = "THREE"  # switch it to capital letters
print(a[2])

print()

a = ["one", "two", "three", "four", "five"]
print(a[2])  # scalar
print(a[2:3])   # list
a[2:4] = "THREE", "FOUR"  # number of elements needs to match!
print(a)


three
THREE

three
['three']
['one', 'two', 'THREE', 'FOUR', 'five']



<p style="color:red;">Again I will tell you:  BE VERY CAREFUL WITH SLICES!!  Especially if you are going to modify the values of a slice!  Watch how easily we can get confused an completely ruin our data!</p>


In [147]:
a = ["one", "two", "three", "four", "five"]
print(a[2:3])    # THIS IS NOT A SLICE WITH TWO ELEMENTS!!!
a[2:4] = "THREE", "FOUR"   # HERE WE ADD TWO ELEMENTS...  oops!!!
print(a)


['three']
['one', 'two', 'THREE', 'FOUR', 'five']


In [144]:
print(  a[9]  )# doesn't exist!

IndexError: list index out of range

In [145]:
a[9] = "nine"  # this also fails! ... you cannot update an index that doesn't exist


IndexError: list assignment index out of range

In [148]:
a = a + ["six", "seven", "eight", "nine"]
print(a)

['one', 'two', 'THREE', 'FOUR', 'five', 'six', 'seven', 'eight', 'nine']



## other list operations

You can add to a list using the <code>.append()</code> method.

You can remove a specific value from a list using teh <code>.remove()</code> method.

You can reverse a list using the <code>.reverse()</code> method.


In [158]:
mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(mylist)
mylist.append(11)
print(mylist)

mylist.remove(2)
print(mylist)

mylist.reverse()
print(mylist)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[1, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[11, 10, 9, 8, 7, 6, 5, 4, 3, 1]



## YOU TRY

* create the list <code>["one", "two", "three", "four", "five"]</code>
* change the value of "five" into the number "5"
* change it back again (to "five")
* invert the elements "one" and "two" so that the list is now <code>["two", "one", "three", "four", "five"]</code>
* invert the elements "three" and "four" **WITHOUT TYPING THE WORDS "three" or "four"** in your code!

<span style="visibility:hidden;">
    
mylist = ["one", "two", "three", "four", "five"]
mylist[4] = 5
print(mylist)
mylist[4] = "five"
print(mylist)
mylist[0:2] = "two", "one"
print(mylist)

mylist[2:4]= mylist[3:1:-1]
print(mylist)

</span>

In [174]:
b = ["one","two","three","four","five"]


b[4]="5"
print(b)

b[4]="five"
print(b)

b[0]="two"
b[1]="one"
print(b)

b[2:4]=b[3:1:-1]
print(b)

['one', 'two', 'three', 'four', '5']
['one', 'two', 'three', 'four', 'five']
['two', 'one', 'three', 'four', 'five']
['two', 'one', 'four', 'three', 'five']


## Multi-layer lists

In [175]:

listolist = []
listolist.append(["my", "dog", "has", "fleas"])
listolist.append(["my", "cat", "has", "piojos?"])
print(listolist)
print(listolist[0][1]) #list 0 index position 1
print(listolist[1][1])

[['my', 'dog', 'has', 'fleas'], ['my', 'cat', 'has', 'piojos?']]
dog
cat



# "Dictionary" data structures

(Note:  In many languages (not Python) this data structure is called a "hash", not a "dictionary.  They are almost the same thing...)

A Dictionary is a set of key/value pairs.  For example, imagine we needed to create a dictionary of the ages of people in the class:

     Mark =  50
     Jonas = 25
     Alberto = 24
     SmartGuy = 16
     
This is what a Dictionary does.  The syntax for the dictionary data structure is:

    a = {key1: value1 ,  key2: value2  ,  key3: value3 .......}
    
Keys and Values can be any basic data type, data structure, or variable  eg. for a key string, it is "key1"



In [176]:
dict = {'key1': 1 ,  'key2': 2  ,  'key3': 3}
print(dict)

x = "keyX"
y = "keyY"
z = "keyZ"

dict2 = {x: 1 ,  y: 2  ,  z: 3}  # variables are expanded to whatever it is...
print(dict2)

{'key1': 1, 'key2': 2, 'key3': 3}
{'keyX': 1, 'keyY': 2, 'keyZ': 3}



# Indexes of a Dictionary

This is exactly like an Array, but you use the key instead of an index number.  For example:

    x = "keyX"
    y = "keyY"
    z = "keyZ"
    a = {x: 1 ,  y: 2  ,  z: 3}
    
    print(  a["keyX"]   )
    


In [177]:
x = "keyX"
y = "keyY"
z = "keyZ"
dict = {x: 1 ,  y: 2  ,  z: 3}

print( "keyX has the value: ", dict["keyX"]   )

keyX has the value:  1


In [178]:
#Patient name:  Mark
#Patient age:  27
#Patient weight: 190kg

thispatient = {'name': "Mark",  'age': 27, 'weight': '190kg'}

print("This patient's name is: ", thispatient['name'])
print("This patient's weight is: ", thispatient['weight'])
print("This patient's age is: ", thispatient['age'])


children = ['jay', 'silent bob']

patient1 = {'name': "Mark",  'age': 27, 'weight': '190kg'}
patient2 = {'name': "Silvia",  'age': 29, 'weight': '120kg'}
patient3 = {'name': "Julio",  'age': 35, 'weight': '290kg', 'mychildren': children}

patients = { 'ID244':patient2,  'ID123': patient1,  'ID345':patient3}
print(patients)
print(patients['ID244']['name'])
print(patients['ID345']['mychildren'][0])

This patient's name is:  Mark
This patient's weight is:  190kg
This patient's age is:  27
{'ID244': {'name': 'Silvia', 'age': 29, 'weight': '120kg'}, 'ID123': {'name': 'Mark', 'age': 27, 'weight': '190kg'}, 'ID345': {'name': 'Julio', 'age': 35, 'weight': '290kg', 'mychildren': ['jay', 'silent bob']}}
Silvia
jay



# assigning values

Just as with a List, you can assign a new value to a Dictionary:

    dict["keyX"] = "XXX"
    print(dict)
    

In [179]:

dict["keyX"] = "XXXXXXXXX"
print(dict)


{'keyX': 'XXXXXXXXX', 'keyY': 2, 'keyZ': 3}


 ### but note that we have more flexibility with a Dictionary!  Watch..
You can asign new keys whenever you want, this could not be done with list
    

In [180]:
dict["Key Doesn't Exist"] = "ha!  This works!"
print(dict)

{'keyX': 'XXXXXXXXX', 'keyY': 2, 'keyZ': 3, "Key Doesn't Exist": 'ha!  This works!'}


### deleting values

If you want to delete a Dictionary entry, use the "del" function:
    
    del dict["Key Doesn't Exist"]
    print(dict)
    
if you want to eleminate all key/values from the Dictionary (but maintain the dictionary variable!) use the "clear" "method" (we will discuss "methods" when we talk about object oriented programming):

    dict.clear()
    print(dict)

In [181]:
del dict["Key Doesn't Exist"]
dict.pop("Key Doesn't Exist")
print(dict)

dict.clear()
print(dict)


{'keyX': 'XXXXXXXXX', 'keyY': 2, 'keyZ': 3}
{}



## more complex data structures

imagine we have two classes of students, and we want to create a dictionary from those two classes...  how?

    class1 = {"Mark": 50,  "Jonas": 25,  "Alberto": 24, "SmartGuy": 16}
    class2 = {"Julio": 55,  "Laura": 23,  "Angela": 25, "SmartWoman": 14}

    classes = {"Morning": class1,  "Afternooon": class2}

    print(classes)

In [182]:
class1 = {"Mark": 50,  "Jonas": 25,  "Alberto": 24, "SmartGuy": 16}
class2 = {"Julio": 55,  "Laura": 23,  "Angela": 25, "SmartWoman": 14}

classes = {"Morning": class1,  "Afternooon": class2}

print(classes)
print()

print("Mark, from the morning class, has age:  ",  classes["Morning"]["Mark"])

{'Morning': {'Mark': 50, 'Jonas': 25, 'Alberto': 24, 'SmartGuy': 16}, 'Afternooon': {'Julio': 55, 'Laura': 23, 'Angela': 25, 'SmartWoman': 14}}

Mark, from the morning class, has age:   50



# NOW YOU

Imagine that we want to be able to say "hello" or "goodbye" in either English, Spanish, or German.  Create a SINGLE dictionary that ***abstracts this problem***

    English ==>  "hello", "bye!"
    Spanish ==>  "hola", "hasta luego!"
    German ==>   "hallo", "tschuess!"

Your code should include two variables:  
* language (the value of this variable is either "English", "Spanish", or "German")
* event (the value of this variable is either "greeting", or "departure")

I want to access the corect phrase for "English greeting", or "German departure"

**This is a challenge!!!  If you get it right, then you deeply understand both dictionaries, and abstraction!  :-)**

<span style="visibility:hidden;">

p1= {'greeting': "hello", 'departure': "bye!"}
p2= {'greeting': "hola", 'departure': "hasta luego!"}
p3= {'greeting': "hallo", 'departure':"tschuess!"}
phrases = {'English': p1, 'Spanish': p2, 'German': p3}
#print(phrases)

language = 'Spanish'
event = 'departure'

print(phrases[language][event])

</span>

In [191]:
list1 = {"english": "hello", "german": "hallo", "spanish": "hola"}
list2 = {"english": "bye","german": "tschuess","spanish": "adios"}


hibye = {"Greeting": list1, "Afternoon": list2}
print(hibye)

print("English greeting:", hibye["Greeting"]["english"])
print("German departure:", hibye["Afternoon"]["german"])

{'Greeting': {'english': 'hello', 'german': 'hallo', 'spanish': 'hola'}, 'Afternoon': {'english': 'bye', 'german': 'tschuess', 'spanish': 'adios'}}
English greeting: hello
German departure: tschuess


## Operators

| operator | definition | example |
| --- | --- | --- |
| ASSIGNMENT |
| = | assign value b to a | a = b |
| += | add and assign | a+=b --> a = a + b |
| -= | subtract and assign | a-=b --> a = a - b |
| \*= | multiply and assign | a\*=b --> a = a * b |
| \\= | divide and assign |  a\=b --> a = a \ b |
| %= | modulus and assign | a%=b --> a = **remainder** of a\b |
| \*\*= | exponent and assign | a\*\*=b  --> a = a\*\*b |
| MATHEMATICS (returns output of operation) |
| + | plus | a + b |
| - | minus | a - b |
| * | multiply | a * b |
| \ | divide | a\b |
| \*\* | exponent | a\*\*b |
| % | modulus | a % b --> the **remainder** after a\b | 
| COMPARISON (returns TRUE or FALSE [except <=>]) |
| == | equals? | a == b |
| != | not equals? | a != b |
| > | greater than? | a > b |
| < | less than? | a < b | 
| >= | greater than or equal? | a >= b |
| <= | less than or equal? | a <= b |
| <=> | combined comparison | a <=> b: returns 0 if a = b, 1 if a > b, and -1 if a < b |
| LOGIC  (say a = 1 b = 2), and **in order of operator precedence**|
| & | Logical AND: if both are non-zero | a & b --> TRUE |
| &#124; | Logical OR: if either are non-zero | a &#124; b --> TRUE |
| ! | Logical NOT:  invert a true to false, false to true | !(a &#124; b) --> FALSE | 