# Python for R users
# Part 2: Control structures

In this notebook we will explore how R and Python differ in the syntax of control structures like loops or if/then statements. 

First we need to tell Jupyter to let us use R within this Python notebook.

In [2]:
%load_ext rpy2.ipython

from pprint import pprint

## Loops

We generally want to avoid loops whenever possible (as we will see later when we are talking about numerical analysis), but sometimes we can't.  Loops in Python are structurally very similar to those in R, but the syntax differs quite a bit.  Let's say that we want to loop over integers from 1 to 3 and print them out.  In R we would do it as follows:

In [3]:
%%R

for (j in 1:3){
    print(j)
}

[1] 1
[1] 2
[1] 3


Notice that in R, the contents of the loop are demarcated by brackets.

The equivalent loop in Python would look like this:

In [4]:
for j in range(1,4):
    print(j)

1
2
3


There is one thing here that is new for us, and is another fundamental difference between Python and R.  The contents of the loop in Python are denoted by their *indentation*!  The fact that white space makes a difference in Python syntax is probably one of the most controversial aspects of Python coding.  If the spacing doesn't match exactly, then the code will fail. Try running the next cell after removing the comment symbol (#) from the third line:

In [6]:
for j in range(1,4):
    print(j)
#     print(j+1)

IndentationError: unexpected indent (<ipython-input-6-74c8ddc91f8e>, line 3)

You should see an error message telling you that there is an unexpected indentation.  Python expects all of lines with a loop to have exactly the same indentation. This can get a bit tricky if you mix code that uses tabs for indentation and code that uses spaces. In general, indentation by 4 spaces is preferred.



There is another new thing that we see here:  the ```range()``` function.  This function generates a series of numbers within a particular range, similar to the ```seq()``` function in R except that it starts at zero and steps by 1 at a time until it *almost* reaches the specified number.  Here is a simple example:

In [7]:
for j in range(4):
    print(j)

0
1
2
3


Notice that the series is as long as the specified number (i.e. 4 digits), but it stops before it gets to the limit. Just like ```seq()```, you can also specify a step size for the sequence:

In [9]:
for j in range(0, 8, 2):
    print(j)

0
2
4
6


One limitation is that range() only works for integer step sizes.  Later we will encounter a function within the numpy package that can give us more flexible step sizes.  But if we are simply looping through for a specific number of times, we would generally use ```range()```.

The ```range()``` function also exhibits a behavior that you will not have experienced in R.  Let's say you want to create a new variable that contains a sequence of integers from 1 to 5.  In R you could do this using the ```seq()``` command:

In [10]:
%%R

my_var <- seq(1,5)
print(my_var)

[1] 1 2 3 4 5


However, if we try to do this using the ```range()``` command, the result is not what we expect:

In [11]:
my_var = range(5)
print(my_var)

range(0, 5)


You probably expected this command to output a set of values, but instead it prints out what looks like a function.  That's because the ```range()``` function is a special type of Python function known as a *generator*, which is meant to generate a sequence.  You don't need to know how generators work under the hood (though if you do, you can read more [here](https://wiki.python.org/moin/Generators)), but you should be aware that you can't simply use them to define a new variable --- they have to be part of a loop.  

## List comprehensions

One way to easily obtain a new variable from a generator is to use a special Python construction called a *list comprehension*.  Going back to our previous problem of generating a list that ranges from 1 to 5, we could create a for loop to do this:

In [14]:
my_var = []  # create an empty list
for j in range(1, 6):
    my_var.append(j)  # append the value to the list

print(my_var)

[1, 2, 3, 4, 5]


However, this is a lot of code to generate such a simple variable.  A list comprehension allows us to embed this entire loop within a single command:

In [15]:
my_var = [j for j in range(1, 6)]

print(my_var)

[1, 2, 3, 4, 5]


One useful thing that this allows us to do is to transform the numbers being generated by our generator (in this case ```range()```).

Let's say that we wanted to create a series of powers of 2, from 2^0 to 2^5. We could do this easily using a list comprehension, by simply raising the initial value ```i``` to the power of 2:

In [16]:
power_series = [2**j for j in range(0, 6)]
print(power_series)

[1, 2, 4, 8, 16, 32]


## Nested Loops

We can also easily nest loops within one another, using additional indentation for each level of the loop.  For example, let's say we want to loop through for integers from 1 to 9 and create a dictionary that contains a list of that value when raised to powers from zero to five.

In [17]:
power_dict = {}  # create empty dictionary to store our results

for j in range(1, 10):  ## loop through integers 1-9
    power_dict[j] = []  # create an empty list to store the results for this integer
    for k in range(0, 6):
        power_dict[j].append(j**k)
        
pprint(power_dict)  ## pretty print the dict

{1: [1, 1, 1, 1, 1, 1],
 2: [1, 2, 4, 8, 16, 32],
 3: [1, 3, 9, 27, 81, 243],
 4: [1, 4, 16, 64, 256, 1024],
 5: [1, 5, 25, 125, 625, 3125],
 6: [1, 6, 36, 216, 1296, 7776],
 7: [1, 7, 49, 343, 2401, 16807],
 8: [1, 8, 64, 512, 4096, 32768],
 9: [1, 9, 81, 729, 6561, 59049]}


## While loops

While loops in Python are very similar to those in R, except for the surface syntax:

In [19]:
%%R

j <- 1
while (j < 6){
    print(j)
    j <- j + 1
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


In [20]:
j = 1
while j < 6:
    print(j)
    j += 1
    

1
2
3
4
5


Note that we used a special operator, ```+=``` which is shorthand for "add the value on the right side to the existing variable on the left side".

## If/then statement

If/then statements are also fairly similar between R and Python.  Let's say we want to loop through all numbers from 1 to 10 and print whether they are odd or even.  Here is how we would do that in R:

In [21]:
%%R

for (j in 1:10){
    # use the modulus operator to see if the remainder from 2 is zero
    if (!(j %% 2)) {
        print(sprintf('%d: even', j))
    } else {
        print(sprintf('%d: odd', j))
        
    }
}

[1] "1: odd"
[1] "2: even"
[1] "3: odd"
[1] "4: even"
[1] "5: odd"
[1] "6: even"
[1] "7: odd"
[1] "8: even"
[1] "9: odd"
[1] "10: even"


The analogous code in Python looks fairly similar:

In [22]:
for j in range(1, 11):
    if not j % 2:
        print(j, 'even', )
    else:
        print(j, 'odd')

1 odd
2 even
3 odd
4 even
5 odd
6 even
7 odd
8 even
9 odd
10 even


Note a few small syntactic differences. Most importantly, again here the structure of the statement in Python relies upon indentation.  Second, instead of using the ```!``` operator to signify negation in R, in Python we can simply use ```not```.  Also note that the modulus operator is ```%``` in Python versus ```%%``` in R.