# Python for R users
# Part 4: Numerical operations

In this notebook we will explore how R and Python differ in the use of numerical operations.  In particular we will dig into the Numpy package for numerical operations in Python.  

NOTE: A useful cheat sheet is available [here](http://mathesaurus.sourceforge.net/r-numpy.html).


First we need to tell Jupyter to let us use R within this Python notebook.  We also need to import the numpy package. Online you will often see people abbreviate numpy by using ```import numpy as np```.  I am not sure why people do this but I like to use the full name of the package unless it's really long.

In [1]:
import numpy

%load_ext rpy2.ipython


## Basic numerical operations

All of the standard arithmetic operators (+, -, /, \*) and relational operators (==, >, <, >=, <=) are the same between Python and R, with two exceptions: exponentiation and remainders.  Here they are in R:

In [10]:
%%R

print(3^2)
print(8 %% 3)

[1] 9
[1] 2


Here are the equivalents in Python, where you use \*\* instead of ^ and % instead of %%:

In [12]:
print(3**2)
print(8 % 3)

9
2


The numpy package includes all of the standard mathematical functions that one might want to perform on a number.  First let's implement them in R:

In [7]:
%%R

x <- 1.3
print(sin(x))  # sine
print(log(x))  # natural logarithm
print(log10(x))  # base 10 logarithm
print(round(x))  # rounding

[1] 0.9635582
[1] 0.2623643
[1] 0.1139434
[1] 1


In [8]:
x = 1.3
print(numpy.sin(x))  # sine
print(numpy.log(x))  # natural logarithm
print(numpy.log10(x))  # base 10 logarithm
print(numpy.round(x))  # rounding

0.963558185417193
0.26236426446749106
0.11394335230683679
1.0


## Logical operations

Logical operators are slightly different (and arguably much simpler) in Python compared to R. Here they are in R:

In [14]:
%%R

a = TRUE
b = FALSE

print(a && b) # logical AND
print(a || b) # logical OR
print(xor(a, b)) # EXCLUSIVE OR


[1] FALSE
[1] TRUE
[1] TRUE


Here they are in Python:

In [17]:
a = True
b = False

print(a and b)  # logical AND
print(a or b)  # logical OR
print(numpy.logical_xor(a,b))  # EXCLUSIVE OR


False
True
True


## Arrays in Python and R

Let's start by discussing single-dimensional arrays (i.e. vectors).  First let's look at how one would do a number of vector operations in R.


In [45]:
%%R

# create a vector using seq()
a = seq(1,5, 0.5)
print(a)

# get the length of the vector
print(length(a))

# extract first 3 elements of vector
print(a[1:3])

# extract last two elements of vector
print(tail(a, 2))

# add 10 to all elements
b = a + 10
print(b)

# multiply all elements by 5
# we skip using the name "c" as a variable here - why?
d = b * 10
print(d)

# concatenate two vectors
e = c(b, d)
print(e)

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
[1] 9
[1] 1.0 1.5 2.0
[1] 4.5 5.0
[1] 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0
[1] 110 115 120 125 130 135 140 145 150
 [1]  11.0  11.5  12.0  12.5  13.0  13.5  14.0  14.5  15.0 110.0 115.0 120.0
[13] 125.0 130.0 135.0 140.0 145.0 150.0


Here we do the equivalent work in Python:

In [61]:
# create a vector using numpy.arange
a = numpy.arange(1, 5.5, 0.5)
print(a)

%R print(1)

# get the length of the vector
print(a.shape)

# extract first 3 elements of vector
print(a[:3])

# extract last two elements of vector
print(a[-2:])

# add 10 to all elements
b = a + 10
print(b)

# multiply all elements by 5
d = b * 10
print(d)

# concatenate two vectors
# NOTE: the two vectors being concatenated must be provided to the function as a tuple.
e = numpy. concatenate((b, d))
print(e)

[1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]


[1] 1


(9,)
[1.  1.5 2. ]
[4.5 5. ]
[11.  11.5 12.  12.5 13.  13.5 14.  14.5 15. ]
[110. 115. 120. 125. 130. 135. 140. 145. 150.]
[ 11.   11.5  12.   12.5  13.   13.5  14.   14.5  15.  110.  115.  120.
 125.  130.  135.  140.  145.  150. ]


Just as in R, many numpy functions are *vectorized* meaning that if one inputs a vector, they will output a vector with the operation applied independently to each element.

In [47]:
%%R

x = seq(0,pi, pi/10)
print(sin(x))

 [1] 0.000000e+00 3.090170e-01 5.877853e-01 8.090170e-01 9.510565e-01
 [6] 1.000000e+00 9.510565e-01 8.090170e-01 5.877853e-01 3.090170e-01
[11] 1.224647e-16


In [52]:
x = numpy.arange(0, numpy.pi*1.01, numpy.pi/10)
print(numpy.sin(x))

[0.00000000e+00 3.09016994e-01 5.87785252e-01 8.09016994e-01
 9.51056516e-01 1.00000000e+00 9.51056516e-01 8.09016994e-01
 5.87785252e-01 3.09016994e-01 1.22464680e-16]


## Generating random numbers

One common thing that we need to do is to generate random numbers.  This works similarly in R and Python.  For the purpose of reproducibility it's usually good practice to set the random seed to a specific value at the begnning of the code, so that the code will always provide the same results. 

In [58]:
%%R

set.seed(123456)  # fix the random seed

# generate 5 uniform random variates
print(runif(5))

# generate 4 standard normal variates
print(rnorm(4))

[1] 0.7977843 0.7535651 0.3912557 0.3415567 0.3612941
[1] -0.8475485 -1.3016020 -0.9638145  0.2373156


In [60]:
numpy.random.seed(123456)  # fix the random seed

# generate 5 uniform random variates
print(numpy.random.rand(5))

# generate 4 standard normal variates
print(numpy.random.randn(4))

[0.12696983 0.96671784 0.26047601 0.89723652 0.37674972]
[-0.58986305 -1.98683112 -2.17314697  0.73630915]


## Working with multidimensional arrays

We very often want to work with multidimensional data; the term *matrix* is often used to refer specifically to a 2-d array, though the array functions in both R and numpy can handle higher-dimension arrays as well. We will focus on 2-d arrays for now, though higher dimensional arrays become important when dealing with volumetric images.

In R, one can use the ```array()``` function to generate a 2-d array.  There is also a function called ```matrix()```, which is just another way to generate the same thing.

In [91]:
%%R

# create an array from two vectors
vector1 = c(1, 2, 3, 4)
vector2 = c(5, 6, 7, 8)
vector3 = c(9, 10, 11, 12)
a = array(c(vector1, vector2, vector3), dim = c(4, 3))
print(a)
dim(a)

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12
[1] 4 3


The situation for Numpy is a bit more confusing.  Numpy has separate object types for *arrays* and *matrices*, and they are not equivalent.  Because one can usually do anything with the array class that is possible with the matrix class, we will focus on the array class in this document. However, occasionally you might encounter a library that requires Numpy matrices, and it's important to realize that they are a not interchangeable as they are in R.

Let's generate a 2-d array in Python:

In [92]:
# create an array from two vectors
vector1 = [1, 2, 3, 4]
vector2 = [5, 6, 7, 8]
vector3 = [9, 10, 11, 12]
a = numpy.array([vector1, vector2, vector3])
print(a)

print(a.shape)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
(3, 4)


Numpy creates the array with the vectors in rows, whereas R created it with the vectors in columns.
In order to match what was done in R, we need to transpose the array, using the ```.T``` operator.


In [93]:
a = a.T
print(a)
print(a.shape)

[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]
(4, 3)


### Indexing arrays

Indexing works slightly differently between R and Python.  We have already encountered one way in which they differ: R indexes starting at 1, whereas Python indexes starting at zero.  There are a few other differences, as we will see below.

In [99]:
%%R 

# extract first 3 rows
print(a[1:3 ,])

# extract first two columns
print(a[, 1:2])

# extract last two rows
print(a[3:4, ])

# transpose the array
print(t(a))

# flip the array up-down
print(a[4:1, ])

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8
     [,1] [,2] [,3]
[1,]    3    7   11
[2,]    4    8   12
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
     [,1] [,2] [,3]
[1,]    4    8   12
[2,]    3    7   11
[3,]    2    6   10
[4,]    1    5    9


Now we will do the same in Python. There are several differences to notice.  The first is that we can't leave the index for an axis blank like we can in R --- instead, if we want to use all elements along a particular dimension, we need to use the wild-card symbol, ```:```. A second difference is that we can use negative indices to move backward from the end.  Doing this in R would require computing the length of the array along the dimension of interest, and then subtracting off the number of desired steps. 

Indexing also works a bit differently in Python versus R (as discussed in detail [here](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)).  If we put ```1:3``` as a sequence in R, the resulting values are 1, 2, and 3.  If we put the same in Python, we only get the numbers 1 and 2, because Python treats the second number as an exclusive limit rather than an inclusive limit.  This is useful in the context of Python's zero-based indexing, because if we want the first three elements (which have indices 0, 1, and 2), then we can use ```:3``` and we will get the correct answer.

You can think about the syntax for slicing an array as follows. we specify up to three numbers as ```i:j:k``` which specify the starting index (i), the stopping index (j), and the step size (k).  The default value for i is 0, for j is the length of the dimension, and for k is 1.  


In [103]:
# extract first 3 rows
print(a[:3, :])

# extract first two columns
print(a[:, :2])

# extract last two rows
print(a[-2:, :])

# transpose the array
print(a.T)

# flip the array up-down
# setting k to -1 goes backwards from j to i
print(a[::-1, :])

# there is also a built-in numpy function to flip an array up-down:
print(numpy.flipud(a))

[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]
[[1 5]
 [2 6]
 [3 7]
 [4 8]]
[[ 3  7 11]
 [ 4  8 12]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[[ 4  8 12]
 [ 3  7 11]
 [ 2  6 10]
 [ 1  5  9]]
[[ 4  8 12]
 [ 3  7 11]
 [ 2  6 10]
 [ 1  5  9]]


One handy feature of Numpy arrays is that they have a bunch of built-in functions that you can apply to them.  We can see all of those using the ```dir()``` function:

In [104]:
dir(a)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__

Here are just a few examples:

In [108]:
# take the mean of the entire array
print(a.mean())

# take the mean along the first axis
print(a.mean(axis=0))

# take the sum of the entire array
print(a.sum())

6.5
[ 2.5  6.5 10.5]
78
