# Python for R Users
# Part 1: Variables

In this notebook we will explore how variables are treated in Python in comparison to R.

*NOTE*:  We are going to assume throughout this tutorial that you are using Python 3.  

Resources:
- http://www.data-analysis-in-python.org/python_for_r.html


## Using Jupyter

Jupyter is a system for interactive computing, that is similar in spirit to R Notebooks that you may have used with RStudio.

In a Jupyter notebook, the code and text (written in Markdown, just like an RMarkdown file) are placed in separate *cells*.  
A handy feature of Jupyter is that we can use both R and Python within the same notebook. Since this is a native Python notebook, it will assume that any code is written in Python --- if we want to use R then we need to tell it explictly to use R.

The first thing we need to do is to tell Jupyter to load the functions it needs to run R code alongside Python.  Jupyter has a number of special commands, which start with a percent sign.

In [44]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


First, let's print "Hello, world!" using Python.  To execute the code in the following cell, you can hit Shift-Enter.

In [6]:
print('Hello, world')

Hello, world


You should see the message printed to the screen.  Now let's do the same with R. In order to call an R function, we need to use the magic command %%R:

In [8]:
%%R

cat('Hello, world')

Hello, world

## Variables and variable assignment

R allows two different ways to assign values to variabes: ```<-``` and ```=```

In [11]:
%%R

a <- 1
print(a)

b = 2
print(b)

[1] 1
[1] 2


In Python, there is only one way to assign a value to a variable: **=**

In [12]:
a = 1
print(a)

1


There is also an important difference under the hood in how Python treats variables compared to R.  First let's see it in action, and then explain it.  

In [30]:
%%R

a <- c(1, 2, 3)  # create a variable

b <- a  # assign it to a new variable

b[4] = 4  # add an entry to the new variable

print(a)

[1] 1 2 3


In [31]:
a = [1, 2, 3]   # create a variable
b = a  # assign it to a new variable
b.append(4) # add an entry to the new variable

print(a)

[1, 2, 3, 4]


What is happening here is that when you use ```b <- a``` in R, it automatically makes a new copy of the variable, so that any changes in the new variable don't affect the original one.  

R's behavior is different from most general-purpose programming languages, in which a variable is treated as a pointer to a place in memory.  Thus, when we say ```b = a``` in Python, these two variables become pointers to the same place in memory. Thus, anything that we do to one will affect the other as well.  There are certain types of variables in Python where this doesn't happen, which are called *immutable* variables --- these include character strings, single numbers, and a type of variable called a *tuple* that you will learn about later.

If we want Python to create a copy of a variable (so that changes to the new variable will not affect changes to the old one), we need to use the ```copy()``` operator:

In [33]:
a = [1, 2, 3]
b = a.copy()
b.append(4)
print('a:', a)
print('b:', b)

a: [1, 2, 3]
b: [1, 2, 3, 4]


The operator notation may be a bit confusing - what's with the dot?  It turns out that every variable in Python has various functions or other variables associated with it, and those are accessible using the dot.  This means that unlike R, the dot is special, and you can't create regular variable names with a dot in them.  We can see all of the functions and variables associated with a particular object using the ```dir()``` command.

In [36]:
dir(a)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

You can see that this variable, which is a type of variable called a *list* in Python, has a bunch of different operators associated with it.  If you want to know what one of them does, you can place a question mark in front of it to get its help text:

In [38]:
?a.count

[0;31mDocstring:[0m L.count(value) -> integer -- return number of occurrences of value
[0;31mType:[0m      builtin_function_or_method


This tells us that ```a.count()``` should output the number of occurrences of a particular value, which we have to enter as an argument to the function. Let's see if it works - how many times does the number 1 occur in the list:


In [43]:
print(a.count(1))

1


## Types of variables

From R you will be familiar with a number of different types of variables, such as *integer*, *double*, and *character*, as well as more complex types of variables such as *vectors* or *lists*.  Python has its own set of variable types, some of which overlap with R's but others of which are very different.

In [46]:
type(1.)

float

### Lists in Python

The concept of a *list* in Python is different from the concept of a *list* in R.  In Python, a list is more like a vector that you would generate using the ```c()``` function in R.  Let's see this in action.

In [48]:
my_list = [1, 2, 3, 4]
print(my_list)

[1, 2, 3, 4]


In [51]:
%%R

my_vector <- c(1, 2, 3, 4)
print(my_vector)

[1] 1 2 3 4


Here we run into what is perhaps the most annoying difference between R and Python.  In R, we would access the vector members using their position numbers, starting with 1:

In [52]:
%%R

print(my_vector[1])

[1] 1


However, in Python we index the entries starting with zero:

In [53]:
print(my_list[0])

1


This might seem weird, but it is actually the way that most general-purpose programming languages perform indexing, and there are good theorietical reasons for it.  For example, see discussion here: https://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html

Another difference between Python and R has to do with their treatment of vectors with different types of objects. R doesn't allow this, and if you try to create one it will convert the elements to a single type:

In [57]:
%%R

foo <- c(1, 'b')
print(foo)


[1] "1" "b"


In this case, R converts the elements to a type that can accomomodate all of them, which in this case is a character string.  Python, on the other hand, has no problem with lists containing different types of objects:

In [58]:
foo = [1, 'b']
print(foo)

[1, 'b']


### 