# R & Python Comparison Cheat Sheet

Contents:
- [Intro](#Intro)
    + [Basics](#Basics)
       * Arithmetic Operators
       * Assignment & Variables
       * Basic Data Types
    + [Vectors/Arrays](#Vectors/Arrays)
       * Selecting Items/elements from a Vector
    + [Matrix](#Matrix)
       * Selecting Elements from a Matrix
       * Arithmetic
    + [Factors](#Factors)
    + [Data Frames](#Data-Frames)
    + [Lists](#Lists)
- [Essential](#Essential)
    + [Relational and Logical Operators](#Relational-and-Logical-Operators)
    + [Loops](#Loops)
    + [Functions](#Functions)

To learn how to use **R** and __Python__ in _one_ Jupyter Notebook, see [R & Python in one Notebook](https://github.com/kaymal/R-and-Python/blob/master/R%20%26%20Python%20in%20One%20Notebook.ipynb).

In [6]:
# R - Using R in Jupyter Notebook
# Load in the R Magic
%load_ext rpy2.ipython

In [7]:
# Import necessary modules to print multiple outputs from a single cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

## Intro

### Basics

#### Arithmetic Operators

In [8]:
# Python - Arithmetic operators
print(5 + 3)
print(5 % 3) # Modulo
print(5 ** 3) # Exponentiation

8
2
125


In [9]:
# Python - Arithmetic operators
5 % 3 # prints out the result without the .print() function in Jupyter notebook

2

In [10]:
%%R
# R - Arithmetic operators
print(5 + 3) # print function is not required in R!
print(5 %% 3) # Modulo
print(5 ^ 3) # Exponentiation

[1] 8
[1] 2
[1] 125


In [11]:
%%R
# R - Modulo operator
5 %% 3

[1] 2


#### Assignment & Variables

In [12]:
# Python - Assignment and variables
x = 5 + 3
y = 'y'
z = "z"
print(x, y, z)

8 y z


In [13]:
%%R
# R - Assignment (with '<-') and variables
x <- 5 + 3

# R - Assignment (with '=') and variables
y = 5 + 3
c(x, y)

[1] 8 8


#### Basic Data Types

In [14]:
# Python - Basic data types
type(3), type(3.5), type('a'), type(True)

(int, float, str, bool)

In [15]:
%%R
# R - Basic data types
c(class(3), class(3.5), class('a'), class(TRUE))

[1] "numeric"   "numeric"   "character" "logical"  


In [16]:
# Python - list and dictionary
type([1, 'a']), type({'a': 1, 'b':2})

(list, dict)

In [17]:
%%R
# R - list
class(list(1, 'a'))

[1] "list"


In [18]:
%%R
# R - matrix
print(matrix(1:4, nrow=2))
print(class(matrix(1:4, nrow=1)))

     [,1] [,2]
[1,]    1    3
[2,]    2    4
[1] "matrix"


### Vectors/Arrays

Vectors are one dimensional arrays. The elements in a Vector/Array all have the same data type.

In [19]:
# import numpy for vector/matrix operations in Python
import numpy as np

Note: NumPy is a package in Python for scientific computing.

In [20]:
# Python - Numpy arrays
t_array = np.array([1, 9, 8, 4])
s_array = np.array([1, 9, 8, 4 + 3])
s_array + t_array

array([ 2, 18, 16, 11])

In [21]:
%%R
# R - Vectors

# Create a vector with 'combine (c)' function
t_vector = c(1, 9, 8, 4)
s_vector = c(1, 9, 8, 4 + 3)
s_vector + t_vector

[1]  2 18 16 11


In [22]:
#Python - sum and mean with array
sum(s_array), np.sum(s_array), np.mean(s_array)

(25, 25, 6.25)

In [23]:
%%R
# R - sum and mean with vectors
c(sum(s_vector), mean(s_vector))

[1] 25.00  6.25


In [24]:
%%R
# Assign names to items in a vector (only for R)
names <- c("a", "b", "c", "d")
names(s_vector) <- names
s_vector

a b c d 
1 9 8 7 


#### Selecting items/elements from a vector

**Important**: The index of the first item in Python arrays is '0', whereas the index of the first item in R is '1'.

In [25]:
# Python - Selection
t_array[0], t_array[[0,1]]

(1, array([1, 9]))

In [26]:
%%R
# R - Selection
t_vector[1]

[1] 1


In [27]:
%%R
# Subsetting by index
print(s_vector[1])
# Subsetting by name
print(s_vector['a'])

a 
1 
a 
1 


In [28]:
# Python - slicing
t_array[0:3] # prints only three elements.

array([1, 9, 8])

In [29]:
%%R
# R - slicing
t_vector[1:3]

[1] 1 9 8


In [30]:
# Python - selecting by comparison
t_array[t_array > 5]

array([9, 8])

In [31]:
%%R
# R - Selecting by comparison
t_vector[t_vector > 5]

[1] 9 8


### Matrix

Matrices are two dimensional arrays. The elements of a Matrix have the same data type such as numeric, character or logical/boolean.

_Note: Python doesn't have a built-in type for matrices. Multi-dimensional arrays can be used as matrices._

In [32]:
# Python - Creating a matrix using np.arange
py_matrix = np.arange(1, 10).reshape(3, 3)
py_matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [33]:
# Python - Creating a matrix using np.matrix
np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

matrix([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [34]:
# Python - Calculating the row sums
np.sum(py_matrix, axis = 1) # axis -> axis along which we want to sum the values

array([ 6, 15, 24])

In [35]:
# Python - Adding a new column
np.insert(py_matrix, 3, np.sum(py_matrix, axis = 1), axis=1)

array([[ 1,  2,  3,  6],
       [ 4,  5,  6, 15],
       [ 7,  8,  9, 24]])

In [36]:
%%R
# Constructing a matrix with the matrix() function
r_matrix <- matrix(1:9, byrow = TRUE, nrow = 3)
r_matrix

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9


In [37]:
%%R
# R - Naming a matrix
rownames(r_matrix) <- c("r1", "r2", "r3")
colnames(r_matrix) <- c("c1", "c2", "c3")
r_matrix

   c1 c2 c3
r1  1  2  3
r2  4  5  6
r3  7  8  9


In [38]:
%%R
# R - Calculating the row sums
rowSums(r_matrix)

r1 r2 r3 
 6 15 24 


In [39]:
%%R
# R - Adding a new column
cbind(r_matrix, rowSums(r_matrix))

   c1 c2 c3   
r1  1  2  3  6
r2  4  5  6 15
r3  7  8  9 24


#### Selecting Elements from a Matrix

In [40]:
# Python - Selecting Elements from a Matrix
py_matrix[0,0]

1

In [41]:
# Python - Selecting Elements from a Matrix
py_matrix[:, 0:2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [42]:
%%R
# R - Selecting Elements from a Matrix
r_matrix[1, 1]

[1] 1


In [43]:
%%R
# R - Selecting Elements from a Matrix
r_matrix[, 1:2] # We do not need to use ':' as we'd do in our Python code

   c1 c2
r1  1  2
r2  4  5
r3  7  8


#### Arithmetic

In [44]:
# Python - Matrix arithmetic
py_matrix * 2

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])

In [45]:
# Python - Matrix arithmetic
py_matrix + py_matrix

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])

In [46]:
%%R
# R - Matrix arithmetic
r_matrix * 2

   c1 c2 c3
r1  2  4  6
r2  8 10 12
r3 14 16 18


In [47]:
%%R
# R - Matrix arithmetic
r_matrix + r_matrix

   c1 c2 c3
r1  2  4  6
r2  8 10 12
r3 14 16 18


### Factors

In R, categorical data is stored in factors. In python, unlike R, we cannot represent categorical data as factors. Yet, we can specify a type as 'categorical'.

In [48]:
import pandas as pd

In [49]:
# Python - Categories
size_cat_obj = pd.Series(['small', 'medium', 'large', 'large'])
size_cat_obj

# Category with `dtype`
size_cat_cat = pd.Series(['small', 'medium', 'large', 'large'], dtype='category')
size_cat_cat

0     small
1    medium
2     large
3     large
dtype: object

0     small
1    medium
2     large
3     large
dtype: category
Categories (3, object): [large, medium, small]

In [50]:
# Python - Category with `astype`
size_cat_obj.astype('category')

0     small
1    medium
2     large
3     large
dtype: category
Categories (3, object): [large, medium, small]

In [51]:
%%R
# R - Creating factors
size_vector <- c("small", "medium", "large", "large")
size_fac <- factor(size_vector)
size_fac

[1] small  medium large  large 
Levels: large medium small


In [52]:
%%R
# R - Creating factors (ordered)
size_vector <- c("medium", "small", "large", "large")
size_fac <- factor(size_vector, ordered = TRUE, levels = c("small", "medium", "large"))
size_fac

[1] medium small  large  large 
Levels: small < medium < large


In [53]:
%%R
# R - Change the level names
levels(size_fac) <- c("size1", "size2", "size3")
size_fac

[1] size2 size1 size3 size3
Levels: size1 < size2 < size3


In [54]:
%%R
# R - Summary
summary(size_fac)

size1 size2 size3 
    1     1     2 


In [55]:
%%R
# R - Comparing ordered factors
size_fac[2] > size_fac[1]

[1] FALSE


### Data Frames

Data frames are two dimensional **objects**. The elements of a column must have the same data type. Different columns may hold different data types.

In [56]:
# Python - Data Frames

df_P = pd.read_csv('datasets/iris.csv')

# Print the first 5 observations (rows)
df_P.head()

Unnamed: 0,sepal.length,sepal.width,petal.length,petal.width,variety
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


In [57]:
%%R
# R - Data Frames
df_R = read.csv('datasets/iris.csv')

# Print the first 6 observations (rows)
head(df_R)

  sepal.length sepal.width petal.length petal.width variety
1          5.1         3.5          1.4         0.2  Setosa
2          4.9         3.0          1.4         0.2  Setosa
3          4.7         3.2          1.3         0.2  Setosa
4          4.6         3.1          1.5         0.2  Setosa
5          5.0         3.6          1.4         0.2  Setosa
6          5.4         3.9          1.7         0.4  Setosa


In [58]:
# Python - Investigate DataFrames

# Get the basic info of the df
df_P.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal.length    150 non-null float64
sepal.width     150 non-null float64
petal.length    150 non-null float64
petal.width     150 non-null float64
variety         150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


In [59]:
%%R
# R - Investigate Data Frames

# Investigate the structure of df
str(df_R)

'data.frame':	150 obs. of  5 variables:
 $ sepal.length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ sepal.width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ petal.length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ petal.width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ variety     : Factor w/ 3 levels "Setosa","Versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...


In [60]:
# Python - Create a DataFrame

d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data=d)
df

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6


In [61]:
%%R
# R - Create Data Frames with `data.frame()` function

col1 <- c(1, 2, 3)
col2 <- c(4, 5, 6)
df = data.frame(col1, col2)
df

  col1 col2
1    1    4
2    2    5
3    3    6


In [62]:
# Python - Selection from a DataFrame

df.iloc[0, 1]

# Print the second row
df.iloc[1, :]

# Print the first column
df['col1']

4

col1    2
col2    5
Name: 1, dtype: int64

0    1
1    2
2    3
Name: col1, dtype: int64

In [63]:
%%R
# R - Selection from a Data Frame
print(df[1, 2])

# Print the second row
print(df[2, ])

# Print the first column
print(df["col1"])

[1] 4
  col1 col2
2    2    5
  col1
1    1
2    2
3    3


In [64]:
# Python - Selecting a column (shortcut)
df.col1
# df['col1']

0    1
1    2
2    3
Name: col1, dtype: int64

In [65]:
%%R
# R - Selecting a column (shortcut)
df$col1

[1] 1 2 3


In [66]:
# Python - Subsetting
df[df.col1 > 1]

Unnamed: 0,col1,col2
1,2,5
2,3,6


In [67]:
%%R
# R - Subsetting

subset(df, col1 > 1) # df[col1>1, ]

  col1 col2
2    2    5
3    3    6


### Lists

Lists can hold different data types.

In [68]:
# Python - Creating a list
my_Plist = [t_array, py_matrix, df_P.head()]
my_Plist

[array([1, 9, 8, 4]), array([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]),    sepal.length  sepal.width  petal.length  petal.width variety
 0           5.1          3.5           1.4          0.2  Setosa
 1           4.9          3.0           1.4          0.2  Setosa
 2           4.7          3.2           1.3          0.2  Setosa
 3           4.6          3.1           1.5          0.2  Setosa
 4           5.0          3.6           1.4          0.2  Setosa]

In [69]:
%%R
# R - Creating a list
my_Rlist <- list(t_vector, r_matrix, head(df_R))
my_Rlist

[[1]]
[1] 1 9 8 4

[[2]]
   c1 c2 c3
r1  1  2  3
r2  4  5  6
r3  7  8  9

[[3]]
  sepal.length sepal.width petal.length petal.width variety
1          5.1         3.5          1.4         0.2  Setosa
2          4.9         3.0          1.4         0.2  Setosa
3          4.7         3.2          1.3         0.2  Setosa
4          4.6         3.1          1.5         0.2  Setosa
5          5.0         3.6          1.4         0.2  Setosa
6          5.4         3.9          1.7         0.4  Setosa



In [70]:
%%R
# R - Creating a list with names (Similar to a Python Dictionary)
my_Rlist <- list(vec = t_vector, mat = r_matrix, df = df_R)
my_Rlist$vec # my_Rlist[["vec"]]

[1] 1 9 8 4


In [71]:
# Python - Selecting from a list
my_Plist[0][0]

1

In [72]:
%%R
# R - Selecting from a list
my_Rlist$vec[1] # my_Rlist[["vec"]][1]

[1] 1


## Essential

### Relational and Logical Operators

#### Precedence Comparison between Python and R

Precedence (order of evaluation) table for Python: (highest to lowest)
---
![Python Precendence](img/precedence_Python.png)
[source](https://www.thomas-cokelaer.info/tutorials/python/boolean.html)

Precendence table for R: (highest to lowest)
---
![R Precendence](img/precedence_R.png)
[source](https://www.datamentor.io/r-programming/precedence-associativity/)

#### Relational Operators

In [73]:
# Python Relational Operators
1 < 2, 1 > 2, 1 == 2, "a" < "b"

(True, False, False, True)

In [74]:
%%R
# R Relational Operators
c(1 < 2, 1 > 2, 1 == 2, "a" < "b")

[1]  TRUE FALSE FALSE  TRUE


In Python, comparison operators can be chained (left-to-right chaining).

In [79]:
# Python - Chaining Comparison Operators
x = 2
1 < x < 3

True

In R, comparison operators CANNOT be chained.

In [80]:
%%R
# R Comparison Operators
x <- 2
1 < x < 3


Error in (function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"),  : 
  <text>:3:7: beklenmeyen durum, '<'
2: x <- 2
3: 1 < x <
         ^


  <text>:3:7: beklenmeyen durum, '<'
2: x <- 2
3: 1 < x <
         ^



We can use _Relational Operators_ to Vectors, Matrices and Data Frames as well.

In [94]:
# Python - Comparison Operators with Vectors
2 > np.array([1, 2, 3])

array([ True, False, False])

In [95]:
%%R
# R - Comparison Operators with Vectors
2 > c(1, 2, 3)

[1]  TRUE FALSE FALSE


In [97]:
# Python - Comparison Operators with DataFrames
df
2 > df

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6


Unnamed: 0,col1,col2
0,True,False
1,False,False
2,False,False


In [100]:
%%R
# R - Comparison Operators with Data Frames
print(df)
2 > df

  col1 col2
1    1    4
2    2    5
3    3    6
      col1  col2
[1,]  TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE


#### Logical Operators

##### OR `|`

In Python, _bitwise operators (OR, XOR, AND)_ have higher precedence than _comparison operators_! Therefore, we may want to use parentheses in many cases.

In [88]:
# Python Logical Operators
1 > 2 | 1 < 2  # This is interpreted as "1 > (2 | 1) < 2"
(1 > 2) | (1 < 2)

# Check to see what '2 | 1' amounts to in Python
2 | 1

False

True

3

In [86]:
%%R
# R Logical Operators
c(
   (1 > 2) | (1 < 2)
  , 1 > 2 | 1 < 2
 )
# R interprets both the same, since | has higher precedence than comparison operators

[1] TRUE TRUE


##### AND `&`

In [90]:
# Python Logical Operators
(1 > 2) & (1 < 2) 
# False & True

False

In [91]:
%%R
# R Logical Operators
1 > 2 & 1 < 2
# FALSE & TRUE

[1] FALSE


#### Conditional Statements

### Loops

### Functions