# Conceptual differences between Python and R

Collection of most important objects and methods in python, particularly with regard to differences to R.
Also a sort of personal cheat sheet for me coming from R.

## Data types

### Strings

In [1]:
s = 'mystring'; s2 = 'anotherString'

In [2]:
s.capitalize()

'Mystring'

In [3]:
s.swapcase()

'MYSTRING'

In [4]:
s.upper()

'MYSTRING'

In [5]:
s.lower()

'mystring'

In [6]:
s.split('st')

['my', 'ring']

In [7]:
len(s)

8

In [8]:
# merge two strings
s + s2

'mystringanotherString'

In [9]:
# repeat string elements and return new string
s * 2

'mystringmystring'

In [10]:
# use of format with position placeholders
'string 1: {0} - string 2: {1}'.format(s, s2)

'string 1: mystring - string 2: anotherString'

In [11]:
# three quotes indicate multi line strings
'''  line 1
  line 2
  line 3
'''

'  line 1\n  line 2\n  line 3\n'

In [12]:
# exercise: make inverse of string with base python
l = [i for i in s]
print(l)

['m', 'y', 's', 't', 'r', 'i', 'n', 'g']


In [13]:
# typical python: objects can be altered without new assignment
# simply by calling a modifiying method like reverse
l.reverse()
print(l)

['g', 'n', 'i', 'r', 't', 's', 'y', 'm']


In [14]:
# paste list of strings together using python
''.join(l)

'gnirtsym'

In [15]:
# we can slice strings using the colon syntax: first char, last char, step
# this code returns every 2nd char from first to sixth
s[0:6:2]

'msr'

### Booleans

Different to R, boolean/logical operators are combined with `and`, `or` and `not`

In [16]:
True and False

False

In [17]:
True or False

True

In [18]:
l1 = [True, True, False]
l2 = [False, True, True]

# unlike R, two lists can NOT be pairwise compared
# use list comprehension to compare objects pairwise
[a and b for a, b in zip(l1, l2)]

[False, True, False]

In [19]:
# zip combines two lists into a single list of pairs
for i in list(zip(l1, l2)):
    print(sum(i))

1
2
1


### Lists and tuples

In R, lists can either refer to a simple one dimensional vector of one specific type (`numeric`, `character`).
Otherwise lists can be more complex nested objects that hold other arbitrary objects. It's a very important object class in R.
A list in python can be similarly nested and also contain different types. However, the concept of several iteratable objects of same type in one single variable is not natural to python.

- Lists can hold different variables
- Lists are ordered and order can be changed
- Lists are indexed and elements can be retrieved by index
- Tuples are like lists but immutable once created

In [20]:
l = list()

In [21]:
l.append('a')
l.append(123)
print(l)

['a', 123]


In [22]:
# A list that get's appended to itself nests itself infinitely deep
l.append(l)
print(l)
print(l[2][2][2])

['a', 123, [...]]
['a', 123, [...]]


In [23]:
# short hand vs explicit creation of a tuple
tuple([1,2]) == (1, 2)

True

In [24]:
# coercing a tuple to a list without and with unpacking
a = (1,2)
b = (3,4)
print([a, b])
print([*a, *b])

[(1, 2), (3, 4)]
[1, 2, 3, 4]


In [25]:
# coerce two same-length lists or tuples to dict by zipping into name-value pairs
dict(zip(a, b))

{1: 3, 2: 4}

In [44]:
# coerce to string, note packed vs unpacked syntax
a

(1, 2)

In [38]:
# unpacking assignment works with all iterables
number1, number2 = a
print(number1)
print(number2)

1
2


### Functions

The workhorse with defined input and output works similar to R.
Some differences:

- functions in python can alter objects even outside their scope while R functions need re-assignment

In [26]:
l = ['a', 'b', 'd', 'c']

def sort_list(l):
    l.sort()
    print('string sorted')

# function modifies the input object even without re-assignment
sort_list(l)
print(l)

string sorted
['a', 'b', 'c', 'd']


In [27]:
# useful function to iterate through a list of defined length
for i in enumerate(l):
    print(i)

(0, 'a')
(1, 'b')
(2, 'c')
(3, 'd')


### Classes

In python, classes are more commonly used and defined than in R.
R has classes too but they are less commonly used because many (existing) objects are simply modified using functions

- classes are defined with the variables that they hold and corresponding methods
- classes should have an `__init__` function that defines how an instance is created
- classed can have more functions that serve as methods for the class
- methods that work with classes can either be generic methods ('magic') with double underscore `__len__`
- or it can be a method specific for this class `length`
- use `dir` function to show what methods are available

In [28]:
class Sequence:
    def __init__(self, seq, name):
        self.seq = seq
        self.name = name
        self.length = len(seq)
    # method that extends seq by arbitrary string and updates length
    def add(self, seq_add = str()):
        self.seq = self.seq + seq_add
        self.length = len(self.seq)

# show last three methods of the object
dir(Sequence)[-3:]

['__subclasshook__', '__weakref__', 'add']

In [29]:
s = Sequence(seq = 'ATCGCT', name = 'someseq')
print(s.name, s.seq, s.length)

someseq ATCGCT 6


In [30]:
s.add('GGCCC')
print(s.name, s.seq, s.length)

someseq ATCGCTGGCCC 11


In [31]:
s.add()
print(s.seq)

ATCGCTGGCCC


In [32]:
# a new class can inherit from an existing class;
# this adds a new function that will rev-com the sequence
class RevSeq(Sequence):
    def rev_com(self):
        new_seq = str()
        for i in self.seq:
            if i == 'A':
                new_seq = new_seq + 'T'
            elif i == 'T':
                new_seq = new_seq + 'A'
            elif i == 'G':
                new_seq = new_seq + 'C'
            elif i == 'C':
                new_seq = new_seq + 'G'
        new_seq = [i for i in new_seq]
        new_seq.reverse()
        self.seq = ''.join(new_seq)

In [33]:
rs = RevSeq(seq = 'AGCT', name = 'someseq')
rs.add('TT')
rs.rev_com()
print(rs.seq)

AAAGCT


## Flow control

As with R, python has classic loops and if conditions to control the flow of program execution.

However there are some differences as explained in the examples below.

- `for` loops are much more common in python than R, where one uses `apply` functions to loop over instances
- `else` can be used with a for statement too; is evaluated if no `break` command stops loop
- `break` statement: `break` leaves current for or while loop
- `continue` statement: `continue` directly enters next iteration of the loop

In [34]:
# the % operator divides x by y and returns
for n in range(2, 10):
    for x in range(2, n):
        if n % x == 0:
            print(n, 'equals', x, '*', n//x)
            break
    else:
        # loop fell through without finding a factor
        print(n, 'is a prime number')

2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3


In [35]:
for num in range(2, 5):
    if num % 2 == 0:
        print("Found an even number", num)
        break
    print("Found an odd number", num)

Found an even number 2


In [36]:
for num in range(2, 5):
    if num % 2 == 0:
        print("Found an even number", num)
        continue
    print("Found an odd number", num)

Found an even number 2
Found an odd number 3
Found an even number 4


In [37]:
# home made Fibonacci series
def fib(n):
    x = 0
    y = 1
    result = list()
    while x+y <= n:
        s = x+y
        result.append(s)
        x = y
        y = s
    return(result)

fib(100)

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]