# Conceptual differences between Python and R

Collection of most important objects and methods in python, particularly with regard to differences to R.
Also a sort of personal cheat sheet for me coming from R.

## Data types

### Strings

In [1]:
s = 'mystring'; s2 = 'anotherString'

In [2]:
s.capitalize()

'Mystring'

In [3]:
s.swapcase()

'MYSTRING'

In [4]:
s.upper()

'MYSTRING'

In [5]:
s.lower()

'mystring'

In [6]:
s.split('st')

['my', 'ring']

In [7]:
len(s)

8

In [8]:
# merge two strings
s + s2

'mystringanotherString'

In [9]:
# repeat string elements and return new string
s * 2

'mystringmystring'

In [10]:
# use of format with position placeholders
'string 1: {0} - string 2: {1}'.format(s, s2)

'string 1: mystring - string 2: anotherString'

In [11]:
# three quotes indicate multi line strings
'''  line 1
  line 2
  line 3
'''

'  line 1\n  line 2\n  line 3\n'

In [12]:
# exercise: make inverse of string with base python
l = [i for i in s]
print(l)

['m', 'y', 's', 't', 'r', 'i', 'n', 'g']


In [13]:
# typical python: objects can be altered without new assignment
# simply by calling a modifiying method like reverse
l.reverse()
print(l)

['g', 'n', 'i', 'r', 't', 's', 'y', 'm']


In [14]:
# paste list of strings together using python
''.join(l)

'gnirtsym'

In [15]:
# we can slice strings using the colon syntax: first char, last char, step
# this code returns every 2nd char from first to sixth
s[0:6:2]

'msr'

### Booleans

Different to R, boolean/logical operators are combined with `and`, `or` and `not`

In [16]:
True and False

False

In [17]:
True or False

True

In [18]:
l1 = [True, True, False]
l2 = [False, True, True]

# unlike R, two lists can NOT be pairwise compared
# use list comprehension to compare objects pairwise
[a and b for a, b in zip(l1, l2)]

[False, True, False]

In [19]:
# zip combines two lists into a single list of pairs
for i in list(zip(l1, l2)):
    print(sum(i))

1
2
1


### Lists and tuples

In R, lists can either refer to a simple one dimensional vector of one specific type (`numeric`, `character`).
Otherwise lists can be more complex nested objects that hold other arbitrary objects. It's a very important object class in R.
A list in python can be similarly nested and also contain different types. However, the concept of several iteratable objects of same type in one single variable is not natural to python.

- Lists can hold different variables
- Lists are ordered and order can be changed
- Lists are indexed and elements can be retrieved by index
- Tuples are like lists but immutable once created

In [20]:
l = list()

In [21]:
# add items to list
l.append('a')
l.append(123)
print(l)

['a', 123]


In [22]:
# A list that get's appended to itself nests itself infinitely deep
l.append(l)
print(l)
print(l[2][2][2])

['a', 123, [...]]
['a', 123, [...]]


In [23]:
# remove items from list: pop cuts out (retrieves and deletes) and del/rem delete only
l = ['a', 'b', 'c', 'd']
l.pop(2)
l.remove('a')
print(l)

['b', 'd']


In [24]:
# other useful functions: find indices of matching elements
l.index('d')

1

In [25]:
# count elements in list matching term
l.count('c')

0

In [26]:
# short hand vs explicit creation of a tuple
tuple([1,2]) == (1, 2)

True

In [27]:
# coercing a tuple to a list without and with unpacking
a = (1,2)
b = (3,4)
print([a, b])
print([*a, *b])

[(1, 2), (3, 4)]
[1, 2, 3, 4]


In [28]:
# coerce two same-length lists or tuples to dict by zipping into name-value pairs
dict(zip(a, b))

{1: 3, 2: 4}

In [29]:
# coerce to string, note list of string vs strin of list
print(str(a))
[str(i) for i in a]

(1, 2)


['1', '2']

In [30]:
# unpacking assignment works with all iterables
number1, number2 = a
print(number1)
print(number2)

1
2


### Dicts

Dictionaries or `dict`s in python store data in key value pairs. Some characteristics of dicts are:

- are mutable, but unordered and unindexed
- created using curved brackets
- can hold different data types

In [31]:
d = {'a': 1, 'b': 2, 1: 'cde'}

In [32]:
d['a']

1

In [33]:
# add and remove values
d['new'] = 5
del(d['b'])
print(d)

{'a': 1, 1: 'cde', 'new': 5}


In [34]:
# is it possible to have different values with same key? No.
{'a': 1, 'a': 3}

{'a': 3}

### Functions

The workhorse with defined input and output works similar to R.
Some differences:

- functions in python can alter objects even outside their scope while R functions need re-assignment

In [35]:
l = ['a', 'b', 'd', 'c']

def sort_list(l):
    l.sort()
    print('string sorted')

# function modifies the input object even without re-assignment
sort_list(l)
print(l)

string sorted
['a', 'b', 'c', 'd']


In [36]:
# useful function to iterate through a list of defined length
for i in enumerate(l):
    print(i)

(0, 'a')
(1, 'b')
(2, 'c')
(3, 'd')


In [37]:
# pass function arguments as dict instead of single values
def repeat_str(string, times):
    return(string*times)

args = {'string': 'Test', 'times': 5}
repeat_str(**args)

'TestTestTestTestTest'

In [38]:
# function with variable unnamed arguments
def var_input(*args):
    for i in args:
        print(i)

inp = ['a', 'b']
var_input(*inp)

a
b


In [39]:
# function with variable keyworded arguments
def var_input(**kwargs):
    for i, j in kwargs.items():
        print(i, j)

args = {'a': 1, 'b': 2}
var_input(**args)

a 1
b 2


### List comprehensions

Not exactly its own type of function but nevertheless a unique and extremely useful tool in python that has no counterpart in R. In R, all lists/vectors are treated as iterables automatically and functions can be applied directly, i.e. without loops.

In [40]:
# list comprehensions
# -------------------
a = [1,3,5]
b = [2,4,5,7]
# to compare lists - works also for different length where the shortest length defines the output
[i == j for i, j in zip(a, b)]

[False, False, True]

In [41]:
# to match the elements of list a to all elements of list b
[i in b for i in a]

[False, False, True]

In [42]:
# and vice versa
[i in a for i in b]

[False, False, True, False]

In [43]:
# list comprehension with if condition
[i for i in b if i%2 != 0]

[5, 7]

In [44]:
# nested for loop list comprehension
[(i,j) for i in a for j in b]

[(1, 2),
 (1, 4),
 (1, 5),
 (1, 7),
 (3, 2),
 (3, 4),
 (3, 5),
 (3, 7),
 (5, 2),
 (5, 4),
 (5, 5),
 (5, 7)]

In [45]:
# flatten a nested list: in R, simply done using unlist()
l = [[1,2,3],[4,5,6],[7,8,9]]
[j for i in l for j in i]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

### Lambda functions

- Lambda functions are anonymous functions in Python, that means they are constructed on the fly rather than defined with a name
- Lambdas are most useful in combination with the `map` function that applies an anonymous function to an iterable object
- the counterpart in R are anonymous functions used in apply constructs like `lapply(my_list, function(x){x^2})`

In [46]:
# lambda with one input list
list(map(lambda x : x**2, a))

[1, 9, 25]

In [47]:
# lambda with multiple input lists (shorter list defines output)
list(map(lambda x, y : x*y, a, b))

[2, 12, 25]

### Classes

In python, classes are more commonly used and defined than in R.
R has classes too but they are less commonly used because many (existing) objects are simply modified using functions

- classes are defined with the variables that they hold and corresponding methods
- classes should have an `__init__` function that defines how an instance is created
- classed can have more functions that serve as methods for the class
- methods that work with classes can either be generic methods ('magic') with double underscore `__len__`
- or it can be a method specific for this class `length`
- use `dir` function to show what methods are available

In [48]:
class Sequence:
    def __init__(self, seq, name):
        self.seq = seq
        self.name = name
        self.length = len(seq)
    # method that extends seq by arbitrary string and updates length
    def add(self, seq_add = str()):
        self.seq = self.seq + seq_add
        self.length = len(self.seq)

# show last three methods of the object
dir(Sequence)[-3:]

['__subclasshook__', '__weakref__', 'add']

In [49]:
s = Sequence(seq = 'ATCGCT', name = 'someseq')
print(s.name, s.seq, s.length)

someseq ATCGCT 6


In [50]:
s.add('GGCCC')
print(s.name, s.seq, s.length)

someseq ATCGCTGGCCC 11


In [51]:
s.add()
print(s.seq)

ATCGCTGGCCC


In [52]:
# a new class can inherit from an existing class;
# this adds a new function that will rev-com the sequence
class RevSeq(Sequence):
    def rev_com(self):
        new_seq = str()
        for i in self.seq:
            if i == 'A':
                new_seq = new_seq + 'T'
            elif i == 'T':
                new_seq = new_seq + 'A'
            elif i == 'G':
                new_seq = new_seq + 'C'
            elif i == 'C':
                new_seq = new_seq + 'G'
        new_seq = [i for i in new_seq]
        new_seq.reverse()
        self.seq = ''.join(new_seq)

In [53]:
rs = RevSeq(seq = 'AGCT', name = 'someseq')
rs.add('TT')
rs.rev_com()
print(rs.seq)

AAAGCT


## Flow control

As with R, python has classic loops and if conditions to control the flow of program execution.

However there are some differences as explained in the examples below.

- `for` loops are much more common in python than R, where one uses `apply` functions to loop over instances
- `else` can be used with a for statement too; is evaluated if no `break` command stops loop
- `break` statement: `break` leaves current for or while loop
- `continue` statement: `continue` directly enters next iteration of the loop

In [54]:
# the % operator divides x by y and returns
for n in range(2, 10):
    for x in range(2, n):
        if n % x == 0:
            print(n, 'equals', x, '*', n//x)
            break
    else:
        # loop fell through without finding a factor
        print(n, 'is a prime number')

2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3


In [55]:
for num in range(2, 5):
    if num % 2 == 0:
        print("Found an even number", num)
        break
    print("Found an odd number", num)

Found an even number 2


In [56]:
for num in range(2, 5):
    if num % 2 == 0:
        print("Found an even number", num)
        continue
    print("Found an odd number", num)

Found an even number 2
Found an odd number 3
Found an even number 4


In [57]:
# home made Fibonacci series
def fib(n):
    x = 0
    y = 1
    result = list()
    while x+y <= n:
        s = x+y
        result.append(s)
        x = y
        y = s
    return(result)

fib(100)

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

## Input / output

Standardized data formats can be imported using dedicated packages like `pandas`.
However for raw import of data python has a simple framework for accessing files from hard drive etc.

- open a file as a connection using `open`
- read or write to the file
- close it again
- use one of these `mode`s for working with the a file
    - `r` - open for reading (default)
    - `w` - open for writing, truncating the file first
    - `x` - create a new file and open it for writing
    - `a` - open for writing, appending to the end of the file if it 

In [58]:
# create and write to a new file
with open('test.txt', 'x') as f:
    f.write('this is a test file\nsecond line\nthird line')

In [59]:
# read a file as one single string
with open('test.txt', 'r') as f:
    ftext = f.read()

print(ftext)

this is a test file
second line
third line


In [60]:
# read a file line by line
with open('test.txt', 'r') as f:
    ftext = f.readlines()

print(ftext)

['this is a test file\n', 'second line\n', 'third line']


In [61]:
# remove text file at the end
import os
os.remove('test.txt')

In [62]:
count = 0
count += 1
print(count)

1


## Error handling

To be added.

In [63]:
try:
    # Open file in read-only mode
    with open("not_here.txt", 'r') as f:
        f.write("Hello World!")
except IOError as e:
    print("I/O Error:", e)

I/O Error: [Errno 2] No such file or directory: 'not_here.txt'


## Useful shorthands

Python has a lot of really neat features that serve as shorthands for more complicated expressions.

In [64]:
# the % operator returns the remainder of a division
10 % 3

1

In [65]:
# the += assingment means count up by one, shorthand for a = a + 1
count = 0
count += 1
print(count)

1


## Exercises

Some exercises for functions using only basic python.

- simple sort algorithm
- bubble sort algorithm
- DNA k-mer sequence combinations

In [66]:
l = [4,7,2,3,8,4,2,1]

def simple_sort(lst):
    new_lst = []
    while len(lst) > 0:
        pos = lst.index(min(lst))
        new_lst.append(lst.pop(pos))
    return(new_lst)

In [67]:
simple_sort(l)

[1, 2, 2, 3, 4, 4, 7, 8]

In [68]:
l = [4,7,2,3,8,4,2,1]

def bubble_sort(lst):
    list_sorted = False
    while not list_sorted:
        list_sorted = True
        for i,j in enumerate(lst):
            if i < len(lst)-1:
                if lst[i+1] < j:
                    lst[i] = lst[i+1]
                    lst[i+1] = j
                    list_sorted = False
    return(lst)

bubble_sort(l)

[1, 2, 2, 3, 4, 4, 7, 8]

- Function to calculate all possible sequences of a DNA k-mer

In [69]:
def DNA_comb(seq_len):
    count = 1
    result = ['A', 'T', 'C', 'G']
    while count < seq_len:
        result = [i+j for i in result for j in ['A', 'T', 'C', 'G']]
        count += 1
    return(result)

print(len(DNA_comb(seq_len = 4)))
print(DNA_comb(seq_len = 4)[0:10])

256
['AAAA', 'AAAT', 'AAAC', 'AAAG', 'AATA', 'AATT', 'AATC', 'AATG', 'AACA', 'AACT']
