# R is a strange place to start

R is an unusual language in many ways, so parts of the below will be a mixture of

> R is unusual in the following ways

and 

> Python specifically differs R in the following ways

and also

> Python is unusual in the following ways

In [2]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


## R is vectorized by default

In R, the basic primitives of datatypes are vectors, lists, matrices and data frames. 

In most cases you can apply a function to all elements in a vector by calling the function once and providing the array as input.

Consider the following examples:

In [11]:
%%R

some_numbers <- 1:10
print(some_numbers)
print(some_numbers > 5)

words <- c('hello', 'world', 'goodbye', 'pizza')
print(toupper(words))

 [1]  1  2  3  4  5  6  7  8  9 10
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
[1] "HELLO"   "WORLD"   "GOODBYE" "PIZZA"  


This is exceptionally useful for data munging and performing analyses etc.

In this regard R is an outlier as is this is unusual for most programming languages.

In almost all languages you would achieve the same result using iteration of some kind.

In Python we could handle these two examples like so:

In [16]:
some_numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for number in some_numbers:
    print(number > 5)
    
print('\nUppercasing words:\n')

words = ['hello', 'world', 'goodbye', 'pizza']

for word in words:
    print(word.upper())

False
False
False
False
False
True
True
True
True
True

Uppercasing words:

HELLO
WORLD
GOODBYE
PIZZA


It is worth noting that there are other more compact ways of achieving the above in Python and there are libraries to provide functionality that is close to data frames and matrices (and their operations) found in R (specifically NumPy and Pandas)

### Some consequences

In R you typically end up working with a small set of data types (vectors, lists, matrices, dataframes) in diverse ways.

In Python you typically end up working with a diverse set of data types over and above the built-in types.

## R is more functional

R takes more influence from Functional Programming than Imperative or Object-oriented programming.

Python, by contrast, takes a bit from everywhere.

In the previous examples you may have noticed to convert strings to upper case you call a function:

```R
toupper(mystrings)
```

Whereas in Python, this particular function is invoked on a receiver:

```python
some_string = 'Hello'
some_string.upper()
```

Note that this doesn't apply universally, as illustrated in the next example.

In [24]:
from os import getcwd
getcwd()

'/Users/patrickdinneen/Development/ext/python101/notebooks'

### Some consequences

In R you end up using many functions where the inputs are vectors, lists, matrices and dataframes and the outputs. The place where these functions is implemented is only really relevant in terms of installing and loading libraries.

In you'll end up general purpose functions from modules (like `getcwd()`) *and* functions from different types/classes (such as `str.upper()`)

Let us take a look a list of functions available on strings (ignore the double underscore functions for now):

In [25]:
dir('some string')

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


You can view the details on what each function (actually called a method in this context) does in the [fantastic help docs](https://docs.python.org/3.7/library/stdtypes.html#str)

Alternatively, can read help directly in an interactive session, like so:

In [26]:
help(str.startswith)

Help on method_descriptor:

startswith(...)
    S.startswith(prefix[, start[, end]]) -> bool
    
    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.



## Data Structures: R vs Python

Data Structures are ways of grouping values together.

In Python (and indeed most languages) you typically work with atomic values (such as numbers, strings, dates, etc) and Data Structures to contain multiple atomic values (arrays, dictionaroies). 

In R you could say that the Data Structures are like the atomic values of the language. R makes it hard for you to have a single value of anything, it is much quicker to supply you with a vector of length 1.

### Python Lists != R Lists

One of the most frequently used base Data Structures in Python is the humble List. In our number and word handling example we created two lists.

A List contains a sequence of items, in a specific order and items can be read based on their position in the list. In Python you can have a List with mixed types of values:

In [29]:
from datetime import datetime
my_crazy_list = ['Hello', 1984, datetime.utcnow()]
my_crazy_list

['Hello', 1984, datetime.datetime(2021, 1, 22, 11, 45, 44, 669303)]

However, this is seldom a good idea and you don't find it much.

The List in R is actually, very confusingly, much closer to a Python Dictionary, which we will see next. This naming weirdness is an idiosyncrasy of R as the List is a very distinct concept in Computer Science.

Python Lists are much closer in spirit to R's vectors, with a few notable differences:

* R does not support vectors with mixed types of data
* Python is 0-indexed and R is 1-indexed
* Python lists only support integer and slice-based indexing, where R supports vector indexing

In [30]:
%%R

words <- c('verify', 'letters', 'compiler', 'cow', 'cat', 'dog')
short_word_indexes <- nchar(words) < 4
print(words[short_word_indexes])

[1] "cow" "cat" "dog"


In [31]:
%%R

words <- c('verify', 'letters', 'compiler', 'cow', 'cat', 'dog')
print(words[c(1, 3, 6)])

[1] "verify"   "compiler" "dog"     


In [33]:
%%R

words <- c('verify', 'letters', 'compiler', 'cow', 'cat', 'dog')
print(words[1])

[1] "verify"


**In Python:**

In [35]:
words = ['verify', 'letters', 'compiler', 'cow', 'cat', 'dog']

short_words = []

for w in words:
    if len(w) < 4:
        short_words.append(w)
        
print(short_words)

['cow', 'cat', 'dog']
