# Introduction to some Python stuff

The main purpose of this file is to follow the ideas in "Introduction to R" by Marius Hofert and present similar concepts that exist in python. Not everything can be translated in a one-to-one fashion as, unlike R, python is not inherently build for mathematical work. Some packages will need to be imported to make things work. It is assumed here that the user is familiar with basic use of python. If you are not familiar with python at all, then there are great free resources on python for beginners such as: [w3schools](https://www.w3schools.com/python/), or the [Open CS course](https://open.cs.uwaterloo.ca/python-from-scratch/) from the University of Waterloo.

## Important Imports

Naturally, some packages need to be imported. We will start with the basic packages such as numpy, pandas, matplotlib, and so on, and will only import packages as needed.

In [1]:
import numpy as np # For dealing with vectors
import pandas as pd # For dealing with dataframes

# For dealing with plots
import matplotlib as mpl
import matplotlib.pyplot as plt

## Simple Manipulations

### Numbers

In [2]:
1/2

0.5

In [3]:
1/0

ZeroDivisionError: division by zero

May be better to use np.divide, but it may be cumbersome to do this everytime. Recommend using np.divide if suspected division be zero.

In [5]:
np.divide(1, 2)

0.5

In [6]:
np.divide(1, 0) # Notice the warning. Also, not the output



inf

In [7]:
np.divide(-1, 0) # Note np.divide(1, -0) actually gives inf, and not -inf. This is unlike 1/-0 in R

  np.divide(-1, 0) # Note np.divide(1, -0) actually gives inf, and not -inf. This is unlike 1/-0 in R


-inf

In [8]:
np.divide(0, 0)

  np.divide(0, 0)


nan

In [9]:
x = np.divide(0, 0)
type(x) # Note the use of type(x) instead of class(x)

## Relevant: https://www.quora.com/What-is-the-Python-equivalent-of-R-programmings-class-function

  x = np.divide(0, 0)


numpy.float64

In [10]:
type(np.inf)

float

In [11]:
np.inf

inf

### Vectors (a.k.a numpy arrays)

There is no built-in vector type in python. If using the R-package "reticulate" to convert "r_to_py", then single element R-vectors are turned into floats, and multi-element R-vectors are turned into python lists, as seen in the following [blog](https://www.r-bloggers.com/2020/01/what-r-you-in-python-r-vectors/). Here, using np.arrays is recommended for their similarities in dealing with R-vectors.

In [12]:
x = np.array([1, 2, 3, 4, 5])
type(x) # Note that x is an ndarray

numpy.ndarray

In [13]:
x.size # Not length(x)

5

In [14]:
x

array([1, 2, 3, 4, 5])

In [15]:
# Other ways of creating similar arrays
y = np.array(range(1, 6)) # Note that the last number is one more that what is needed
y

array([1, 2, 3, 4, 5])

In [16]:
# Or better yet
z = np.arange(1, 6)
z

array([1, 2, 3, 4, 5])

In [17]:
z[0] # Indexing starts at 0

1

In [18]:
#This doesn't work
z[5] = 6

IndexError: index 5 is out of bounds for axis 0 with size 5

In [20]:
#use append instead
z = np.append(z, 6) # Note that np.append is not Inplace
z

array([1, 2, 3, 4, 5, 6, 6])

In [21]:
x == y # Compares which elements of x match elements of y with same index, does not compare if x and y are the same

array([ True,  True,  True,  True,  True])

In [22]:
np.equal(x, y) # Does the same as above

array([ True,  True,  True,  True,  True])

In [23]:
# Check if x and y are the same ndarray
np.array_equal(x, y)

True

It is also possible to do the above using "(x == y).all()", but there are some issues with it as discussed [here](https://www.codingem.com/numpy-compare-arrays/).

In [24]:
(x == y).all()

True

### Watch out

In [25]:
x = np.var(np.arange(1,5))
y = np.std(np.arange(1, 5))**2

In [26]:
x == y # Not the same

False

In [27]:
x - y # Numerically not zero

-2.220446049250313e-16

In [28]:
x

1.25

In [29]:
y

1.2500000000000002

In [30]:
n = 0
np.arange(1, n) # empty array, some dissimilarities with R

array([], dtype=int64)

In [31]:
np.arange(2, n) # also empty array

array([], dtype=int64)

In [32]:
n = -1
np.arange(1, n) # also empty array

array([], dtype=int64)

In [33]:
# Couldn't find anything directly equivalent to R's seq_along, but here is a possible alternate
np.arange(1, np.array([3, 4, 2]).size + 1)

array([1, 2, 3])

### Some functions

In [34]:
x = np.array([3, 4, 2])

In [35]:
x.size

3

In [36]:
np.flip(x)

array([2, 4, 3])

In [37]:
np.sort(x)

array([2, 3, 4])

In [38]:
# there is no option or argument in both the sort() functions to change the sorting order to decreasing order.
#So, to sort a numpy array in descending order we need to sort it and then use [::-1] to reverse the sorted array.
np.sort(x)[::-1]

array([4, 3, 2])

In [39]:
#Returns the indices that would sort an array.
np.argsort(x)

array([2, 0, 1])

In [40]:
x[np.argsort(x)] # Returns sort(x)

array([2, 3, 4])

In [41]:
np.log(x) # natural logarithms

array([1.09861229, 1.38629436, 0.69314718])

In [42]:
x**2 # component=wise squares

array([ 9, 16,  4])

In [43]:
np.sum(x) # sum all numbers

9

In [44]:
np.cumsum(x) # compute the cumulative sum

array([3, 7, 9])

In [45]:
np.prod(x) # multiply all numbers

24

In [46]:
np.arange(1, 8, step = 2)

array([1, 3, 5, 7])

In [47]:
np.repeat(np.arange(1,4), 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

In [48]:
# To make it repeat a number of times, use np.tile
np.tile(np.repeat(np.arange(1,4), 3), 2)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3])

In [49]:
x[-1] # to get the last element of a vector

2

In [50]:
x[:-1] # to get everything but the last element

array([3, 4])

### Logical Vectors

In [54]:
np.full_like([], True, dtype= bool) # Empty logical vector, but don't read too much into this

array([], dtype=bool)

In [61]:
ii = x >= 3 # logical vector indicating whether each element of x is >= 3
ii

array([ True,  True, False])

In [62]:
x[ii] # use that vector to index x => pick out all values of x >= 3

array([3, 4])

In [68]:
~ii # negate the logical vector

array([False, False,  True])

In [70]:
all(ii)  # check whether all indices are TRUE (whether all x >= 3)

False

In [71]:
any(ii) # check whether any indices are TRUE (whether any x >= 3)

True

In [74]:
ii | ~ii # vectorized logical OR (is, componentwise, any entry TRUE?)

array([ True,  True,  True])

In [83]:
np.logical_or(ii,  ~ii) # also gives the same result as above

array([ True,  True,  True])

In [75]:
ii &  ~ii # vectorized logical AND (are, componentwise, both entries TRUE?)

array([False, False, False])

In [84]:
np.logical_and(ii, ~ii) # gives the same result as above

array([False, False, False])

In [85]:
any(ii | ~ii) # logical OR applied to all values (is entry any TRUE?)

True

In [87]:
all(ii & ~ii) # logical AND applied to all values (are all entries TRUE?)

False

In [88]:
3 * np.array([True, False])  # TRUE is coerced to 1, FALSE to 0

array([3, 0])

In [89]:
type(np.nan) # Note that NaN is of type float in python

float

Sometimes None is also used for missing values, but in pandas, this converts to NaN, see [here](https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html).

In [90]:
type(None) # This is it's own type of object

NoneType

In [107]:
type(np.NA) # By itself, NaN is not a thing in python, use np.nan

AttributeError: module 'numpy' has no attribute 'NA'

In [95]:
z = np.arange(1, 4); z= np.append(z, 4) # two statements in the same line
z

array([1, 2, 3, 4])

In [103]:
z = np.array([1, 2, 3, None, 5]) # Use None or np.nan if an element is missing
z[3] # No output as the value is None

In [104]:
z = np.array([1, 2, 3, np.nan, 5])
z[3] # outputs NaN, so this may be better

nan

Some important distinctions between None and np.nan can be found [here](https://ealizadeh.com/blog/working-with-missing-values-in-pandas-and-numpy).

In [109]:
z = np.append(z, [np.nan, np.inf]) # append NaN and Inf
z

array([ 1.,  2.,  3., nan,  5., nan, inf])

In [110]:
type(z)

numpy.ndarray

In [112]:
np.isnan(z) # Does not work if an element of z is None

array([False, False, False,  True, False,  True, False])

In [115]:
np.isinf(z) # check for +/-Inf

array([False, False, False, False, False, False,  True])

In [138]:
#pick out all finite numbers >= 2
z[np.logical_and(z >= 2, ~np.isinf(z))]

array([2., 3., 5.])

### Character Vectors

It is recommended to mostly use python strings. There are a lot of in-built functions for strings as described [here](https://realpython.com/python-strings/).

In [140]:
x = "apple"
y = "orange"
z = x + y # String concatenation
z

'appleorange'

In [142]:
z = " ".join([x, y]) # paste together with space
z

'apple orange'

More advanced features of "paste" from R can be coded as custom functions in python as shown [here](https://stackoverflow.com/questions/21292552/equivalent-of-paste-r-to-python)

### Named Arrays

A variation on named arrays in python can be implemented using structured arrays in numpy as described [here](https://numpy.org/doc/stable/user/basics.rec.html). However, this is not the same as named arrays in R. It doesn't seem that numpy has a good way to assign names to each index, but this can be easily does using a pandas dataframe as shown below.

In [170]:
x = pd.DataFrame({"a": [3], "b": [2]})
x

Unnamed: 0,a,b
0,3,2


In [177]:
x["b"]

0    2
Name: b, dtype: int64

In [191]:
x._get_value(0, "b")

2