# Lab 11: Intro to python and numpy 

## Announcements 

- HW3 is posted, due Nov 15. 

## Python

Today we are going to try to do a brief introduction of Python and `numpy` hopefully these topics aren't entirely new to you, but we are going to try to start from the beginning. 

### Installing Python 

- Unlike `R`, there are actually many different python distributions out there, so there are multiple ways that one could install python.
- The first is going straight to the source: [https://www.python.org/downloads/](https://www.python.org/downloads/).
- Alternatively, we can use a distribution like Anaconda/Miniconda. 

### Package managers

- The default package manager is `pip`. This comes with most python distributions. You can install packages by running something on the command line like `pip install numpy`.
- There are more sophisticated package managers that give more flexibility. The most popular example is `conda`, which comes with Anaconda / Miniconda.
- To install a package using conda, it's just as simple! `conda install numpy`.

### What exactly is conda? 

- conda can be used to install packages, but in reality it is a way to manage (virtual) *environments*.
- A virtual environment is a tool that helps to keep dependencies required by different projects separate by creating isolated spaces for them that contain per-project dependencies for them.
- Idea: the package version of numpy I want to use for `project1` might not be the same that I use for `project2`; even more extreme, the version of python I want use for one project might not be the same as the version I use for another project.

The desire for a feature like this isn't unique to python programming, you can imagine that you want this feature for other programming languages (including R) as well. So why is this such a native thing to python and not so much in R? My thoughts: 

- Python is used much more frequently in production, so code that was made many years ago that is still running important processes might be running older versions of Python. Rather than updated an entire system, it might be easier to just keep working with older versions.
- Python version updates are much more "severe". For example, python 3 came out in 2008, and it is almost entirely a different programing language than python 2, and nearly all python 2 code won't be able to run using python 3. Therefore it is necessary to enable people to use both python 2 (for old existing systems) and python 3 (for new systems). The only time R had an update this drastic it actually changed from the S programming language (S for "S"tatistics) to R ("R" because it is the letter before "S").

### Anaconda vs Miniconda 

- conda is the environment manager that is used by both miniconda and Anaconda.
- miniconda is very light-weight, as it comes just with a version of Python, and the conda package manager.
- Anacond comes with the environment manager, but also comes with 1500+ of pythons most popular packages already installed.

You're free to use whatever you want (anaconda, miniconda, or neither of them). Personally I recommend learning a little bit about miniconda and using that. 

### Installing Miniconda 

Installing conda is incredibly simple. There are some instructions on how to do this [here](https://docs.conda.io/projects/miniconda/en/latest/index.html). Basically there are a few commands that it suggests you type in the command line that will download, install, and set-up miniconda for you. 

### Using environments 

conda makes it really easy to use different lightweight environments for various projects. For example, you could create a specific environment for this class: 

```
conda create --name stats604
```

This command will create a conda environment called stats604. Because I didn't give it any specific arguments, it will simply set up with environment with the same version of python that your base environment uses, and it doesn't install any packages. To use this environment, you need to *activate* it: 

```
conda activate stats604
```

To check which environments exist, you can type: 

```
conda info --envs
```

There will be an asterisk by the current active environment. You can install packages into your current environment using: 

```
conda install numpy
```

Finally, let's install jupyter on our new environment so that we can use jupyter-notebooks: 

```
conda install -c conda-forge jupyter
```

Here we see something extra added to the install command: 

- `-c` option is shorthand for `--channel` and is used to specify where to look (online) for a certain pacakge.
- `conda-forge` is an online package repository, similar to `CRAN` in `R`. `conda-forge` is a community run repository. The default repository is ` https://conda.anaconda.org/`. Here the only reason I specified `conda-forge`, is because that was what is [recommended by the creators of `jupyter`](https://jupyter.org/install).

Once jupyter is installed, we can run a jupyter-notebook by running the command 
```
jupyter-notebook
```

Finally, you can create as many environments as you want, and share them with others! This is a great way to make super reproducible code: you can create an enviroment with specific versions of python and packages, and then share the environment itself with the code that you submit to a journal. This way, all someone has to do is use conda to install your environment, and then they should be able to run your code even if there are updates to python or package versions.


While conda is primarily designed as a tool for python, the Anaconda distribution also includes all major versions of R, and most of the very popular R packages. This means that you could also use conda to manage a reproducible R project. Unfortunately, many R packages are not available on Anaconda, which means that this isn't always a perfect solution. (look at [renv](https://rstudio.github.io/renv/articles/renv.html) instead). 

## NumPy

NumPy (which stands for Numerical Python) is an open source Python library used for scientific computing. This is perhaps the most widely used Python library, and it is standard for working with numerical data in Pyhton. Many other popular libraries are built using NumPy, for example, Pandas, SciPy, Matplotlib, and scikit-learn all rely on NumPy. 

NumPy primarly gives users the ability to do numeric matrix / array calculations, which isn't a native feature in Python. It's really fast by requiring that the elements of an array be the same data type. This allows for many NumPy operations to be implemented in C, avoiding some of the overhead costs associated with an interpreted programming language such as Python. 

To use NumPy, or any other python library, we use the import statement: 

```
import numpy
```

However, because NumPy is so frequently used, and we have to call its name everytime we want to use it, we often abbreviate it as:

```
import numpy as np
```

In [1]:
import numpy as np
import time

### Python List vs NumPy array

In python, there is a native object that allows us to deal with arrays / sets. This object is a `list`. We can have a list of numbers that represents a vector or an array, but lists also allow us to put any type of object into the list. 

In [4]:
x = [1, 2, 'a', np]

An array (or a NumPy array), on the otherhand, can only use a single type of primitive object, usually a numeric.

In [33]:
x = np.array([1, 2, 6.2, -0.1])  # Vector in R^4
y = np.array([[1, 2, 3],[5, 4, 1], [0, 10, 0]])  # Matrix in R^3x3
z = np.array([[[1, 2, 3],[5, 4, 1], [0, 10, 0]],[[9, 8, 7],[6, 5, 4], [3, 2, 1]]])  # "tensor" in R^2x3x3

print("Shape: " + str(z.shape))
print()
print(z)

Shape: (2, 3, 3)

[[[ 1  2  3]
  [ 5  4  1]
  [ 0 10  0]]

 [[ 9  8  7]
  [ 6  5  4]
  [ 3  2  1]]]


Like lists, we can access the elements of the array by using square brackets. **Remember that Python is 0 indexed rather than 1 indexed**

In [26]:
print(z[0])

[[ 1  2  3]
 [ 5  4  1]
 [ 0 10  0]]


### Creating Arrays 

There are many different ways one can create an array. Here are some that might be useful: 

In [56]:
# Create "manually": 
print("np.array([1, 2, 3]): " + str(np.array([1, 2, 3])), end = '\n\n')

# Fill in with zeros, here 0 \in R^2x3 
print("np.zeros([2, 3]): \n" + str(np.zeros([2, 3])), end = '\n\n')

# Fill in with ones
print("np.ones(10): " + str(np.ones(10)))

# Create an empy array (slightly faster than creating zeros)
print("np.empty(5): " + str(np.empty(5)))

# Create a range of values: 
print("np.arange(5): " + str(np.arange(5)))

# Custom evenly spaced values
print("np.arange(1, 25, 3): " + str(np.arange(1, 25, 3)))

# Get specified number of evenly spaced values within an interval:
print("np.linspace(0, 0.5, num=8): " + str(np.linspace(0, 0.5, num=8)))

# Specifying your datatype: (default is np.float64)
print("np.ones(10, dtype=np.int64): " + str(np.ones(10, dtype=np.int64)))

np.array([1, 2, 3]): [1 2 3]

np.zeros([2, 3]): 
[[0. 0. 0.]
 [0. 0. 0.]]

np.ones(10): [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
np.empty(5): [0.0e+000 4.9e-324 9.9e-324 1.5e-323 2.0e-323]
np.arange(5): [0 1 2 3 4]
np.arange(1, 25, 3): [ 1  4  7 10 13 16 19 22]
np.linspace(0, 0.5, num=8): [0.         0.07142857 0.14285714 0.21428571 0.28571429 0.35714286
 0.42857143 0.5       ]
np.ones(10, dtype=np.int64): [1 1 1 1 1 1 1 1 1 1]


#### sorting elements

In [67]:
# To sort: 
x = np.array([2, 1, 5, 3, 7, 4, 6, 8])
print(x)

print(np.sort(x))

# Note that this didn't modify x:
print(x)

# However, this will modify x: 
x.sort()
print(x)

[2 1 5 3 7 4 6 8]
[1 2 3 4 5 6 7 8]
[2 1 5 3 7 4 6 8]
[1 2 3 4 5 6 7 8]


In [69]:
# Also of interest could be argsort, which tells you the index of the sorted elements: 
x = np.array([2, 1, 5, 3, 7, 4, 6, 8])
print(np.argsort(x))

[1 0 3 5 2 6 4 7]


#### Combining arrays

In [70]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.concatenate((a, b))

array([1, 2, 3, 4, 5, 6])

#### investigating arrays

we might want to check to see if the arrays are of the correct dimension. 

In [74]:
x = np.array([[[0, 1, 2, 3],
                           [4, 5, 6, 7]],

                          [[0, 1, 2, 3],
                           [4, 5, 6, 7]],

                          [[0 ,1 ,2, 3],
                           [4, 5, 6, 7]]])

print(x.ndim)  # number of dimensions 
print(x.shape)  # shape of the array
print(x.size)  # total number of elements

3
(3, 2, 4)
24


#### Reshaping an array 

In [82]:
x = np.arange(6)
print(x, end = '\n\n')

b = x.reshape(3, 2)
print(b, end = '\n\n')

c = x.reshape(3, 2, order='F')  # order='C' for C-like is default, order='F' for Fortran-like. 
print(c)

[0 1 2 3 4 5]

[[0 1]
 [2 3]
 [4 5]]

[[0 3]
 [1 4]
 [2 5]]


In [92]:
a = np.arange(6)
print(a.shape)

# Add a new axis 
a2 = a[:, np.newaxis]
print(a2.shape)

# Alternative method 
a3 = np.expand_dims(a, axis = 0)
print(a3.shape)

(6,)
(6, 1)
(1, 6)


#### Indexing and slicing 

In [94]:
data = np.array([1, 2, 3, 4, 5, 6])

print(data[1])
print(data[0:2])
print(data[1:])
print(data[-2:])

2
[1 2]
[2 3 4 5 6]
[5 6]


#### Selecting by conditions

In [98]:
a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(a, end = '\n\n')

print(a[a < 5], end = '\n\n')
five_up = (a >= 5)
print(five_up, end = '\n\n')
print(a[five_up], end = '\n\n')

c = a[(a > 2) & (a < 11)]
print(c)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[1 2 3 4]

[[False False False False]
 [ True  True  True  True]
 [ True  True  True  True]]

[ 5  6  7  8  9 10 11 12]

[ 3  4  5  6  7  8  9 10]


## Matrix Multiplication

## Vectorized code 

## Example relevant to HW