[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/joshmaglione/CS102-Jupyter/main?labpath=.%2FWeek01.ipynb)

<a href="https://colab.research.google.com/github/joshmaglione/CS102-Jupyter/blob/main/Week01.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1: The basics

## What are we even doing here? 

In this part of CS102, we will study 4 ubiquitous [Python](https://www.python.org/) packages:
- [NumPy](https://numpy.org/) : scientific computing
- [pandas](https://pandas.pydata.org/) : data analysis
- [Matplotlib](https://matplotlib.org/) : data visualisation
- [scikit-learn](https://scikit-learn.org/stable/) : machine learning

These cover the 4 main topics of CS102-2. We may use other pacakges as well, but these are our focus. 

## Jupyter

We will use [Jupyter notebooks](https://jupyter.org/). They enable us to display math (and text) and run code in real-time. 

There are a few ways to work with Jupyter notebooks (`ipynb` files). 

You can install it on your local machine at [jupyter.org/install](https://jupyter.org/install).

![](imgs/Jupyter.png)

You can use the [binder](https://mybinder.org/) button on the [GitHub page](https://github.com/joshmaglione/CS102-Jupyter?tab=readme-ov-file#binder).

![](imgs/Binder.png)

You can always just do something else... You can read about what I do [here](https://github.com/joshmaglione/CS102-Jupyter?tab=readme-ov-file#the-way-i-jupyter-in-class).

Either way, we need to interact with `Python`, which is the most popular language according to [TIOBE](https://www.tiobe.com/tiobe-index/).

## OK, so how do I get `Python`? 🐍

Multiple ways to get `Python` on your own machine.

You can download and install from [python.org](https://www.python.org/downloads/).

![](imgs/Python.png)

You can use [Conda](https://docs.conda.io/en/latest/) to install and manage various packages.

![](imgs/Conda.png)

There are other ways as well. If you are having a hard time, send me an email (joshua.maglione@universityofgalway.ie), and we can figure it out.

## Show me what you've got

Let's demonstrate these 4 packages (or modules) in concert. We will also use the `Pillow` package for image conversion. 

We will 
1. convert a picture into a list of its pixels in $\mathbb{R}^3$ (for red, green, blue values), 
2. plot those points in a 3D scatter plot, 
3. then approximate our image using only $k$ distinct colours, for some input $k$.

Here is code to take a string (pointing to an image file) and return a 3D scatter plot of a sample of pixels. 

In [None]:
def image_to_plot(file:str, N:int):
    from PIL import Image
    import numpy as np
    import matplotlib.pyplot as plt
    img = Image.open(file)
    aimg = np.asarray(img)/255
    acolor = aimg.reshape(aimg.shape[0]*aimg.shape[1], aimg.shape[2])
    fig = plt.figure()
    ax = fig.add_subplot(projection='3d')
    rng = np.random.default_rng()
    S = rng.choice(acolor.shape[0], size=N, replace=False)
    # S = range(acolor.shape[0])
    xs, ys, zs = np.array([acolor[s,:] for s in S]).T
    ax.scatter3D(xs, ys, zs, s=0.5)
    ax.set_xlabel("R")
    ax.set_ylabel("G")
    ax.set_zlabel("B")
    return ax

Let's input our image which is `imgs/umbrellas.jpg`:

![](imgs/umbrellas.jpg)

This is about 400 x 600 pixels, so about $250,000$ total pixels.

In [None]:
%matplotlib ipympl
fig = image_to_plot("imgs/umbrellas.jpg", 50000)

In [None]:
%matplotlib inline

Now we provide the code to prescribe the exact number of colours.

In [None]:
def k_colours(file:str, k:int):
    from PIL import Image
    import numpy as np
    from sklearn.cluster import KMeans
    import matplotlib.pyplot as plt
    plt.figure().clear()
    img = Image.open(file, )
    aimg = np.asarray(img)/255
    acolor = aimg.reshape(aimg.shape[0]*aimg.shape[1], aimg.shape[2])
    kmeans = KMeans(n_clusters=k, random_state=0, n_init="auto").fit(acolor)
    means = [list(map(lambda x: x, pt)) for pt in kmeans.cluster_centers_]
    result = np.asarray(list(map(lambda i: means[i], kmeans.labels_)))
    aimg_new = result.reshape(aimg.shape[0], aimg.shape[1], aimg.shape[2])
    return Image.fromarray((aimg_new * 255).astype(np.uint8))

In [None]:
k_colours("imgs/umbrellas.jpg", 3)

## Jupyter and IPython

Jupyter notebooks run on `IPython` an interactive python interpreter. 

This provides additional functionality on top of python.

We will discuss:
1. `?` : documentation
2. `<tab>` : auto-completion
3. `%` : 'magic' commands
4. `!` : shell commands

### The `?` command

The first command `?` gives basic help and guidance. 

In [None]:
?

We can use `?` on functions like `len` in python.

In [None]:
len?

We can use `?` on objects in python.

In [None]:
L = [2, 3, 5, 7, 11]
L?

In [None]:
?L

It works on methods.

In [None]:
L.append?

It works on packages (i.e. modules).

In [None]:
import functools 
functools?

Can use `??` for a verbose output. Sometimes it's the same as `?`.

### The `<tab>` command

This is very simple. It can show you what is possible with a particular object.

In [None]:
L = [2, 3, 5, 7, 11]
i = 42
# L.
# i.

Also on packages.

In [None]:
# functools.

### The 'magic' commands

Magic commands are functions that are called 'outside' of python. 

I used one above: `%matplotlib`. This made the output from `matplotlib` *interactive*.

You can read more about the possible magic commands in the [documentation](https://ipython.readthedocs.io/en/stable/interactive/magics.html). 

We can use `%timeit` to time the next function.

In [None]:
%timeit _ = [x*x for x in range(100000)]
# Computes x^2 for all integers x in [0, 100 000). 

There are magic commands for the entire cell. For example, we can write bash in the following cell.

In [None]:
%%bash
echo "Hello world"
echo "Goodbye cruel world"

### The `!` command

The `!` allows one to run shell commands in python. We will run `ls` and use it in our python script.

In [None]:
files = !ls

In [None]:
print("The current working directory has the following files:")
for f in files: 
    print("\t" + f)

## Importing packages in Python

One of the main design philosophies of Python is *readability*. 

Python is not the fastest, but it is widely used in part because it is easy to read and write.

The community has established conventions and 'unwritten rules' to help facilitate readability.

One of those unwritten rules is how one imports popular packages.

In [None]:
import numpy as np 

This enables us to interact with the `numpy` package by abbreviating `numpy` to `np`.

Here are more conventions:

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

With `scikit-learn` one often just imports what is needed. For example:

In [None]:
from sklearn.cluster import KMeans

## Exercises

Talk over the answers with a neighbour or two.

1. In `IPython` and in Jupyter notebooks, what operators provides access to the online documentation? 
2. Which command in Python gives one access to packages and libraries?
3. What is the difference between `?` and `??`?
4. What operator begins the magic commands? 
5. What is the difference between 'line magic' and 'cell magic'?
6. What is the magic command for displaying the input history?
7. Which key would one press for auto-completion?
8. What happens when you run `In[1]`?
9. What happens when you run `Out`?
10. What symbol provides shell commands in Jupyter?