<a href="https://colab.research.google.com/github/jazoza/mad/blob/main/01_MAD_intro_python_SOM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making Arguments with Data

Introduction into Python and Self-organizing map (SOM)

## Introduction to Python

Python is an object-oriented, interpreted language. The interpreter can be used interactively in this notebook.

The boxes below contain Python code. You can click on the box, change the content and then simply start the programme with the play button, or with Shift-Enter (Apple-Enter) and let it execute. Try adding two numbers and see what happens. You can change the numbers directly in the input window.

#### Basic operations

In [None]:
2+4+5+2+0+2+3

In [None]:
"2" + "4" + " " + "5" + " " + "2" + "0" + "2" + "3"

In [None]:
# try some other operations

4/2 + 3*6 - 8

In [None]:
# notice the floating point (12.0); you can ask for a specific number type, floating point - "float" or integer - "int"

int(4/2 + 3*6 - 8)

#### Working with variables

A variable is a container for holding certain values, e.g. strings or numbers. In the course of the programme, one can access the value of the variable's content, or assign a new value to them.

In [None]:
myVariable = "Yes"
anotherVariable = "we can!"
print(myVariable + " " + anotherVariable)

The variable name is freely selectable, but there are a few words that you are not allowed to use (e.g. python-specific commands like print). Try the addition again and try to change the value of the variable:

In [None]:
nummer1 = 5
nummer2 = 2
nummer1 * nummer2
nummer1 + nummer2
nummer1 - nummer2
nummer1 / nummer2

Actually, we would expect the command line to spit out 4 values, right? You can use the "print" statement to print values in the terminal since the notebook terminal only prints the last command that outputs.

In [None]:
i = 5
print(i)
i = i + 142
print(i)
i = 43
print(i)

You can make a mistake once in a while. The syntax must always be exact, for example Upper/lower case characters, a typing error such as Print instead of print immediately throws up an error:

In [None]:
Print i


You get a syntax error. The first line of the error message immediately tells you where the error is. Because our programme has only one line, it is "line 1". You can also see that the word print is no longer green, as it was before! Try correcting the cell.

#### Random numbers

Now, let's get to know our first function. Functions are something like little sub-programmes that can give us a certain value.

The function randint(m, n) returns a randomly selected number from the number range {m, m+1, ..., n}. A call randint(1, 6) thus returns a natural number between 1 and 6.

In [None]:
randint(1,6)

But we get an error message that 'randint' is not defined. What does this mean? Python is very modular. The random function is an extra module and must first be imported:

In [None]:
from random import randint
randint(1,6)

Now the **random** module has now been loaded into our notebook and the **randint** function works. Try it many times, try changing the numbers. Bonus: all about random module: https://docs.python.org/3/library/random.html

In [None]:
# come back to numerology....
luckyNumber = randint(1,1000000)
print('your lucky number today is ', luckyNumber)

A problem that occurs relatively often is when you want to link two different types of data (e.g. a number and a text). See for yourself:

In [None]:
print("Your lucky number is: " + luckyNumber + " soooo lucky")


We have to convert the number - the value of the variable - into a string, like this:

In [None]:
print("Your lucky number is: " + str(luckyNumber) + " soooo lucky")

Now try integrating the random function into this line instead of the variable value, to get a different number every time you run the code:

### Conditional statements

At some points in your code , you want to detect or distinguis certain stages/conditions. So far, we have been able to write programmes in which one instruction follows another and they are also executed in this order. In a programming language, a conditional statement means parts of code that are executed under certain conditions. If the condition is not present, this code is not executed.  Normally it looks like this in Python:

```
if condition1:
    instructions1

```

You see the second line is indented. The tabulator key is used to make the indentation. How do these instructions work? With the condition, you check whether a state is "true" or "false".

In [None]:
nummer1 = 3
nummer2 = 5
if nummer1 == nummer2:
    print("they're the same!!!!")
else:
    print("not sure which number is bigger")
    print("but they're not the same")
    print("--------------")

Now try changing the numbers. Suddenly the output disappears because the condition no longer applies. Here you use two == characters to compare the two values. The counterpart to the *if* is the *else*, with which you can catch everything that does not apply:


You can also check if the value exceeds a certain limit with the less and greater sign < >: try to swap the > with the <. Does the output change?

In [None]:
nummer1 = 5
nummer2 = 3
if nummer1 > nummer2:
    print("number 1 is bigger than number 2")
else:
    print("## number 2 is bigger than number 1")

### Loops

You can make the computer do repetitive things quite easily with loops. Loops, are needed to repeatedly execute a block of code, also called a loop body. In Python, there are two types of loops: the while loop and the for loop.

In [None]:
a = 0
while a < 4:
    #a = a + 1
    a += 1
    print(a)
    print("hello world")


In [None]:
for i in range(35):
    print("hello")
    print(i)

How can you build a down-counting counter from this? Try to think it through on a piece of paper. Tip: you have to subtract something!

### Lists

A list can be thought of as an array of single elements. You can create lists of any length, either with numbers or with text, or with both. A list of values (elements) is separated by commas and enclosed in square brackets. To define a list

In [None]:
myList = ["batman","technoviking","nyancat"]
print(myList)

Lists are helpful because a lot of data is usually stored via lists. To read the individual texts from the list, you can write:

In [None]:
print(myList[0] + " is so boring")
print(myList[1] + " is so boring")

The practical thing is that you can use loops together with lists.

In [None]:
for element in myList:
    print(element + " is so boring")

If I change the length of the list, I can use the same for loop to output them all:

In [None]:
myList = ["batman","saxplayer","technoviking","nyancat","rainbow","double rainbow","waterfall","and so on..."]
for element in myList:
    print(element)

We can randomly mix up the order in the list with the shuffle function:

In [None]:
import random
random.shuffle(myList)

This does nothing at first, at least it looks like it. But the order has already been changed, so if we now output myList, we see the following:

In [None]:
myList

Now try random.shuffle(mylist) again in the bottom row

In [None]:
# import and use relevant libraries

In [None]:
import this # for more eastereggs look here https://xkcd.com/353/

## Self-organizing Map (SOM)

An artificial neural network (ANN) algorithm for unsupervised machine learning.
SOM was introduced by the Finish computer scientist Teuvo Kohonen in 1982. It is used for ordering high-dimensional statistical data so that alike inputs are mapped closer to each other, illustrating the similarity relationships between different data items, that can be explored in an intuitive manner.

SOM presents datapoint relationships in two dimensions, by projecting the higher-dimensional onto a low-dimensional space of predetermined size, a sort of a map. Data points with similar features are mapped close together, resulting in clusters.

More information: https://en.wikipedia.org/wiki/Self-organizing_map


In [None]:
! pip install susi

In [None]:
import susi

# short intro to working with SOM

### Clustering and projecting colors

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import math
import susi
from susi.SOMPlots import plot_nbh_dist_weight_matrix, plot_umatrix, plot_estimation_map, plot_som_histogram

In [None]:
rand = np.random.RandomState(0)
train_data = rand.randint(0, 255, (3000, 3))
# print(train_data)

In [None]:
# @markdown Display the trining data
fig, ax = plt.subplots(
    nrows=1, ncols=1, figsize=(12, 3.5),
    subplot_kw=dict(xticks=[], yticks=[])
    )
shape_x = 50 # @param{type:"integer"}
shape_y = 60 # @param{type:"integer"}
ax.imshow(train_data.reshape(shape_x, shape_y, 3))
ax.title.set_text('Training Data')

In [None]:
# 5 * sqrt(number of training samples)
grid = math.ceil(5 * math.sqrt(3000))
grid = math.floor(math.sqrt(grid))
print(grid)

# this returns "ideal" grid size
# maybe is a parameter that students can play around as well

In [None]:
# @title Unsupervised Learning
som = susi.SOMClustering(
    n_rows=grid,
    n_columns=grid
)
som.fit(train_data)
print("SOM fitted!")

In [None]:
# @title Project Data
test_data = rand.randint(20, 200, (3000, 3))
result = som.transform(test_data)


In [None]:
result = som.transform(train_data)

In [None]:
X = [x[0] for x in result]
Y = [x[1] for x in result]
C = test_data
fig = plt.figure()
ax = fig.add_subplot(111)

ax.scatter(X, Y, c = C/255.0)
plt.show()