## Disclaimers

* Coding is a classic learning-by-doing skill. It is good to learn the basics in a structured manner, but most learning happens if you just try and code things that interest you. Just try it and google the errors/bugs that arise.
* Different people learn different coding concepts at different speeds and will need coding for entirely different reasons. There are multiple ways to achieve most coding outcomes and as long as you find a way that works for you, thats fine.
* If you are having a problem - chances are you are not the first person to have said issue. Coding provides a great opportunity to hone the skill of finding the relevant piece of knowledge on the internet and applying it to your needs.
***

### Jupyter Notebooks
Jupyter Notebooks are organized in cells. Every cell can either be **code**, which is executable commands in python or **markdown**, which allows you to write free-form text such as this cell.

It is helpful in commenting/structuring your code. E.g. by telling other users or reminding yourself what a code section is about.

***
## Python basics.

### The print command

In order to print anything, so you can see output, you can use **print()** and write whatever content you want printed within the paranthesis. If you want text printed, you need to enclose the text within parantheses. The coding classic is to print *Hello World* to see that your programming language setup is working.

Run the following cell by clicking in it and pressing **Shift+Enter** or **Run** in the ribbon line at the top.

In [None]:
print("Hello World")

Note that, if you do not enclose the text in parantheses, it causes an error.

In [None]:
print(Hello World)

Printing numbers works similarly:

In [None]:
print(15)

### Variables

You can assign values on the RHS of an equal sign to variables on the LHS of the equal sign. You can freely choose any name for your variable (combination of letters and numbers and some special characters):

In [None]:
x = 2 #how to assign the value 2 to a variable called x
y = 4
crazyVariablenme_32151_ = 6
print(x) #Printing the first variable
print(y)
print(crazyVariablenme_32151_)

In addition to markdown cells, it is often convenient to comment within a code cell. Everything in a row after a # sign will be commented out and is understood to not be code.

Code is executed line by line from top to bottom within each cell, so if you assign the value 2 to variable x and then at some point further down assign x the value 3, it will overwrite the previous value.

In [None]:
x = 2
x = 3
print(x)   #Runs line by line

In assignments, the RHS is evaluated first and that value is stored in the variable on the LHS. `x = 10 * x` is therefore a perfectly valid statement. A single equal sign is not a mathematical equality.

In [None]:
x = 2
x = 10 * x #This also works as an assignment
print(x)   

### Math

You can do math with variables, as you would expect. Note that, in order to raise something to the power, you use `**`.
Also note, how you can **print multiple things in one print statement**, by separating it with commas.

In [None]:
num1 = 2.5 #call your variables almost anything
num2 = 4
total = num1 + num2 #Do Math
minus = num1 - num2
multi = num1 * num2
division = num1 / num2
power = num1 ** num2
print (total, minus, multi, division, power)# Print out multiple things

Careful with **strings** (anything withhin " " is interpreted as text). Adding strings, simply means appending them.

In [None]:
num3 = "3"
num4 = "5"
total2 = num3+num4
print(total2) #What do you expect as the outcome Strings are not numbers

It can be useful to append strings:

In [None]:
first = "Christian"
last = "Kaps"
name = first + last #strings concatenate
print(name)
name = first + " " + last 
print(name) #overwrite name with proper spacing

You can also evaluate equalities / check if statements are true. `==` checks if the LHS and RHS are equal `!=` checks if they are unequal and `>=` `<=` test for inequalities. These **true/false** statements are called **booleans**.

In [9]:
print(13 == 12) #booleans
print(24/2 == 12)
print(13 != 12)
print(13 >= 12)
False

False
True
True
True


False

### Loops

The real magic of coding does not arrise from using Python merely as a calculator, but by having code repeatedly executed. One way of achieving repeated execution are **loops**.
The following is called a **for-loop** and three things are noteworthy: 
1. The code consists of one line that starts the loop. `for variablename in range(howManyIterationsToExecute):`
    and is followed by  all statements that you want to execute repeadetly. Note the colon at the end.
2. All code that is supposed to be repeatedly executed has to be indented. This indentation shows Python what code is within the for loop.
3. Python zero-indexes, so if you want to count the first 10 numbers `range(10)`, it will print 0-9, not 1-10.

In [None]:
for x in range(10):
    print(x)
    print("hello")

In the **range** statement, you can optionally provide the **startvalue, endvalue, step-size** to change what the loop goes through: 

In [None]:
for x in range(1990,2000,3):
    print(x) #Loop

### If-else statements

Another very useful control mechanism are so-called **if-else** statements, where code is only executed if certain conditions are met. The syntax is

`if condition1:
    execute if condition1 is true
else:
    execute if condition 1 is false`

In [None]:
testnum = 13 #If statement

if testnum> 10:
    print("LARGE")
else:
    print("small")

Combining for loops and if-else statements becomes powerful quickly

In [None]:
for x in range(20): #combine the two things
    if x > 10:
        print("LARGE")
    else:
        print("small")

`%` is the **modulo** operator - that returns the remainder if you divide one integer by another.

In [None]:
for x in range (100, 200, 33):
    if x % 2 == 0:
        print(x, "Even")
    else:
        print(x, "Odd")

- - - 
With some of the core control statements out of the way, lets get a little more advanced. If this feels like drinking from the firehose - don't worry, coding is eternally open book, no memorization needed, just try to wrap your head around as many concepts as possible and look up the exact syntax whenever you need to).
- - -
### User defined formulas

Using the syntax `def myFunctionName(inputVariables):` you can define your own functions. Below, we define a function that checks if you are at the voting age in 2021.

In [15]:
def votingRight(yearBorn):
    Age = 2021 - yearBorn #Calculate Someting
    voting =  (Age >= 18) #Check whether person is 18
    if voting == True:
        print("You are " + str(Age) + " years old and allowed to vote" ) #If 18, print this out
    else:
        print("You are " + str(Age) + " years young and cannot vote yet" )

You then call the function by using the name you defined and entering the yearBorn variable into the parantheses

In [16]:
votingRight(1993)
votingRight(1850)
votingRight(2005)

You are 28 years old and allowed to vote
You are 171 years old and allowed to vote
You are 16 years young and cannot vote yet


### Importing packages

Python is great, because many programmers have already written alot of difficult code, easily packaged in so-called packages (who would have thought). During the installation process, you already installed **pandas and jupyter**, which automatically installed numpy (because pandas requires numpy as a dependency) as well. We will use **numpy** as an example to briefly show how it allows to very quickly do tremendous pieces of coding. Numpy allows you to do a lot of numerical manipulation in Python.

In [None]:
import numpy #this loads all the contents of the numpy package

Learning how to help yourself to solve coding problems is a core skill to develop. Don't know how to get a random number in Python, but know that numpy does number stuff? Just google *numpy how to get random number* and you will find results that teach you what to do

In [None]:
np.random.seed(123)
numpy.random.binomial(1,0.5,20) #Flip 1 fair (50:50) coin 20 times
#this is all ignored
numpy.random.binomial(5,0.5,20) #Flip 5 fair coins 20 times and some how often you got heads in each of the 20 tries

Get 100 random integers between 0 and 35

In [None]:
numpy.random.randint(0,36,100)

Guess you want to plot some of this, so google: *plot a histogram in python*:
https://stackoverflow.com/questions/33203645/how-to-plot-a-histogram-using-matplotlib-in-python-with-a-list-of-data

You see in the first answer on Stackoverflow that you can name your packages when importing them by using `import packageWithLongname as nickname`, so we will follow that convention for the numpy package we previously imported and the matplotlib library that will help us plot.

In [None]:
import numpy as np
import matplotlib.pyplot as plt #https://stackoverflow.com/questions/33203645/how-to-plot-a-histogram-using-matplotlib-in-python-with-a-list-of-data

If you just got error **ModuleNotFoundError: No module named 'matplotlib'**, Python is telling you that it does not know what matplotlib is. To solve this, 1) go to Anaconda Navigator > Environments, 2) select the current environment and 3) install matplotlib. Then come back here and it should be able to import the package.


Now flip 10 coins 1000 times and get the distribution of how many heads you have in each of the 1000 tries

In [None]:
x = numpy.random.binomial(10,0.8,1000) #Flip 10 coins 1000 times
plt.hist(x)

### Lists

Lists are a great way of storing data in an easy ay to sort / filter them or to do math on multiple items:

In [None]:
y = []
print(y)

In [None]:
y = [] #Creates ana empty list and stores it in variable y
print(y)
y.append(2) #Appends the number 2 to the list
print(y)
y.append(4) #Appends the number 4 to the list
print(y)

You can append different datatypes into lists as well, and retrieve them with `list[integer]`

In [None]:
list1 = [1945, "wharton", "oidd", "python", True] 
list1[3] #beware of zero indexing

There are many pre-defined functions to work on lists, such as `len(listname)` or `sorted(listname)` that, shockingly, return the length of the list or sort its items. For more info: https://www.datacamp.com/community/tutorials/python-list-function

In [None]:
list2 = [16, 12, 1, 3, 5, 7, 8, 8, 10]
print(len(list2))
print(sorted(list2))

### Dictionaries

Dictionaries are another way of storing data, however a dictionary does not store its contents in any specific order, but via keywords. In the below example we for example want to store restaurant ratings. The general structure is `dictionaryname = {key:value}`


In [22]:
dict1 = {"Zahav":4.5, "Butcher Bar":4, "El Vez":1, "City Tap House":4, "Distrito":3.5, "Han Dynasty":4, "Koreana":4} #Ask anyone PhD that was part of the OIDD department in 2018 about the infamous El Vez...
print(dict1)

{'Zahav': 4.5, 'Butcher Bar': 4, 'El Vez': 1, 'City Tap House': 4, 'Distrito': 3.5, 'Han Dynasty': 4, 'Koreana': 4}


You can return a value by dictionaryname["ItemKey"]

In [None]:
dict1["El Vez"]

But because dictionaries do not store their contents in order, the following creates an error:

In [None]:
dict1[2]

To overwrite a value or create a new entry, you can use the following approach

In [None]:
dict1["El Vez"] = 2 
dict1["New Place In Town"] = 5
print(dict1["El Vez"], dict1["New Place In Town"])

### Arrays

Often times, you want to deal with tables or data that is structured in arrays (matrices with rows and columns). You access values in an array by `array[rownumber,columnnumber]` and remembering Python's zero indexing

In [None]:
myArray = np.zeros(shape=(3,6)) #Creates an array with 3 rows and  6 colums of 0s
print(myArray.shape) #prints the dimensions of the Array
print(myArray)
myArray[0,0]= 1 #Write 1 into the top left corner
myArray[2,5] = 13 #Write 13 in the bottom left corner
print(myArray)

In the following, we assign two variables at once, because the myArray.shape returns a tuple (two values) that can be directly assigned and the use two nested for loops to change the values of the values in the Array. Just by recombining the basic building blocks, one can get useful pieces of code.

In [None]:
myArray.shape

In [None]:
numRows, numCols = myArray.shape
for row in range(numRows):
    for col in range(numCols):
        myArray[row, col] = row*10+col
print(myArray)

***
Take a quick breather, you so far have learned about:

    - Defining and using variables (integers, strings, booleans)   
    - Doing math in Python
    - Python code structure and indentation
    - For loops
    - If, else statements
    - Importing Packages
    - Random Numbers
    - Plotting
    - Lists
    - Dictionaries
    - Arrays
***

Before we close, lets do a quick example of quickly you can get things to work. A **neural network in 15 lines** from https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

For this to work, please install `keras` the way we installed previous packages. Note that Python also comes with its own package manager, called pip, that may be useful if you cannot find a package in anaconda. For a quick intro on pip, see: https://pip.pypa.io/en/stable/quickstart/

Also make sure that you have the csv file (**pima-indians-diabetes.csv**) in the same folder as this notebook. Columns 1-9 are:
   1. Number of times pregnant
   2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
   3. Diastolic blood pressure (mm Hg)
   4. Triceps skin fold thickness (mm)
   5. 2-Hour serum insulin (mu U/ml)
   6. Body mass index (weight in kg/(height in m)^2)
   7. Diabetes pedigree function
   8. Age (years)
   9. Diabetes variable (0 or 1)

In [None]:
from keras.models import Sequential
from keras.layers import Dense
import numpy

# split into input (X) and output (Y) variables
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",") #This loads the data into an array


X = dataset[:,0:8] #This stores the first 8 columns in the variable X 
Y = dataset[:,8] #This stores the 9th column in the variable Y. We try to predict diabetes based on the X variables

# create model
model = Sequential() #Create a Sequential NN, where each layer takes in variables from previous layer and outputs into the following
model.add(Dense(12, input_dim=8, activation="relu", kernel_initializer="uniform")) #Input data has 8 dimensions, this layer has 12 nodes, activation function is relu
model.add(Dense(8, activation="relu", kernel_initializer="uniform"))  #This layer has 8 dimensions
model.add(Dense(1, activation="sigmoid", kernel_initializer="uniform")) #Our final layer has 1 layer, the activation function is sigmoid. Here, we predict the diabetes score.
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) #Specification of what to optimize for, which algorithm to use, etc.

The above code gets you to the following model (only top two nodes' connections displayed, but the model is fully connected):
![NN_Model.png](attachment:NN_Model.png)

In [None]:
#Fit model - careful, may take a few minutes
model.fit(X, Y, epochs=150, batch_size=10,  verbose=2)
# calculate & round predictions
predictions = model.predict(X) #Predicting diabetes in dataset
rounded = [round(x[0]) for x in predictions]
print(rounded)

Next time, we will look at how we can use Python to do some of the optimization we have seen in this class.