# Introduction to Programming

What a title, huh?  "Programming" is such a hugely expansive term, covering a staggering number of techniques, languages, processes, hardware configurations, purposes, applications, scales, and so forth.

*So what is programming, really?*

The short version is that programming is a way of making a computer (another broad term!) perform tasks.  These tasks can be as simple as the addition of two numbers, or as complex as calculating the excited state energy of a fluorescent nucleotide as it moves through a solution of water and sodium chloride ions.  As the tasks become more complex, the code becomes more complex as well.  Small tasks may be manageable with tiny scripts or single-file programs, while larger ones may require the inclusion of other tools, multiple files, and more complicated design principles.

For the purposes of this Summer School, we'll mostly be sticking to the introductory/entry-level stuff, but remember that the actual limits of your programs are only in the capacity of your hardware and the breadth of your imagination.

## Fundamental Principles

There are some rules to programming that should really be learned early on, if only because knowing them early will make the rest considerably easier.  For many experienced programmers, the actual writing of code is a smaller portion of the overall process than one might expect.

**Version Control** - This is a very important habit to get into.  In the simplest form, version control provides a means of keeping track of the changes you've made to your code as you go, as well as providing information about who made which changes in a collaborative project.  More complicated version control can lead to things like software written for different types of hardware, or for different scales of calculations, etc.

**Code Formatting** - Many different groups and companies have what are known as "style guides" for any code produced in, by, or for the organization.  These often include the simple concepts like "how many spaces constitute a `tab` in your code?" or "What information, if any, should be included in comments at the top of each file?".  However, these style guides can also include more complex information beyond text formatting and up into things like "Each function definition should include comments describing the arguments to the function and what the function returns on completion" or "Test suites must be included for all additions to the codebase before they may be considered for merging".

**Debugging** - This is a team effort, because invariably the person who wrote the code might wind up missing some of their own mistakes that others will readily find.  This is not at all an indicator of skill, intelligence, or character.  This is purely because we can get tunnel vision about our own code (it happens to the best of us), and because our brains are wired to recognize *and complete* patterns.  Where we might "see" a missing semicolon at the end of a line of code we wrote because we expect it to be there, someone else who didn't write our code may notice it immediately.  Interestingly, debugging is actually a *highly* valued skill in industry, because fixing problems is usually far more expensive than preventing them in the first place.

**Pseudocode** - Other names for pseudocode include "algorithm development", "project management", "outlining", "planning", "thinking", and "jotting that down so I don't forget it later".  Pseudocoding is effectively writing out the stages of your program in plain language (*not code*) to ensure a clear understanding of the problem you're trying to solve.  Often, programmers will begin pseudocode with a very simple set of steps they think of the problem.  Each step can be explained in more and more detail as a set of smaller, more manageable steps, until eventually you wind up with a complete list of steps that can be converted to computer commands.  Pseudocode can also provide some insight into ways the code might be optimized, such as by revealing opportunities to parallelise the execution, or by revealing regions where something being calculated can just be saved for reuse rather than recalculated again later.

**Variable Types** - Different types of information can be stored for use by the computer.  Some programming languages are fairly lenient about variable types and are flexible with what types of data are being provided in a given variable.  Others are a bit more strict, and require explicit type declarations/definitions for each variable to ensure proper memory allocation.
*If only some of that made sense, you're in the right place.*


## Version Control

The most commonly used tool for version control is `git`, with related websites [GitHub](https://www.github.com), [GitLab](https://about.gitlab.com/), and [Bitbucket](https://bitbucket.org/).  For the purposes of this workshop, we'll focus on using GitHub because it's free and most (if not all) of us already have accounts there.

There are a few steps to do first if you've never used `git` on your current computer before.  These will configure your computer for use with your GitHub account.  If you use multiple computers (including working from a HPC/Supercomputer), these steps will need to be configured for each computer you use.

---

Configure your local machine with an SSH-key.  This will allow your computer to connect to other **trusted** computers that you've previously designated as such, and this includes your Github account.

```bash
    cd $HOME
    ssh-keygen
```
will give the following response/prompt:
```
    Generating public/private rsa key pair
    Enter file in which to save the key (/home/username/.ssh/id_rsa): 
```
If the file already exists, choose a new filename such as `git_rsa` or something you'll recognize.

```
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
```
You can set a passphrase if you want, but keep in mind you'll be entering it every single time you upload changes of your code to GitHub.  Some people choose not to have a passphrase for this particular aspect of their work, others do.

```
    Your identification has been saved in /home/username/.ssh/git_rsa
    Your public key has been saved in /home/username/.ssh/git_rsa.pub
    The key fingerprint is:
    SHA256:8U9t+r+SwCi8Xe8uu3HCjbHa7WU51A9pArzm9+F+esk username@Computer
    The key's randomart image is:
    +---[RSA 3072]----+
    |                 |
    |       .         |
    |        +  . +   |
    |         =. =. ..|
    |      . S =*o.=..|
    |       = .oBo+..o|
    |        ..o+*o.*.|
    |       . o.+o*D .|
    |           +@Bo*o|
    +----[SHA256]-----+
```
(As an aside, this is a modified variant of the one generated for this workshop, so don't bother trying to mess with my stuff.)

---

Add the SSH key to your GitHub Account to allow your computer to access it.

Go to your account settings page and click on `SSH and GPG keys`, then click on `New SSH key`.

![GitHub Step 1](Images/GH_01.png)

![GitHub Step 2](Images/GH_02.png)

![GitHub Step 3](Images/GH_03.png)


You'll need some text out of a file generated by the `ssh-keygen` step.  If you named the keyfile `git_rsa`, the process will have also produced a file called `git_rsa.pub`, which is the "public key" corresponding to your computer's private key.  In simpler terms, the public key is like a "secret question" that the other computer can ask, that only your computer with its private "secret answer" can properly respond to, so both computers know the other is trusted with this information transfer.  

Open the `git_rsa.pub` file and copy all the text into the field shown on the GitHub website here.

![GitHub Step 4](Images/GH_04.png)

---

You'll also need to configure `git` on your computer as well.  Assuming you have `git` already installed, you can begin with setting some of the initial variables.

You can configure individual repositories (projects) with these settings, or you can configure `git` globally to set your defaults.  For now, we'll assume that you only have one GitHub account to manage on your computer.

```bash
git config --global user.name "Firstname Lastname"
git config --global user.email "username@emailserver.com"
git config --global user.user "github_username"
```

This next command may not mean too much right now, but it's useful to have right off the bat to keep things clean later on.

```bash
git config --global core.excludesFile '~/.gitignore'
touch ~/.gitignore
```

This tells git to ignore anything listed in the file `~/.gitignore` when maintaining version controls.  This is useful for things like cached files produced by various Python scripts, compiled programs/object files from C++, and so forth.  As we continue forward, we'll add some things to the global ignore, and others to repository-specific `.gitignore` files.

---

Okay, we we've configured `git` on our computers, now how about actually *using* it?

Let's say you've made some headway on designing and maybe even coding up some of your project, and you remember how important it is to maintain version controls.  You can initialize the project folder `MyCodingProject` with the following command

```bash
git init MyCodingProject/
```
This establishes a starting point for all future versions to be compared against.


For more useful commands, check out this cheat sheet!

![Git Cheat Sheet](Images/GitCheatSheet.png)

## Code Formatting

Most programming languages have a form of something called **scope**, which is a region in the code in which certain things are true.
For example, a function may have variables that only exist inside that function, and then disappear once the program exits the function's **scope**.
Some languages use specific characters to define a scope, such as `C++` with `{` and `}` defining the beginning and end of a scope.  

```C++
int main()
{
    cout << "Hello World!\n";
    if (5 < 4)
    {
        cout << "Five is less than four.\n";
    }
    return;
}
```

A common convention is to use spaces or tabs when moving into different levels of scope, however this is usually for easier reading by humans and isn't necessary for the code compiler itself.

Others, like `Python`, use indentations of spaces or tabs and are specifically required to change scope.

```python
def myfunction(x):
    x = x + 5
    print(x)
    return

x = 10
print(x)
myfunction(x)
print(x)
```

The code snippet above shows the variable x inside the scope of `myfunction` as well as the main program.  If we follow the value of `x` as the program runs, we can see that `x = 10`, which is printed out.  Then, the *value* of `x` is passed into the function, which adds five and prints it out (`x = 15`).  Once that is done, the function returns, and the main program's value of x is printed out again. (`x = 10`).

Let's see how that works in practice.

In [2]:
def myfunction(x):
    x = x + 5
    print(x)
    return

x = 10
print(x)
myfunction(x)
print(x)

10
15
10


It is very important to keep scope in mind when using variables as counters or other housekeepers.  

Many programmers use `i` as a counter variable in loops.  However, sometimes it is necessary to have loops inside other loops (nested), which effectively means you have a scope inside another scope.  If you use `i` in the outer loop, then change it in the inner loop, it remains changed in the outer loop and can have effects on the execution of your code.

Therefore, it is important to keep track of what variables are used during the execution of your code and how they are modified as you go.

#### Back to Formatting

Formatting is not simply a matter of using indents or 80-characters-per-line requirements.  Formatting also includes things like expected code-comments or other internal documentation.  Some code development packages have the functionality built in to parse comments in the code and build human-readable documentation, but it requires the use of specific formats in the comments.  See the examples below.

In [3]:
import numpy as np
def myfunction(x,y,z):
    norm = np.linalg.norm([x,y,z])
    return norm
a = 1
b = 2
c = 3
result = myfunction(a,b,c)
print(result)

3.7416573867739413


The code block above has no comments in it, and so without already knowing what the individual parts are doing, it's not easy to know what is happening in the code or how to modify and manipulate it for your own purposes.  If we take the same code block and add some commentary, it can be made easier.

In [4]:
# library import
import numpy as np

# function definition
def myfunction(x,y,z):
    norm = np.linalg.norm([x,y,z])
    return norm

# Main program execution
a = 1
b = 2
c = 3
result = myfunction(a,b,c)
print(result)

3.7416573867739413


Now we have a little more clarity in what is happening, but it can still be made clearer.  As it stands, we simply know that we're importing libraries, defining a function, and running the main program.

We can improve the commentary further by describing what is happening in the function or the steps inside the main program.

In [5]:
# library import
import numpy as np

# function definition
def myfunction(x,y,z):
    # Arguments:
    # x - float representing the x component of a vector
    # y - float representing the y component of a vector
    # z - float representing the z component of a vector
    # Returns:
    # norm - float representing the magnitude of the vector given by the [x,y,z] values
    norm = np.linalg.norm([x,y,z])
    return norm

# Main program execution
# initialize variables 
a = 1
b = 2
c = 3
# obtain the magnitude of the vector defined by [a,b,c]
result = myfunction(a,b,c)
# print out the magnitude
print(result)

3.7416573867739413


With this level of commentary in the code, we can easily understand what is happening in the function and the main code block.  This is very useful when working on collaborative projects especially, as it can ensure that everyone is able to follow your thought process in the code and, if necessary, compare to the actual code for debugging purposes.

Another thing to consider is the larger format of a project.  That is, not simply how the text is arranged in a file, but how code blocks and functions are arranged in multiple files.  For smaller programs, it may not be necessary to divide the code up in this way, but for larger projects - or for standard functions you'll use in multiple separate projects - it may be easier and cleaner to keep some things separated and compartmentalized.  This also makes compiling easier later on down the line.

It can also lead to smaller individual files, making debugging easier as well - most error messages include the location where the error was encountered, and it's much easier to go to `line 46 in utility.cpp` than it is to go to `line 57684 in main.cpp`.  As a bonus, making changes in smaller files won't necessarily require the entire codebase to be recompiled, but rather just the small portion you modified.

## Debugging

It's often said that only a small portion of programming is actually writing the code - the rest is fixing the code.

Debugging is required at multiple stages of programming, and bugs can arise in many different forms.

#### Compilation Bugs

Compilation errors are usually the easiest to solve, as they're encountered by the compiler itself and are often syntax-related ("missing semicolon on line 85") or datatype-related ("Unable to cast string as int"), and usually include "tracebacks" which can help you figure out where exactly the error is.

#### Runtime Bugs

There are a few kinds of runtime errors that can pop up.  They are usually more complicated to unravel, depending on what caused them.  The first kind is something that, while the code will compile fine, the actual execution will return an error.  For example, a function that takes `x` and `y` and returns the result of `x/y` may compile just fine.  But when you run the code, if somehow `y = 0`, the program will crash because of an attempt to divide by zero.

Another bug that can arise is when the results are not what was expected.  For instance, if you have a program that should return the product of two numbers, and instead returns the sum of the two numbers, this is a runtime error, even though no error is actually reported.  The code compiles fine and executes exactly as it is written, but it may not be what was intended.  These types of bugs are why validation and testing are required for programs big and small.

Take a look at the code blocks below.  One contains an example of a compilation error, the other a runtime error.



In [7]:
int x = 45
print(x)

SyntaxError: invalid syntax (2716038442.py, line 1)

In [10]:
def divide(x,y):
    return x/y

x = 5
y = 1
result = divide(x,y)
print(x,y,result)

x = 5
y = 0
result = divide(x,y)
print(x,y,result)



5 1 5.0


ZeroDivisionError: division by zero

It should be pointed out that *technically*, `Python` doesn't have compile errors since it's not compiled at all, but runs like a scripting language which merely interprets the commands line by line.  However, more advanced python programming can include the actual compilation of python scripts into self-contained programs that don't require any external libraries.  This is not in the scope of this workshop, however, and is merely pointed out for information.

## Pseudocode

One of the most valuable skills a programmer can develop is the ability to think like a computer.  This means learning to break down larger problems and complex behaviors into smaller and smaller pieces until it becomes a collection of tiny calculations such as a sequence of additions and subtractions, or comparisons between two values.

A good habit to develop whenever beginning a new project is to first outline the expected flow of the code.  Some people use a whiteboard, or scratch paper, or just a blank text document on their computer - it doesn't matter how you do it, just that you plan it out somehow before jumping into the code.

#### Example Project - Brownian Motion

Write a program that places an arbitrary number of particles in a box of some other arbitrary size, then moves them around randomly by assigning a random x, y, and z component of their motion vector between 0 and 1.

Enter your pseudocode below.  You can make it as complex as you like, but it should just be plain text.  Try to think through the steps of the problem like a computer might.

---

<!--- Pseudocode goes below



---

Once you've completed your pseudocode, keep it handy - you can use it for tomorrow's project!

## Variable Types

Python doesn't generally require *explicit* variable type declarations (with some exceptions that will come later as we get into more advanced programming).  However, it is still useful to know what kinds of data there is, what can be done with it, and how it's stored.

First, let's explore data types like `int`, `float`, `list`, `tuple`, and `string`.


In [None]:
my_int    = 2
my_float  = 3.1415
my_list   = [1,3.1415,"Hello World!","pizza"]
my_tuple  = (5,6,7,8,9)
my_string = "Hello World!"

These examples are fairly simple.  

- `my_int` is an integer, and gets treated like one.  Integers are useful for things like indexes, counters, and so forth.
- `my_float` is a float (often called a "double" in other programming languages), and are regular numbers including decimals.
- `my_list` is a list of values enclosed in square brackets.  Lists are indexed from zero, which means the first item in a list is "item 0".  Lists are great ways to keep collections of data organized and in order, and you can extract individual values simply by including the index with the variable name:  `my_list[3]` will return "pizza".  You can also get values from the end of a list with negative indices.  my_list`[-1]` will return "pizza" because it's the last value.
- `my_tuple` is similar to a list, except that it is a little more difficult to pull individual values from it.  Tuples are useful when you need to maintain groups of values together in relation to each other, such as with (x,y,z) coordinates.
- `my_string` is a list of characters including letters, numbers, punctuation, whitespace (tabs, spaces, line breaks, etc.).  The contents of a string do not include the quotation marks on either side.  Strings can include quotes using *escapes* like `\"` or `\\` to include a backslash.

Variable manipulation comes in many forms and depends on the type of data contained within.  Better understanding of how data types work can allow you to do some interesting things, like taking a "slice" of a string like you would from a list.  

Consider the examples below.

In [None]:
my_list = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]
# my_list has a length of 26 individual values
len(my_list)

In [None]:
my_string = "Once more into the breach!"
# my_string has a length of 26 characters including whitespace and punctuation.
len(my_string)

### Slicing Lists

One common use for lists is "slicing", where you can get a small subsection of the list.  Let's say you wanted just the first five elements in `my_list`.  You would use a slice.  Slices are generated similar to how an individual element is called from a list, from inside square brackets.  However, we can put a `:` between the starting and ending indices to get everything between.  We can also use an empty space to indicate "everything".  Check out the examples below.

In [None]:
my_list[:5]

In [None]:
my_list[5:10]

Note how the two results are different.  The ending in the first cell is the same as the beginning of the second cell, but we don't actually get "5" in the results in the first cell.  Slices go "up to" the ending value, but don't include it.  Keep this in mind when working with slices.  We can also combine other tricks from list manipulations, like using negative indices to go backwards from the end.

In the next cell, we'll get the last seven elements from the list.

In [None]:
my_list[-7:]

What if we wanted every third element in the list?

In [None]:
my_list[::3]

The second `:` indicates a "stride".  This is useful when you have data that is strangely shaped (such as a long list of values that correspond to x,y,z coordinates, but aren't in a (3,n) shaped list.

Now let's combine these.  We'll get every other element starting from the tenth and going up to the twentieth.

In [None]:
my_list[10:20:2]

Now let's look at strings.  Strings are just lists of letters, numbers, and any other characters you can think of.  With this in mind, we can do things to strings that we have done to lists.

In [None]:
my_string

In [None]:
my_string[:5]

In [None]:
my_string[5:10]

In [None]:
my_string[-7:]

In [None]:
my_string[::3]

In [None]:
my_string[10:20:2]

... some functions are more useful than others, but you get the idea!

Now let's look at integers and floats.  In some programming languages, the difference between these two can be pretty severe.  For example, in C++, dividing a double by an integer will give you a truncated integer, which means you can lose some of the information in your data if you're not careful.  Thankfully, Python is a little more forgiving.

Normal division works like we might intuitively expect, where a float divided by an integer can be a float, and is therefore assumed to be.

In [None]:
my_float/my_int

We can also force the division to return an integer value (which is useful in some situations)

In the example above, we got a value of 1.57075.  If we were to round this using conventional methods, we'd get 2
However, forcing integer division with the `//` below gives us a truncated (not rounded) value of 1.0.  This is also slightly deceptive, as the `.0` implies the value is a float,even though the result is a whole number.  This is important to be aware of when doing mathematical work in python.  Truncation just removes everything after the decimal point, while rounding actually considers the value beforehand.


In [None]:
my_float//my_int

In [None]:
round(my_float/my_int)

### Math with Variables

Math can get incredibly complex, so it's important to remember your Order of Operations (PEMDAS) - Parentheses, Exponents, Multiplication, Division, Addition, and Subtraction.

However, in python it's a little different.  Parentheses are solved first, then exponents, until everything in a given equation is reduced down to a series of terms separated by `+`,`-`,`*`, and `/`.  Then, the values are processed left-to-right.

In [None]:
1 + 2 - 3 * 4 / 5

In [None]:
(1 + 2) - 3 * 4 / 5

In [None]:
(1 + 2 - 3 * 4) / 5

In [None]:
(1 + 2 - 3) * 4 / 5

These are just a few examples of how order of operations affects the results.  With this in mind, you can see why it's very important to keep track of what you're doing in a complex mathematical function.  The next cell has a complex equation in a single line, then the same equation separated into more easily-managed terms.

In [None]:
x=3
y=5
z=7
answer =  (x**(y/z)-x/((y+2)*z)-x)/(y*z)*x

print(answer)

Not only is that difficult to read, but it's also harder to see where errors might be arising.  So we can rewrite it and create additional variables to hold small chunks

In [None]:
x=3
y=5
z=7

# (x**(y/z)-x/((y+2)*z)-x)/(y*z)*x
p = y/z
# (x**p-x/((y+2)*z)-x)/(y*z)*x
q = x**p
# (q-x/((y+2)*z)-x)/(y*z)*x
r = y+2
# (q-x/(r*z)-x)/(y*z)*x
s = r*z
# (q-x/s-x)/(y*z)*x
t = y*z
# (q-x/s-x)/t*x
u = x/s
# (q-u-x)/t*x
v = q-u-x
# v/t*x
w = v/t
# w*x
answer = w*x

print(answer)

This may seem overengineered, but breaking down the individual terms is helpful in both programming and math, especially when it reveals certain trends, or even ways to rearrange an equation to reduce the overall number of calculations being performed.  This kind of breakdown can also be useful when you begin building larger, more complicated functions, even up to the point of creating entire programs or modules.

### Booleans

Booleans are simply variables that are either `True` or `False`.  They can also be interpreted as `1` and `0`.  Booleans get used all the time in programming, though we may not be constantly aware of them.  

For example, whenever we compare two numbers, the comparison creates a boolean


In [1]:
3<5

True

In [3]:
3>5

False

In [4]:
3 == 5

False

We can see that the responses for the different comparisons are correct.  $3 < 5$ is true, while $3 > 5$ and $3 == 5$ are both false.  Incidentally, the `==` is intentional.  In Python and C++, `=` *assigns* a value, while `==` *compares* two values.

Booleans get used constantly in things like "if-else statements" or "while loops".