# Introduction to Data Science. Lecture 2: Notebooks and Python Basics
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*

Welcome to your first Jupyter notebook! This will be our main working environment for this class.

# Jupyter Notebook Basics

First, let's get familiar with Jupyter Notebooks. 

Notebooks are made up of "cells" that can contain text or code. Notebooks also show you output of the code right below a code cell. These words are written in a text cell using a simple formatting dialect called [markdown](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html). 

Double click on this cell text or press enter while the cell is selected to see how it is formatted and change it. We can make words *italic* or **bold** or add [links](http://datasciencecourse.net) or include pictures:

![Sample picture](decline.png)

The content of the notebook, as you edit in your browser, is written to the `.ipynb` file we provided. 

If you want to read up on Notebooks in details check out the [excellent documentation](http://jupyter-notebook.readthedocs.io/en/latest/notebook.html).

The most interesting aspect of notebooks, however, is that we can write code in the cells. You can use [many different programming languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) in Jupyter notebooks, but we'll stick to Python. So, let's try it out:

In [None]:
print ("Hello World!")
a = 3
# the return value of the last line of a cell is the output
a 

Again, we've greeted the world out there using a print statement. 

We also assigned a variable and returned it, which makes it the output of this cell. Notice that the output here is directly written into the notebook. 

You can change something in a cell and re-run it using the "run cell" button in the toolbar. 

Another cool thing about cells is that they preserve the state of what happened before. Let's initialize a couple of variables in the next cell: 

In [None]:
age = 2
gender = "female"
name = "Datascience Cat"
smart = True

These variables are now available to all cell below or above **if you executed the cell**. In practice, you should never rely on a variable from a lower cell in an earlier cell. 

If you make a change to a cell, you need to execute it again. You can also batch-executed multiple cells using the "Cell" menu in the toolbar. 

Let's do something with the variables we just defined:

In [None]:
print (name + ", age: " + str(age) + ", " + 
       gender + ", is smart: " + str(smart))

In the previous cell, we've [concatenated a couple of strings](https://docs.python.org/3.5/tutorial/introduction.html#strings) to produce one longer string using the `+` operator. Also, we had to call the `str()` function to get [string representations of these variables](https://docs.python.org/3.5/library/stdtypes.html#str).

An alternative way to do this is not to concatenate the string but to pass each variable in as a separate arguemtn to the print function: 

In [None]:
print (name, ", age: ", str(age), ", ", gender, ", is smart: " + str(smart), sep="")

## Modes

Notebooks have two modes, a **command mode** and **edit mode**. You can see which mode you're in by the color of the cell: 
 * **green** means edit mode, 
 * **blue** means command mode. 
 
Many operations depend on your mode. You can switch into edit mode with "Enter", and get out of it with "Escape".


## Shortucts

While you can always use the tool-bar above, you'll be much more efficient if you use a couple of shortcuts. The most important ones are:

**`Ctrl+Enter`** runs the current cell.  
**`Shift+Enter`** runs the current cell and jumps to the next cell.   
**`Alt+Enter`** runs the cell and adds a new one below it.

In command mode:
**`h`** shows a help menu with all these commands.  
**`a`** adds a cell before the current cell.  
**`b`** adds a cell after the current cell.  
**`dd`** deletes a cell.  
**`m`** as in **m**arkdown, switches a cell to markdown mode.  
**`y`** as in p**y**thon switches a cell to code.  

## Kernels

When you [run code](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Running%20Code.html), the code is actually executed in a **kernel**. You can do bad thinks to a kernel: you can make it stuck in an endless loop, crash it, corrupt it, etc. And you probably will do all of these things :). So sometimes you might have to interrupt your kernel or restart it. Use the "Kernel" menu to restart the kernel, re-run your notebook, etc.

Also, before submitting a homework or a project, make sure to `Restart and Run All`. This will create a clean run of your project, without any side effects that you might encounter during development. We want you to submit the homeworks **with output**, and by doing that you will make sure that we actually can also execute your code properly.

## Storing Output

Notebooks contain both, the input to a computation and the outputs. If you run a notebook, all the outputs generated by the code cells are also stored in the notebook. That way, you can look at notebooks also in non-interactive environments, like on [GitHub for this notebook](https://github.com/datascience-course/2018-datascience-lectures/blob/master/02-basic-python/lecture-02-notebook.ipynb). 

The Notebook itself is stored in a rather ugly format containing the text, code, and the output. This can sometimes be challenging when working with version control.

### Exercise 2: Creating Cells, Executing Code

1. Create a new code cell below where you define variables containing your name, your age in years, and your major.
2. Create another cell that uses these variables and prints a concatenated string stating your name, major, and your age in years, months, and days (assuming today is your birthday ;)). The output should look like that:

```
Name: Science Cat, Major: Computer Science, Age: 94 years, or 1128 months, or 34310 days. 
```

# Python Basics

## Functions

In math, functions transfrom an input to an output as defined by the property of the function. 

You probably remember functions defined like this

$f(x) = x^2 + 3$

In programming, functions can do exactly this, but are also used to execute "subroutines", i.e., to execute pieces of code in various order and under various conditions. Functions in programming are very important for structuring and modularizing code. 

In computer science, functions are also called "procedures" and "methods" (there are subtle distinctions, but nothing we need to worry about at this time). 

The following Python function, for example, provides the output of the above defined function for every valid input: 

In [None]:
def f(x):
    result = x ** 2 + 3 
    return result

We can now run this function with multiple input values: 

In [None]:
print(f(2))
print(f(3))
f(5)

Let's take a look at this function. The first line
```python
def f(x):
```
defines the function of name `f` using the `def` keyword. The name we use (`f` here) is largely arbitrary, but following good software engineering practices it should be something meaningful. So instead of `f`, **`square_plus_three` would be a better function name in this case**.  

After the function name follows a list of parameters, in parantheses. In this case we define that the function takes only one parameter, `x`, but we could also define multiple parameters like this:
```python 
def f(x, y, z):
```

The parameters are then available as local variables within the function.

The second line does the actual computation and assigns it to a **local variable** called `result`. 

The third line uses the `return` keyword to return the result variable. Functions can have a return value that we can assign to a variable. For example, here we could write: 

```python
my_result = f(10)
``` 

Which would assign the return value of the function to the variable `my_result`.

Note that the lines of code that belong to a function are **intended by four spaces** (you can hit tab to intend, but it will be converted to four spaces). Python defines the scope of a function using intendation. Many other programming languages use curly brackets {} to do this. A function is ended by a new line.

For example, the same function wouldn't work like this:

In [None]:
def f(x):
    result = x ** 2 + 3
# Throws a NameError becauser result isn't defined outside the function
return result

Equally, we can't intend by too much:

In [None]:
def f(x):
    result = x ** 2 + 3
    # Throws an IndentationError
        return result

## Scope

Another critical concept when working with functions is to understand the scope of a variable. Scope defines under which circumstances a variable is accessible. For example, in the following code snippet we cannot access the variable defined inside a function:

In [None]:
def scope_test():
    function_scope = "only readable in here"
    # Within the function, we can use the variable we have defined
    print("Within function: " + function_scope)

# calling the function, which will print     
scope_test()

In [None]:
# If we try to use the function_scope variable outside of the function, we will find that it is not defined. 
# This will throw a NameError, because Python doesn't know about that variable here
print("Outside function: " + function_scope)

You might wonder "Why is that? Wouldn't it make sense to have access to variables wherever I need access?". The reason for scoping is that it's simply much easier to **build reliable software when we modularize code**. When we use a function, we shouldn't have to worry about its internals. 

Another practical reason is that this way we can **re-use variable names** that were used in other places. This is really important when we work with other peoples' code (e.g., libraries). If that weren't possible, we might get nasty side-effects just because the library uses a variable with the same name somewhere. 

You can, however, use variables defined in the larger scope in the sub-scope:

In [None]:
name = "Science Cat"

def print_name_with_dr():
    print("Dr.", name)
    
print_name_with_dr()

This is generally **not considered good practice** - functions should rely on their input parameters. Otherwise it can easily lead to side effects. This would be the better approach: 

In [None]:
# notice that we're re-using the parameter name
def print_name_with_dr(name):
    print("Dr.", name)
    
print_name_with_dr(name)

Finally, there is a way to define a variable within a function for use outside its scope by using the global keyword. There are reasons to do this, however, it is generally discouraged.

In [None]:
def scope_test():
    # Think long and hard before you do this - generally you shoudln't. I have never.
    global global_scope
    global_scope = "defined in the function, global scope"
    # Within the function, we can use the variable we have defined
    print("Within function: " + global_scope)

scope_test()
# Since this is defined as global we can also print the variable here
print("Outside function: " + global_scope)

### Exercise 3: Functions
Write a function that 
 * takes two numerical variables
 * multiplies them with each other
 * divides them by a numerical variable defined in the scope outside the function
 * and returns the result. 
 
Print the result of the function for three different sets of input variables. 

## Looking Ahead: Conditions, Loops, Advanced Data Types

We've learned how to execute operations and call and define functions. In the next lecture, we'll learn how we can control the flow of execution in a program unsing conditions (if statements) and loops. We'll also introduce more advanced data types such as lists and dictionaries. 