# Intro to Data Science Lecture 2 - Intro to Command Line, Git, Python, & Jupyter Notebooks
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*

Welcome to our first coding lecture! We will be using Python, a popular data science programming language, in the lectures, homeworks, and projects. As part of Homework 0, you should have already setup Python, IPython, and Jupyter notebooks.

We will first go over Jupyter Notebooks and Python. Then, we will talk about command line interefaces and git.

## How to get this Lecture!

Lectures for this class are available at [https://github.com/datascience-course/2026-datascience-lectures](https://github.com/datascience-course/2026-datascience-lectures).

Ideally, you would use this as a way to practice git. This means you should clone the repository and then pull before each lecture. From the command line, you clone once:
```bash
git clone https://github.com/datascience-course/2026-datascience-lectures
```

You can also use a graphical app like Github Desktop:

<img src="githubdesktopclone.png" width=600 />

and then every time you want to update, you pull:
```bash
cd 2026-datacience-homework
git pull
```

Note this can cause collisions when you make changes, so when I intend to use the Jupyter notebook, I change to a different branch
```bash
git checkout -b working
```

Then I make my changes and commit:
```bash
git commit -am "Used during lecture."
```

Then next time I want to pull, I make sure I'm back on my main branch:
```bash
git checkout main
git pull
get checkout working
git rebase main
```

If we have time, we'll talk more about git this lecture. Additional notebooks about version control are command line interfaces available with this lecture.

If you're struggling with this, there is also a direct download option but that will require downloading a lot of copies! 

<img src="howtogetfiles.png" width=800 />

You can also navigate directly to a file to download it:

<img src="directdownload.png" width=800 />

# Intro to Jupyter Notebooks

Jupyter notebooks will be our main working environment for this class.

You should have already downloaded it as part of HW0. But you'll also need to start your notebook server.

There are two ways to do this:

1. You can use the command line to navigate to the directory that contains the notebook and then run:  

```bash
$ jupyter notebook
```

2. Or you can use the anaconda navigator to launch a notebook server in your home directory and then navigate to this folder: 

![Anaconda Navigator Screenshot](anaconda_navigator.png)

## Jupyter Notebook Basics

First, let's get familiar with Jupyter Notebooks. 

Notebooks are made up of "cells" that can contain text or code. Notebooks also show you output of the code right below a code cell. These words are written in a text cell using a simple formatting dialect called [markdown](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html). 

Double click on this cell text or press enter while the cell is selected to see how it is formatted and change it. We can make words *italic* or **bold** or add [links](http://datasciencecourse.net) or include pictures:

![Data science cat](datasciencecat.jpg)

The content of the notebook, as you edit in your browser, is written to the `.ipynb` file we provided. 

For your homeworks, you will write in `homework1.ipynb` files for which we will give you a template. You then create a zip archive of this file (and all relevant additional files) and submit it to canvas. 

If you want to read up on Notebooks in details check out the [excellent documentation](http://jupyter-notebook.readthedocs.io/en/latest/notebook.html).

## Google Colab
An alternative to native Jupyter Notebooks are cloud-hosted google colab notebooks. Google Colab is largely identical to jupyter notebooks on your local computer, though there are some differences when it comes to loading data from files. We generally recommend that you work on your homeworks and review lectures in local Jupyter Notebooks, but a Google Colab project could be an idea for your final project, as it's really good for collaborative work – which is an area where Jupyter Notebooks themselves aren't so great because of the issues with doing proper version control on them.


# Running Python

Now it's time to run python!

## Executing your first program

Open a terminal and execute:

```bash
$ python
```

You'll see something like that:

```bash
$ python
Python 3.13.9 (v3.13.9:8183fa5e3f7, Oct 14 2025, 10:27:13) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
```
What does this tell us? It shows us the version number of Python (3.13.9). Some of you might see that yours is installed through Anaconda. At the end of this statement, you see the three `>>>` signs: these indicate a prompt, but it looks different from your console prompt (`$` or `%`), to indicate you're in an interactive python environment.

There are two fundamental ways you can run Python: in interactive mode (what we're doing here) or in batch mode. 

In interactive mode you write your program interactively, i.e., each new statement is interpreted as you type it. 

If you just run ```python``` without any other parameter, you enter the **interactive** mode. Let's write our very first program:

```python
>>> print("Hello World!")
Hello World!
```

**Note:** If you copy this code, don't include the leading `>>>`. We only show these here because it allows us to distinguish input from output.

"Hello World!" is by tradition the very first program that you should write in a new programming language! And see, when we instructed python to print the text "Hello World!", it did just that.

So, let's briefly take that statement apart: it contains a call to the `print()` function and passes a parameter to that print function, the string `Hello World!`. 

The string is enclosed in quotation marks `"`; alternatively you can also use single quotes `'`. Given that information, python knows you want to print the string, and it does exactly that.

Print is a built-in function of python. There are many useful built-in functions, which you can check out [here](https://docs.python.org/3/library/functions.html).

If you're familiar with Python, you might have seen this syntax:

```python
>>> print "Hello World!"
```

This is Python 2 syntax and not legal in Python 3 anymore, now all parameters of a function have to be passed in brackets. Python 2 is now [officially retired](https://www.python.org/doc/sunset-python-2/), so you should not be using it anymore.  

Let's define our first variable. Type

```python
>>> my_string_var = "Are you still spinning?"
```

This statement is executed without any feedback. What you're doing here, intuitively, is that first, you create a new variable of type string with the name ```my_string_var```, and then you assign a value to it, "Are you still spinning?".


Note that the equals sign `=` is NOT a test for equality here, but an ASSIGNMENT. This can be confusing for beginning programmers. 

Equality is tested with a double equals sign `==` in many programming languages including python. Arguably, a different assignment operator such as `:=` would be a better idea and is implemented in other programming languages.

We now can print this variable:

```python
>>> print(my_string_var)
Are you still spinning?
```

which produces the result we expected!

There are many different types of variables, not only strings. For example, Python has three different data types for numbers (integers, floats – that represent real numbers, and complex). Check out the details about the built-in data types [here](https://docs.python.org/3/library/stdtypes.html).

Let's start with a simple example:

```python
>>> a = 3
>>> b = 2.5
>>> c = a + b
>>> print(c)
5.5
```

Here we've created three variables (`a, b, c`) and executed an operation, the addition of `a` and `b` using the `+` operator, which we have then assigned to `c`. Finally, we've printed `c`.

The data types of `a` and `b`, however, are subtly different. `a` is an integer and `b` is a float. We can check the data type of any variable using the `type()` function:

```python
>>> a = 3
>>> type(a)
<class 'int'>
>>> b = 2.5
<class 'float'>
>>> c = "hello"
>>> type(c)
<class 'str'>
```

Python supports many operations, including mathematical operations (addition, subtraction, division, modulo), type conversions, etc. – we'll explore those soon.

## Writing Code

The most interesting aspect of notebooks, however, is that we can write code in the cells. You can use [many different programming languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) in Jupyter notebooks, but we'll stick to Python. So, let's try it out:

In [2]:
print ("Hello World! new")
a = 3
# This is a comment!
# The return value of the last line of a cell is the output

Hello World! new


Again, we've greeted the world out there using a print statement. 

We also assigned a variable and returned it, which makes it the output of this cell. Notice that the output here is directly written into the notebook. 

You can change something in a cell and re-run it using the "run cell" button in the toolbar, or use the `CTRL/CMD+ENTER` shortcut.

Another cool thing about cells is that they preserve the state of what happened before. Let's initialize a couple of variables in the next cell: 

In [4]:
age = 2
gender = "woman"
name = "Datascience Cat"
smart = True

These variables are now available to all cell below or above **if you executed the cell**. In practice, you should never rely on a variable from a lower cell in an earlier cell. **This behavior is different from if you were to execute the content of the cells in sequence in a python file.**

If you make a change to a cell, you need to execute it again. You can also batch-executed multiple cells using the "Cell" menu in the toolbar. 

Let's do something with the variables we just defined:

In [5]:
print(name + ", age: " + str(age) + ", " + 
       gender + ", is smart: " + str(smart))

Datascience Cat, age: 2, woman, is smart: True


In the previous cell, we've [concatenated a couple of strings](https://docs.python.org/3.5/tutorial/introduction.html#strings) to produce one longer string using the `+` operator. Also, we had to call the `str()` function to get [string representations of these variables](https://docs.python.org/3.5/library/stdtypes.html#str).

An alternative way to do this is not to concatenate the string but to pass each variable in as a separate argument to the print function: 

In [6]:
print(name, "\n",
       "age:", age, "\n",
       gender, "\n",
       "is smart:", smart)

Datascience Cat 
 age: 2 
 woman 
 is smart: True


Here, we're using a new-line character "\n" to break the lines. 

### Try it!

1. Create a Python cell below.
2. Create two variables, one for your UID and one for your email. What are the types of these variables?
3. Modify the above print statement to add your UID and email to the print-out.

In [9]:
UID = 1594325
email = "jacobscottutah@gmail.com"
print(name, "\n",
       "age:", age, "\n",
       gender, "\n",
       "is smart:", smart, "\n", "UID:", UID, "\n", "email:", email)

Datascience Cat 
 age: 2 
 woman 
 is smart: True 
 UID: 1594325 
 email: jacobscottutah@gmail.com


## Modes

Notebooks have two modes, a **command mode** and **edit mode**. You can see which mode you're in by the color of the cell, the presence of borders, and the presence of a cursor.

Many operations depend on your mode. For code cells, you can switch into edit mode with "Enter", and get out of it with "Escape".


## Shortucts

While you can always use the tool-bar above, you'll be much more efficient if you use a couple of shortcuts. The most important ones are:

**`Ctrl/Cmd+Enter`** runs the current cell.  
**`Shift+Enter`** runs the current cell and jumps to the next cell.   
**`Alt/Option+Enter`** runs the cell and adds a new one below it.

In command mode:

**`h`** shows a help menu with all these commands.  
**`a`** adds a cell before the current cell.  
**`b`** adds a cell after the current cell.  
**`dd`** deletes a cell.  
**`m`** as in **m**arkdown, switches a cell to markdown mode.  
**`y`** as in p**y**thon switches a cell to code.  

## Kernels

When you [run code](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Running%20Code.html), the code is actually executed in a **kernel**. You can do bad things to a kernel: you can make it stuck in an endless loop, crash it, corrupt it, etc. And you probably will do all of these things :). 

So sometimes you might have to interrupt your kernel or restart it. Use the "Kernel" menu to restart the kernel, re-run your notebook, etc.

Also, before submitting a homework or a project, make sure to `Restart and Run All`. This will create a clean run of your project, without any side effects that you might encounter during development. We want you to submit the homeworks **with output**, and by doing that you will make sure that we actually can also execute your code properly.

## Storing Output

Notebooks contain both, the input to a computation and the outputs. If you run a notebook, all the outputs generated by the code cells are also stored in the notebook. That way, you can look at notebooks also in non-interactive environments, like your first homework on [GitHub](https://github.com/datascience-course/2023-datascience-homework/blob/main/HW1/HW1.ipynb). 

The Notebook itself is stored in a rather ugly format containing the text, code, and the output. As discussed, this can sometimes be challenging when working with version control.

# Python Basics

## Functions

In math, functions transfrom an input to an output as defined by the property of the function, like this: 

$f(x) = x^2 + 3$

In programming, functions can do exactly this, but are also used to execute “subroutines”, i.e., to execute pieces of code in various order and under various conditions. Functions in programming are very important for structuring and modularizing code. 

In computer science, functions are also called “procedures” and “methods” (there are subtle distinctions, but nothing we need to worry about at this time). 

The following Python function, for example, provides the output of the above defined function for every valid input: 

In [10]:
def f(x):
    result = x ** 2 + 3 
    return result

We can now run this function with multiple input values: 

In [11]:
print(f(2))
print(f(3))
f(5)

7
12


28

Let's take a look at this function. The first line
```python
def f(x):
```
defines the function of name `f` using the `def` keyword. The name we use (`f` here) is largely arbitrary, but following good software engineering practices it should be something meaningful. So instead of `f`, **`square_plus_three` would be a better function name in this case**.  

After the function name follows a list of parameters, in parantheses. In this case we define that the function takes only one parameter, `x`, but we could also define multiple parameters like this:
```python 
def f(x, y, z):
```

The parameters are then available as local variables within the function.

The second line does the actual computation and assigns it to a **local variable** called `result`. 

The third line uses the `return` keyword to return the result variable. Functions can have a return value that we can assign to a variable. For example, here we could write: 

```python
my_result = f(10)
``` 

Which would assign the return value of the function to the variable `my_result`.

Note that the lines of code that belong to a function are **indented by four spaces** (you can hit tab to indent, but it will be converted to four spaces). Python defines the scope of a function using indentation. Many other programming languages use curly brackets `{}` to do this. 

A function is ended by a new line.

For example, the same function wouldn't work like this:

In [None]:
def f(x):
    result = x ** 2 + 3
# Throws a SyntaxError because return is used outside a function
return result

Equally, we can't indent by too much:

In [None]:
def f(x):
    result = x ** 2 + 3
    # Throws an IndentationError
        return result

### Try it!

1. Create a Python cell below.
2. Define a new function that takes two variables, `x` and `y` and prints the one divided by the other.
3. Test your function with multiple input values, printing the answer.
4. What happens when you try to divide by zero?

In [14]:
def newF(x,y): return x / y

print(newF(4,2))
print(newF(100,2))
print(newF(1,0))

2.0
50.0


ZeroDivisionError: division by zero

## Scope

Another critical concept when working with functions is to understand the scope of a variable. Scope defines under which circumstances a variable is accessible. For example, in the following code snippet we cannot access the variable defined inside a function:

In [15]:
def scope_test():
    function_scope = "only readable in here"
    # Within the function, we can use the variable we have defined
    print("Within function: " + function_scope)

# calling the function, which will print     
scope_test()

Within function: only readable in here


If we try to use the `function_scope` variable outside of the function, we will find that it is not defined. 

This will throw a `NameError`, because Python doesn't know about that variable here.

In [16]:
print("Outside function: " + function_scope)

NameError: name 'function_scope' is not defined

You might wonder “Why is that? Wouldn't it make sense to have access to variables wherever I need access?”. The reason for scoping is that it's simply much easier to **build reliable software when we modularize code**. When we use a function, we shouldn't have to worry about its internals. 

Another practical reason is that this way we can **re-use variable names** that were used in other places. This is really important when we work with other peoples' code (e.g., libraries). If that weren't possible, we might get nasty side-effects just because the library uses a variable with the same name somewhere. 

You can, however, use variables defined in the larger scope in the sub-scope:

In [17]:
name = "Science Cat"

def print_name_with_dr():
    print("Dr.", name)
    
print_name_with_dr()

Dr. Science Cat


This is generally **not considered good practice** – functions should rely only on their input parameters. Otherwise it can easily lead to side effects. This would be the better approach: 

In [18]:
# note that we're re-using the parameter name defined in the previous cell.
def print_name_with_dr(name):
    print("Dr.", name)
    
print_name_with_dr(name)

Dr. Science Cat


Finally, there is a way to define a variable within a function for use outside its scope by using the global keyword. There are reasons to do this, but it is generally discouraged.

In [19]:
def scope_test():
    # Think long and hard before you do this - generally you shouldn't. I have never.
    global global_scope
    global_scope = "defined in the function, global scope"
    # Within the function, we can use the variable we have defined
    print("Within function: " + global_scope)

scope_test()
# Since this is defined as global we can also print the variable here
print("Outside function: " + global_scope)

Within function: defined in the function, global scope
Outside function: defined in the function, global scope


### Try it!

1. Create a Python cell below.
2. In the cell, define a variable called `x` and set its value to `2`.
3. Create three functions, all of which calculate `x + 7`:
    * The first function should use `x` without defining it.
    * The second function should have a parameter named `x`. 
    * The third function should redefine `x` inside of it to be `3`.
3. When you try each function, what is the result? What is the value of the `x` outside the function?

## Looking Ahead: Conditions, Loops, Advanced Data Types

We've learned how to execute operations and call and define functions. In the next lecture, we'll learn how we can control the flow of execution in a program using conditions (if statements) and loops. We'll also introduce more advanced data types such as lists and dictionaries. 