<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/CC_BY.png"><br />

Adapted by [Jen Ferguson](https://library.northeastern.edu/about/library-staff-directory/jen-ferguson) from notebooks created by [Nathan Kelber](http://nkelber.com) and Ted Lawless for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/).<br /> See [here](https://docs.tdm-pilot.org/tag/beginner-lessons/) for the original versions. Some contents were adapted from a notebook created by [Sarah Connell](https://library.northeastern.edu/about/library-staff-directory/sarah-connell) for an earlier version of this workshop.<br />
___

# About Jupyter

What is this thing?

**[Jupyter](https://jupyter-notebook.readthedocs.io/en/latest/notebook.html) combines text, data, and code, in a format that runs inside a web browser.**
* 'Jupyter' = JUlia, PYthon, and R - but it's really language-agnostic
* Jupyter Notebooks let *us* create and share workshop content interwoven with interactive, executable code all in the same place.
* Jupyter Notebooks let *you* run code immediately. No need to download or install anything!
* Jupyter Notebooks can connect to a server that has the right environment/dependencies to execute code successfully.

If you're curious about how we have this set up, read on:

* **[Binder](https://binder.constellate.org/)** is the *environment* we are using to access ("run") our Jupyter Notebooks inside a browser. It's kind of like a separate computer in the cloud that makes all of tools we need to use in Jupyter Notebooks work. We're using [Contellate's](https://constellate.org/) Binder for this session, but anyone can [create their own Binder](https://mybinder.org/).


* We use **[Github](https://github.com/jasf-/tdm-nbs)** to store the Jupyter Notebooks (.ipynb files) for this session. Like any other computer, Binder will power down after a while--basically after a certain period of inactivity. So we need a place to make our Jupyter Notebooks available, and tell Binder where to find them the next time we start up a Binder session to interact with our notebooks.

## Cells

Similar to the way an essay is composed of paragraphs, Jupyter notebooks are composed of [cells](https://docs.tdm-pilot.org/key-terms/#cell). A cell is like a container for a particular kind of content. There are essentially two kinds of content in Jupyter notebooks:

1. [Markdown Cells](https://docs.tdm-pilot.org/key-terms/#markdown-cell)—These can contain text, images, video, and the other kinds of explanatory content you might find on a regular website. The cell you're reading right now is a markdown cell.
2. [Code Cells](https://docs.tdm-pilot.org/key-terms/#code-cell)—These can contain code written in a variety of languages.

How does this magic happen? There's a kernel, or computational engine, that runs the code inside your notebook. In this case our kernel is Python 3, as you can see in the top right corner of this page under the 'logout' button.

Markdown allows you to have some basic formatting, like *italicizing*, **bolding**, and adding <font color=fuchsia>**colors**</font> to text. For more on markdown, see [this guide](https://www.markdownguide.org/cheat-sheet/).

A **code cell** can be distinguished from a **markdown cell** by the fact that it contains a pair of brackets with a colon to its left. 

In [None]:
# This is a code cell 

A markdown cell provides information, but a code cell can be executed to perform an action. The code cell above does not contain any executable content, only a text comment. We can tell the text in the code cell is a comment because it is prefixed by a ``#``. In Python, if a line is prefaced by a ``#`` then that line is a comment and will not be executed if the code is run. In a code cell, comments are bluish-green in color.

## Modifying a Cell

The text in code cells can be quickly changed just by typing in the cell. In order to change the content of a markdown cell, you need to expose the markdown content by double-clicking the cell. This will reveal the plain text of the markdown that creates various elements like headings, links, images, etc. When you want the cell to render again, you can simply run it by pushing the play button or pressing Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard.

## Hello World: Your First Code

It is traditional in programming education to begin with a program that prints ``Hello World``. In Python, this is a simple task using the ``print()`` [function](https://docs.tdm-pilot.org/key-terms/#function). A function is a block of code that performs some action—we will cover functions in more detail below. This function simply prints out whatever is inside the parentheses. We will **pass** the quotation 'Hello World' to the ``print()`` function like so:

```print('Hello World')```

The code cell below has the ``print()`` function set up to get you started, so all you need to do is write the text you want to print (in this case, 'Hello World'). Here's how:

1. First, select the cell to modify it. The cell will be highlighted in <font color='blue'>**blue**</font> when selected, then will change to <font color='green'>**green**</font> when you begin to edit it
2. Type `'Hello World'` inside the parentheses (including the ' ' marks)

In [None]:
#Fill in 'Hello World!' inside of the '' marks below
print(' ')

Now we're ready to run this code. To **execute** or **run** our code, we have a couple of options:

#### Option One

![Image of play button](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/play_button.png) Click the code cell you wish to run and then push the "Run" button above. Depending on how your notebook is set up, you might not see the word "Run" but you should still see the same triangle symbol.
#### Option Two

Click in the code cell you wish to run and press Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard. Control + return also works on Macs.

#### Go ahead - scroll back up, run that 'Hello World' code cell and see what happens!

### What if I get stuck?
Don't worry, you can't break anything in this notebook! Any changes you make to a Jupyter notebook - any edits - are not saved by default. 

**<font color=red>If you get stuck at any point, you can always reset the notebook by resetting the kernel (the 'refresh' symbol in the toolbar).</font>**

### Outputs, brackets, numbers, and asterisks

As you saw with 'Hello World!', after you run a code cell the output typically appears below it in another cell. The output after you run a cell essentially forms part of the document as you work through it. Note that some code cells will not result in any visible output.

A number will appear inside the brackets to the left of the code cell to show the order the cell was run. If your code is complicated or takes some time to execute, an asterisk * will be displayed in the pair of brackets while the code executes. You'll also see this in action during the text analysis section.

We're going to prove the above principle to ourselves by executing the code cell below which will:

1. Print the phrase "Waiting 10 seconds..."
2. Wait 10 seconds, then
3. Print "Done"

As the program is running, watch the pair of brackets and you will see the code is running `[*]:`.


In [None]:
print('Waiting 10 seconds...')
import time
time.sleep(10)
print('Done')


If you missed the asterisk, you can run the code cell as many times as you like. Notice that each time you run a code cell, the number inside the brackets increases. This keeps track of the order in which cells were run. 

It is usually a good practice to run code cells sequentially from top to bottom to avoid potential errors. 

## Creating and Deleting Cells


![The + symbol to create a new cell](https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/new_cell.png)To create a new cell, click the + at the top of the menu. A new cell will be created immediately underneath the currently selected cell. 
By default, a [code cell](https://docs.tdm-pilot.org/key-terms/#code-cell) is created. To change the cell type, click on the dropdown menu at the top to switch from "Code" to "Markdown." 

Go ahead, try adding a new cell! You can either keep the new blank cell you just added, or remove it by selecting it and then cut it by using the scissors icon next to the + in the menu.


***
# Python basics

Now that we've covered the basics of how Jupyter notebooks work, we're going to switch gears and talk about Python. Python is a widely used computer programming language. We'll cover a few Python basics here, just enough so you can understand some core concepts and run several pre-constructed text analyses to work with your corpus in the next part of this workshop. If you'd like to learn more, Constellate has published [lessons](https://docs.tdm-pilot.org/) running from beginner to intermediate, and there are many additional resources online for learning Python, such as [Python for Everybody](https://www.py4e.com/).

## Expressions and Operators

One very simple form of Python programming is an [expression](https://docs.tdm-pilot.org/key-terms/#expression) using an [operator](https://docs.tdm-pilot.org/key-terms/#operator). For example, you might have a simple mathematical statement like:

> 1 + 3

The [operator](https://docs.tdm-pilot.org/key-terms/#operator) in this case is `+`, sometimes called "plus" or "addition". This particular **[expression](https://docs.tdm-pilot.org/key-terms/#expression)** is a combination of two **values** (1 and 3) and an **operator** (`+`). In Python, **expressions** are combinations of values, operators, and variables (more on this last item soon!). 

In the code block below, try writing an expression that uses the addition operator.

You can also do subtraction, multiplication, and division, among other mathematical operations. To multiply in Python, you use an asterisk (\*) and to divide, you use a forward slash (/). 

You are probably not going to replace the calculator on your phone with Python! But, even this simple example is showing you something about Python works: here, you are creating an **expression** by combining **values** with an **operator** and running the code to produce **output**. 

We're not going to get into it now, but it's worth knowing that mathematical operations in Python follow the "PEMDAS" rules for order of operations: parentheses; exponents; multiplication and division, left to right; and addition and subtraction, left to right.

## Data Types: Numerical (Integers & Floats) and Strings

All [expressions](https://docs.tdm-pilot.org/key-terms/#expression) evaluate to a single value. In the above examples, our expressions evaluated to single numerical value. 

Numerical values come in two basic forms:

* [integer](https://docs.tdm-pilot.org/key-terms/#integer)
* [float](https://docs.tdm-pilot.org/key-terms/#float) (or floating-point number)

An [integer](https://docs.tdm-pilot.org/key-terms/#integer), what we sometimes call a "whole number", is a number without a decimal point that can be positive or negative. When a value uses a decimal, it is called a [float](https://docs.tdm-pilot.org/key-terms/#float) or floating-point number.

Python can also help us manipulate text. A snippet of text in Python is called a [string](https://docs.tdm-pilot.org/key-terms/#string). A string can be written with single or double quotes. A string can use letters, spaces, line breaks, and numbers. So 5 is an integer and 5.0 is a float, but '5' and '5.0' are strings. A string can also be blank, such as ''. 

The distinction between each of these data types may seem unimportant, but Python treats each one differently. For example, we can ask Python whether an integer is equal to a float, but we cannot ask whether a string is equal to an integer or a float.

To evaluate whether two values are equal, we can use two equals signs between them. The expression will evaluate to either `True` or `False`.

In [None]:
# Run this code cell to determine whether the values are equal
42 == 42.0

In [None]:
# Run this code cell to compare an integer with a string
15 == '15'

When we use the addition operator on integers or floats, they are added to create a sum. When we use the addition operator on strings, they are combined into a single, longer string. This is called [concatenation](https://docs.tdm-pilot.org/key-terms/#concatenation). 

In [None]:
# Combine the strings 'Hello' and 'World'
'Hello' + 'World'

Notice that the strings are combined exactly as they are written, meaning that there is no space between the strings. If we want to include a space, we need to add the space to the end of 'Hello' or the beginning of 'World'. We can also concatenate multiple strings.

When we use the addition operator, the values must be all numbers or all strings. Combining them will create an error.

In [None]:
# Try adding a string to an integer
'55' + 23

Here, we receive an error because Python doesn't know how to join a string to an integer. Putting this another way, Python is unsure if we want:

>'55' + 23 

to become
>'5523'

or 
>78

Because these data types operate differently, it can be useful to be able to check which type you're working with if a bit of code is giving an error. You can do this with the `type()` function. Try running the code blocks below to check the types for 15, 15.0 and "15".

In [None]:
#Check the type for 15
type(15)

In [None]:
#Check the type for "15"
type("15")

## Variables
We noted above that expressions are combinations of values, operators, and variables, and said that we'd be returning to variables. A [variable](https://docs.tdm-pilot.org/key-terms/#variable) is like a container that stores information. There are many kinds of information that can be stored in a variable, including the data types we have already discussed (integers, floats, and strings). We create (or **initialize**) a variable with an [assignment statement](https://docs.tdm-pilot.org/key-terms/#assignment-statement). The assignment statement gives the variable an initial value.


In [None]:
# Initialize an integer variable (note that this code doesn't produce any output; it just establishes the variable)
new_integer_variable = 6

In [None]:
# Add 36 to our integer variable
new_integer_variable + 36

The value of a variable can be overwritten with a new value. You can test this by changing the value in the first code block above, and then re-running both blocks. Or, we can replace the value within the same block, using comments:

In [None]:
# Overwrite the value of my_favorite_number when the commented-out line of code is executed. 
# Remove the # in the line "#my_favorite_number = 2" to turn the line into executable code.

my_favorite_number = 6
#my_favorite_number = 2
my_favorite_number

Whenever you create a new variable, you can always confirm what data type it is with the `type()` function. For example:

In [None]:
#Checking the type of the variable my_favorite_number
type(my_favorite_number)

You can create a variable with almost any name, but there are a few guidelines that are recommended. First, variable names should be clear and descriptive. Consider this bit of code that will compute the number of seconds in 3 days:

In [None]:
# Compute the number of seconds in 3 days
days = 3
hours_in_day = 24
minutes_in_hour = 60
seconds_in_minute = 60

days * hours_in_day * minutes_in_hour * seconds_in_minute

In the code cell above, we created a variable that stores the number of minutes in an hour and called it `minutes_in_hour`. It is a kindness to anyone viewing your code, not to mention your future self, to give your variables names that clearly communicate what value is stored in your variables. From the computer's perspective, we could call the variable almost anything (`potato`, `bananafish`, `flat_tire`). As long as we are consistent, the code will execute the same. 

You'll also notice that this code cell, like most other code cells in this notebook, is commented in the first line to explain its purpose by using the # symbol, another good practice.

## Variable Naming Rules

In addition to being descriptive, variable names must follow 3 basic rules:

1. Must be one word (no spaces allowed)
2. Only letters, numbers and the underscore character (\_) are allowed
3. Cannot begin with a number

In [None]:
# Which of these variable names are acceptable? 
# Comment out the variables that are not allowed in Python by adding a # in front of each line
# Then, run this cell to check if the variable assignment works. 
# If you get an error, the variable name is not allowed in Python.

a variable = 1
2variable = 2
a_variable = 3
VARIable = 4

## Functions

Many different kinds of programs often need to do very similar operations. Instead of writing the same code over and over again, you can use a [function](https://docs.tdm-pilot.org/key-terms/#function). Essentially, a function is a small snippet of code that can be quickly referenced and reused. 

One of the most common functions used in Python is the `print()` function, which simply prints a string. Above, we used the `print()` function to print 'Hello World!'; now, let's use it to print whatever you would like. Replace the text inside of the quotation marks below with whatever words you would like to print. 

In [None]:
# A print function that prints whatever you tell it to
print('Your words here')

We could also define a variable with our chosen input string and then pass that variable into the `print()` function. It is common for functions to take an input, called an [argument](https://docs.tdm-pilot.org/key-terms/#argument), that is placed inside the parentheses. 

In [None]:
# Define a string and then print it
our_string = 'Your words here'
print(our_string)

Now, let's try a slightly more complex example of defining a variable. Modify the cell below by:
1. Deleting the text that says <font color='red'>**'yourname'**</font> (but keep the quotation marks)
2. Typing your name inside the quotation marks instead, and finally
3. Running the code cell

In [None]:
def print_hey(recipient):
    return 'Hey, {}!'.format(recipient)
print_hey('yourname')

Python 'out-of-the-box' comes with many different useful functions, but for more specialized operations, you will likely need to import **modules** with additional functions. In our next notebook, we will import several functions that will allow us to perform a range of analyses with our datasets.

# Next steps

That's the quick introduction to Jupyter and Python!  We've covered values, [expressions](https://docs.tdm-pilot.org/key-terms/#expression), [operators](https://docs.tdm-pilot.org/key-terms/#operator), [variables](https://docs.tdm-pilot.org/key-terms/#variable), [assignment statements](https://docs.tdm-pilot.org/key-terms/#assignment-statement), and [functions](https://docs.tdm-pilot.org/key-terms/#function). 

If you'd like to learn more Python, Constellate has published some great open Jupyter Notebook [lessons](https://docs.constellate.org/) including beginner and intermediate Python. There are many additional resources online for learning Python, such as [Python for Everybody](https://www.py4e.com/).

Next we'll start working with our datasets, using some slightly more complicated commands. If you built your own dataset in Constellate, you'll want to have the ID handy. It looks like the bit highlighted in yellow here:

![ConstellateDatasetID.png](attachment:ConstellateDatasetID.png)



We'll begin the text analysis portion after a short break.

