# JupyterHub

### Acessing JupyterHub

Got to https://jhub.dartmouth.edu/  
Log in using your university credentials  
Click "start my server" 
Select "COLT 70 -- Spring 2022"  
It may take some time for the server to start up  
The notebooks are in the "notebooks" directory on the left  
Double click the notebook you want to open

### Editing a Notebook

At this point you can only view a notebook. In order to be able to edit the notebook:  
> right click on the notebook file (listed on the left) > copy > double click on your home directory (by click the directory icon above the list of notebooks) > past notebook  

You should now be able to edit and run the cells of the notebook

### Uploading files

To upload files:  
> make sure you are in your home directory (where you paste the writable/editable notebooks) > click the upload icon > select your files to upload

# Introduction to Notebooks  

This is a notebook! A notebook is composed of different cells which can be used to write markdown text or computer code.

This is a "Markdown" cell for writing text, inserting images, videos, links etc. 

In [3]:
## This is a code cell

N.B. You can select the cell type at the top of this document with the drop down menu. 

Jupyter notebooks therefore allow you to combine code, text, images, visualizations etc. all in one place. You can edit and run code in a notebook which makes it an ideal place to play around and test out code. The possibilities of combining code and text afforded by Notebooks are not only useful as a pedagogical and learning environment, but can also offer a way to make our analytical process explicit and to reflect on our analytical process:

> "Notebooks are theory — not merely code as theory but theory as thoughtful engagement with the theoretical work and implications of the code itself. Disciplinary norms— including contextual framing, theory, and self or auto-critique— need to accompany, supplement, and inform any computational criticism. Revealing as much of the code, data, and methods as possible is essential to enable the ongoing disciplinary conversation. Compiling these together in a single object, one that can be exported, shared, examined, and executed by others, produces a dynamic type of theorization that is modular yet tightly bound up with its object." (Dobson, James. Critical Digital Humanities: The Search for a Methodology. Urbana-Champaign: University of Illinois Press. (2019) p. 40)

### Editing and running code in notebooks

To edit a cell double click the cell. 

To run a code cell (or render a Markdown cell):  
> select the cell (a blue line will appear on the left when the cell is selected) and click the "play" icon at the top of the notebook or press shift+enter.

Note that the code cells have a pair of square brackets with a colon next to them [ ]:  
Once you run a code cell a number will appear on the left. This tells you how many times the cell was run and in what order — this can help you keep track of which cells were run and in what order.  

Try running the code cell above a couple of times and see the number in square brackets change.  

Now run the cell below:

In [4]:
print('Waiting 5 seconds...')
import time
time.sleep(5)
print('Done')

Waiting 5 seconds...
Done


Did you notice the asterisk in the brackets when running the cell? An asterisk displays whilst code is still busy executing, and the number appears when it has finished executing.  

Beneath the code cell, any output from running the code will appear. In the case of the cell above "Waiting 5 second..." and "Done" were printed beneath the cell.

### Clearing outputs: interrupting and restarting the kernel

If your code seems to be getting stuck, or if the flow of execution of the cells has become mixed up, it’s a good idea to make a fresh start and clear all the outputs. 

To clear the outputs of a single cell:  
> select the cell > click Edit tab > select clear output  

To clear the output from all cells:  
> Edit tab > Clear All Outputs

Interrupt and restart the kernel:  
> You can click the “stop” and “restart” icons at the top of the notebook (next to the “play” icon)
or  
> Select Kernel tab and select the options you want (Interrupt Kernel, Restart Kernel).  

Clearing outputs clears all outputs from executed code. Restarting the kernel clears the outputs and fires up the backend component that actually runs the code written in the notebook. Notebooks are browser-based documents that run through your browser. Every time you fire up a notebook it runs on a local server on your personal computer (that’s why it sometimes takes some time to load and why you can edit a notebook without altering the “original” notebook). If you restart the kernel it refreshes that environment which runs your notebook.

# Introduction to Python

Writing code involves describing a series of steps (detailed instructions) to go through in order to perform a task. Code allows you to specify instructions that the computer will follow. Whilst programming languages are in some ways very rigid — the *exact* syntax and instructions need to be defined in order for it to work — this doesn’t mean there isn’t some degree of flexibility. There are multiple ways of achieving something through code, and people have different coding styles that might differ according to their goals (e.g. is the code efficient? is the code readable? etc.).

### Anatomy of a Python Script

Here is an example of a programme. The end goal (output) of this programme is to count the most frequent words in a given text. We're going to go through this example script and look at some basics of the Python programming language.

In [18]:
#Importing the libraries and modules we'll need
import re
from collections import Counter

#Defining a function to split our text into words
def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

#Defining variables we're going to need
#Giving the path to the text we're going to analyze
filepath_of_text = "gilman.txt"

number_of_desired_words = 40

#Defining a list of stopwords
stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

#Opening and Reading the text to analyze it
full_text = open(filepath_of_text, encoding="utf-8").read()

#Splitting the text into words by using our previously defined function
all_the_words = split_into_words(full_text)

#Making a list of words with our stopwords removed
meaningful_words = []
for word in all_the_words:
        if word not in stopwords:
            meaningful_words.append(word)

""" Another way of writing this for loop
meaningful_words = [word for word in all_the_words if word not in stopwords] """

#Counting how many times each word in our "meaningful_words" list appears
meaningful_words_tally = Counter(meaningful_words)

#Pulling out the top 40 most frequently occurring words from our complete tally using hte "most_common" method
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

#Display these top 40 most frequent words
most_frequent_meaningful_words

[('gutenberg', 98),
 ('project', 88),
 ('tm', 57),
 ('work', 49),
 ('1', 47),
 ('john', 45),
 ('one', 35),
 ('works', 33),
 ('said', 30),
 ('would', 27),
 ('see', 27),
 ('electronic', 27),
 ('foundation', 25),
 ('paper', 24),
 ('get', 24),
 ('room', 24),
 ('pattern', 24),
 ('e', 23),
 ('terms', 22),
 ('like', 21),
 ('little', 20),
 ('must', 20),
 ('copyright', 20),
 ('states', 19),
 ('license', 18),
 ('agreement', 18),
 ('wallpaper', 17),
 ('use', 17),
 ('may', 17),
 ('much', 17),
 ('good', 16),
 ('think', 16),
 ('full', 16),
 ('know', 16),
 ('united', 15),
 ('well', 15),
 ('go', 15),
 ('donations', 15),
 ('away', 14),
 ('things', 14)]

#### Imports

There is a lot of code that has already been written, and writing your own code can mostly involve adapting code already written by others to your own needs.  

At the top of a program, it is common to import any libraries or modules you might need later on in the program.  

Python incudes many libraries (importable packages of already written code) that can do different things. In this case we are importing the library `re` (for regular expressions, which will allow us to use regular expressions to clean up our text) and from the `collections` library we’re importing the module `Counter` (which will allow us to count things).

In [None]:
import re
from collections import Counter

Remember the “Zen of Python” that we read when we were reading manifestos? The “Easter Egg” it refereed to in that manifesto is that there is a module in Python (called `this`) that prints the “Zen of Python” when you import it.

In [2]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


#### Functions

Functions are bundles of code that performs a particular task. Often a function involves giving some kind of input to the function (called “arguments” that you give to the function in brackets), the function then performs its task on the input and returns an output (the result of its task on this particular input).

Python has a number of prewritten (or built-in) functions. For example, the function `print()` will print to the screen the argument you give it.  

Run the cell below and then try adding your name to “Hello!” and run the cell again. 

In [5]:
print("Hello!")

Hello!


Functions performs different tasks. Look how the same input with a different function returns a different results (can you guess what this `len()` function does from the returned output?):

In [4]:
len("Hello!")

6

You can also write your own functions. This is called defining a function. 

It is useful to make your own functions you want to bundle together a series of steps to achieve a particular goal. This make the code neater and easier to manage. It also means you can reuse that function anywhere in the code, you just have to “call” it by its name. 

In this case we have defined a function to split words from our text that takes `any_chunk_of_text` as its argument.

In [None]:
def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

#### Variables

Next in our program we see we have defined some “variables”. 

Variables are like containers that store information. You assign to a variable value that you want to use later. Then you only need to use the name of the variable later on. The value of a variable can be overwritten if you assign a new value to the same name. 

You can give a variable almost any name you want, but there are a few limitations. The name can’t start with a number (it needs to start with a letter), there can be no spaced in the name and no punctuation apart from underscore (_), and it cannot be a python reserve word (a word that already has a specific meaning in the python language). 

For example here we assign the value `“Hello!”`to a variable called greeting. And then we can pass that value to a function such as print instead of writing out “Hello!” again.

In [6]:
greeting = "Hello!"
print(greeting)

Hello!


#### Data Types

Notice how the variables we have assigned in our program look a bit different. One is a number that appears in green. The other are some words that appear in red in between quotation marks. 

This is because number and words are different data types. depending on what type of data something is, with different rules about what you can do with them and use them.

In [None]:
filepath_to_text = "notebooks/gilman.txt"

number_of_desired_words = 40

The function `type()` returns what type of data something is:

In [8]:
type("Hello!")

str

Let’s start by looking at four basic data types: 

*String*: strings store characters such as letters or numbers. String are essentially text. To specify something is a string you need to write it in “quotation marks” (both ‘sing’ or “double” quotation marks work)

Try finding out what data type these examples are below using the `type()` function:

In [9]:
example = 'forty'
another_example = "40"
a_final_example = 40
type(example)

str

As you can see, the final example is a not a string (or text), it was not written in quotation marks. It is an *integer* (i.e. a whole number). Decimal numbers in python are called *floats*. 

How would you find out what type of data our variables are below using `type()`?

In [10]:
filepath_to_text = "notebooks/gilman.txt"

number_of_desired_words = 40

In [11]:
stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

type(stopwords)

list