# Learning the Environment

*This notebook is adapted from the Hathi Trust Research Center training materials*

The interface that you're using is called _Jupyter_ - which allows you to interact with code in your browser, though a format called a 'notebook'. In addition to being in the browser, Jupyter is pleasant because it's _interactive_, so you can interact with your code. The traditional way of running code involves writing a script and running the whole thing; the interactive approach used by Jupyter is more segmented and better for data analysis, because you explore, tinker, and converse with your data.

Jupyter notebooks are designed to support a literate programming approach which means that each step is explained with text as well as code.

### Let's get comfortable with Jupyter

Jupyter is one of many execution environtments for Python and R.  Its specific vision of the world is that work is done iteratively, with an active session.

"Script-like" execution means that you write down all the code in a file, and that entire file is run in order.

"Interpreter-like" execution means that you type in commands one at a time. The session pauses after each, waiting for the next command. This is really similar to how the command line is run.

Jupyter is a hybrid of both those things.  Notebooks are composed of cells.  Then the cells are executed (almost like mini scripts).  This gives you the advantage of keeping the session alive so you don't have to repeat loading data, etc., and the advantage of being able to execute multiple lines of code at the same time.

Jupyter is extremely powerful, but there are a few traps.

#### Cells

A Jupyter notebook is composed of individual cells. Everything is a cell. Some are just text, others are code, but each is "run" in order for things to appear. Text cells are Markdown that is interpreted into formatted text. Code cells are limited to the language that your notebook is in.

Cells are meant to run sequentially in a Jupyter notebook. Runtime tools in the menu help you torubleshoot when a cell is misbehaving -- it is quite often the case that a cell is relying on something that came before but was not run correctly.

A code cell will not show you output unless you tell it to. Humans need things to be "printed" in some way, whether as text or a visualization in order for an output to show.

In [None]:
print("Ensure that this cell is active, you can do that by clicking inside here.")
print("Press shift+enter to execute this cell.")
print("Try using the right shift and return at the same time, with one hand.")
print("You can also press the 'play' button at the top.")

The cell above contains four print statements that will be executed sequentially once the cell is executed. You should see all the content printed out below that cell.

Let's look at a more complex code snippet.

In [None]:
title = "Jupyter and You"
author = "Human, A."
year_published = 2018

print("The book, " + title + ", by " + author + ", was published in " + str(year_published) + ".")

This code cell defines a few variables that are describing a book, and has a print statement with a summary about them. Each line is executed and does something, but only the final one actually makes something appear to the screen.

Important to understand:  Python has eyes on things that exist within the session. You can have a ton of code working behind the scenes without anything printed out.  This is different from your human eyes.  This is where print statements (and some fun jupyter stuff comes in handy).

So if you want to see it with your human eyes, you have to explicitly make that happen somehow.

If Python doesn't yell at you, the code executed.  Now, it may not have done the thing you wanted it to do, but it did do something!

**Take a few minutes to play with changing and executing the code above to get a feel for things.**

### Jupyter pain points

Powerfully flexible systems open up endless opportunities to powerfully tangle your code up.

Here are a few key tips as you are getting started:

* While Jupyter allows you to evaluate cells out of order, please try to only do them in order.
* If you are getting errors that make no sense, sometimes going back to the top and starting over fixes it.


### Installing and importing pre-requisites

Python has a default set of functions and tools, things like `print()` that are so commonly used that you don't need to do anything special to be able to use them.There are other things that come preloaded with standard Python (this is the Standard Library), but you'll have to specifically ask for them and others can be installed using tools such as pip. 

Whether part of the standard library or installed separately, you use **import statements** to bring them into your current session. Once imported, you have access to use the functions and content from that toolkit.  


In [None]:
pip install htrc-feature-reader

In [None]:
from htrc_features import FeatureReader