# Learning the Environment

What is this thing that you're using right now?

For this workshop, we've set up a coding environment, with the necessary tools and data already preloaded. The interface that you're using is called _Jupyter_ - which allows you to interact with code in your browser, though a format called a 'notebook'. In addition to being in the browser, Jupyter is pleasant because it's _interactive_, so you can interact with your code. The traditional way of running code involves writing a script and running the whole thing; the interactive approach used by Jupyter is better for data analysis, because you explore, tinker, and converse with your data.

The HathiTrust data is too big to just open and manipulate on your own machine, so we will be using various tools within Python to explore it.  This means that we will have to ask questions of it first, rather than just viewing it. Python is a flexible programming language. If you've never programmed before, don't worry - we give you many examples, which you can follow along with and modify for your own needs. You'll learn the main skills in this workshop!

You will be using a set of custom access tools via a set of Python tools written specifically for this dataset. Those tools rely on a data-science specific set of tools called *Pandas*. So, while you're learning to work with HathiTrust Extracted Feature files, you're also secretly learning common and widely useful data processing and analysis skills.

This means that you will need to consider your questions through the affordances of your data model as well as the data tools you have access to. There will be some questions that are simply not possible to ask of this dataset, while others will need some transformation to acurately represent the data available. This is a totally normal process that many computational projects go through.

In this workshop, we're working in the browser, with Jupyter, Python and all the corresponding data installed on somebody else's computer, but it can be run on your own computers. While you're in the browser, keep in mind that your custom code is *ephemeral* - it won't stay saved for the long term.

## Exercises

### Let's get comfortable with Jupyter

Jupyter is one of many execution environtments for Python.  Its specific vision of the world is that work is done iteratively, with an active session.

"Script-like" execution means that you write down all the code in a file, and that entire file is run in order. 

"Interpreter-like" execution means that you type in commands one at a time. The session pauses after each, waiting for the next command. This is really similar to how the command line is run.

Jupyter is a hybrid of both those things.  Notebooks are composed of cells.  Then the cells are executed (almost like mini scripts).  This gives you the advantage of keeping the session alive so you don't have to repeat loading data, etc., and the advantage of being able to execute multiple lines of code at the same time.

Jupyter is extremely powerful, but there are a few traps.

Let's get comfortable with cells first.

In [None]:
print("Ensure that this cell is active, you can do that by clicking inside here.")
print("Press shift+enter to execute this cell.")
print("Try using the right shift and return at the same time, with one hand.")
print("You can also press the 'play' button at the top.")

The cell above contains four print statements that will be executed sequentially once the cell is executed. You should see all the content printed out below that cell.

Let's look at a more complex code snippet.

In [None]:
title = "Jupyter and You"
author = "Human, A."
year_published = 2018

print("The book, " + title + ", by " + author + ", was published in " + str(year_published) + ".")

This code cell defines a few variables that are describing a book, and has a print statement with a summary about them. Each line is executed and does something, but only the final one actually makes something appear to the screen.

Important to understand:  Python has eyes on things that exist within the session. You can have a ton of code working behind the scenes without anything printed out.  This is different from your human eyes.  This is where print statements (and some fun jupyter stuff comes in handy).

So if you want to see it with your human eyes, you have to explicitly make that happen somehow.

If Python doesn't yell at you, the code executed.  Now, it may not have done the thing you wanted it to do, but it did do something!

**Take a few minutes to play with changing and executing the code above to get a feel for things.**

### Jupyter pain points

Powerfully flexible systems open up endless opportunities to powerfully tangle your code up.

Here are a few key tips as you are getting started:

* While Jupyter allows you to evaluate cells out of order, please try to only do them in order.
* if you are getting errors that make no sense, sometimes going back to the top and starting over fixes it.

There are so many tips and tricks, but just try and pay attention to how we are using it and try to match that.  You need to get a feel for it before we can have a more detailed discussion.

### Importing prerequisites

Python users have specific conventions about how and where things are organized in code. We'll be highlighting a few, but there is pretty decent documentation about the rest of it if you continue on in Python.

Python has a default set of functions and tools, things like `print()` that are so commonly used that you don't need to do anything special to be able to use them.

There are other things that come preloaded with standard Python (this is the Standard Library), but you'll have to specifically ask for them.  We'll be using some of those.

There are also tools that have been installed separately. We won't really be covering these because it is nuanced and worth a workshop of it's own, but this is covered in Python books and courses.  We have preloaded these things for you within this binder repository, so we can skip this.

Whether part of the standard library or installed separately, you use **import statements** to bring them into your current session. Once imported, you have access to use the functions and content from that toolkit.  

There are many ways to import libraries into your python session.  Each style determines how you access it.  Many tools have recommended conventions for importing.

Below are the import statements that you'll need to have in your notebook.

In [None]:
from htrc_features import FeatureReader
import os

We don't want to dive too deeply into the syntax here. But here are some brief explainations:

* `from htrc_features import FeatureReader` this is a library installed from this package: https://pypi.org/project/htrc-feature-reader/.  The import convention we are using here is to import just one function.  This means we can directly call `FeatureReader()` within our code, but nothing else from that library.  This library is one created by the HTRC team and is used for parsing the data from the extracted features.
* `import os` this is part of the standard library, and provides tools for working with your file system.

As we shift from notebook to notebook, these will be repeated for you.  You may also see some other repeated code. That's because each notebook is a separate universe and session.  You may have several active notebooks, but they aren't sharing memory or information with eachother.