Introduction
============

You should read most of https://merely-useful.tech/py-rse/index.html in parallel with this course. We will not follow it exactly, but it contains many useful things we won't have time to fully cover.

Course structure:

We will start with the shell, and use it to get some information from https://openalex.org/. This will introduce us to some basic concepts in how software works on a computer. Then, we will transition to using Python, which is more flexible, to create shell commands. Finally, we will transition that into a Jupyter notebook app of sorts, with reusable packaged code.

We need a little bit of setup for using Jupyter lab. These notes are written in a Jupyter notebook. Some cells will run Python code. Other cells will start with a `shell magic` "%%bash". These cells will execute shell commands, similar to running those commands in a terminal.

# Terminals and shells

A terminal is a command-line interface to your computer. In it you type commands to do things. In Windows it might be called the DOS prompt, or Powershell, or if you have the Linux Subsystem installed, it is just a terminal. On a Mac/Linux system it is also often called a terminal. Terminals usually do not have a lot of functionality compared to a "Window/GUI" system. In fact, you can run a terminal in systems where there is in GUI present.

In the terminal, a *shell* runs. The shell provides some commands and functionality to the terminal. Some typical shells include:

- bash
- csh/tcsh
- zsh

Others exist too, but these are the most common ones.

You can find out what shell you are using by running this command:

    

In [1]:
%%bash
echo $SHELL

/bin/bash


Here, echo is a command, and $SHELL is called an environment variable. We will learn more about those later.    

# Paths and the working directory

We need to talk about the concept of *where are we in the computer*? A computer has a file system, and all files live at a path in that system

All files live at a path. The root of the file system (on Linux) starts at /. We use the `ls` command to list files at a path.  

In [2]:
%%bash
ls /

bin
boot
dev
etc
home
lib
lib32
lib64
libx32
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var


You do not always have permission to list any directory. You can run `ls /home`, but you probably cannot list the contents of most directories in /home; they belong to other people and are private. That can be changed, but not today. 

The shell is always somewhere in it. It is often important to know where that is. Let's start with the `pwd` command, or path to working directory.

In [3]:
%%bash
pwd

/home/jovyan/work/00-introduction


    
    
To get the contents of this directory, simply type `ls`. 

## Absolute and relative paths

An absolute path starts with /, i.e. at the root of the file system.

A relative path does not. A relative path assumes the current working directory is the root, and the path is relative to that. There are two special paths you can use:

    .   this means the current directory
    ..  this means one directory "up"
    
So `ls ..` lists the contents one directory up, and `ls ../..` lists the contents two directories up.    

In [4]:
%%bash
ls ..

00-introduction
01-rest-api-openalex
02-python-requests
03-python-packaging
04-more-python-packaging
05-python-classes
06-code-quality
07-github-intro
08-github-actions
09-wrapup
_build
_config.yml
content.md
intro.md
logo.png
markdown.md
markdown-notebooks.md
notebooks.ipynb
references.bib
_toc.yml


# Commands

Commands are typed at the shell prompt. There are several kinds of commands. Some are built into the shell. Some are called system commands. Finally, there are user-defined commands. We have already seen several commands:

- echo
- ls
- pwd

We can use the `type` command to tell what kind of command something is. Try it:

    

In [5]:
%%bash
type echo

echo is a shell builtin


You use commands to make the computer do something. That could include:

- change directory
- see contents of a file
- make a file or directory
- delete a file or directory
- run a python shell
- and many more things

Many commands have additional options. These are often documented in "manual" pages. You access this documentation with the `man` command. First see what kind of command `ls` is:

    

In [6]:
%%bash
type ls

ls is /usr/bin/ls


    
On the JupyterHUB, you will find it is aliased to provide some default options. We will see how that is done later. Next run:

    man ls
    
to see the options that are available. Try using the -a option for showing all files. This shows a lot more files! It is a convention that files and directories that start with "." are *hidden* files. These files and directories often contain configuration information, or other kinds of information like a history. Let's look at some of these. We can use the `cat` command (from concatenate) to display the contents of a file. Try this:

In [1]:
%%bash
man ls

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, including manpages, you can run the 'unminimize'
command. You will still need to ensure the 'man-db' package is installed.


In [3]:
%%bash

ls -a ~/

.
..
.bash_logout
.bashrc
.cache
.conda
.config
.ipython
.jupyter
.local
.lsp_symlink
.npm
.profile
.wget-hsts
work
.yarn


In [5]:
%%bash
cat ~/.bashrc


    

# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything
case $- in
    *i*) ;;
      *) return;;
esac

# don't put duplicate lines or lines starting with space in the history.
# See bash(1) for more options
HISTCONTROL=ignoreboth

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=1000
HISTFILESIZE=2000

# check the window size after each command and, if necessary,
# update the values of LINES and COLUMNS.
shopt -s checkwinsize

# If set, the pattern "**" used in a pathname expansion context will
# match all files and zero or more directories and subdirectories.
#shopt -s globstar

# make less more friendly for non-text input files, see lesspipe(1)
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

# set variable identifying the chroot you w

## Where are the commands?

The answer is it depends. Commands that are built into the shell simply exist. For example:

    type pwd
    
shows that `pwd is a shell builtin`. In contrast, `type cat` tells you `cat is hashed (/usr/bin/cat)`. That means the shell knows that `cat` is defined in a file at the path /usr/bin/cat. The `/usr/bin` directory contains hundreds of commands that do different things. 

We can tell what kind of file it is with the `file` command.

    file /usr/bin/cat
    
indicates it is a compiled program. A command can be a compiled program, or it can be another kind of script written in some language, e.g. in the shell language, or in Python.

Here is a shell script:

> less /usr/bin/gunzip

And here is a Python script

> less /usr/bin/pip



## What makes them a command?

There are a few things that make the files in /usr/bin available as a command. First, the files are *executable*. Every file is owned by a user and group. On every file there are read, write and execute permissions. For each of those categories, there are user, group and other permissions. To see who owns a file, and the permissions it grants, use the `-l` option of ls.

> ls -l /usr/bin/pip

This yields

    -rwxr-xr-x 1 root root 365 Feb 28 09:41 /usr/bin/pip

First, this means the file is owned by the root user, and is in the root group. The permissions are broken up into four groups:

    - ignore this for now (it is for directories)
    rwx  means that the root user can read/write and execute
    r-x  means users in the root group can read and execute but not write.
    r-x  means others can read and execute, but not write
    
To work as a command, a file must be executable for you.

Second, to work as a command by name, the command must exist in a file by that name in one of several special places defined by the $PATH environment variable. This variable holds a colon separated list of directories to look for commands in:

    echo $PATH
    
Finally, to work as a command the shell must know how to run the file found with that name. This is done in the first line of the file. You can see the first line like this:

    head -n 1 /usr/bin/pip
    
This shows the path to the executable that is supposed to run this file: `#!/usr/bin/python3`. 
    
Note if you run this on a binary, or compiled file, you will see gobbleygook characters. These files are not meant to be viewed by you.    

If a file is executable, and the first line shows how to run, you can always call it using an absolute path, even when it is not in one of the $PATH directories.



# Combining commands

So far we have looked at using commands one at at time. The shell commands are much more powerful when they are combined. Many commands are designed so they can take the output of another command is input. Suppose we want the list of files in a directory, but sorted. We use a *pipe* to take the output of `ls` and feed it to `sort`. The pipe operator is "|".

    ls | sort
    
Want them in reverse order? Check the sort man page to find the option for reversing the sort. 

Want to know how many entries there are? You can pipe the output to `wc -l` which is a command for word count with an option to count the number of lines.  It is not necessary to sort here, but I show it to indicate you can use multiple pipes

    ls | sort | wc -l
    



# Redirecting output

So far we have been seeing output on the *standard output*, or stdout of the shell. That output is transient, and ephemeral; if you close the terminal it may disappear forever. You can often look back in the .bash_history file to see what commands were run, but it may be undesireable to run them again. In the shell, we can use redirect operators to put the output into files. the ">" operator will redirect the output from stdout to a file. Here is an example where we create a file, and then run some additional commands on the saved file.

    ls | sort > sorted-list.dat
    wc -l sorted-list.dat
    head -n3 sorted-list.dat
    
When you are done with the file, you can delete it with:

    rm sorted-list.dat



# Make your own shell command!

Create a new file in this directory called hello.sh. You can do that in Jupyter lab. Add these lines:

    #!/bin/bash
    echo "Hello world!"
    
Now try to run it. This directory is not in your $PATH, so we specify the path to it: 

    ./hello.sh
    
You should see

    bash: ./hello.sh: Permission denied
    
Let's check the permissions:

    ls -l hello.sh
    
Sure enough, you don't see any "x" bits indicating it is executable, and if you list the file, it is not colored green. We can make the file executable like this:

    chmod +x hello.sh 
    
now you can see the "x" bits for everyone, and you can run it with `./hello.sh`.    



## Make your own bin directory and add it to the path

We can make our own bin directory with the `mkdir` command.

    mkdir ~/bin
    
next, we can add that directory to our $PATH. This is done temporarily for now. That means if you close the terminal and reopen it, you will have to run this command again.

    export PATH=$PATH:~/bin
    
Next, we move our command to ~/bin

    mv hello.sh ~/bin
    
Now in a shell, from any directory you can simply type 

    hello.sh
    
to run your command. You can also use the `which` and `type` commands to find it.

When naming commands you have to be careful to give them different names from other commands. If you don't use an absolute path to your command, whatever the first one that is found will be run!



## Making the PATH modification permanent

We have to go back a little to understand how to make modifications to the shell, e.g. to add something to the path. When you start/open your shell it reads the contents of some "dotfiles". Specifically, here the .bashrc file is read. Check out the contents now:

    cat ~/.bashrc
    
There is already a lot being set for you by default. We can add a line to this file so that our $PATH will be set each time we open a terminal. It is a little tricky editing dotfiles in JupyterLab. The dotfiles are not shown in the File manager. Instead, we have to edit them in the terminal. You can use many editors for this: vim, emacs, etc. We will use `nano`. It is pretty simple.

    nano ~/.bashrc
    
This will open the .bashrc file in your terminal. Use the arrows or page down to get to the end of the file. Add exactly this text (no leading spaces):

    export PATH=$PATH:~/bin
    # END
    
Then type C-o RET to save and write the file, then C-x to exit.
    
Run this command to check that the text you added is there.
    
    cat ~/.bashrc
    
We still have to load the file in this terminal session. Do that here to check for errors. If you get no output, there is no error.
    
    source ~/.bashrc
    
Finally, echo the $PATH to see if your directory got added. Now, any executable files you add to ~/bin will be on your path, and available as commands.   
    



# Do I have to learn the shell?

Sort of. Eventually you land in it when developing software. It is eventually where you install and uninstall software, and every software package runs in a shell somewhere. So you have to understand some things about the shell to know how software works.

Personally, I limit the way I use shell commands. Where practical, I write short shell scripts to document



In [1]:
! ls 


00-introduction-terminals.ipynb
