# A brief introduction to UNIX and BASH

Class: Introducción a Python para Astronomía.
Programa de Doctorado en Astronomía.

Instructor: Guillermo Damke - Universidad de La Serena.

### Operating Systems: Unix and Linux

#### What is an Operating System?


"An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs."

![OS_structure.png](attachment:OS_structure.png)

#### What is Unix? 

"Unix (/ˈjuːnɪks/; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, development starting in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others."

"Unix systems are characterized by a modular design that is sometimes called the "Unix philosophy". According to this philosophy, the operating system should provide a set of simple tools, each of which performs a limited, well-defined function.[5] A unified filesystem (the Unix filesystem) and an inter-process communication mechanism known as "pipes" serve as the main means of communication,[3] and a shell scripting and command language (the Unix shell) is used to combine the tools to perform complex workflows."

This Unix philosophy is summarized by Peter H. Salus in A Quarter-Century of Unix (1994):

* Write programs that do one thing and do it well.
* Write programs to work together.
* Write programs to handle text streams, because that is a universal interface.
    
#### What is GNU-Linux?

"Linux (/ˈlinʊks/ (About this soundlisten) LEEN-uuks or /ˈlɪnʊks/ LIN-uuks) is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged in a Linux distribution."

"Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy."

"GNU is an operating system that is free software—that is, it respects users' freedom. The GNU operating system consists of GNU packages (programs specifically released by the GNU Project) as well as free software released by third parties. The development of GNU made it possible to use a computer without software that would trample your freedom."

![linux_os.gif](attachment:linux_os.gif)


"GNU is a Unix-like operating system. That means it is a collection of many programs: applications, libraries, developer tools, even games. The development of GNU, started in January 1984, is known as the GNU Project. Many of the programs in GNU are released under the auspices of the GNU Project; those we call GNU packages."

"The name “GNU” is a recursive acronym for “GNU's Not Unix.” “GNU” is pronounced g'noo, as one syllable, like saying “grew” but replacing the r with n."

"The program in a Unix-like system that allocates machine resources and talks to the hardware is called the “kernel”. GNU is typically used with a kernel called Linux. This combination is the GNU/Linux operating system. GNU/Linux is used by millions, though many call it “Linux” by mistake."

**In this class, we will use the terms Unix/Linux/GNU-Linux interchangeably.**

### The Unix Shell

"A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting language, and is used by the operating system to control the execution of the system using shell scripts."

"Users typically interact with a Unix shell using a terminal emulator; however, direct operation via serial hardware connections or Secure Shell are common for server systems. All Unix shells provide filename wildcarding, piping, here documents, command substitution, variables and control structures for condition-testing and iteration."

### BASH: The Bourne Again SHell

"Bash is the GNU Project's Bourne Again SHell, a complete implementation of the IEEE POSIX and Open Group shell specification with interactive command line editing, job control on architectures that support it, csh-like features such as history substitution and brace expansion, and a slew of other features."

### BASH and the Linux Terminal

When you open a terminal, it will display a ***prompt*** that means it is ready for an input command. The usual aspect of the prompt is:

`username@hostname:~$`


![terminal.png](attachment:terminal.png)

For instance, in the image the username us "gdamke" and the hostname (i.e., your machine name in a network) is "chileno".

In addition, there are two symbols (~ and $). The `~` sign is telling you where in the directory structure you are. In Unix, `~` means "your home directory". The `$` sign tells you that you are a regular usar (not root).

Linux allows you to configure your prompt in several ways, for example, to display the time, the full path, relative path, use colors, etc.

#### Some "popular" commands

Finally, let's learn some commands in our shell. Probably the most popular command is `ls`.

In [11]:
ls

class1_BASH.ipynb  [0m[01;34mImages[0m


The `ls` command gave an output showing "class1_BASH.ipynb" and "Images". Why different colors?

More importantly, how can we learn what the command does?

Let's try a **very** useful command: `man`.

In [12]:
man ls

LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci‐
       fied.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters

       --block-size=SIZE
              scale sizes by SIZE before printing them; e.g., '--block-size=M'
              prints sizes in units of 1,048,576 bytes; see SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

 

Wow! What is this? This is a "manual" page. Most Unix commands have manual pages that will tell you how to use the command. They are quite useful, because they literally teach you how to use the OS!

There are three important sections listed in the page.

* "NAME": tells you the name a a short description of the command. Now, we know that ls - "list directory contents".


* "SYNOPSIS": Tells you how to use the command. In this case, we see that there is an [OPTION] and then [FILE].


* "DESCRIPTION": This is self-explanatory. However, it mentions "arguments". What are arguments?


For example, let's pass the argument "all". Arguments can be passed in two ways:

* Long option: using a double dash. They are longer than one letter.
* Abreviation: using a single dash -. They are a single letter.



In [14]:
ls --all

[0m[01;34m.[0m  [01;34m..[0m  class1_BASH.ipynb  [01;34mImages[0m  [01;34m.ipynb_checkpoints[0m


In [15]:
ls -a

[0m[01;34m.[0m  [01;34m..[0m  class1_BASH.ipynb  [01;34mImages[0m  [01;34m.ipynb_checkpoints[0m


Interestingly, multiple command abreviations can be joined together. For example, let's use the "long listing format" argument `l`.


In [17]:
ls -la

total 240
drwxr-xr-x 4 gdamke gdamke   4096 Sep 21 01:54 [0m[01;34m.[0m
drwxr-xr-x 4 gdamke gdamke   4096 Sep 20 21:55 [01;34m..[0m
-rw-r--r-- 1 gdamke gdamke 226725 Sep 21 01:54 class1_BASH.ipynb
drwxr-xr-x 2 gdamke gdamke   4096 Sep 21 01:15 [01;34mImages[0m
drwxr-xr-x 2 gdamke gdamke   4096 Sep 20 21:39 [01;34m.ipynb_checkpoints[0m


### Exercise: Can you look for the options so that the most recent file is shown last? Can you display the file syze in a "human readable" format?

*Aside: Interestingly, even `man` has its manual page!*

In [19]:

man man


MAN(1)                        Manual pager utils                        MAN(1)

NAME
       man - an interface to the on-line reference manuals

SYNOPSIS
       locale] [-m system[,...]] [-M path] [-S list]  [-e  extension]  [-i|-I]
       [--regex|--wildcard]   [--names-only]  [-a]  [-u]  [--no-subpages]  [-P
       pager] [-r prompt] [-7] [-E encoding] [--no-hyphenation] [--no-justifi‐
       cation]  [-p  string]  [-t]  [-T[device]]  [-H[browser]] [-X[dpi]] [-Z]
       [[section] page[.section] ...] ...
       man -k [apropos options] regexp ...
       man -K [-w|-W] [-S list] [-i|-I] [--regex] [section] term ...
       man -f [whatis options] page ...
       locale]  [-P  pager]  [-r  prompt]  [-7] [-E encoding] [-p string] [-t]
       [-T[device]] [-H[browser]] [-X[dpi]] [-Z] file ...
       man -w|-W [-C file] [-d] [-D] page ...
       man -c [-C file] [-d] [-D] page ...
       man [-?V]

DESCRIPTION
       man is the system's manual pager.  Each page argument given to  man  is
 

       International  support is available with this package.  Native language
       manual pages are accessible (if available on your system)  via  use  of
       locale  functions.   To  activate  such support, it is necessary to set
       either $LC_MESSAGES, $LANG  or  another  system  dependent  environment
       variable to your language locale, usually specified in the POSIX 1003.1
       based format:

       <language>[_<territory>[.<character-set>[,<version>]]]

       If the desired page is available in your locale, it will  be  displayed
       in lieu of the standard (usually American English) page.

       Support  for  international message catalogues is also featured in this
       package and can be activated in the same way, again if  available.   If
       you  find  that  the  manual pages and message catalogues supplied with
       this package are not available in your native language  and  you  would
       like  to supply them, please contact the maintainer w

              variables, possibly including $LC_MESSAGES and $LANG.  To tempo‐
              rarily  override the determined value, use this option to supply
              a locale string directly to man.  Note that  it  will  not  take
              effect  until the search for pages actually begins.  Output such
              as the help message will always be displayed  in  the  initially
              determined locale.

       -m system[,...], --systems=system[,...]
              If  this  system  has  access to other operating system's manual
              pages, they can be accessed using this option.  To search for  a
              manual  page from NewOS's manual page collection, use the option
              -m NewOS.

              The system specified can be a  combination  of  comma  delimited
              operating system names.  To include a search of the native oper‐
              ating system's manual pages, include the system name man in  the
              argument st

              man sets the -ix8 options.

              The $MANLESS environment variable described below may be used to
              set  a  default prompt string if none is supplied on the command
              line.

       -7, --ascii
              When viewing a pure ascii(7) manual page on a 7 bit terminal  or
              terminal  emulator,  some  characters  may not display correctly
              when using the latin1(7)  device  description  with  GNU  nroff.
              This  option  allows  pure ascii manual pages to be displayed in
              ascii with the latin1 device.  It will not translate any  latin1
              text.   The  following  table  shows the translations performed:
              some parts of it may only be displayed properly when  using  GNU
              nroff's latin1(7) device.

              Description      Octal   latin1   ascii
              ────────────────────────────────────────
              continuation      255      ‐        -
     

              option  (so any occurrences of the text $MAN_PN will be expanded
              in the same way).  For example, if you want to  set  the  prompt
              string  unconditionally  to  “my prompt string”, set $MANLESS to
              ‘-Psmy prompt string’.  Using the -r option overrides this envi‐
              ronment variable.

       BROWSER
              If  $BROWSER is set, its value is a colon-delimited list of com‐
              mands, each of which in turn is used  to  try  to  start  a  web
              browser  for  man  --html.  In each command, %s is replaced by a
              filename containing the HTML output from groff, %%  is  replaced
              by a single percent sign (%), and %c is replaced by a colon (:).

       SYSTEM If  $SYSTEM  is  set,  it will have the same effect as if it had
              been specified as the argument to the -m option.

       MANOPT If $MANOPT is set, it will be parsed prior to man's command line
              and 

### Files and Directories (and  useful commands):

Directories in Linux follow a "tree" organization. The directories are organized as follow:


![directory_tree.png](attachment:directory_tree.png)


(image credit: Ppgardne on Wikipedia)

The two most important directories, for a user, are:

- The "root" directory: It is the root of the tree. It is designed by `/`.

- The `home` directory: It contains the user files. This is where you will work. Your home directory is `/home/<username>`. There may be many users in a system, whic will have their own directory within `/home`. This is part of the multiuser nature of Unix-like systems.

When you open a shell, it will be in *your* home directory.

* Use the `pwd` command to print the current-working directory.
* Use the `cd <diretory>` command to change directory to <directory>.
* If you are not in your home diretory, just type `cd`, or `cd ~`
* **Tip:** Type `cd -` to return to the previous directory.

In [1]:
pwd

/home/gdamke/astro/teaching/2020B_PythonAstro/IntroPythonAstroPhD/01_Unix_and_BASH


In [2]:
cd
pwd

/home/gdamke


In [3]:
cd -

/home/gdamke/astro/teaching/2020B_PythonAstro/IntroPythonAstroPhD/01_Unix_and_BASH


### Important commands for files and directories

This list contains the commands that you **must** know to manipulate files.

* `mkdir`: create a directory.
* `rm`: remove a file or directory. *USE WITH CAUTION, DELETED FILES ARE USUALLY NOT RECOVERABLE*.
   * To remove a directory, you need to pass the *recirsuve** or `-r` option.
* `mv`: move a file or directory.
   * Used also for renaming a file!

### The .bashrc file and BASH aliases


The `~/.bashrc` file is executed automatically every time you open a terminal. It is usual that it will look for a file called `~/.bash_aliases`, where you can define personal aliases.

#### What are aliases?

"aliases" are abbreviations for commands that we usually type in. This allows us to set some arguments to commands by default. For example, some of my aliases are:



In [8]:
cat ~/.bash_aliases | head -n 8


# Some useful aliases:
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
alias research='cd ~/astro/research'
alias lc='ls --color=auto'



### Pipes and redirections, and more commands.

Unix is developed to work with text (basically, everything is text). Let's review some commands that are useful to work with text.

`cat`: concatenate text.

`head`: print the first `n` lines of a file.

`tail`: print the last `n` lines of a file.

`paste`: join two files line-by-line horizontally.


There are three (well, two) very powerful tools in the terminal:

`|`: a pipe allows to pass the output of one file to the input of another command.

`>`: redirect the output of a command to a **new file, or overwrite the existing file**.

`>>`: append the output of a command to an existing file.

In [10]:
ls

class1_BASH.ipynb  [0m[01;34mImages[0m  RC3_catalog.txt


How many lines are in the file `RC3_catalog.txt`?

There is a Unix command for that! Check the `wc` command. What is the output?

In [12]:
wc RC3_catalog.txt

  23016  195496 1541906 RC3_catalog.txt


### Exercises:

* Count the number of files in a directory.
* Create a list of all the files in a directory.
* Create a list of all the filenames that begin with a number.
* Get the first 100 galaxies and the last 100 galaxies in the catalog, and put them in a new text table.

### Wildcards

Probably, the second and third exercises are not easy to do, except if you know about wildcards.

Let's review some of the most useful *wildcards*, which are expressions which are interpreted by the shell and allow to match filenames.

* Use `*` to mean "all".
* Use brackets to specify ranges.

### Some other useful commands and tips:

Some other useful tips:

* Programs can be run on the foreground and the background. Add an umpersant `&` at the end of a command to keep the prompt. 
* Use "ctrl + z" to stop a process. Then you can use the command `bg` to move it to the background.
* Use "ctrl + c" to cancel or quit a command.
* Use `jobs` to list the programs being run in the terminal.
* Use `top` to list all running processes.
* Use `kill -9 <process_id>` to kill a non-responsive process.

Another useful command is "wget". It allows you to download a file from the Internet with the ULR.

For example, to download the latest Miniconda Python distribution, you can do:

`wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh`



If enough time: "tarballs", packing, compressing and compressing files with tar and gzip.