# Linux

**Linux** is an open source version of the **UNIX operating system**. The open source nature of this operating system
has led to many different versions of it. These versions are called **distributions**, often abbreviated as **distro**. Some popular Linux distributions are **Debian**, **Red Hat**, **Fedora**, and **Ubuntu**.

The portion of the operating system that handles the computer resources (CPU, RAM and devices) is referred to as the **kernel**. When people refer to the Linux operating system they mean the kernel as well as the accompanying set of tools and libraries that is actually called **GNU**.

## Shell

The **Command Line Interface** (**CLI**), also known as **Shell**, is one way to interact with our computer. The default shell on Linux systems is usually **Bash**. Other shells include the Bourne shell (**sh**), the Korn shell (**ksh**), **tcsh**, **zsh**, and **fish**.

A **shell** is a powerful user interface for **Unix-like operating systems**. It can interpret commands and run other programs. It also enables access to files, utilities, and applications, and is an interactive scripting language. Additionally, we can use a shell to automate tasks. Linux shell commands are used for navigating and working with files and directories. We can also use them for file compression and archiving.


The **command line prompt** looks something like this:
```bash
shaur@JAMWINE:~/workspace$
```
* The first part of the command prompt `shaur` is the **username** followed by the **@**symbol. 

* The second part of the command prompt `JAMWINE` is the **server address** (or **machine name**) after the **@** sign. On cloud instances, it is randomly specified using a unique combination of words. 

* The information after the **colon** is an abbreviation of the current working directory.
* The command line prompt ends with a **$**.

## Basic Linux and Bash Commands

* The **~** or the **tilde symbol** represents the **home directory**.


* **Dots** in bash are useful utilities that help with navigating the file system using the CLI. In an `ls -a` listing:
    * **.** refers to the **current directory**.
    * **..** refers to the **parent directory**.


* A **relative path** is related to the **current working directory**. 


* An **absolute path** is represented as starting from the **root directory** by leading with the **/**.


### Getting information

#### return your user name
`whoami`

#### return your user and group id
`id`

#### return operating system name, username, and other info
`uname -a`

#### display reference manual for a command
`man top`

#### get help on a command
`curl --help`

#### return the current date and time
`date`

#### find out the path of the command 'bash'
`which bash`

### Navigating and working with directories

#### A regular sorted list of files in the directory
`ls`

#### A more complete list with permissions
`ls -l`

#### Hidden files were included
`ls -a`

#### A comma separated list
`ls -m`

#### Included directories and their contents
`ls -R`

#### list files and directories by date, newest last
`ls -lrt`

#### return present working directory
`pwd`

#### make a new directory
`mkdir new_folder`

#### change the current directory: up one level
`cd ..`

#### change the current directory: home
`cd ~ or cd`

#### change the current directory: some other path
`cd another_directory`

#### remove directory (recursive)
`rmdir -r temp_directory`

#### find files in home directory with suffix 'sh'
`find ~/ -name '*.sh'`

#### find files in home directory that are greater in size than 100 bytes
To specify bigger than, we need to put a **+** in front of the 100.

`find ~/ -size +100`

* We may specify **k** for **kilobytes**, **M** for **megabytes** next to the size (e.g. `find ~/ -size +10k`).
* For a range, we can specify both a minimum and maximum file size. 

#### find files between 10 and 50 bytes
`find ~/ -size +10 -size -50`

#### Find files or directories in ~/workspace/code that have been modified in the last year
`find ~/workspace/code -mtime -365`

### Working with files

#### copy a file
`cp file.txt new_path/new_name.txt`

#### copy a directory
`cp -r my_directory/ new_path/new_copied_directory`

#### change file name or path
`mv this_file.txt that_path/that_file.txt`

#### move all our .txt files into the code directory
`mv *.txt code`

#### remove a file verbosely
`rm this_old_file.txt -v`

#### create an empty file, or update existing file's timestamp
`touch a_new_file.txt`

#### change/modify file permissions to 'execute' for all users
`chmod +x my_script.sh`

#### change/modify file permissions to limit the execute permission to only the owner
`chmod u+x my_script.sh`

#### get count of lines in file
`wc -l table_of_data.csv`

#### get count of words in file
`wc -w my_essay.txt`

#### get count of characters in file
`wc -m some_document.txt`

#### return lines matching a pattern from files matching a filename pattern - whole words only
`grep -w 'hello' \*.txt`

#### return lines matching a pattern from files matching a filename pattern - case insensitive and whole words only
`grep -iw 'hello' \*.txt`

#### return file names with lines matching the pattern 'hello' from files matching a filename pattern
`grep -l 'hello' \*.txt`

#### return lines that do not contain printf in the c files in the code/src/ directory
`grep -v 'printf' code/src/*.c`

#### provide access information about a file
`stat sample.txt`

#### provide system information about a file
`stat -f sample.txt`

### Printing file and string contents

#### print file contents
`cat my_shell_script.sh`

#### print file contents page-by-page, displays text by one screen at a time
`more ReadMe.txt`

`more -d ReadMe.txt`

* The `-d` parameter instructs the more command to put a prompt at the bottom telling you to press space for more text or q to quit.

#### displays text by allowing scrolling
`less ReadMe.txt`

#### print first N lines of file
`head -10 data_table.csv`

#### print last N lines of file
`tail -10 data_table.csv`

#### print string
`echo "I am not a robot"`

#### print variable value
`echo "I am $USERNAME"`

---
The **printf** command formats and prints data. Here is a summary of the common parameters used with `printf`:

* `%d` integer number printed in decimal 
* `%f` floating point number 
* `%c` character 
* `%s` string 

Unlike `echo`, `printf` does not send an automatic newline at the end of the output. The printf command allows us to optionally assign the result to a variable rather than outputting to the screen (which is useful in Bash
scripts)

#### print a formatted string
`printf "%s got %s wrong answer(s)\n" "Jane" "1"`

#### Print the result of a math equation
`printf "%d\n" $((8+4))`

#### Print only the first 4 digits beyond the decimal point of a floating point number
`printf "%.4f\n" 3.1415926535`



### Compression and archiving

#### archive a set of files
`tar -cvf my_archive.tar.gz file1 file2 file3`

#### compress a set of files
`zip my_zipped_files.zip file1 file2`

`zip my_zipped_folders.zip directory1 directory2`

#### extract files from a compressed zip archive
`unzip my_zipped_file.zip`

`unzip my_zipped_file.zip -d extract_to_this_directory`

### Performing network operations

#### print hostname
`hostname`

#### send packets to URL and print response
`ping www.google.com`

#### display or configure system network interfaces
`ifconfig`

`ip`

#### display contents of file at a URL
`curl <url>`

#### download file from a URL
`wget <url>`

### Pipes and Filters

#### chain filter commands using the pipe operator
`ls | sort -r`

#### pipe the output of manual page for ls to head to display the first 20 lines
`man ls | head -20`

### Shell and Environment Variables

#### list all shell variables
`set`

#### define a shell variable called my_planet and assign value Earth to it
`my_planet=Earth`

#### display shell variable
`echo $my_planet`

#### list all environment variables
`env`

#### environment vars: define/extend variable scope to child processes
`export my_planet`

`export my_galaxy='Milky Way'`

### Metacharacters

* **\*** represents **any number of characters**
* **?** represents **any single character**
* **\[ \]** represents a **range**, can be `[1-3]` or `[1,2,3]`

#### comments
`#The shell will not respond to this message`

#### command separator
`echo 'here are some files and folders'; ls`

#### file name expansion wildcard
`ls *.json`

#### single character wildcard
`ls file_2021-06-??.json`

### Quoting

#### single quotes - interpret literally
`echo 'My home directory can be accessed by entering: echo $HOME'`

#### double quotes - interpret literally, but evaluate metacharacters
`echo "My home directory is $HOME"`

#### backslash - escape metacharacter interpretation
`echo "This dollar sign should render: \$"`

### I/O Redirection

The **>** character or **stdout redirection operator** changes the **stdout** to be a file with the specified name rather than the console.


The **>> redirection operato**r does an append rather than an overwrite. It will create the file if it doesn’t exist but if it does, it will append to it.

#### redirect output to file
`echo 'Write this text to file x' > x.txt`

#### append output to file
`echo 'Add this line to file x' >> x.txt`

#### redirect standard error to file
`bad_command_1 2 > error.log`

#### append standard error to file
`bad_command_2 2 >> error.log`

#### redirect file contents to standard input
`tr “[a-z]” “[A-Z]” < a_text_file.txt`

#### the input redirection above is equivalent to
`cat a_text_file.txt | tr “[a-z]” “[A-Z]”`

### Command Substitution

#### capture output of a command and echo its value
`THE_PRESENT=$(date)`

`echo "There is no time like $THE_PRESENT"`

### Command line arguments

`./My_Bash_Script.sh arg1 arg2 arg3`

### Batch vs. concurrent modes

#### run commands sequentially
`start=$(date);`

`./MyBigScript.sh;`

`end=$(date)`

#### run commands in parallel
`./ETL_chunk_one_on_these_nodes.sh & ./ETL_chunk_two_on_those_nodes.sh`

### Scheduling jobs with Cron

#### open crontab editor
`crontab -e`

#### job scheduling syntax (minute, hour, day of month, month, day of week)
`m h dom mon dow command`

(**\*** means any)

#### append the date/time to file every Sunday at 6:15 pm
`15 18 * * 0 date >> sundays.txt`

#### run a shell script on the first minute of the first day of each month
`1 0 1 * * ./My_Shell_Script.sh`

#### back up your home directory every Monday at 3 am
`0 3 * * 1 tar -cvf my_backup_path\my_archive.tar.gz $HOME\`

#### deploy your cron job
- Close the crontab editor and save the file

#### list all cron jobs
`crontab -l`

### User Input

#### Wait for user to enter a name, and save the entered name into the variable 'name'
`echo -n "Enter your name :";`

`read name;`

`echo "Welcome $name"`

### Monitoring performance and status

#### list selection of or all running processes and their PIDs
`ps`

`ps -e`

`ps au`

#### display resource usage, provides a dynamic real-time view of a running system
`top`

#### estimates and displays the disk space used by files
`du-h`

#### list mounted file systems and usage
`df-h`

#### displays the total amount of free and used memory
`free`

#### Display the amount of memory in megabytes
`free -m`

#### Display the amount of memory in gigabytes
`free -g`

### Stream Editor (sed)

The name **sed** is short for **stream editor**. The `sed` command is most often used for finding and replacing, or searching and deleting. The sed utility in its simplest form works like `grep`, where we can use regular expressions to find a string in a file. It can also provide a substitute string for the string we found, or delete every line that matches a string.

#### find a string in a file
`sed -n '/text/p' sample.txt`

* The `-n` means it will only produce output when explicitly told to via the `p` command (the default is to print each line)
* The `'/text/p'` tells it to print out lines that have the word `text` in them.

#### delete underscores in a file (replace the `_` with nothing)
`sed 's%_%%g' myfile.txt`

* `sed` is the command.
* `'s%_%%g'` is the pattern where:
    * **s** means **substitute**.
    * The **%** is the **delimiter** (we can use any characters here).
    * The **search pattern** follows the first **%**.
    * The **replace pattern** follows the second **%**.
    * The **g** stands for **global replace** - replace all occurrences in the file.
    
#### delete underscores in a file (replace the `_` with nothing) and send the output to a file instead of the console
`sed 's%_%%g' myfile.txt > newfile.txt`

#### delete underscores in a file (replace the `_` with nothing) and save it in the original file
`sed -i 's%_%%g' myfile.txt`

#### Change the name of one of the characters
`sed 's%Christopher%Chris%g' myfile.txt`

#### deleting lines
`sed '45,54d' myfile.txt`

* The `'45,54'` represents lines from 45 to 54. The comma indicates a range.
* The **d** lets `sed` know it will be a deletion

#### deleting all blank lines
`sed '/^$/d' myfile.txt`

* **^** indicates the beginning of the line 
* **$** indicates the end of the line 

(Since there is nothing between the two, it means a blank line.)

#### deleting from a specific line to the end of the file
`sed '/So I tried/,$d' myfile.txt`

#### deleting everything after a specific line and save it “in place”
`sed -i '1,/So I tried/!d' myfile.txt`

* The `-i` command line option means edit in place
* The `1,/So I tried/` specifies a range from the first line up to and including “So I tried”.
* The **!d** means things in this range will not be deleted but everything else will

### awk

The **awk** command provides a way to search for a pattern, and perform actions on the found text. `awk` reads lines from the input file one at a time. The line is scanned for each pattern in the program and if there is a match the associated action is executed.

Either the pattern or the action can be omitted, but not both.
* If the pattern is omitted, then the action is performed for every line.
* If the action is omitted, then all lines that match the pattern will be printed out.

A few built in AWK variables are:

* `$0` The entire line - not including the newline at the end
* `$1..$n` The fields in a line (delimited by the field separator)
* `FNR` Current line number - just spans the current file
* `FS` Field separator - default is space
* `NF` Number of fields
* `NR` Current line number - spans multiple files
* `RS` Record separator - the default is newline


AWK special patterns are:
* `BEGIN` Startup actions
* `END` Cleanup actions

#### count the number of occurrences of the word “Pooh” in the text
`awk '/Pooh/{x++}END{print x}' myfile.txt`

* `awk` is the command
* `/Pooh/` is the pattern
* `{x++}END{print x}` is the action

#### number of words in a file
`awk '{ total = total + NF }; END {print total}' myfile.txt`

#### number of lines in a file
`awk 'END{print NR}' myfile.txt`

#### pull out items in field separated data (such as csv)
`awk -F"," '{if (NR!=1){ print $1 " wrong answers " $5 " out of total " $8 }}' data.csv`

* The option `-F` sets the field separator, the default is space. 
* `$` specifies which column is required (such as `$5`)

#### print out just the second column of the file
`awk -F"," '{print $2}' data.csv`

#### Print out the number of answered assessments for “Jane Smith”
`awk -F"," '/Jane Smith/{print $6}' data.csv`

## Shell Scripting

The `.sh` extension is a convention used to indicate that the file is a shell script.

A **shell script** is an executable text file in which the first line usually has the form of an interpreter directive. The **interpreter directive** is also known as a `shebang directive`, and has the following form:
```linux
#!interpreter [optional argument]
```
**Interpreter** is an absolute path to an `executable program`, and the **optional argument** is a string representing a `single argument`.

Shell scripts are scripts that invoke a shell program. For example:
* `#!/bin/sh` invokes the Bourne shell or other compatible shell program, from the bin directory.
* `#!/bin/bash` aka the **bash shebang** invokes the Bash shell. 

**Shebang directives** aren't limited to shell programs. For example, we could create a python script with the following directive:

*`#!/usr/bin/env python3`

## Filesystem Hierarchy Standard (FHS)

The **Filesystem Hierarchy Standard** ensures that software packages running on a Linux system will know where to find essential files and directories.

To view the directories of the root type in:
```bash
cd /
ls
```
The contents of the root directory include:
* **/bin** : Binaries or executables that are essential for functionality
* **/boot** : Files needed to boot the system such as the Linux kernel
* **/dev** : Device files - interface with hardware drivers
* **/etc** : Host-specific system configuration - editable text
* **/home** : User directories live under here
* **/lib** : Common libraries
* **/lib64** : Common 64 bit libraries
* **/media** : Mount point for removable media
* **/mnt** : Mount point for mounting a filesystem temporarily
* **/opt** : Optional add on software
* **/proc** : Keeps track of running processes
* **/root** : Home directory for root user
* **/run** : Data relevant to running processes
* **/sbin** : System binaries or executables that are essential for functionality
* **/srv** : Data for services provided by this system
* **/sys** : A symbolic link to the kernel source tree
* **/temp** : Temporary files that won't be persistent between reboots
* **/var** : Variable files - things that will change as the operating system is being run such as logs and cache files


The **/bin directory** contains these commands:

| Command | Description |
|---|---|
|cat |Concatenate files to standard output|
|chgrp| Change file group ownership|
|chmod| Change file access permissions|
|chown| Change file owner and group|
|cp| Copy files and directories|
|date| Print or set the system data and time|
|dd| Convert and copy a file|
|df |Report filesystem disk space usage|
|dmesg |Print or control the kernel message buffer|
|echo |Display a line of text|
|false |Do nothing, unsuccessfully|
|hostname |Show or set the system’s host name|
| kill | Send signals to processes|
|ln |Make links between files|
|login |Begin a session on the system|
|ls |List directory contents|
|mkdir |Make directories|
|mknod |Make block or character special files|
|more |Page through text|
|mount |Mount a filesystem|
|mv |Move/rename files|
|ps |Report process status|
|pwd |Print name of current working directory|
|rm |Remove files or directories|


In [None]:
sd

Filters are shell commands, and the pipe operator allows you to chain filter commands 

Command substitution is used to replace a command with its output 