# Linux

**Linux** is an open source version of the **UNIX operating system**. The open source nature of this operating system
has led to many different versions of it. These versions are called **distributions**, often abbreviated as **distro**. Some popular Linux distributions are **Debian**, **Red Hat**, **Fedora**, and **Ubuntu**.

The portion of the operating system that handles the computer resources (CPU, RAM and devices) is referred to as the **kernel**. When people refer to the Linux operating system they mean the kernel as well as the accompanying set of tools and libraries that is actually called **GNU**.

## Shell

The **Command Line Interface** (**CLI**), also known as **Shell**, is one way to interact with our computer. The default shell on Linux systems is usually **Bash**. Other shells include the Bourne shell (**sh**), the Korn shell (**ksh**), **tcsh**, **zsh**, and **fish**.

A **shell** is a powerful user interface for **Unix-like operating systems**. It can interpret commands and run other programs. It also enables access to files, utilities, and applications, and is an interactive scripting language. Additionally, we can use a shell to automate tasks. Linux shell commands are used for navigating and working with files and directories. We can also use them for file compression and archiving.


The **command line prompt** looks something like this:
```bash
shaur@JAMWINE:~/workspace$
```
* The first part of the command prompt `shaur` is the **username** followed by the **@**symbol. 

* The second part of the command prompt `JAMWINE` is the **server address** (or **machine name**) after the **@** sign. On cloud instances, it is randomly specified using a unique combination of words. 

* The information after the **colon** is an abbreviation of the current working directory.
* The command line prompt ends with a **$**.

## Basic Linux and Bash Commands

* The **~** or the **tilde symbol** represents the **home directory**.


* **Dots** in bash are useful utilities that help with navigating the file system using the CLI. In an `ls -a` listing:
    * **.** refers to the **current directory**.
    * **..** refers to the **parent directory**.


* A **relative path** is related to the **current working directory**. 


* An **absolute path** is represented as starting from the **root directory** by leading with the **/**.


### Getting information

#### return your user name
`whoami`

#### return your user and group id
`id`

#### return operating system name, username, and other info
`uname -a`

#### display reference manual for a command
`man top`

#### get help on a command
`curl --help`

#### return the current date and time
`date`

#### find out the path of the command 'bash'
`which bash`

### Navigating and working with directories

#### A regular sorted list of files in the directory
`ls`

#### A more complete list with permissions
`ls -l`

#### Hidden files were included
`ls -a`

#### A comma separated list
`ls -m`

#### Included directories and their contents
`ls -R`

#### list files and directories by date, newest last
`ls -lrt`

#### return present working directory
`pwd`

#### make a new directory
`mkdir new_folder`

#### change the current directory: up one level
`cd ..`

#### change the current directory: home
`cd ~ or cd`

#### change the current directory: some other path
`cd another_directory`

#### remove directory (recursive)
`rmdir -r temp_directory`

#### find files in home directory with suffix 'sh'
`find ~/ -name '*.sh'`

#### find files in home directory that are greater in size than 100 bytes
To specify bigger than, we need to put a **+** in front of the 100.

`find ~/ -size +100`

* We may specify **k** for **kilobytes**, **M** for **megabytes** next to the size (e.g. `find ~/ -size +10k`).
* For a range, we can specify both a minimum and maximum file size. 

#### find files between 10 and 50 bytes
`find ~/ -size +10 -size -50`

#### Find files or directories in ~/workspace/code that have been modified in the last year
`find ~/workspace/code -mtime -365`

### Working with files

#### copy a file
`cp file.txt new_path/new_name.txt`

#### copy a directory
`cp -r my_directory/ new_path/new_copied_directory`

#### change file name or path
`mv this_file.txt that_path/that_file.txt`

#### move all our .txt files into the code directory
`mv *.txt code`

#### remove a file verbosely
`rm this_old_file.txt -v`

#### create an empty file, or update existing file's timestamp
`touch a_new_file.txt`

#### change/modify file permissions to 'execute' for all users
`chmod +x my_script.sh`

####  add write permission for all users for the file sample.txt
`sudo chmod ugo+w sample.txt`

* `sudo` or **super user do** combined with the `chmod` command allows to change file permissions.
* `ugo` stands for `user`, `group` and `other`

#### Giving different privileges to user, group, other
`chmod g-w,o-x sample.txt`

`chmod u=rwx,g=r,o=r sample.txt`

`chmod 756 sample.txt`

* Permissions can be reduced to a **3 digit number**:

<img src='imgs/permissions.png' alt='permissions' width=500 height=500>

    * The total value of 7 equals the rwx permissions (sum of the read, write and execute permission)
    * The read permission has a value of 4, the write a value of 2 and execute a value of 1 for a total of 7.
    * The total value of 5 equals the r-x, 4 for read, - is 0, 1 for execute.
    * The total value of 6 equals the rw-, 4 for read, 2 for write and 0 for execute.


#### remove the execution permissions
`chmod u-x my_script.sh`

#### get count of lines in file
`wc -l table_of_data.csv`

#### get count of words in file
`wc -w my_essay.txt`

#### get count of characters in file
`wc -m some_document.txt`

#### return lines matching a pattern from files matching a filename pattern - whole words only
`grep -w 'hello' \*.txt`

#### return lines matching a pattern from files matching a filename pattern - case insensitive and whole words only
`grep -iw 'hello' \*.txt`

#### return file names with lines matching the pattern 'hello' from files matching a filename pattern
`grep -l 'hello' \*.txt`

#### return lines that do not contain printf in the c files in the code/src/ directory
`grep -v 'printf' code/src/*.c`

#### provide access information about a file
`stat sample.txt`

#### provide system information about a file
`stat -f sample.txt`

#### information about the file type
`file sample.txt`

### Printing file and string contents

#### print file contents
`cat my_shell_script.sh`

#### print file contents page-by-page, displays text by one screen at a time
`more ReadMe.txt`

`more -d ReadMe.txt`

* The `-d` parameter instructs the more command to put a prompt at the bottom telling you to press space for more text or q to quit.

#### displays text by allowing scrolling
`less ReadMe.txt`

#### print first N lines of file
`head -10 data_table.csv`

#### print last N lines of file
`tail -10 data_table.csv`

#### print string
`echo "I am not a robot"`

#### print variable value
`echo "I am $USERNAME"`

---
The **printf** command formats and prints data. Here is a summary of the common parameters used with `printf`:

* `%d` integer number printed in decimal 
* `%f` floating point number 
* `%c` character 
* `%s` string 

Unlike `echo`, `printf` does not send an automatic newline at the end of the output. The printf command allows us to optionally assign the result to a variable rather than outputting to the screen (which is useful in Bash
scripts)

#### print a formatted string
`printf "%s got %s wrong answer(s)\n" "Jane" "1"`

#### Print the result of a math equation
`printf "%d\n" $((8+4))`

#### Print only the first 4 digits beyond the decimal point of a floating point number
`printf "%.4f\n" 3.1415926535`



### Compression and archiving

#### gzip compression tool
* Zip the large text file: `gzip file1.txt`
* Get information about the compression using the `-l` option: `gzip -l file1.txt.gz`
* Unzip the file: `gzip -d file1.txt.gz`

The above commands are similar for **bzip2** and **xz** compression tools.

#### archive a set of files
`tar -cvf my_archive.tar.gz file1 file2 file3`

Some common **tar** flags are:
* `-c`	Create an archive
* `-f`	Use archive file
* `-r`	Append to an archive
* `-t`	List contents of an archive
* `-v`	Verbose output
* `-x`	Extract contents of an archive
* `-z`	Compress the archive using gzip

#### list the contents of the images.tar file
`tar -tvf images.tar`

#### Archive all the files in the images directory to a file named imagedir.cpio
`cd images`

`ls | cpio -o > imagedir.cpio`

`ls -l imagedir.cpio`

* The **cpio command** is used to copy files into and out of archives. The name refers to its functionality, **copy in copy out**. It is similar to `tar` but there are some distinct differences:
    * `cpio` is **not recursive** by default
    * `cpio` will **not overwrite newer data file** whereas the default for `tar` is to overwrite
    
    
* Common cpio flags are:
    * **-o**:	copy out mode copies files into an archive
    * **-i**:	copy in mode copies files out of an archive
    * **-p**:	copy pass copies files from one directory tree to another

#### Extract a compressed archive to a different directory - the directory must already exist
`mkdir extracted`

`tar -zxvf images.tar.gz -C extracted`

#### compress a set of files
`zip my_zipped_files.zip file1 file2`

`zip my_zipped_folders.zip directory1 directory2`

* The **zip** command archives and compresses a set of files. It is a popular cross platform tool and that makes it ideal for sharing files with people on different systems. `zip` is **not recursive** by default, if we want it to zip everything in a directory, we need to specify the `-r` flag.

`zip -r my_zipped_folders.zip directory`

#### extract files from a compressed zip archive
`unzip my_zipped_file.zip`

`unzip my_zipped_file.zip -d extract_to_this_directory`



### Performing network operations

#### print hostname
`hostname`

#### send packets to URL and print response
`ping www.google.com`

#### display or configure system network interfaces
`ifconfig`

`ip`

#### display contents of file at a URL
`curl <url>`

#### download file from a URL
`wget <url>`

### Pipes and Filters

#### chain filter commands using the pipe operator
`ls | sort -r`

#### pipe the output of manual page for ls to head to display the first 20 lines
`man ls | head -20`

### Shell and Environment Variables

#### list all shell variables
`set`

#### define a shell variable called my_planet and assign value Earth to it
`my_planet=Earth`

#### display shell variable
`echo $my_planet`

#### list all environment variables
`env`

#### environment vars: define/extend variable scope to child processes
`export my_planet`

`export my_galaxy='Milky Way'`

### Metacharacters

* **\*** represents **any number of characters**
* **?** represents **any single character**
* **\[ \]** represents a **range**, can be `[1-3]` or `[1,2,3]`

#### comments
`#The shell will not respond to this message`

#### command separator
`echo 'here are some files and folders'; ls`

#### file name expansion wildcard
`ls *.json`

#### single character wildcard
`ls file_2021-06-??.json`

### Quoting

#### single quotes - interpret literally
`echo 'My home directory can be accessed by entering: echo $HOME'`

#### double quotes - interpret literally, but evaluate metacharacters
`echo "My home directory is $HOME"`

#### backslash - escape metacharacter interpretation
`echo "This dollar sign should render: \$"`

### I/O Redirection

The **>** character or **stdout redirection operator** changes the **stdout** to be a file with the specified name rather than the console.


The **>> redirection operato**r does an append rather than an overwrite. It will create the file if it doesn’t exist but if it does, it will append to it.

#### redirect output to file
`echo 'Write this text to file x' > x.txt`

#### append output to file
`echo 'Add this line to file x' >> x.txt`

#### redirect standard error to file
`bad_command_1 2 > error.log`

#### append standard error to file
`bad_command_2 2 >> error.log`

#### redirect file contents to standard input
`tr “[a-z]” “[A-Z]” < a_text_file.txt`

#### the input redirection above is equivalent to
`cat a_text_file.txt | tr “[a-z]” “[A-Z]”`

### Command Substitution

#### capture output of a command and echo its value
`THE_PRESENT=$(date)`

`echo "There is no time like $THE_PRESENT"`

### Command line arguments

`./My_Bash_Script.sh arg1 arg2 arg3`

### Batch vs. concurrent modes

#### run commands sequentially
`start=$(date);`

`./MyBigScript.sh;`

`end=$(date)`

#### run commands in parallel
`./ETL_chunk_one_on_these_nodes.sh & ./ETL_chunk_two_on_those_nodes.sh`

### Scheduling jobs with Cron

The **cron daemon** (**crond**) is a system-managed executable that runs in memory and is used to schedule tasks. The job scheduling syntax is: (**minute, hour, day of month, month, day of week**)
`m h dom mon dow command`

#### open crontab editor
`crontab -e`

#### append the date/time to file every Sunday at 6:15 pm
`15 18 * * 0 date >> sundays.txt`

(**\*** means any)

#### run a shell script on the first minute of the first day of each month
`1 0 1 * * ./My_Shell_Script.sh`

#### back up your home directory every Monday at 3 am
`0 3 * * 1 tar -cvf my_backup_path\my_archive.tar.gz $HOME\`

#### deploy your cron job
- Close the crontab editor and save the file

#### list all cron jobs
`crontab -l`

### User Input

#### Wait for user to enter a name, and save the entered name into the variable 'name'
`echo -n "Enter your name :";`

`read name;`

`echo "Welcome $name"`

### Monitoring performance and status

#### list selection of or all running processes and their PIDs
`ps`

`ps -e`

`ps au`

#### display resource usage, provides a dynamic real-time view of a running system
`top`

Running the `top` command provides the following information:
- **Tasks**: A list of the number of current processes the computer is running. The top program is the one labeled as **running** while other software such as the bash shell is **paused**. **Zombie programs** are the ones that fail to quit or **crash** because of a bug or a system memory overload.


- **PID**: The **process identication number**. Running or Zombie programs can be killed by using this unique id.


- **USER**: The user that started the task. For example, the root user tasks, also called **super user** are running in the background. 


- **%CPU** & **%MEM**: Display the percentage of the computer’s Central Processing Unit usage and the Random Access Memory or RAM usage. These indicators can help us identify which program is making the computer laggy or slow.


- **COMMAND**: The name of the program (software)

#### estimates and displays the disk space used by files
`du-h`

#### list mounted file systems and usage
`df-Th`

* `-T` for type and `h` displays the output in a more human readable form

#### displays the total amount of free and used memory
`free`

#### Display the amount of memory in megabytes
`free -m`

#### Display the amount of memory in gigabytes
`free -g`

#### find processes that are an exact match of the search string
`pidof abc`

(assuming `abc` is the process)

#### find processes that contain the string
`pgrep xyz`

(assuming `xyz` is the process)

#### kill processes that contain the string
`pkill xyz`

(assuming `xyz` is the process)

####  find information about all available block devices
`lsblk`

* In the case of the virtual environment, we mostly see **loop** in this list. These are **loopback devices**, which are **virtual devices** and mean that all our data is stored on a different system. 

### Stream Editor (sed)

The name **sed** is short for **stream editor**. The `sed` command is most often used for finding and replacing, or searching and deleting. The sed utility in its simplest form works like `grep`, where we can use regular expressions to find a string in a file. It can also provide a substitute string for the string we found, or delete every line that matches a string.

#### find a string in a file
`sed -n '/text/p' sample.txt`

* The `-n` means it will only produce output when explicitly told to via the `p` command (the default is to print each line)
* The `'/text/p'` tells it to print out lines that have the word `text` in them.

#### delete underscores in a file (replace the `_` with nothing)
`sed 's%_%%g' myfile.txt`

* `sed` is the command.
* `'s%_%%g'` is the pattern where:
    * **s** means **substitute**.
    * The **%** is the **delimiter** (we can use any characters here).
    * The **search pattern** follows the first **%**.
    * The **replace pattern** follows the second **%**.
    * The **g** stands for **global replace** - replace all occurrences in the file.
    
#### delete underscores in a file (replace the `_` with nothing) and send the output to a file instead of the console
`sed 's%_%%g' myfile.txt > newfile.txt`

#### delete underscores in a file (replace the `_` with nothing) and save it in the original file
`sed -i 's%_%%g' myfile.txt`

#### Change the name of one of the characters
`sed 's%Christopher%Chris%g' myfile.txt`

#### deleting lines
`sed '45,54d' myfile.txt`

* The `'45,54'` represents lines from 45 to 54. The comma indicates a range.
* The **d** lets `sed` know it will be a deletion

#### deleting all blank lines
`sed '/^$/d' myfile.txt`

* **^** indicates the beginning of the line 
* **$** indicates the end of the line 

(Since there is nothing between the two, it means a blank line.)

#### deleting from a specific line to the end of the file
`sed '/So I tried/,$d' myfile.txt`

#### deleting everything after a specific line and save it “in place”
`sed -i '1,/So I tried/!d' myfile.txt`

* The `-i` command line option means edit in place
* The `1,/So I tried/` specifies a range from the first line up to and including “So I tried”.
* The **!d** means things in this range will not be deleted but everything else will

### awk

The **awk** command provides a way to search for a pattern, and perform actions on the found text. `awk` reads lines from the input file one at a time. The line is scanned for each pattern in the program and if there is a match the associated action is executed.

Either the pattern or the action can be omitted, but not both.
* If the pattern is omitted, then the action is performed for every line.
* If the action is omitted, then all lines that match the pattern will be printed out.

A few built in AWK variables are:

* `$0` The entire line - not including the newline at the end
* `$1..$n` The fields in a line (delimited by the field separator)
* `FNR` Current line number - just spans the current file
* `FS` Field separator - default is space
* `NF` Number of fields
* `NR` Current line number - spans multiple files
* `RS` Record separator - the default is newline


AWK special patterns are:
* `BEGIN` Startup actions
* `END` Cleanup actions

#### count the number of occurrences of the word “Pooh” in the text
`awk '/Pooh/{x++}END{print x}' myfile.txt`

* `awk` is the command
* `/Pooh/` is the pattern
* `{x++}END{print x}` is the action

#### number of words in a file
`awk '{ total = total + NF }; END {print total}' myfile.txt`

#### number of lines in a file
`awk 'END{print NR}' myfile.txt`

#### pull out items in field separated data (such as csv)
`awk -F"," '{if (NR!=1){ print $1 " wrong answers " $5 " out of total " $8 }}' data.csv`

* The option `-F` sets the field separator, the default is space. 
* `$` specifies which column is required (such as `$5`)

#### print out just the second column of the file
`awk -F"," '{print $2}' data.csv`

#### Print out the number of answered assessments for “Jane Smith”
`awk -F"," '/Jane Smith/{print $6}' data.csv`

#### Create a lower case version of the file testupper.txt called testlower.txt
`awk '{print tolower($0)}' < testupper.txt > testlower.txt`

### Soft and Hard links

**Hard links** can only be created for regular files, not for directories or special files. They cannot span multiple file systems.

#### Create a hard link to myfile.txt using `ln`
`ln myfile.txt test/myfile.txt`

* This will create a new hard link of myfile.txt to test/myfile.txt. On modifying test/myfile.txt file, we will notice that the changes also appear on the original myfile.txt file as both are the same files by nature.


* The same is true for changing permissions on one of the file names, it will change them for both.

A **soft link** is a special file that points to an existing file. Soft links are also called symbolic links and they can span multiple file systems. The `ln` command with the `-s` argument will create a symbolic link.

#### Create a soft link to myfile.txt
`ln -s myfile.txt myfilesym.txt`

The **soft links** and **hard links** seemingly work the same. The one drawback with soft links is that if the original file is deleted, the soft link is broken and we are left with a dangling link.

A hard link is an additional name for the original file that references to the target file through inode (index node). On the other hand, Soft link is different from the original file and is an alternative for it, but it does not use inode. A hard link remains valid even if the target file is deleted.

* The **inode** number refers to the data structure that stores all the file information (metadata) other than the name. The number is assigned when a file is created.

### Other Commands

####  Lists open files
`lsof`

#### displays all processes running on the system, along with their command line arguments
`htop`

#### sleep for 30 seconds
`sleep 30`

#### view a list all the possible signals available with the kill command
`kill -l`

* All processes should contain code that will handle a kill signal, these are called **signal handlers**. A typical action of a signal handler might be to delete temporary files or prompt to save changes. The exceptions to this are **SIGKILL** and **SIGSTOP** which are immediate and cannot be handled, ignored or blocked.


* If we do not specify a signal to the **kill** (or **pkill**) command, the default is **SIGTERM** which allows the application to clean up before quitting.


* The most commonly used signals are:
    * **SIGINT** - has a value `2` and is similar to pressing `Ctrl+C`, it may be ignored by the process
    * **SIGHUP** - has a value `1`. This is a **hang-up signal** and is used to report that the user’s terminal is disconnected
    * **SIGQUIT** - has a value `3` and is similar to **SIGINT** but with the ability to produce a core dump
    * **SIGKILL** - has a value `9`. This forces the process to terminate immediately and cannot be ignored
    * **SIGTERM** - has a value `13`. This gives the application time to shutdown gracefully and may be ignored

#### Pause the running jobs
Use `CTRL-Z` 


#### filters out the “Permission Denied” errors
For example, on executing this command: `find / -name "*.conf" 2>/dev/null`, the **2>/dev/null** at the end of the command filters out the “Permission Denied” errors. The `2` means **stderr** and the `/dev/null` is the **null device** which means throw it away.

#### Add a user
`sudo useradd testname`

* The added new user can be seen in `/etc/passwd` configuration file. Files that are copied over to a new user’s home directory are stored in `/etc/skel` (for skeleton). They are copied over the `/etc/profile` directory for the new user.

* The file **.bash_profile** is read and executed when Bash is invoked as an interactive login shell. The **.bashrc** file is executed for an interactive non-login shell.

* The **adduser** command wraps the **useradd** functionality with a script and adds other useful features we would want when adding a user such as prompting us for a password and other details and creating a directory for the user in the home directory.


#### Setup a password
`sudo passwrd testname`

* Encrypted passwords are stored in the file `/etc/shadow`. Account expiration information is also stored in that file.

#### switch to different user
`su testname`

#### Delete a user
`sudo userdel testname`

#### displays a list of currently logged in users and time of login
`who`

#### displays information about the users currently on a machine, and the processes they are running
`w`

#### displays user and group ids
`id`

#### Create a new Group
`sudo groupadd mygroup`

#### Add users to a group
`sudo groupadd mygroup`

#### Create a new Group
`sudo adduser test1 mygroup`

#### View the group information
`cat /etc/group`

* Group information is stored in the `/etc/group` file. Newest additions are at the bottom.

#### Delete a user from a group
`sudo deluser test3 mygroup`

#### Change a group’s name
`sudo groupmod -n mynewgroup mygroup`

#### Delete a group
`sudo groupdel mynewgroup`

#### Default Permissions - umask
The **umask** command sets the default file permissions for newly created files in our current session.

`umask u=rw,g=r,o=r`

#### view the umask value in symbolic form
`umask -S`

#### view attributes of a file
`lsattr`

#### change attributes of a file
`sudo chattr +i myfile`

* `chattr`, or **change attribute** is mostly used to keep files secure and prevent them from being deleted accidentally or overwritten. The `chattr` command can only be run by super users and therefore must be preceded by the command `sudo`. These file attributes are stored in a file’s metadata properties.


* Few attributes of `chattr` are:
    * `a` - File can only be opened in append mode
    * `i` - Make a file immutable (uneditable)
    * `S` - File will be synchronously updated on the disk
    * `u` - Makes it possible to undelete the file
    
    
* Use a `+` to add to the existing attributes of the files, a `-` to remove an attribute, and an `=` to overwrite the existing attributes.


#### setup users and a group
```
sudo useradd janedoc
sudo useradd joedoc
sudo groupadd documentation
sudo adduser janedoc documentation
sudo adduser joedoc documentation
```

#### view the groups
`cat /etc/group`

#### get ACL entries for file
`getfacl file1.txt`

* An **Access Control List** (**ACL**) allows us to apply specific privileges to files and directories without needing to change owners or groups. It is a way to specify privileges for people/groups who are not the owner.


#### Give the group documentation all access to file1.txt
`setfacl -m "g:documentation:rwx" file1.txt`

#### Remove the ACL settings 
`setfacl -b file1.txt`

#### Change the owner of file
`sudo chown user1 file1.txt`

`sudo chown -R user2 mydir`

#### Change the group owner of a file or folder
`sudo chgrp group1 file1.txt`

#### Set the SUID for the file
`chmod u+s file1.sh`

* **Special permissions** are a level above the user, group, anyone permissions.


* **SUID** stands for **set owner user id**. A file with an SUID permission will always execute as if run by the user who owns the file. 


* For example, `/usr/bin/passwd` has an SUID permission set by default. There's an `s` in place of an `x` in the execute permission of the user and the file name has a `red background`. This is important because users need to be able to change their own passwords, and with this bit set they don’t have to be the root user to do so.


#### Set the SGID for the file
`chmod g+s file1.sh`

* This is similar to the **SUID** - a file with an **SGID permission** will always execute as if run by the group who owns the file. In some versions of UNIX, this setting is called **GUID**. If this permission is set on a directory, any files created in the directory will have their group ownership set to that of the owner of the directory


* For example, `/usr/bin/crontab` or the crontab executable file has the **SGID** permission.


* There is a capitol `S` in the place of the execution privileges for the group. This is because we set the **SGID** and there were **no execution privileges** for the group in place. To fix this, execute `chmod g+x file1.sh`


* There's another special permission called the **sticky bit permission**, which applies to directories only. When it is set, only the owner of a file and the root user can delete a file in that directory. This setting is particularly useful for shared directories. For example, the `/tmp` directory.

## Process States

In Linux, there are five possible states a process may be in:

<img src='imgs/process_states.png' alt='process_states' width=500 height=500>

Processes always start off in the **Running** or **Runnable state**. After it starts it might change states to one of the **sleeping states** if it needs to wait for resources or signals.

### Running or Runnable R
When the process is running, it is using the CPU to execute instructions. It can also be in the **runnable state** which means it is in the scheduling queue for using the CPU.

### Uninterruptible Sleep D and Interruptable Sleep S
A process which needs to wait for resources such as IO or data or network requests will enter sleep mode. This allows other processes to use the CPU while the process waits for the resources. The difference between the two types of sleep states is that the **Interruptable Sleep** state will react to both the availability of the resources it needs or to signals. The **Uninterruptible Sleep** state only reacts to the availability of the resources.

### Stopped T
A process will enter the **stopped state** when it receives either the **SIGSTOP** or **SIGTSTP** signals. The **SIGSTOP** is not ignorable but a process can choose to ignore the **SIGTSTP** signal. The **SIGCONT** signal returns the process to a running state.

### Zombie Z
When a process is terminated or has completed, it sends a **SIGCHLD** signal to the parent process and enters a **Zombie state**. The parent process is responsible for clearing the process from the process table.

## Types of processes

There are two types of computer processes: 
* Foreground processes
* Background processes

#### Foreground processes
A **foreground process** is different from a **background process** in 2 ways:
1. Some foreground processes show an interface, through which the user can interact with the program. In Linux, the command line interface (CLI) is that user interface.


2. The user must wait for one foreground process to complete before running another one.

#### Background processes
Unlike with a foreground process, the shell does not have to wait for a background process to end before it can run more processes. We can enter as many background commands one after another within the limit of the amount of memory available.

#### Example: Using `&` ampersand
For instance, after executing `sleep 30`, we cannot type another command in until the `sleep 30` command is
complete. 

However, if we execute `sleep 30 & ps`, the **& ampersand** specifies that the command should run in the background and we have access to the command prompt right away. The `ps` command will list the running processes and we can see that `sleep` is one of them.

## Job versus Process

A **process** is a program that is running on our computer. For example, the command `ls` is a process.

Sometimes we may combine two processes in one command line by **piping** the output of one to the input of the next. These combined processes are called a **job**. 

For example: `grep abc sample.txt | sort &` shows us a sorted list of all the lines that contain the word
`abc` in the file `sample.txt`. The `grep` command returns every occurrence of the word `abc` and that is piped to the `sort` command which returns a sorted list. The two processes in this list are `grep` and `sort` but they are run as one job. The `ampersand` at the end causes this job to run in the background and it prints a notification that the job is done at the end.

Linux provides the following commands to control running jobs:
* **bg** - Resume specified job in the background
* **fg** - Resume specified job in the foreground
* **jobs** - List all current jobs
* **kill** - Can be used to kill a job or a process
* **wait** - Wait until the specified job is complete
* **disown** - Removes specified job from the table of active jobs

#### The `bg` Command
This is useful for activities that might be tying up our interface for too long. We can suspend a process using `Ctrl-Z` and then send it to the background. If we don’t specify a job number to the `bg` command, it defaults to working on the most recently stopped process.
`(sleep 30; printf "I am awake\n")`
* Type `Ctrl- Z` to suspend execution of the job.
* Type `jobs` to view the status
* Type `bg %1` to resume the job
* Type `jobs` to see that it is running again, in the background

#### The `fg` Command
We can use the `fg` command to move a job to the foreground, it will operate on the most recently backgrounded process if we don’t specify a job number. Start the following job in the background:
`COUNTER=0; while true; do printf "Hello World %d\n" $COUNTER >> temp; sleep 1; let COUNTER++; done &`

This command first initializes a `COUNTER` to `0`, then sets up a forever loop while `true`, it prints `Hello World` followed by the value of `COUNTER` in the current iteration, then it `sleeps` for a `second` (to slow it down) and increments `COUNTER`. The `done` marks the end of the loop.

* Type `cat temp` at any time to see how many iterations it has gone through.
* Type `fg` at the command line to see that it ties up the interface when it runs in the command line. At any point, we can type `CTRL-Z` to stop execution and then `cat temp` to see how many more iterations it went through.
* Type `jobs` to see that job is stopped in the queue.

#### The kill command for jobs
The `kill` command can be used to kill a job by using the `%` key before the job number. For example, `kill %1`

#### The `wait` command
This command will suspend script execution until all jobs running in background have terminated, or if a specific job or process number is specified, it will wait until that terminates. Returns the exit status of waited-for command.

We can use the `wait` command to prevent a script from running before a background job finishes executing.
* Start this process in the background, it creates a file with the numbers `1` to `20`:
```for i in `seq 1 20`; do echo $i >> temp3; sleep 1; done &```
* If we enter `cat temp3` right away, we’ll see that the first job hasn’t finished and all the numbers aren’t there.
* Now after entering the command `wait %1; cat temp3`, the file won’t list out until the first job is complete. All the numbers will be there.

#### The `disown `command
The `disown` command removes a job from the table of active jobs. If we don’t specify a job number, it will disown the most recently launched job. Paste the commands below to set up four jobs running in the background: 
```
sleep 100 &
sleep 100 &
sleep 100 &
sleep 100 &
jobs
```
On executing `disown %2` and then `jobs`, we will observe that jobs 1, 3 and 4 still remains and job 2 goes away. Typing `disown` again (this time without a job number), will disown the last job run, thus the remaining jobs are 1 and 3.


### Sending a signal to a running job
#### The `CTRL-Z` command
The `CTRL-Z` command is used to suspend a foreground process. We need to suspend a foreground job in order to have access to the command line. Once we have access to the terminal, we can do other things such as send the job to the background `bg` or kill it. The `CTRL-Z` command does not work on background processes.

#### The `CTRL-C` command
We can use `CTRL-C` to kill a foreground process. To kill a background job, we must first bring it into the foreground `fg` and then type `CTRL-C`. It is simpler to use the `kill` command on the process. We can get a list of the running jobs with the `jobs` command.

#### Example: Compare the outcomes of these commands
Run this command: ```for i in `seq 1 100`; do echo $i; sleep 1; done```

This sets up a loop that runs from `1` to `100`, then it `sleeps` for a `second` (to slow it down). The `done` marks the end of the loop. While this is running, type `CTRL-C` and then `jobs` to see which jobs are running. We will not see any `jobs` in the list.

Now, run it again: ```for i in `seq 1 100`; do echo $i; sleep 1; done```. While this is running, type `CTRL-Z` and then `jobs` to see which jobs are running. With `CTRL-Z`, the job is still there but stopped.

#### The `CTRL-D` command
The `CTRL-D` command signals the end of input. We can use it to let a command know there will be no more input coming.


### Setting process priority

The **nice** and **renice** commands are used to set and change the priority of a Linux process. The term `nice` refers to the fact that high priority tasks are less nice because they don’t share resources as well as low priority tasks.
Priority values range from **-20** (**highest **) to **19** (**lowest priority**).

Using the `-l` flag with the `ps` command will list the priority of jobs, the priority is in the **NI column**. We can also see the `nice` values in the NI column when we use the `top` command.

#### set the priority of a process before it is started
`nice -n 19 sleep 100 &`

* The `renice` is used set the priority of an already running process.

## Filesystem Hierarchy Standard (FHS)

The **Filesystem Hierarchy Standard** ensures that software packages running on a Linux system will know where to find essential files and directories.

To view the directories of the root type in:
```bash
cd /
ls
```
The contents of the root directory include:
* **/bin** : Binaries or executables that are essential for functionality
* **/boot** : Files needed to boot the system such as the Linux kernel
* **/dev** : Device files - interface with hardware drivers
* **/etc** : Host-specific system configuration - editable text
* **/home** : User directories live under here
* **/lib** : Common libraries
* **/lib64** : Common 64 bit libraries
* **/media** : Mount point for removable media
* **/mnt** : Mount point for mounting a filesystem temporarily
* **/opt** : Optional add on software
* **/proc** : Keeps track of running processes
* **/root** : Home directory for root user
* **/run** : Data relevant to running processes
* **/sbin** : System binaries or executables that are essential for functionality
* **/srv** : Data for services provided by this system
* **/sys** : A symbolic link to the kernel source tree
* **/temp** : Temporary files that won't be persistent between reboots
* **/var** : Variable files - things that will change as the operating system is being run such as logs and cache files


The **/bin directory** contains these commands:

| Command | Description |
|---|---|
|cat |Concatenate files to standard output|
|chgrp| Change file group ownership|
|chmod| Change file access permissions|
|chown| Change file owner and group|
|cp| Copy files and directories|
|date| Print or set the system data and time|
|dd| Convert and copy a file|
|df |Report filesystem disk space usage|
|dmesg |Print or control the kernel message buffer|
|echo |Display a line of text|
|false |Do nothing, unsuccessfully|
|hostname |Show or set the system’s host name|
| kill | Send signals to processes|
|ln |Make links between files|
|login |Begin a session on the system|
|ls |List directory contents|
|mkdir |Make directories|
|mknod |Make block or character special files|
|more |Page through text|
|mount |Mount a filesystem|
|mv |Move/rename files|
|ps |Report process status|
|pwd |Print name of current working directory|
|rm |Remove files or directories|

The **/boot directory** contains everything required for the boot process. The exceptions are **configuration files** not needed at boot time and the **map installer**. The operating system kernel must be located in **/** or **/boot**.

The **/dev directory** contains special or device files. These might also contain some **symbolic links** (**symlink**). 

* A **symlink** is a type of file in Linux that points to a different file or folder. Symlinks allow multiple access points to a file without needing multiple copies.

The **/etc directory** contains all system related configuration files. **Configuration files** are editable text files, executable files should not be placed in this directory. The configuration files should be placed in subdirectories of the `/etc` folder grouped by the application they serve.

The **/home directory** contains user directories.

The **/lib directory** contains essential shared libraries and kernel modules. The `/lib` directory contains shared library images needed to boot the system and run the commands in `/bin` and `/sbin`.

The **/media directory** is used for removable media such as USB drives and CD ROMS. This is typically used by the system. 
The **/mnt directory** is used for temporarily mounted file systems, mostly for user mounted items.
External devices are **mounted** to the Linux files system at `/media` or `/mnt`.

The **/opt directory** is reserved for the installation of add-on application software packages. 

The **/proc directory** or more often referred to as filesystem is built every time the system starts and it contains information about currently running processes, hardware and memory management. It represents the current state of the kernel.

The **/run directory** contains system information about the system since it was booted. This directory must be cleared at the beginning of the boot process. This directory contains **.pid** files or **Processor identifier files** (**PID**).

* A **PID** file consists of a process identifier in ASCII-encoded decimal, followed by a newline character.

The **/sbin directory** is for system binaries. Binary files are executable programs and may also be referred to as **commands**. These binary files are essential for booting, restoring, recovering, and repairing the system. The `/sbin` directory must not contain any subdirectories. At the very least the `/sbin` directory must contain the shutdown command.

* The **/bin directory** contains binaries (commands) that are for users as well as items needed to bring the system up or repair it. The **/sbin directory** contains binaries that the system uses for booting up. These are generally not run by users, you need sudo privileges to be able to run them.

The **/srv directory** is used for data for services provided by the system.

The **/sys directory** is a virtual file system where we can find information about devices, drivers, and other kernel components.

The **/tmp directory** may be used by applications to store temporary files, files that an application does not expect to remain after it stops running. It is recommended (but not required) that the files in the `/tmp` are deleted whenever the
system is rebooted.

The data in the **/usr directory** is read only. It is for user runnable programs and user accessible data is located. The required directories, or symbolic links to directories in `/usr` are:

| Command | Description |
|---|---|
|bin | Most user commands|
|lib| Libraries|
|local| Local hierarchy (empty after main installation)|
|sbin| Non-vital system binaries|
|share| Architecture-independent|

The optional directories, or symbolic links to directories in `/usr` are:

| Command | Description |
|---|---|
|games | Games and educational binaries|
|include| Header files included by C programs|
|lib <qual>| Alternate Format Libraries|
|libexec| Binaries run by other programs|
|src| Source code|
 
* The **/bin directory** contains executable commands that are required by the system and **/usr/bin** contains executable files that are not required.
 
* One of the useful commands in the **/usr/bin directory** is **whereis**. Runing `whereis python3` shows all the locations of python3 related files.
  
* The **/sbin directory** holds commands needed to boot the system. The **/usr/sbin direcory** contains program binaries for system administration which are not essential for the boot process.
    

The **/usr/local directory** is for use by the system administrator when installing software locally. The following directories, or symbolic links to directories, must be in `/usr/local`:
   
| Directory| Description |
|---|---|
| bin | Local binaries |
| etc | Host-specific system configuration for local binaries |
| games | Local game binaries |
| include | Local C header files |
| lib | Local libraries |
| man | Local online manuals |
| sbin | Local system binaries |
| share | Local architecture-independent hierarchy |
| src | Local source code |
    
    
The **/var** or **variable data** hierarchy contains files to which the system writes data during the course of its operation. The following directories, or symbolic links to directories, are required in `/var`:
    
| Directory | Description |
|---|---|
| cache | Application cache data |
| lib | Variable state information |
| local | Variable data for /usr/local |
| lock | Lock files |
| log | Log files and directories |
| opt | Variable data for /opt |
| run | Data relevant to running processes |
| spool | Application spool data |
| tmp | Temporary files preserved between system reboots |

## System Services

**System services** are processes that continuously run in the background, waiting for requests to come in.

In Unix, **init** is the first process that starts, and it starts up other processes.

For the most part, Linux is UNIX-like or UNIX-compatible. With only a few exceptions, the Linux and UNIX systems are very similar and it is easy to move between the two. Linux’s use of **systemd** instead of `init` is one of the few exceptions to this.

### init
**init** starts the machine in one of the **7 run levels** (from `0` to `6`) which indicate machine state.
Example of standard Linux run levels:
* `0` – Shut down
* `1` – Single user mode
* `3` – Multiple user mode with command line interface
* `5` – Multiple user mode under GUI
* `6` – Reboot/Restart
* Run level `5` is the standard run level for most of the LINUX based systems.

### systemd
In Linux, **systemd** also known as **daemon** is the system and service manager for Linux systems. A **daemon** is a process that runs in the background. After the Linux kernel is booted, `systemd` is activated to manage the user space components know as a **unit**. The `systemd` tools are used to start, stop, enable and disable services and retrieve status.

This move away from `init` was and is a highly debated move away from the Unix-based approach. Many disliked how bloated and inter-connected `systemd` was which is in direct opposition to the Unix philosophy of do one thing well. The uncharacteristically large and interconnected code base for `systemd` causes concerns for both reliability and security. Others embraced it as a fix to the existing, Unix-inspired `init` solution since it addressed many long-standing issues.

Run the `top` command, we will see that `systemd` is the first process and it has a PID of `1`.

The **systemctl** program is the tool used to manage `systemd`. The `systemctl` allows us to manage services, check statuses and change system states.

### Systemctl commands

#### Show system status
`systemctl status`

#### Show all unit files
`systemctl list-unit-files`

#### Show the status of a particular service
`systemctl status cron.service`

### Install a service
The **sudo apt update** command updates a list with all the latest versions of packages. For example:
```
sudo apt update
sudo apt install mariadb-server
```

#### View service status
`systemctl status mariadb.service`

#### Activate a service
`sudo systemctl start mariadb.service`

#### Deactivate a service immediately
`sudo systemctl stop mariadb.service`

#### Restart a service
`sudo systemctl restart mariadb.service`

#### Enables a service to be started on bootup
`sudo systemctl enable mariadb.service`

#### Disables a service from starting on bootup
`sudo systemctl disable mariadb.service`

#### Mask a service so it can’t be started
`sudo systemctl mask mariadb.service`

### Scheduling Services

* The **cron** daemon is a Linux command that can be used to schedule tasks on our computer. The **crontab** (**cron table**) contains information about the date and time cron should run something. We can create a cron table using the command `crontab -e`, and will be presented with options for editors.


* The **at** utility can be used for scheduling one off tasks. It does not output to the console when it is run, it is meant for tasks we might run when we are not logged in. The time and date syntax is very flexible but the time must come before the day. The **current day** is the default if day is not specified. The following words are recognized: `now`, `midnight`, `noon`, `teatime (4 PM)`, `AM`, `PM`. We use `24 hour time` as well. We can also use the following words to designate a time relative to `now` (using the **+** sign): `minutes`, `hours`, `days`, `weeks`, `months`, `years`. For example: `at now +1 minute -f timed.sh`

## Shell Scripting

The `.sh` extension is a convention used to indicate that the file is a shell script.

A **shell script** is an executable text file in which the first line usually has the form of an interpreter directive. The **interpreter directive** is also known as a `shebang directive`, and has the following form:
```linux
#!interpreter [optional argument]
```
**Interpreter** is an absolute path to an `executable program`, and the **optional argument** is a string representing a `single argument`.

Shell scripts are scripts that invoke a shell program. For example:
* `#!/bin/sh` invokes the Bourne shell or other compatible shell program, from the bin directory.
* `#!/bin/bash` aka the **bash shebang** invokes the Bash shell. 

**Shebang directives** aren't limited to shell programs. For example, we could create a python script with the following directive:

*`#!/usr/bin/env python3`

**NOTE**: While running a script, we need a **./** before the file name. This ensures that the current directory is included in the **Path** such that the linux finds the executable file.