MiniShell is a Unix-like shell implemented in the C programming language by os-moussao and awbx. It draws inspiration from the Bash Unix shell.
$> false && echo success || echo failure
$> cat /etc/passwd | cut -d':' -f1 | sort | uniq # get unique users
$> cat << "EOF" > outfile ; < infile cat >> outfile
$> cd / ; (cd bin && ./ls -G .); pwd # The current working directory is still "/"
echo # Write arguments to the standard output.
cd # Change the shell working directory.
pwd # Print the name of the current working directory.
export # Set export attribute for shell variables.
unset # Unset values and attributes of shell variables.
env # Set environment and execute command, or print environment.
exit # Exit the shell.
Before accepting any input, MiniShell handles Ctrl-C, Ctrl-D, Ctrl-Z, and Ctrl-\ signals, mimicking the behavior of Bash. Once initialized, the program continuously prompts for a new command line input. The command line goes through several stages before execution and displaying the result.
First, it passes through the lexical analyzer, which divides the input into a "valid" list of tokens. Then, the parser uses this token list to generate an abstract syntax tree. Finally, the executor recursively traverses this tree and executes each node accordingly.
The lexical analyzer, or lexer, is the initial stage of parsing the command line input. It consists of three important steps:
- Tokenizer: Converts the command line into an "initial" list of tokens, preserving all details of the
cmdline
string. The tokenizer does not check for syntax errors within thecmdline
; this task is left to the syntax analyzer. - Syntax Analyzer: Iterates through the doubly linked list of tokens, checking for syntax errors at each node. It examines the left and right contexts of each node, looking for unexpected or incorrect tokens. If any errors are encountered, they are printed to
stderr
, and a falsy value is returned to the lexer function. See syntax rules for more details. - Expander: If the syntax analyzer does not encounter any errors, the token list is passed to an expander, which removes quotes and whitespace, expands tilde and wildcard patterns, and performs variable expansion during execution.
The syntax analyzer follows the following rules:
* AND, OR, PIPE, FG, BG:
- left: [WSPACE] (STRING | CPAR)
- right: [WSPACE] (STRING | REDIRECT | OPAR | if <FG, BG> ENDOFCMD)
* OPAR "(":
- left: CMDBEGIN | [WSPACE] (AND | OR | PIPE | OPAR)
- right: [WSPACE] (STRING | REDIRECT | OPAR)
* CPAR ")":
- left: [WSPACE] (STRING | CPAR)
- right: [WSPACE] (AND | OR | PIPE | CPAR | ENDOFCMD)
* REDIRECT:
- right: [WSPACE] STRING
* PARENTHESES MATCHING AND QUOTING:
- Inside each pair of parentheses should not be an empty command.
- Every open parenthesis must have a matching closing parenthesis.
- Every single or double quote must be closed.
// Inside the lexer function
tokens = tokenizer(cmdline);
if (!validate_syntax(tokens))
{
set_status(2);
return (NULL);
}
tokens = expander(tokens);
The parser in MiniShell uses a recursive descent parsing algorithm. Its purpose is to traverse the list of tokens produced by the lexer and validate each recognized rule defined in the grammar.
The grammar for the MiniShell, written in Extended Backus–Naur form, is as follows:
<block> ::= <pipeline> {("&&" | "||") <pipeline>}
<pipeline> ::= <command> {"|" <command>}
<command> ::= <cmdlist>
| "(" <cmdline> ")" <redirect>
<cmdlist> ::= <cmd> {<redirection>}
<cmd> ::= <filename> | <builtin>
| <filename> <arglist>
| <builtin> <arglist>
| <filename> <arglist> <redirection>
<redirection> ::= "<" <filename>
| "<<" <filename>
| ">" <filename>
| ">>" <filename>
<arglist> ::= <arg> {<arg>}
<arg> ::= <filename> | <string>
<filename> ::= <string>
<builtin> ::= "cd" | "pwd" | "export" | "unset" | "env" | "echo" | "exit"
The parser uses recursive functions to match each rule in the grammar. If a match is found, the parser generates an abstract syntax tree (AST) representing the command line input.
The executor in MiniShell is responsible for executing the commands represented by the abstract syntax tree (AST) generated by the parser. It traverses the AST recursively and performs the necessary actions for each node.
The executor differentiates between different types of nodes: command nodes, pipeline nodes, and block nodes. For each node, it handles the execution, I/O redirection, piping, and control flow.
The executor relies on system calls such as fork
, execvp
, dup2
, close
, and waitpid
to execute commands and manage I/O redirection and piping.