Learned from: https://brennan.io/2015/01/16/write-a-shell-in-c/
My own notes for future reference (via Notion)
Going to be adding more functionality, thinking of using it to help with AI development
- Initialize: Reads and executes config files
- Interpret: Reads and executes interactive commands or files
- Terminate: Executes shutdown commands, frees memory, then terminates
int main(int argc, char **argv){
//load config files
//run command loop
shell_loop();
//shutdown/terminate
return EXIT_SUCCESS
}- Read: Read command line input
- Parse: Separate command string into program and arguments
- Execute: Run the parsed command
void shell_loop(void){
char *line; //* lets us allocate required space for the line
char **args; //uses ** because it will hold the address of the args list
int status;
do{
print("> ");
line = read_line();
args = split_line(line);
status = execute(args);
free(line)
free(args)
} while(status);
}- We do not know how much text a user will enter, so we start with a block and allocate more if needed
- Because EOF (end of file) is an int, we store each character as an int
- If we reach EOF (either is a newline or end) we terminate the current string and return it
- Else, add the current character to the string
- Check if the next character will exceed our buffer size, we reallocate the buffer
int bufsize = SHELL_BUFSIZE; //constant prob
int position = 0;
char *buffer = (malloc(sizeof(char) * bufsize);
int c;
if(!buffer){
fprintf(stderr, "shell: memory allocation error");
exit(EXIT_FAILURE);
}BUFSIZEis a constant that would probably be 1024 (can store up to 1024 chars)positiontracks the index of where the next character will be placed and how much of the buffer is filledbufferis a pointer that stores the starting address of the memory block- It asks the OS for a block of memory large enough to hold
bufsizenumber of chars - We used
mallocinstead ofcallocbecause calloc initializes each element to 0, which is unnecessary for this since we will always be rewriting memory
- It asks the OS for a block of memory large enough to hold
if(!buffer)is just to ensure there aren’t any crashes if the system we are using isn’t allocating memory properly
while(1){
c = getchar();
if (c == EOF || c == '\n'){
buffer[position] = '\0';
return buffer;
} else {
buffer[position] = c;
}
position++;
}getchar()fetches next character in input\0represents a null characterwhile(1)runs buffer population infinitely until we hit a newline/EOF
if (position >= bufsize){
bufsize += SHELL_BUFSIZE;
buffer = realloc(buffer,bufsize);
if (!buffer){
fprintf(stderr,"shell: error reallocating memory");
exit(EXIT_FAILURE);
}bufsizeis expanded by adding the constant to itreallocis used to expand the existing memory block- It takes a pointer (
buffer) and a new size (bufsize) and expands the memory - If the space immediately following is not free, it will change addresses to a different free space, so it has to be reassigned to the original pointer
bufferto avoid any future issues
- It takes a pointer (
- We once again check for any issues to prevent any crashes
- We can use whitespace to separate args from each other
- We can use the library
strtokto tokenize the string with whitespace as the delimiterchar *strtok(char *str, const char *delim)strtok(NULL,delim)tells the program to move to the next split character instead of starting over
- The function is mostly similar to reading
#define LSH_TOK_BUFSIZE 64
#define LSH_TOK_DELIM " \t\r\n\a"
char **lsh_split_line(char *line)
{
int bufsize = LSH_TOK_BUFSIZE, position = 0;
char **tokens = malloc(bufsize * sizeof(char*));
char *token;
if (!tokens) {
fprintf(stderr, "lsh: allocation error\n");
exit(EXIT_FAILURE);
}- We use
**to indicate that we want to store a list of addresses- We don’t know what the size of each token will be, so we will just store a list of each token’s address
- We do
sizeof(char*)to get the size of a char address
token = strtok(line, LSH_TOK_DELIM);
while (token != NULL) {
tokens[position] = token;
position++;
if (position >= bufsize) {
bufsize += LSH_TOK_BUFSIZE;
tokens = realloc(tokens, bufsize * sizeof(char*));
if (!tokens) {
fprintf(stderr, "lsh: allocation error\n");
exit(EXIT_FAILURE);
}
}
token = strtok(NULL, LSH_TOK_DELIM);
}
tokens[position] = NULL;
return tokens;
}- We use
strtokbefore the while loop to split up the line and get the first item of token - We do
realloc(tokens, bufsize * sizeof(char*)instead ofrealloc(tokens,bufsize)because the size of acharaddress is > 1 byte- In the read line function, we just do
realloc(tokens, bufsize)because the size of char is 1 byte so we can just enter the buffer size
- In the read line function, we just do
- We do
strtok(NULL,delim)to make token equal to the next item in the split list
- There are 2 ways of starting processes in Unix
- Init: Init is a program that runs when a Unix kernel is loaded, and it runs the whole time a computer is on. It manages running the rest of the processes that we will use
- Fork:
fork()is a system call- When called, it makes a duplicate of the process and has both run
- Original process is the parent, the duplicate is the child
- Fork returns 0 to the child and the process ID (PID) to the parent of its child
- Original process is the parent, the duplicate is the child
- The only way for any process to start is by an existing one duplicating itself
exec()is used to replace a currently running program with an new one- Exec stops the process running in child, loads up a new program, then starts a new process in its place
wait()can be used by the parent process to keep a tab on its children
int lsh_launch(char **args){
pid_t pid, wpid;
int status;
pid = fork();
if(pid == 0){
if(execvp(args[0],args) == -1){
perror("shell");
}
exit(EXIT_FAILURE);
}
....- We initialize a parent id
pidand work idwpid - The child process takes the first condition, where
pid == 0 execvp(program,array)is a variant of theexecsystem call- Different variants take in different parameters
- This is focused on having a vector ‘v’ (array) and a name ‘p,’ meaning the OS searches for the program in the path (many times we have to specify the path
- Exec will try to run the file with given command, but if it returns -1, we print an error message and exit
pidwill hold the value of the child processwpidwill hold the PID of the process that completed- There will be 2 processes, one where
pid= 0 (child) and one wherepid= some number, meaning it is the main process
} else if (pid < 0) {
perror("shell");
}
...- Checks if fork had an error, doesn’t exit
} else {
do{
wpid = waitpid(pid, &status, WUNTRACED);
} while (!WIFEXITED(status) && !WIFSIGNALED(status));
}
return 1
} //end- Parent process lands here
- Since child executes process, parent waits until it is done running
waitpid()is called by the parent process to track state changes a specified child process and retrieve information about the child’s termination statuspid_t waitpid(pid_t pid, int *status, int options);&statusis a pointer to an int which the kernel stores exit status information of a terminated child. It is is the output parameter forwaitpid()WUNTRACED(options) is a flag that tells the function to return also for children that have been stopped
- A process’ state can change in a variety of ways, so we use the macros to wait until the processes are exited or killed
WIFEXITEDreturns true if child process returned terminated normally viaexit()WIFSIGNALEDreturns true if child process was terminated by a signal that wasn’t caught, such as Ctrl+C, segfault, kill, etc.- Status will be a bitfield that these macros scan through
- Function then returns 1 to call the function to ask for input again
- Most commands a shell executes are programs, but not all
- For example cd cannot be its own program, else it would change its own directory then exit
- The shell process itself must execute cd so that it can change its own directory and then have child processes inherit the new directory
- If there was a separate program called exit, it wouldn’t be able to exit the shell that called it
- There are also built in configuration scripts that change how the shell operates
int shell_cd(char **args);
int shell_help(char **args);
int shell_exit(char **args);- It’s often important to do forward declarations (usually in .h files) to help avoid dependency issues when compiling
char *builtin_str[]{
"cd",
"help",
"exit"
};
int (*builtin_func[])(char**){
&shell_cd;
&shell_help;
&shell_exit;
};
int shell_num_builtins(){
return (sizeof(builtin_func) / sizeof(char *));
}builtin_stris an array of pointers which is why its declared so(*builtin_func[])breakdownbuiltin_funcis name of variable[]specifies that its an array(*___)the asterisk and surrounding parentheses indicates that it is an array of pointers- A lack of parentheses would just the address, but with it the command will get dereferenced
char**indicates that function takes in an array of strings, it is shorthand for:char **args(theres literally no difference)
- To calculate length, we get the size of the entire array then divide it by the size of a char memory address
int shell_cd(char **args){
if (args[1] == NULL){
fprintf(stderr, "Expected argument after \"cd\"\n");
} else {
if (chdir(args[1]) != 0){
perror("shell: cd failure");
}
}
return 1;
}
int shell_help(char **args){
//custom message
}
int shell_exit(char **args){
return 0;
}chdirtakes in a path and returns 0 if successful, which is why we check thatshell_exitjust returns 0 to terminate the process, as our shell runs while status == 1
- Function that launches a process or a builtin
- Checks if command equals a builtin, and if not it calls the launch function
int shell_execute(char **args){
if(args[0] == NULL){
return 1;
}
for (i = 0; i < shell_num_builtins(){
if (strcmp(args[0],builtin_str[i]) == 0){
return (*builtin_func[i])(args);
}
return lsh_launch(args);
}