In [19]:
%run -i ../python/common.py

# Executables

In this chapter we will explore what "native" binary programs are and begin our journey to learning how to create them though assembly programming.

This chapter follows two approaches to this material.  In the first part we take an self-guided discovery approach.  Here we use our knowledge and access to UNIX to follow our noses and poke around an executable to see what we can learn. In the second part of the chapter we take a more traditional textbook approach and present the conceptual model for how executables and processes relate to each other.


**The following chapter includes several manual page entries.  A reader is not expected to read these completely.  They are mainly hear to illustrate how we can learn about the detail and document the precise way we can look them up later when we need too.  In general you should skip the first few paragraphs.  If there details that you should pickup on now the text will point you to them**

## Processes and Executables

Perhaps the most basic thing we do on a computer is run programs.  As we have seen, on UNIX, one of the main purposes of the shell is to let us start and manage running programs -- Processes.  As a recap remember that when we type a command like `ls` into a shell, that is not a built-in command (eg. `ls`), the shell will look to see if a file, with a matching name, exists in the list of directories specified by the `PATH` envionment variable.  If one is found (eg `/bin/ls`), and it's meta data marks it as "executable", the shell process will make calls to the UNIX kernel to create a new child process and try and "run" the file within then new process.    

In [13]:
display(HTML(htmlFig(
    [
        [
#            {'src':"/files/work/UndertheCovers/underthecovers/images/Processes/Processes.003.png",
#             'caption':'A: Press Enter'
#             'border': '1px solid black',
#             'padding':'1px',
#             'cellwidth':'33.33%'
#            },
            {'src':"../images/Processes/Processes.004.png",
             'caption':'A: Bash calls kernel functions.'  
#             'border': '1px solid black',
#             'padding':'1px',
#             'cellwidth':'33.33%'
            },
         {'src':"../images/Processes/Processes.005.png",
             'caption':'B: Kernel runs the program in new process.'
#             'border': '1px solid black',
#             'padding':'1px',
#             'cellwidth':'33.33%'
           },
        ]
    ],
    id="fig:shell-blankline",
    caption="<center> Figure: Shell calls kernel functions, fork and exec, to create a new process and 'runs' the 'executable' withing it</center>"
)))

0,1
A: Bash calls kernel functions.,B: Kernel runs the program in new process.


As the figures state there are two basic kinds of files that the kernel knows how to "execute" within a process.  One is an ASCII file that has a special string at its beginning -- `#!<path of interpreter>` and the other is an **executable**.  The former is just a convient way to allow programs like the shell to automatically be started with the contents of the file passed to it as a script to interpret.  This makes it easy to write "scripts" that behave as if they where programs of their own.  When in reality they are being interpreted as commands to the "real" program specified on the first line of the file.  But the quesiton of course what exactly are real programs or **executables**.

## What's inside and executable

Lets explore the `/bin/ls` file using our UNIX skills to see what we can figure out.

### What does ls tell us about ls ;-)

In [29]:
TermShellCmd("ls -l /bin/ls", markdown=False)

$ ls -l /bin/ls
-rwxr-xr-x. 1 root root 142144 Sep  5  2019 [0m[01;32m/bin/ls[0m
$ 


Running using ls to list the meta data of the file `/bin/ls` we see that it contains a sizable number of bytes.  We also see that the permisions clearly mark it as being executable by all users of the system `-rwxr-x-rx` (if you don't remember how to read this output see `man ls`).

### Can we display its contents to the Terminal with `cat`?

We encourage you to open a terminal and give this a shot.  What happened?  Well remember that all bytes that are sent to terminal are interpreted by the terminal as ASCII encoded information.  It should be quickly apparent to you that whatever `/bin/ls` is it is NOT predoninatly ASCII encoded information!  Rather they bytes in it must be some other kind of binary represenation.  



### Lets look at the byte values of `/bin/ls` using `xxd`

So while the data in `/bin/ls` does not seem to be encoded in ASCII we can use other UNIX tools to translate the individual bytes of the file into a numeric ASCII value so that we can at least see what the values of the bytes of the file are.  There are seveal such tools we could use exmaples include: `od` (octal dump), `hexdump`, and `xxd`.  We will use `xxd`

In [33]:
TermShellCmd("man xxd", wait=False, markdown=False, noposttext=True)

$ man xxd
XXD(1)                      General Commands Manual                     XXD(1)

NAME
       xxd - make a hexdump or do the reverse.

SYNOPSIS
       xxd -h[elp]
       xxd [options] [infile [outfile]]
       xxd -r[evert] [options] [infile [outfile]]

DESCRIPTION
       xxd  creates a hex dump of a given file or standard input.  It can also
       convert a hex dump back to its original binary form.  Like  uuencode(1)
       and  uudecode(1)  it allows the transmission of binary data in a `mail-
       safe' ASCII representation, but has the advantage of decoding to  stan‐
       dard output.  Moreover, it can be used to perform binary file patching.

OPTIONS
       If  no infile is given, standard input is read.  If infile is specified
       as a `-' character, then input is taken from  standard  input.   If  no
       outfile is given (or a `-' character is in its place), results are sent
       to standard output.

       Note that a "lazy" parser is used which does not c

`xxd` conviently lets us look at the value of a file represent in base 2 binary digits or base 16 hexadecimal digits.   We will use the following command to display the first 256 bytes of the file in binary: `xxd -l 256 -g 1 -c 8 -b /bin/ls`

Where: 
 - `-l 256` is used to restrict ourselves to the first 80 bytes
 - `-g 1` is used to tell xxd to work on units/groups of single bytes
 - `-c 8` is used to print 8 units/groups per line
 - `-b` means display the values in base 2 (binary) notation

This causes `xxd` to open `/bin/ls` and read the first 256 bytes.  It examines the value of each byte read and translates it so that it produces a string of eight ASCII characters of either `0` or `1` depending on the value of the bits of the byte.  In this way we can use `xxd` to diplay the byte values of a file.  The left hand column of the output encodes the byte position in the file that the line of data correponds too.  These position values start at zero are in hexadecimal notation (eg. `00000010` is 16 in decimal).  On the far right of each line `xxd` prints an ASCII interpretation for any byte values that correspond to printable ASCII characters (otherwise it prints a `.`).

In [37]:
TermShellCmd("xxd -l 256 -g 1 -c 8 -b /bin/ls", wait=True, markdown=False, noposttext=True)

$ xxd -l 256 -g 1 -c 8 -b /bin/ls
00000000: 01111111 01000101 01001100 01000110 00000010 00000001 00000001 00000000  .ELF....
00000008: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  ........
00000010: 00000011 00000000 00111110 00000000 00000001 00000000 00000000 00000000  ..>.....
00000018: 11010000 01100111 00000000 00000000 00000000 00000000 00000000 00000000  .g......
00000020: 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  @.......
00000028: 11000000 00100011 00000010 00000000 00000000 00000000 00000000 00000000  .#......
00000030: 00000000 00000000 00000000 00000000 01000000 00000000 00111000 00000000  ....@.8.
00000038: 00001101 00000000 01000000 00000000 00011110 00000000 00011101 00000000  ..@.....
00000040: 00000110 00000000 00000000 00000000 00000100 00000000 00000000 00000000  ........
00000048: 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  @.......
00000050: 01000000 00000000 00000000 00000000 

Using hexadecimal notation we get more concise visual represention

In [38]:
TermShellCmd("xxd -l 256 -g 1 -c 8 /bin/ls", wait=True, markdown=False, noposttext=True)

$ xxd -l 256 -g 1 -c 8 /bin/ls
00000000: 7f 45 4c 46 02 01 01 00  .ELF....
00000008: 00 00 00 00 00 00 00 00  ........
00000010: 03 00 3e 00 01 00 00 00  ..>.....
00000018: d0 67 00 00 00 00 00 00  .g......
00000020: 40 00 00 00 00 00 00 00  @.......
00000028: c0 23 02 00 00 00 00 00  .#......
00000030: 00 00 00 00 40 00 38 00  ....@.8.
00000038: 0d 00 40 00 1e 00 1d 00  ..@.....
00000040: 06 00 00 00 04 00 00 00  ........
00000048: 40 00 00 00 00 00 00 00  @.......
00000050: 40 00 00 00 00 00 00 00  @.......
00000058: 40 00 00 00 00 00 00 00  @.......
00000060: d8 02 00 00 00 00 00 00  ........
00000068: d8 02 00 00 00 00 00 00  ........
00000070: 08 00 00 00 00 00 00 00  ........
00000078: 03 00 00 00 04 00 00 00  ........
00000080: 18 03 00 00 00 00 00 00  ........
00000088: 18 03 00 00 00 00 00 00  ........
00000090: 18 03 00 00 00 00 00 00  ........
00000098: 1c 00 00 00 00 00 00 00  ........
000000a0: 1c 00 00 00 00 00 00 00  ........
000000a8: 01 00 00 00 00 00 00 00  ........
0

So while it might look cool without knowing how to inprepret the byte values it really does not provide us much insight as to what makes this file a program that lists the contents of directories.  

### Using the UNIX `file` command on `/bin/ls`

While there are no explicit file types in UNIX that tell use what kind of information is in a file (we are expected to know) there is a command that is very good at examining a file and guessing what kind of information is encoded in the file based on a large database of test.  This command is called `file`.  Here is its manual page.

In [42]:
TermShellCmd("man file", wait=False, markdown=False, noposttext=True, tmout=2)

$ man file
FILE(1)                   BSD General Commands Manual                  FILE(1)

NAME
     file — determine file type

SYNOPSIS
     file [-bcdEhiklLNnprsSvzZ0] [--apple] [--extension] [--mime-encoding]
          [--mime-type] [-e testname] [-F separator] [-f namefile]
          [-m magicfiles] [-P name=value] file ...
     file -C [-m magicfiles]
     file [--help]

DESCRIPTION
     This manual page documents version 5.38 of the file command.

     file tests each argument in an attempt to classify it.  There are three
     sets of tests, performed in this order: filesystem tests, magic tests,
     and language tests.  The first test that succeeds causes the file type to
     be printed.

     The type printed will usually contain one of the words text (the file
     contains only printing characters and a few common control characters and
     is probably safe to read on an ASCII terminal), executable (the file con‐
     tains the result of compiling a program in a form und

Well let's see what `file` has to say about `/bin/ls`.

In [43]:
TermShellCmd("file /bin/ls", markdown=False, noposttext=True)

$ file /bin/ls
/bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=2f15ad836be3339dec0e2e6a3c637e08e48aacbd, for GNU/Linux 3.2.0, stripped



Ok cool!  File tells us `/bin/ls` is and **ELF** file.  You might have noticed that the xxd output showed the ASCII characters `ELF` near the begining of the file.  This is due to the fact that this is part of the `ELF` standard format to make recognition of them easier.  

So what exactly is an ELF file?  Lets see what the manuals have to say.

### ELF Files - Executable and Linking Format Files

In [44]:
TermShellCmd("man elf", wait=False, markdown=False, noposttext=True)

$ man elf
ELF(5)                     Linux Programmer's Manual                    ELF(5)

NAME
       elf - format of Executable and Linking Format (ELF) files

SYNOPSIS
       #include <elf.h>

DESCRIPTION
       The  header  file  <elf.h>  defines the format of ELF executable binary
       files.  Amongst these files are normal  executable  files,  relocatable
       object files, core files, and shared objects.

       An executable file using the ELF file format consists of an ELF header,
       followed by a program header table or a section header table, or  both.
       The  ELF  header  is  always  at  offset zero of the file.  The program
       header table and the section header table's offset in the file are  de‐
       fined  in the ELF header.  The two tables describe the rest of the par‐
       ticularities of the file.

       This header file describes the above mentioned headers as C  structures
       and  also includes structures for dynamic sections, relocation sec

Wow that's a lot of information that probably does not make much sense at this point.  However, it is nice to see that it seems to be a format for encodeing "executable" files ;-)

Now as it turns out there several tools such as `readelf` and `objdump` that we could learn about that ae designed for working with `elf` files.  But it is not clear that this is going to help that much unless we get a more conceptual understanding for they are used to encode a program for execution in a process.  

For your interest here is the output for `readelf --all /bin/ls` and `objdump --all /bin/ls` which dump summary information about the `/bin/ls` executable.   Also include `objdump -d` which is hints at what is coming.  

In [45]:
TermShellCmd("readelf --all /bin/ls", wait=False, markdown=False, noposttext=True, tmout=2)

$ readelf --all /bin/ls
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x67d0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          140224 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         30
  Section header string table index: 29

Section Heade

In [46]:
TermShellCmd("objdump --all /bin/ls", wait=False, markdown=False, noposttext=True, tmout=2)

$ objdump --all /bin/ls

/bin/ls:     file format elf64-x86-64
/bin/ls
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x00000000000067d0

Program Header:
    PHDR off    0x0000000000000040 vaddr 0x0000000000000040 paddr 0x0000000000000040 align 2**3
         filesz 0x00000000000002d8 memsz 0x00000000000002d8 flags r--
  INTERP off    0x0000000000000318 vaddr 0x0000000000000318 paddr 0x0000000000000318 align 2**0
         filesz 0x000000000000001c memsz 0x000000000000001c flags r--
    LOAD off    0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**12
         filesz 0x00000000000036a8 memsz 0x00000000000036a8 flags r--
    LOAD off    0x0000000000004000 vaddr 0x0000000000004000 paddr 0x0000000000004000 align 2**12
         filesz 0x0000000000013581 memsz 0x0000000000013581 flags r-x
    LOAD off    0x0000000000018000 vaddr 0x0000000000018000 paddr 0x0000000000018000 align 2**12
         filesz 0x0000000000008b50 memsz 0x0

As a teaser here is some actual "content" that objdump can extract from `/bin/ls`.  Specifically this command, `objdump -d /bin/ls` 'disassembles' the binary.

In [47]:
TermShellCmd("objdump -d /bin/ls", wait=False, markdown=False, noposttext=True, tmout=2)

$ objdump -d /bin/ls

/bin/ls:     file format elf64-x86-64


Disassembly of section .init:

0000000000004000 <.init>:
    4000:	f3 0f 1e fa          	endbr64 
    4004:	48 83 ec 08          	sub    $0x8,%rsp
    4008:	48 8b 05 c9 ef 01 00 	mov    0x1efc9(%rip),%rax        # 22fd8 <__gmon_start__>
    400f:	48 85 c0             	test   %rax,%rax
    4012:	74 02                	je     4016 <free@plt-0x6ba>
    4014:	ff d0                	callq  *%rax
    4016:	48 83 c4 08          	add    $0x8,%rsp
    401a:	c3                   	retq   

Disassembly of section .plt:

0000000000004020 <.plt>:
    4020:	ff 35 3a ec 01 00    	pushq  0x1ec3a(%rip)        # 22c60 <quoting_style_args@@Base+0x260>
    4026:	f2 ff 25 3b ec 01 00 	bnd jmpq *0x1ec3b(%rip)        # 22c68 <quoting_style_args@@Base+0x268>
    402d:	0f 1f 00             	nopl   (%rax)
    4030:	f3 0f 1e fa          	endbr64 
    4034:	68 00 00 00 00       	pushq  $0x0
    4039:	f2 e9 e1 ff ff ff    	bnd jmpq 4020 <free@plt-0x6b0>
  

## Executing and Executable in a Process

Lets try this from the other direction.  We know that there is a call to the OS to run an executable.  Lets see what we can find out by examining the OS documentation.  

Lets start by looking up the manpage for the operating system call `exec`.  At this point we are going to ignore the programming syntax and mechanics and rather focus on what we can learn in braod strokes from the manual page.


In [50]:
TermShellCmd("man 3 exec | cat -n", wait=False, markdown=False, noposttext=True, tmout=2)

$ man 3 exec | cat -n
     1	EXEC(3)                    Linux Programmer's Manual                   EXEC(3)
     2	
     3	NAME
     4	       execl, execlp, execle, execv, execvp, execvpe - execute a file
     5	
     6	SYNOPSIS
     7	       #include <unistd.h>
     8	
     9	       extern char **environ;
    10	
    11	       int execl(const char *pathname, const char *arg, ...
    12	                       /* (char  *) NULL */);
    13	       int execlp(const char *file, const char *arg, ...
    14	                       /* (char  *) NULL */);
    15	       int execle(const char *pathname, const char *arg, ...
    16	                       /*, (char *) NULL, char *const envp[] */);
    17	       int execv(const char *pathname, char *const argv[]);
    18	       int execvp(const char *file, char *const argv[]);
    19	       int execvpe(const char *file, char *const argv[],
    20	                       char *const envp[]);
    21	
    22	   Feature Test Macro Requirements for glibc 

We want to focus on the first two paragraphs of the discription (lines 27 - 33).  These sentences imply that running a program loads a new "process image" over the current one.  Remember in the Intoduction we used the term memory image it is not a random coincidence that we are seeing the same terminology here.  Further reading between the lines the "file" to be executed contains or is the base of the new process image.  Given that this man page tells us that `exec` is really just a front end of `execve` lets look at that man page and see if we can learn a little more.

In [52]:
TermShellCmd("man 2 execve | cat -n", wait=False, markdown=False, noposttext=True, tmout=2)

$ man 2 execve | cat -n
     1	EXECVE(2)                  Linux Programmer's Manual                 EXECVE(2)
     2	
     3	NAME
     4	       execve - execute program
     5	
     6	SYNOPSIS
     7	       #include <unistd.h>
     8	
     9	       int execve(const char *pathname, char *const argv[],
    10	                  char *const envp[]);
    11	
    12	DESCRIPTION
    13	       execve() executes the program referred to by pathname.  This causes the
    14	       program that is currently being run by the calling process  to  be  re‐
    15	       placed  with  a  new  program,  with newly initialized stack, heap, and
    16	       (initialized and uninitialized) data segments.
    17	
    18	       pathname must be either a binary executable, or a script starting  with
    19	       a line of the form:
    20	
    21	           #!interpreter [optional-arg]
    22	
    23	       For details of the latter case, see "Interpreter scripts" below.
    24	
    25	       argv  is  an  

Let's focus on lines 13-21 and 41-43. Again we see that the wording is all about replacing the contents of an existing process with the value from the executable file.  Further we that some parts of the new process will be `newly initialized`: *stack*, *heap* and *data* *segments*. In lines 41-43 we are told that `execve`, assuming success, will overwrite certain parts of the *text*, *data* and *stack* of the process with the contents of the executable file (newly loaded program).  So vaguely we are getting the picture that an executable encodes values that get "loaded" into a process to initialize the execution of the program contained within it.  

Our task now is to start putting the pieces together.  We need to get a better idea of what a processes is and how we go about encoding a program into an executable.


In [None]:
## Processes