In [None]:
%run -i ../python/common.py

# Executables

In this chapter we will explore what "native" binary programs are and begin our journey to learning how to create them though assembly programming.

This chapter follows two approaches to this material.  In the first part we take an self-guided discovery approach.  Here we use our knowledge and access to UNIX to follow our noses and poke around an executable to see what we can learn. In the second part of the chapter we take a more traditional textbook approach and present the conceptual model for how executables and processes relate to each other.


**The following chapter includes several manual page entries.  A reader is not expected to read these completely.  They are mainly hear to illustrate how we can learn about the detail and document the precise way we can look them up later when we need too.  In general you should skip the first few paragraphs.  If there details that you should pickup on now the text will point you to them**

## Processes and Executables

Perhaps the most basic thing we do on a computer is run programs.  As we have seen, on UNIX, one of the main purposes of the shell is to let us start and manage running programs -- Processes.  As a recap remember that when we type a command like `ls` into a shell, that is not a built-in command (eg. `ls`), the shell will look to see if a file, with a matching name, exists in the list of directories specified by the `PATH` envionment variable.  If one is found (eg `/bin/ls`), and it's meta data marks it as "executable", the shell process will make calls to the UNIX kernel to create a new child process and try and "run" the file within then new process.    

In [None]:
display(HTML(htmlFig(
    [
        [
#            {'src':"/files/work/UndertheCovers/underthecovers/images/Processes/Processes.003.png",
#             'caption':'A: Press Enter'
#             'border': '1px solid black',
#             'padding':'1px',
#             'cellwidth':'33.33%'
#            },
            {'src':"../images/Processes/Processes.004.png",
             'caption':'A: Bash calls kernel functions.',  
#             'border': '1px solid black',
#             'padding':'1px',
          'cellwidth':'50%'
            },
         {'src':"../images/Processes/Processes.005.png",
             'caption':'B: Kernel runs the program in new process.',
#             'border': '1px solid black',
#             'padding':'1px',
             'cellwidth':'50%'
           },
        ]
    ],
    id="fig:shell-blankline",
    caption="<center> Figure: Shell calls kernel functions, fork and exec, to create a new process and 'runs' the 'executable' withing it</center>"
)))

As the figures state there are two basic kinds of files that the kernel knows how to "execute" within a process.  One is an ASCII file that has a special string at its beginning -- `#!<path of interpreter>` and the other is an **executable**.  The former is just a convient way to allow programs like the shell to automatically be started with the contents of the file passed to it as a script to interpret.  This makes it easy to write "scripts" that behave as if they where programs of their own.  When in reality they are being interpreted as commands to the "real" program specified on the first line of the file.  But the quesiton of course what exactly are real programs or **executables**.

## What's inside and executable

Lets explore the `/bin/ls` file using our UNIX skills to see what we can figure out.

### What does ls tell us about ls ;-)

In [None]:
TermShellCmd("ls -l /bin/ls", markdown=False)

Running using ls to list the meta data of the file `/bin/ls` we see that it contains a sizable number of bytes.  We also see that the permisions clearly mark it as being executable by all users of the system `-rwxr-x-rx` (if you don't remember how to read this output see `man ls`).

### Can we display its contents to the Terminal with `cat`?

We encourage you to open a terminal and give this a shot.  What happened?  Well remember that all bytes that are sent to terminal are interpreted by the terminal as ASCII encoded information.  It should be quickly apparent to you that whatever `/bin/ls` is it is NOT predoninatly ASCII encoded information!  Rather they bytes in it must be some other kind of binary represenation.  



### Lets look at the byte values of `/bin/ls` using `xxd`

So while the data in `/bin/ls` does not seem to be encoded in ASCII we can use other UNIX tools to translate the individual bytes of the file into a numeric ASCII value so that we can at least see what the values of the bytes of the file are.  There are seveal such tools we could use exmaples include: `od` (octal dump), `hexdump`, and `xxd`.  We will use `xxd`

In [None]:
TermShellCmd("man xxd | cat ", prompt='', pretext='$ man xxd', wait=False, markdown=False, noposttext=True)

`xxd` conviently lets us look at the value of a file represent in base 2 binary digits or base 16 hexadecimal digits.   We will use the following command to display the first 256 bytes of the file in binary: `xxd -l 256 -g 1 -c 8 -b /bin/ls`

Where: 
 - `-l 256` is used to restrict ourselves to the first 80 bytes
 - `-g 1` is used to tell xxd to work on units/groups of single bytes
 - `-c 8` is used to print 8 units/groups per line
 - `-b` means display the values in base 2 (binary) notation

This causes `xxd` to open `/bin/ls` and read the first 256 bytes.  It examines the value of each byte read and translates it so that it produces a string of eight ASCII characters of either `0` or `1` depending on the value of the bits of the byte.  In this way we can use `xxd` to diplay the byte values of a file.  The left hand column of the output encodes the byte position in the file that the line of data correponds too.  These position values start at zero are in hexadecimal notation (eg. `00000010` is 16 in decimal).  On the far right of each line `xxd` prints an ASCII interpretation for any byte values that correspond to printable ASCII characters (otherwise it prints a `.`).

In [None]:
TermShellCmd("xxd -l 256 -g 1 -c 8 -b /bin/ls", wait=True, markdown=False, noposttext=True)

Using hexadecimal notation we get more concise visual represention

In [None]:
TermShellCmd("xxd -l 256 -g 1 -c 8 /bin/ls", wait=True, markdown=False, noposttext=True)

So while it might look cool without knowing how to inprepret the byte values it really does not provide us much insight as to what makes this file a program that lists the contents of directories.  

### Using the UNIX `file` command on `/bin/ls`

While there are no explicit file types in UNIX that tell use what kind of information is in a file (we are expected to know) there is a command that is very good at examining a file and guessing what kind of information is encoded in the file based on a large database of test.  This command is called `file`.  Here is its manual page.

In [None]:
TermShellCmd("man file | cat", prompt='', pretext='$ man file', wait=False, markdown=False, noposttext=True, tmout=2)

Well let's see what `file` has to say about `/bin/ls`.

In [None]:
TermShellCmd("file /bin/ls", markdown=False, noposttext=True)

Ok cool!  File tells us `/bin/ls` is and **ELF** file.  You might have noticed that the xxd output showed the ASCII characters `ELF` near the begining of the file.  This is due to the fact that this is part of the `ELF` standard format to make recognition of them easier.  



### ELF Files - Executable and Linking Format Files

So what exactly is an ELF file?  Lets see what the manuals have to say. P.S. You are not expected to understand what it is saying at this point.

In [None]:
TermShellCmd("man elf|cat", prompt='', pretext='$ man elf', wait=False, markdown=False, noposttext=True)

Wow that's a lot of information that does not make much sense at this point.  However, it is nice to see that it seems to be a format for encoding "executable" files ;-)

Now as it turns out there several tools such as `readelf` and `objdump` that we could read about that are designed for decoding with `elf` files.  But it is not clear that this is going to help that much unless we get a more conceptual understanding for what it means to encode a program for execution in a process.  

For your interest here is the output for `readelf --all /bin/ls` and `objdump --all /bin/ls` which dump summary information about the `/bin/ls` executable.  

In [None]:
TermShellCmd("readelf --all /bin/ls", wait=False, markdown=False, noposttext=True, tmout=2)

In [None]:
TermShellCmd("objdump --all /bin/ls", wait=False, markdown=False, noposttext=True, tmout=2)

As a teaser here is some actual "content" that objdump can extract and decode from `/bin/ls`.  Specifically this command, `objdump -d /bin/ls` 'disassembles' the binary.

In [None]:
TermShellCmd("objdump -d /bin/ls", wait=False, markdown=False, noposttext=True, tmout=2)

## Executing and Executable in a Process

Lets try this from the other direction.  We know that there is a call to the OS to run an executable.  Lets see what we can find out by examining the OS documentation.  

Lets start by looking up the manpage for the operating system call `exec`.  At this point we are going to ignore the programming syntax and mechanics and rather focus on what we can learn in braod strokes from the manual page.


In [None]:
TermShellCmd("man 3 exec | cat -n", wait=False, markdown=False, noposttext=True, tmout=2)

> <img style="margin: 1px 5px 0px 0px;" align="left" width="40" src="../images/fyi.svg"> <p style="background-color:powderblue;"> Notice in the above output we see line numbers for the man page. The `man` command itself does not support line numbers but the `cat` program does if you pass it the `-n` flag.  So instead of just using the command `man exec` on its own we have sent its output to `cat -n` using the pipe syntax of the shell:`|`.  So our combined shell command is:  `man exec | cat -n`.  Remember ot notice these things as UNIX can teach many good programming habits like the value of breaking our software down into small reusable parts and having a standard way for combining those parts (eg a pipe). 

We want to focus on the first two paragraphs of the discription (lines 27 - 33).  These sentences imply that running a program loads a new "process image" over the current one.  Remember in the Intoduction we used the term memory image it is not a random coincidence that we are seeing the same terminology here.  Further reading between the lines the "file" to be executed contains or is the base of the new process image.  Given that this man page tells us that `exec` is really just a front end of `execve` lets look at that man page and see if we can learn a little more.

In [None]:
TermShellCmd("man 2 execve | cat -n", wait=False, markdown=False, noposttext=True, tmout=2)

Let's focus on lines 13-21 and 41-43. Again we see that the wording is all about replacing the contents of an existing process with the value from the executable file.  Further we that some parts of the new process will be `newly initialized`: *stack*, *heap* and *data* *segments*. In lines 41-43 we are told that `execve`, assuming success, will overwrite certain parts of the *text*, *data* and *stack* of the process with the contents of the executable file (newly loaded program).  So vaguely we are getting the picture that an executable encodes values that get "loaded" into a process to initialize the execution of the program contained within it.  

Our task now is to start putting the pieces together.  We need to get a better idea of what a processes is and how we go about encoding a program into an executable.


## Processes