# Implement a combination of `head` and `tail`
While going through a [series of tutorials on awk](https://blog.jpalardy.com/posts/awk-tutorial-part-1/) and reproducing the examples in a [notebook](2020-01-24-why-learn-awk.ipynb), I was looking for a way to process the often lenghty output such that:
* the first few lines of the output are always shown,
* the last line or the last few lines are always shown, and
* if any lines in the middle are omitted, this is indicated in the output.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Fixed-number-of-head-lines,-single-tail-line" data-toc-modified-id="Fixed-number-of-head-lines,-single-tail-line-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Fixed number of head lines, single tail line</a></span><ul class="toc-item"><li><span><a href="#First-working-version" data-toc-modified-id="First-working-version-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>First working version</a></span></li><li><span><a href="#Improved-version" data-toc-modified-id="Improved-version-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Improved version</a></span></li></ul></li><li><span><a href="#More-flexibility:-arbitrary-numbers-of-lines-at-head-and-tail" data-toc-modified-id="More-flexibility:-arbitrary-numbers-of-lines-at-head-and-tail-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>More flexibility: arbitrary numbers of lines at head and tail</a></span></li></ul></div>

## Fixed number of head lines, single tail line

### First working version
I did some research and found some inspiration in these questions and answers:
* https://stackoverflow.com/questions/11454343/pipe-output-to-bash-function/11457183#11457183
* https://unix.stackexchange.com/questions/139089/how-to-read-first-and-last-line-from-cat-output/139099#139099

I then wrote this bash function that can process files or input from stdin:

In [1]:
snip-v1() {
    awk '{ line = $0 }; NR <= 5 { line = ""; print }; NR == 7 { print "... snip ..." }; END { print line }' "$@"
}

Note:
* Because `"$@"` (which can be empty) is passed to awk, this function can work both with input from stdin and with file names that are passed as arguments.
* I thought that I needed the extra variable `line` to prevent a duplication of the last line if there are 5 lines or less.

Let's check that it works as expected:

In [2]:
seq 1 5 | snip-v1

1
2
3
4
5



In [3]:
seq 1 6 | snip-v1

1
2
3
4
5
6


In [4]:
seq 1 7 | snip-v1

1
2
3
4
5
... snip ...
7


In [5]:
snip-v1 /usr/share/dict/words

A
A's
AA's
AB's
ABM's
... snip ...
études


### Improved version
When I found out that awk allows many control structures that are known from C in script blocks,  I realized that the variable `line` is unnecessary.

In [6]:
snip-v2() {
    awk 'NR <= 5; NR == 7 { print "... snip ..." }; END { if (NR > 5) print }' "$@"
}

In [7]:
seq 1 5 | snip-v2

1
2
3
4
5


In [8]:
seq 1 6 | snip-v2

1
2
3
4
5
6


In [9]:
seq 1 7 | snip-v2

1
2
3
4
5
... snip ...
7


## More flexibility: arbitrary numbers of lines at head and tail
The following function gets two arguments `HEAD` and `TAIL`, which specify the number of lines that are printed at the beginning and end, respectively. Therefore, it can only read from stdin, not from files.

After the first `HEAD` lines have been read and printed, the script keeps the last `TAIL` seen lines in a buffer.

Finally, the lines in the buffer are printed. Moreover, the number of omitted lines is printed between head and tail.

In [10]:
head-tail() {
    awk -v HEAD=${1:-5} -v TAIL=${2:-1} '
    
    # If we are still inside the head, print the line and go to the next one
    NR <= HEAD { print; next }
    
    # Update the array "last", which contains the last seen lines
    { 
      if (tail_lines < TAIL)
        # If less then TAIL lines are in the array, add a new one
        tail_lines++
      else
        # If TAIL lines are in the array already, discard the
        # oldest, and shift all other lines by one position
        for (i = 1; i < tail_lines; i++)
          last[i] = last[i+1]
      
      # Store the current line
      last[tail_lines] = $0
    }

    END {
      omitted = NR - (HEAD + TAIL)
      if (omitted == 1)
        print "# 1 line omitted"
      else if (omitted > 0)
        print "# " omitted " lines omitted"

      for (i = 1; i <= tail_lines; i++)
        print last[i]
    }'
} 

Test that it works with some examples:

In [11]:
seq 1 5 | head-tail 4 1

1
2
3
4
5


In [12]:
seq 1 5 | head-tail 4 2

1
2
3
4
5


In [13]:
seq 1 5 | head-tail 3 2

1
2
3
4
5


In [14]:
seq 1 5 | head-tail 2 2

1
2
# 1 line omitted
4
5


In [15]:
seq 1 5 | head-tail 1 2

1
# 2 lines omitted
4
5
