<h1>AWK Intro</h1>
<p>Syntax and basics for AWK, informed by <a href="https://www.tutorialspoint.com/awk/index.htm">tutorialspoint</a></p>

<h2>Create Some Data</h2>

In [2]:
f_contents = """1) Amit     Physics    80
2) Rahul    Maths      90
3) Shyam    Biology    87
4) Kedar    English    85
5) Hari     History    89"""
with open("marks.txt", "w") as f:
    f.write(f_contents)

In [3]:
!ls

AWK Intro.ipynb LICENSE         README.md       marks.txt


In [4]:
!cat marks.txt

1) Amit     Physics    80
2) Rahul    Maths      90
3) Shyam    Biology    87
4) Kedar    English    85
5) Hari     History    89

<h2>AWK overall structure</h2>
<code>
    awk '<span style="color:blue">BEGIN</span>{print "FirstName"} <span style="color:red">/[AS]/</span> {printf $2 "\n"} <span style="color:green">END</span>{print "All Done!"}' marks.txt
</code>
<ul>
    <li><code><span style="color:blue">BEGIN</span></code> OPTIONAL keyword - next action in { } is to be performed before processing the lines</li>
    <li><code><span style="color:red">/[AS]/</span></code> OPTIONAL Regular Expression pattern - filter lines by this pattern inside the forward slashes /<em>pattern</em>/</li>
    <li><code><span style="color:green">END</span></code> OPTIONAL keyword - next action in { } is to be performed after processing the lines</li>
    <li><code>{<em>action</em>}</code> actions to perform, central example is run for each line matching the Regular Expression pattern</li>
    <li><code>marks.txt</code> file to iterate over</li>
</ul>

In [20]:
!awk 'BEGIN{print "FirstName"} /[AS]/ {printf $2 "\n"} END{print "All Done!"}' marks.txt

FirstName
Amit
Shyam
All Done!


<h2>Action commands</h2>
<p> valid statements are made up of multiple of these, separated by semicolons or new lines</p>
<code>
if( expression ) statement [ else statement ]
while( expression ) statement
for( expression ; expression ; expression ) statement
for( var in array ) statement
do statement while( expression )
break
continue
{ [ statement ... ] }
expression              # commonly var = expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
return [ expression ]
next                    # skip remaining patterns on this input line
nextfile                # skip rest of this file, open next, start at top
delete array[ expression ]# delete an array element
delete array            # delete all elements of array
exit [ expression ]     # exit immediately; status is expression
</code>

print a field: \$0 is whole line, \$1 first field, \$2 second field, etc

In [26]:
!awk '{print $2}' marks.txt

Amit
Rahul
Shyam
Kedar
Hari


print field if another field matches a condition

In [40]:
!awk '{if ($4 > 88) print $2}' marks.txt

Rahul
Hari


In [42]:
!awk '{if ($3 == "Maths") print $2 "\t" $3}' marks.txt

Rahul	Maths


print lines longer than 18 characters

In [39]:
!awk '{if (length($0) > 10) print $0}' marks.txt

1) Amit     Physics    80
2) Rahul    Maths      90
3) Shyam    Biology    87
4) Kedar    English    85
5) Hari     History    89


count lines matching a regular expression

In [29]:
!awk '/[AS]/{++i} END {print i, "lines"}' marks.txt

2 lines


<h2>Command Line Options</h2>

<h3>-f</h3>
<p>-f reads awk query from file e.g. <code>awk -f command.awk marks.txt</code></p>

In [21]:
with open("command.awk", "w") as f:
    f.write("{print}")

In [22]:
!awk -f command.awk marks.txt

1) Amit     Physics    80
2) Rahul    Maths      90
3) Shyam    Biology    87
4) Kedar    English    85
5) Hari     History    89
