# Capturing command output

It can be useful to save the output of a command to a variable for later use.

## Syntax
```BASH
     varname=value
     varname=$( command )
```
A beginner error is to put spaces around the equal sign. That's OK in some languages, but not BASH.     

**WRONG:**
```

varname =value
varname= value

varname =$(command)
varname= $(command)

varname=(command)
 ```

Notice that we need the `$` this time on the right side of the equal sign `=`. It must accompany the parentheses.
 
--- 
Capture the output of the command `whoami` to the variable `myname`. 

In [14]:
myname=...

Check the outcome of this expression with the command `echo`

In [None]:
echo ...

You can now wrap the variable in other text. Use echo to print 
```BASH
"My name is $myname. Hello!"
```

In [None]:
echo ...

You can capture a lot of types of data this way. Save the outcome of `hostname` to the variable `whereami`

Now print a message using both of those variables.

In [None]:
echo ...

## More practice

Save `pwd` to varname `dir` and echo the following statement:

    "My current directory is $dir"


In [None]:
dir=...
echo ...

---
Save `date +%Y` to varname `year` and echo the following statement:
    
    "The best year so far is: $year"
    

In [None]:
year=...
echo ...

What does `+%Y` do? What would I use instead to get the Day of the Week?

---
Save `du -h -d 0` to `usage` and echo the following statement:

    "My disk usage is $usage"

In [None]:
usage=...
echo ...

There is extra output (a '.' for the current dir) that is a little messy. We'll deal with parsing output later.

What are these arguments for `du`?
 * `-h`
 * `-d 0`
 
 ---

# Capturing output from pipes

It's OK to capture the outcome of a pipe, just like a single command.
```BASH
        pipeoutput=$( command1 | command2 )
```

Let's sort our directory contents by size. Get directory contents in detail form below: `ls -l`

Now, pipe those results to `sort`, `sort -k1`, `sort -k5`, and `sort -k5n`:

What did each version do?

Now save the pipe to a variable, using the `sort -k5n` version of the command.

Save it to the variable `contentsBySize`.

Use `echo` with `$contentsBySize` to dissect this command.

In [None]:
contentsBySize=...
echo ...

Let's add a 3rd command.

Let's get rid of the output of the line that starts with `total`.

```BASH

      pipeoutput=$( command1 | command2 | command3 )  
```

Add a 3rd command to your pipe: `grep -v total`

`echo` the results.

# Practical Exercise

Let's apply these constructs to something you can use in a script. We're going to download files from the NCBI directory for the honey bee genome. 

Navigate to https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/7460/104/GCF_003254395.2_Amel_HAv3.1 to see a directory listing (clicking will open in a new tab).

It looks like:

<img src="https://onishdata.bmb.colostate.edu/jupyterlab_icons/html_dirlisting.png" width="534px">


Let's save the url in a variable.

```BASH

baseUrl='https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/7460/104/GCF_003254395.2_Amel_HAv3.1'

```

**I'm quoting mine in single quotes** so I don't have to check for things the shell will process.

In [4]:
baseUrl='https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/7460/104/GCF_003254395.2_Amel_HAv3.1'

The different files have different data, in a variety of formats. This lecture is more about linux mechanics, so we're just grabbing a small one.

Choose by saving one of the file names to a variable called `datafile`. I'm going to use "GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz".

In [2]:
datafile=...
echo $datafile
# fully qualified URL
echo $baseUrl/$datafile

If you have defined the variables correctly, the following command downloads the specified file. This will place a file in your directory with the same name as the string stored in `$datafile`

In [5]:
wget $baseUrl/$datafile

--2020-08-26 16:26:38--  https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/7460/104/GCF_003254395.2_Amel_HAv3.1/GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 2607:f220:41e:250::12, 2607:f220:41e:250::10, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6755398 (6.4M) [application/x-gzip]
Saving to: ‘GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz’


2020-08-26 16:26:39 (17.2 MB/s) - ‘GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz’ saved [6755398/6755398]



Now, get the checksum of the file by running the command `md5sum` on `$datafile`

In [20]:
md5sum $datafile

c081b74001f46055b9f5710be2c67f33  GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz

real	0m0.015s
user	0m0.014s
sys	0m0.000s


**What is md5sum?** It is an algorithm that generates a *cryptographic hash*. This is a signature of the specified file that you can check against an expected number. Using a hash function to check file integrity is called a "checksum".

**But, how do we check if it's right?**  We need the checksum file. Looking at the directory listing above, the filename is `md5checksums.txt`. Sometimes its called `checksum.txt`, or `checksum.md5` or similar.

Set this name, `md5checksums.txt`, to the variable `checksumfile`. 

In [7]:
checksumfile=md5checksums.txt

Now download the file as you did with `wget $baseUrl/$datafile`, but make use `$checksumfile` instead of `$datafile`

In [8]:
wget $baseUrl/$checksumfile

--2020-08-26 16:29:34--  https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/7460/104/GCF_003254395.2_Amel_HAv3.1/md5checksums.txt
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.11, 2607:f220:41e:250::7, 2607:f220:41e:250::10, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19101 (19K) [text/plain]
Saving to: ‘md5checksums.txt’


2020-08-26 16:29:35 (692 KB/s) - ‘md5checksums.txt’ saved [19101/19101]



In [10]:
head $checksumfile

4e40d7b3a70329dee14aeb2f658564ea  ./Annotation_comparison/GCF_003254395.2_Amel_HAv3.1_compare_prev.gbp.gz
67a2f793585e04aab9ec5d059b9b4130  ./Annotation_comparison/GCF_003254395.2_Amel_HAv3.1_compare_prev.txt.gz
d3b0bf799204e1b34020a1dac1f5686c  ./Evidence_alignments/GCF_003254395.2_Amel_HAv3.1_cross_species_tx_alns.gff.gz
390a28099c7478fcb914efbe2cf3a855  ./Evidence_alignments/GCF_003254395.2_Amel_HAv3.1_same_species_tx_alns.gff.gz
a8349bcc6a5733d23e5d289dea93e720  ./GCF_003254395.2_Amel_HAv3.1_assembly_report.txt
0fad70b254b53b42894b1132a6def686  ./GCF_003254395.2_Amel_HAv3.1_assembly_stats.txt
6e1ec842ae5a24ed53a6d866ac613d17  ./GCF_003254395.2_Amel_HAv3.1_assembly_structure/Primary_Assembly/assembled_chromosomes/AGP/chrLG1.agp.gz
671eb4b73fa352819654beefe17f465a  ./GCF_003254395.2_Amel_HAv3.1_assembly_structure/Primary_Assembly/assembled_chromosomes/AGP/chrLG1.comp.agp.gz
c60d55f350e165e09ce79ec9fbab3851  ./GCF_003254395.2_Amel_HAv3.1_assembly_structure/Primary_Assembly/assembled_c

---
Is the checksum right??? How do we tell? There's too much information.

Try using `grep` with information you got from the `md5sum` command above. 

It will take the form `grep PATTERN $checksumfile`.

In [13]:
grep c081b74001f46055b9f5710be2c67f33 $checksumfile

c081b74001f46055b9f5710be2c67f33  ./GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz


**Challenge!** Once you figure out how to get the information with grep, can you run the commands in succession to get a more readable answer?

In [23]:
# md5sum command
md5sum $datafile
# grep command
grep $datafile $checksumfile

c081b74001f46055b9f5710be2c67f33  GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz
c081b74001f46055b9f5710be2c67f33  ./GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz
