# Terminal Commands and Bash Scripting

Adapted from http://people.duke.edu/~ccc14/duke-hts-2018/cliburn/Bash_in_Jupyter.html 

## echo: print out a string

In [172]:
echo "is there anybody out there?"

is there anybody out there?


## ls: check out the contents of a folder

In [162]:
ls

Backup
Terminal Commands and Bash Scripting.ipynb
chr21.fa.gz
hello.sh
hello_you.sh
sequences.txt


#### extra tags get you more information

In [56]:
ls -lh

total 40
-rw-r--r--  1 erickofman  staff   5.4K Dec 22 20:49 Terminal Commands and Bash Scripting.ipynb
-rw-r--r--  1 erickofman  staff    32B Dec 22 10:40 hello.sh
-rw-r--r--  1 erickofman  staff    38B Dec 22 10:40 hello_you.sh
-rw-r--r--  1 erickofman  staff   224B Dec 22 20:24 sequences.txt


#### what if we only want to list files with a certain extension?

In [57]:
# The asterisk is a "wild card" that matches any character
ls *.sh

hello.sh	hello_you.sh


In [58]:
### YOU TRY: list only the files that end in .txt

## mkdir: make a directory

In [59]:
mkdir shiny_new

##### Look it's there!

In [60]:
ls

Terminal Commands and Bash Scripting.ipynb
hello.sh
hello_you.sh
sequences.txt
shiny_new


## cp: Copy a file

In [61]:
cp sequences.txt shiny_new/sequences_copy.txt
ls shiny_new

sequences_copy.txt


#### cp -r to copy a directory and all of its contents

In [62]:
cp -r shiny_new same_old
ls same_old

sequences_copy.txt


In [63]:
ls same_old/shiny_new

ls: same_old/shiny_new: No such file or directory


: 1

## touch: make an empty file

In [73]:
touch empty_inside.txt

In [74]:
# Trust me, it's empty. Or go check it yourself later!
ls 

Terminal Commands and Bash Scripting.ipynb
empty_inside.txt
hello.sh
hello_you.sh
same_old
sequences.txt
shiny_new


## mv: Move or rename a file
#### if you use mv where the destination is in the same folder you will simply rename the file

In [77]:
mv empty_inside.txt fulfilled.txt

mv: empty_inside.txt: No such file or directory


: 1

In [78]:
ls

Terminal Commands and Bash Scripting.ipynb
fulfilled.txt
hello.sh
hello_you.sh
same_old
sequences.txt
shiny_new


### If you use mv where the destination is in a different folder you will move the file

In [81]:
mv fulfilled.txt shiny_new

In [82]:
# Not here anymore...
ls

Terminal Commands and Bash Scripting.ipynb
hello.sh
hello_you.sh
same_old
sequences.txt
shiny_new


In [83]:
# Here!
ls shiny_new

fulfilled.txt		sequences_copy.txt


### To continue the existential theme:
## pwd: where am I?


In [85]:
pwd

/Users/erickofman/UCSD/TAing262/BashScripting


### With great power comes great responsibility... use this carefully! There is no "undo" in terminal...
# rm: delete file

In [93]:
rm shiny_new/fulfilled.txt

rm: shiny_new/fulfilled.txt: No such file or directory


: 1

In [94]:
ls shiny_new

sequences_copy.txt


### rm -r: delete directory and all of its contents

In [96]:
rm -r shiny_new


In [98]:
### YOU TRY: delete the directory "same_old" 

In [99]:
ls

Terminal Commands and Bash Scripting.ipynb
hello.sh
hello_you.sh
same_old
sequences.txt


# Now let's get fancy

#### You can do quite a lot of text processing and execute more complex file operations in terminal. Check out the contents of the file sequences.txt in this directory by using the "cat" command:

In [102]:
cat sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### You can look at just the beginning or just the end of a file

In [112]:
head -2 sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC


In [113]:
tail -2 sequences.txt 

seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### You can split up a file by columns!

In [108]:
cut -f1 sequences.txt

seqA
seqB
seqC
seqD


In [111]:
### YOU TRY: print out just the second column

### Piping 

#### You can use pipes ("|") to send the output of one command to another command as input

In [117]:
### What do you think will be output using this combination of commands?
head -3 sequences.txt | tail -2

seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqC	GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA


### grep: searching for specific text within a file

In [123]:
grep "TAAC" sequences.txt

seqA	CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG


#### You can use certain expressions to filter with extra parameters. For example, in this context the | means "or"

In [127]:
grep "[B|D]" sequences.txt

seqB	GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
seqD	TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


### tr: substitution

#### Let's use tr to turn our DNA sequences into RNA sequences instead!

In [129]:
cat sequences.txt | tr T U

seqA	CCCUAACCCCCUAACCCCCUAACCCUCAGUCGGGGAGGCGACAAUAGCUG
seqB	GUCAUAUGUUCUGUACGUUAUUGGCCAACUGAUCAUACCUGAAUCGAGCC
seqC	GAACCGGGAUUAUCAAAGACGAACAUGGUCGGGUCCUUGAACCAAACGAA
seqD	UCUCCGUCCGCUGGCGUGUUUUUCUUUUCUCAAGUGGGCAAGUUACCCGG


#### We can use tr to find the complement of DNA sequences

In [179]:
cut -f2 sequences.txt | tr ACGT TGCA


GGGATTGGGGGATTGGGGGATTGGGAGTCAGCCCCTCCGCTGTTATCGAC
CAGTATACAAGACATGCAATAACCGGTTGACTAGTATGGACTTAGCTCGG
CTTGGCCCTAATAGTTTCTGCTTGTACCAGCCCAGGAACTTGGTTTGCTT
AGAGGCAGGCGACCGCACAAAAAGAAAAGAGTTCACCCGTTCAATGGGCC


## rv: reverse 

#### We can tack on a "reverse" command to get the reverse complement of a DNA sequence

In [182]:
cut -f2 sequences.txt | tr ACGT TGCA | rev


CAGCTATTGTCGCCTCCCCGACTGAGGGTTAGGGGGTTAGGGGGTTAGGG
GGCTCGATTCAGGTATGATCAGTTGGCCAATAACGTACAGAACATATGAC
TTCGTTTGGTTCAAGGACCCGACCATGTTCGTCTTTGATAATCCCGGTTC
CCGGGTAACTTGCCCACTTGAGAAAAGAAAAACACGCCAGCGGACGGAGA


## wc: word/line/character count

In [184]:
wc sequences.txt

       4       8     224 sequences.txt


#### Get just lines

In [185]:
wc -l sequences.txt

       4 sequences.txt


#### Get just words

In [186]:
wc -w sequences.txt

       8 sequences.txt


#### Get just characters

In [187]:
wc -c sequences.txt

     224 sequences.txt


## sort: sort values in a column

In [202]:
cut -f2 sequences.txt | sort 

CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


## uniq: check unique values in a column

### Here you can see that the last two sequences were duplicates of one another

In [203]:
cut -f2 sequences.txt | sort | uniq

CCCTAACCCCCTAACCCCCTAACCCTCAGTCGGGGAGGCGACAATAGCTG
GAACCGGGATTATCAAAGACGAACATGGTCGGGTCCTTGAACCAAACGAA
GTCATATGTTCTGTACGTTATTGGCCAACTGATCATACCTGAATCGAGCC
TCTCCGTCCGCTGGCGTGTTTTTCTTTTCTCAAGTGGGCAAGTTACCCGG


# Common commands

### wget: downloading files -- for example genome reference files for the hg38 reference: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/

In [130]:
# This will download chromosome chr21 
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz

--2020-12-22 21:52:09--  https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr21.fa.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)... 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12709705 (12M) [application/x-gzip]
Saving to: ‘chr21.fa.gz’


2020-12-22 21:52:26 (1.13 MB/s) - ‘chr21.fa.gz’ saved [12709705/12709705]



#### Now we have it locally

In [131]:
ls 

Backup
Terminal Commands and Bash Scripting.ipynb
chr21.fa.gz
hello.sh
hello_you.sh
sequences.txt


### gunzip and gzip: unzipping and zipping files

In [132]:
gunzip chr21.fa.gz

In [133]:
ls

Backup
Terminal Commands and Bash Scripting.ipynb
chr21.fa
hello.sh
hello_you.sh
sequences.txt


In [154]:
# Let's sneak a peak
head -194500 chr21.fa | tail

TAGACCTTCAACAGTAAGTCAGTTTCACAATACTATTTTTAAATTTCCTA
TTAAAATATCACTCTATTTCTTAGTATATCACTTTGGCATATCTGCTTCT
TTCTCTGTATTAATAAATAGCGCATACAGTTTGCCTTTGGTACTTTGTAC
AATGTTGTTTATCTCAGTGTAAATTGGTAGCGTGTCCACAAAGGCGATTG
GAGTGTGAGGCGTGAGTCCTTAGGAGCCTGTCTGCCATCTAAGCCCTGTT
AGCATTTTCCTTTACTAATGTTGGGGTGGGGGGACCTCAGAAGGGGCACA
GCAAGCATATGAAAGTTTTGTTACAGAGATGCCAGTATTTGTCCTTAGAA
CAGGTCCAGTTGACAAAGGCACTGCAGGATATGAAAGATTCTCATTACAA
TGTCACGGCAACATGACTGAAATTATTAACTCTCCACGTGGGATGATGGA
TGGTATAGGGTGGAGATGTCCTTGGCAGAACATGTTGCTTAATTATCTTC


#### Ok back in the box you go

In [155]:
gzip chr21.fa

In [156]:
ls

Backup
Terminal Commands and Bash Scripting.ipynb
chr21.fa.gz
hello.sh
hello_you.sh
sequences.txt


# Advanced scripting

## For loops

In [212]:
for FILE in $(ls *)
do
    echo "check out this sweet little file called" $FILE
done

check out this sweet little file called Terminal
check out this sweet little file called Commands
check out this sweet little file called and
check out this sweet little file called Bash
check out this sweet little file called Scripting.ipynb
check out this sweet little file called chr21.fa.gz
check out this sweet little file called hello.sh
check out this sweet little file called hello_you.sh
check out this sweet little file called sequences.txt
check out this sweet little file called Backup:
check out this sweet little file called hello.sh
check out this sweet little file called hello_you.sh
check out this sweet little file called sequences.txt


## If statements

In [239]:
you="superenthusiastic"
if [ $you = "bored" ]
then
    echo "okay let's switch things up!"
fi