| Due: 5pm on Feb 9
- To start, fork the repository.
- Clone the repository to your computer.
- Modify the files and commit changes to complete your solution.
- Push/sync the changes up to GitHub.
- Make corrections until the Travis CI build status icon changes to green / passing
- Create a pull request on the original repository to turn in the assignment.
For each problem, you should use simple Unix commands to arrive at the
correct answer. Put all your work in a run.sh
file that generates
a answers.yml
file.
# ansewer.yml should look like this:
answer-1: 123
answer-2: 456
Which state has the highest popultion?
# zcat == gzcat on Linux
high_pop=$(gzcat states.tab.gz | cut -f1,2 | sort -k2n | tail -n1 | cut -f1)
echo "answer-example: $high_pop
Each problem below is worth 5 points. Use the files in the data-sets
repository.
Which state in states.tab.gz
has the lowest murder rate?
How many sequence records are in the sample.fa
file?
How many unique CpG IDs are in cpg.bed.gz
?
How many sequence records are in the SP1.fq
file?
How many words are on lines containing the word bloody
in hamlet.txt
? (Hint:
use wc
to count words).
What is the length of the sequence in the first record of sample.fa
?
(Hint: use wc
to count characters).
What is the name of the longest gene in genes.hg19.bed.gz
?
How many unique chromosomes are in genes.hg19.bed.gz
?
How many intervals are associated with CTCF (not CTCFL) in peaks.chr22.bed.gz
?
On what chromosome is the largest interval in lamina.bed
?