# Heroin Project Notes
This document logs my notes, thoughts, and comments for the R01 project referred to as "fwGWAS Heroin"

## Unfamiliar concepts

### split PARs (Pseudoautosomal Regions)
If the QC pipeline, we have a section of code (below) that I need clarification on. We are breaking the genomic data up into autosomes and chromosome X. For chromosome X we include split PARs. Read link [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435358/) for a description of these split PARs.

In [1]:
# ChrX (include split PARs)
/shared/bioinformatics/software/third_party/plink-1.90-beta-4.10-x86_64/plink \
    --bfile genotypes_b37_dbsnp138 \
    --chr 23,25 \
    --make-bed \
    --out chrX/genotypes_b37_dbsnp138_unmerged

# Combine split chrX and PARs
/shared/bioinformatics/software/third_party/plink-1.90-beta-4.10-x86_64/plink \
    --bfile chrX/genotypes_b37_dbsnp138_unmerged \
    --merge-x \
    --make-bed \
    --out chrX/genotypes_b37_dbsnp138

ERROR: Error in parse(text = x, srcfile = src): <text>:2:1: unexpected '/'
1: # ChrX (include split PARs)
2: /
   ^


In [None]:
truncate -s 0 structure/input_files/input_afr_eas_eur
for pop in {AFR,EAS,EUR}; do
    cat structure/1000g_data/${pop}_10k_snp_random_sample.final.ped | \
    /shared/bioinformatics/software/perl/boneyard/ped2structure.pl 1 ${groupID} \
    >> structure/input_files/input_afr_eas_eur
    groupID=`echo ${groupID} + 1 | bc`
done

## Unfamiliar code or software

### Awk

In [None]:
# output field separator, if the last field (the missing call rate) is 1 then append fields 1 and 2
awk '{ OFS="\t" } { if($6==1){ print $1,$2 } }' >> autosomes/missing_whole_autosome.remove

In [None]:
# Prepare results for triangle plot

# # Annotation: in the text file, if the line that has the characters %Miss are encountered, then we know we can start
# next if we have encountered %Miss and the line does not contain the characters "Label" and the line also is not
# blank (beginning of the line is a blank space and it is followed by more whitespaces all the way to the end!)
# then, if all of these criteria are met then create an array with the data sets AFR, EAS, EUR and Study_AA
# remove any whitespace at the beginning of the line
# Create a new array called F. The fields of F will be the entries split by space in the output file
# note that $_ is the default variable - which is the line being read in this case

perl -ne 'if (/%Miss/) {  
              $in=1;
              print "num\tID\tpop\tcluster1\tcluster2\tcluster3\n";
          }
          if ($in==1 && !/Label/ && !/^\s+$/) {
              @datasets=("AFR","EAS","EUR","Study_AA");
              s/^\s+//g;
              @F=split /\s+/;
              # Grab only study data set groups by ID
              if ($F[3] > 3) {
                  print $F[0]."\t".$F[1]."\t".$datasets[$F[3]-1]."\t".$F[5]."\t".$F[6]."\t".$F[7]."\n";
              }
          } 
          s/\s+//g;
          if ($_ eq "") { $in=0; }' structure/output_files/output_afr_eas_eur_f > \
    structure/triangle_plots/afr_eas_eur.triangle_input

### Perl
**Links to helpful resources**
* [The command-line options](https://users.cs.cf.ac.uk/Dave.Marshall/PERL/node161.html#SECTION001820000000000000000)
* [Uncommon* but Useful Perl Command Line Options for One-liners](http://www.perlmonks.org/?node_id=324749)
* [Predefined names](https://www.cs.cmu.edu/afs/cs/usr/rgs/mosaic/pl-predef.html)
* [The top 10 tricks of Perl one-liners](https://blogs.oracle.com/ksplice/the-top-10-tricks-of-perl-one-liners)
* [Information about the default variable](https://perlmaven.com/the-default-variable-of-perl)

Also note that `chomp` removes trailing new line character at the end of the file.

In [None]:
# Create triangle plot input with potential outliers filtered
head -1 structure/triangle_plots/afr_eas_eur.triangle_input_master >/
structure/triangle_plots/afr_eas_eur_filtered.triangle_input_master
tail -n +2 structure/triangle_plots/afr_eas_eur.triangle_input_master | \
    perl -lane 'if ($F[2] eq "Study_AA" && ($F[3] >= 0.25 && $F[4] <= 0.4)) { print $_; }' \


# another example which prints to screen a list of files and their link counts
ls -l | perl -lane 'print "$F[8] $F[1]"'

##### -l parameter
-- This option turns on line-ending processing. It can be used to set the output line terminator variable (\$/) by specifying an octal value. See "Example: Using the -0 option" for an example of using octal numbers. If no octal number is specified, the output line terminator is set equal to the input line terminator (such as `\$\ = \$/`)

**Mnemonic** this specifies what the end of the line should be

#####  -a parameter
-- This option splits the line up from \$_ into fields into \$F.

##### -n parameter
-- This option assumes a loop-like condition on the \$_ variable and operates on it for each iteration

**Note** that the -p parameter is similar to this only differing in that it prints \$_ after each  iteration.

#####  -e parameter
-- This option basically lets perl know that it is going to be a one-liner. There is not going to be a perl script in a file that is executed. The perl code that follows is what needs to be executed. It lets you pass a snippet of code to the terminal.