# Installation

To install these scripts I suggest doing the following:

1. Create folder in home directory called pybin. Open terminal then type 

    ```mkdir pybin```

2. Clone this github repo into pybin. 
    
    ```cd pybin``` 
    
    ```git clone https://github.com/jsolvason/js```
    
3. Go back to your home directory 

    ```cd ~```

4. Find your bash profile (The profile will be called ```.bash_profile```,```.bashrc```, or ```.zshrc```).

    ```ls -a ~``` 

5. Open your bash profile with the terminal word processor (this assumes its named ```.zshrc```) 

    ```nano ~./zshrc```

6. Add the following line to your bash profile 

    ```export PYTHONPATH=~/pybin/js:~/pybin/:$PYTHONPATH```


7. Test this works by creating a new terminal window and typing the follwoing. If this does not return an error message, then it works!

    ```python```

    ```import js``` 


# Load all packages

In [None]:
import js
import jsAff as jsa
import jsDna as jsd
import jsGenome as jsg

help(js)

In [None]:
# List all modules in the package
js.listModules()

# Affinity scripts

This module allows you to load affinity reference files in the form of a dictionary with ```key=dna_sequence``` and ```value=affinity``` where max affinity is 1.0

In [None]:
help(jsa)

## Downloading reference files

Files can be found at this google drive: 

## Loading dictionary

In [None]:
# Load Ets1 affinity data
seq2aff=jsa.loadEts(ref='/Users/joe/code/ref/binding_affinity/ets/parsed_Ets1_8mers.txt')
js.dprint(seq2aff,0)

In [None]:
# Max and min affinities
round(min(seq2aff.values()),3),max(seq2aff.values())

In [None]:
# Max sequence. 
seq2aff['CCGGAAGT']

In [None]:
# Note that you can search for fwd or rev and get same answer.
seq2aff[jsd.revcomp('CCGGAAGT')]

# Dna scripts

This module allows you to do various operations on DNA sequence

In [None]:
import jsDna as jsd
help(jsd)

## Hamming Distance

In [None]:
str1='AATTGGCC'
str2='TTTTGGCC'
jsd.hamming(str1, str2)    

## Reverse Complement

In [None]:
dna='ATGC'
jsd.revcomp(dna)    

## GC Content

In [None]:
seq='ATGGCCAT'
jsd.gc_content(seq)    

## Iterating over kmers

In [None]:
string='AATTGGCC'
k=3
jsd.get_kmers(string, k)    

In [None]:
for kmer in jsd.get_kmers(string, k):
    print(kmer) 

In [None]:
list(jsd.get_kmers(string, k))

## Generating random DNA

In [None]:
length=5
jsd.GenerateRandomDNA(length)    

In [None]:
template='AANAA'
jsd.GenerateSingleRandomSequence(template)    

In [None]:
template='AANAA'
jsd.GenerateAllPossibleSequences(template)

In [None]:
jsd.Iupac2AllNt

In [None]:
dna='AYN'
jsd.IupacToAllPossibleSequences(dna)    

In [None]:
dna='AYN'
pattern=jsd.IupacToRegexPattern(dna)    
pattern

In [None]:
jsd.revcomp_regex(pattern)

# Genome scripts

This module is used to load a genome as a dictionary with ```key=chromosome_name``` and ```value=chromosome_sequence```.

## Download genome

In [2]:
import jsGenome as jsg
help(jsg)

Help on module jsGenome:

NAME
    jsGenome

FUNCTIONS
    loadCi08(file_genome='/Users/joe/code/ref/genomes/ciona/2008/JoinedScaffold.fasta')
        Load 2008 ciona genome
    
    loadCi19(file_genome='/Users/joe/code/ref/genomes/ciona/2019/HT.Ref.fasta')
        Load 2019 ciona genome
    
    loadCi19_beta(file_genome='/Users/joe/code/ref/genomes/ciona/2019/HT.Ref.forBetaTesting.fasta')
        Load 2019 ciona genome (for beta testing, first 1kb of each chrom)
    
    loadDr11(file_genome='/Users/joe/code/ref/genomes/zebrafish/danRer11/danRer11.fa')
        Load Zebrafish danRer11 genome
    
    loadGenome(file_genome)
        Loads arbitrary genome
    
    loadHg19(file_genome='/Users/joe/code/ref/genomes/human/hg19/hg19.fa')
        Load hg19  genome
    
    loadHg38(file_genome='/Users/joe/code/ref/genomes/human/hg38/hg38.fa')
        Load hg38  genome
    
    loadMm10(file_genome='/Users/joe/code/ref/genomes/mouse/mm10/mm10.fa')
        Load mm10 mouse genome

FILE
    /U

## Load genome

In [3]:
chr2seq=jsg.loadHg38()

In [6]:
# Inspect keys of dictionary
list(chr2seq.keys())[0]

'chr1'

In [7]:
# print random location in genome
chr2seq['chr1'][500000:500100]

'AGGTATCCTCTCATCTCAGCTTCCCTAGTAGTTGGAACTCTAGGTGCACAACACCACACCAGTTATTATTATTATTTTTTAATTTTTTATAGAGACAGGT'