In [1]:
using Pkg, Revise

In [2]:
pwd()

"/Users/hunt/GitHub/AgoUtils.jl"

In [1]:
Pkg.activate("../")
using AgoUtils

[32m[1m  Activating[22m[39m project at `~/GitHub/AgoUtils.jl`


---

---

## Creating Samplers

### General sampler creation

This is generally how to build a weighted sampler:

In [4]:
using BioSequences

const sampler_order = RNA[RNA_G, RNA_C, RNA_A, RNA_U]
SamplerWeighted(sampler_order, [0.125, 0.125, 0.75/2])

SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.125, 0.125, 0.375, 0.375])

But the goal is to vectorize this and consolidate the process.

---

### Mapped Sampler Creation

In [3]:
nms = ([Symbol("gc50_", string(i)) for i = 1:2]...,)
# assuming G, C, A, and U for order
vals = (repeat([fill(0.5/2, 3)], 2))
# probabilities = NamedTuple{nms}(vals)
probabilities = merge(
    (gc40_1 = [0.2, 0.2, 0.3],), # trailing comma defines the tuple type
    (; zip(nms, vals)...), # programmatic zip splat kwarg style construction
    (; :gc60_1 => [0.3, 0.3, 0.2]) # kwarg style named tuple key/value pair
)

(gc40_1 = [0.2, 0.2, 0.3], gc50_1 = [0.25, 0.25, 0.25], gc50_2 = [0.25, 0.25, 0.25], gc60_1 = [0.3, 0.3, 0.2])

In [4]:
([Symbol("gc50_", string(i)) for i = 1:4]...,) |> typeof

    [0m[38;2;155;179;224m╭─────────────────────────────────────────────────────────────────────────╮[39m[0m
    [38;2;155;179;224m│[39m                                                                         [38;2;155;179;224m│[39m
    [38;2;155;179;224m│[39m                              [1m[38;2;227;172;141mNTuple[22m[39m[1m[38;2;227;172;141m [2m[38;5;12m<: Any[22m[39m                              [38;2;155;179;224m│[39m
    [38;2;155;179;224m│[39m                                [38;2;227;172;141m│[39m[0m[1m[38;2;224;219;121m1[22m[39m [38;2;187;134;219m::Symbol[39m                              [38;2;155;179;224m│[39m
    [38;2;155;179;224m│[39m                                [38;2;227;172;141m│[39m[0m[1m[38;2;224;219;121m2[22m[39m [38;2;187;134;219m::Symbol[39m                              [38;2;155;179;224m│[39m
    [38;2;155;179;224m│[39m                                [38;2;227;172;141m│[39m[0m[1m[38;2;224;219;121m3[22m[39m 

The same thing can be achieved with an abbreviated syntax as shown below.

In [5]:
(; zip(nms, vals)...)

(gc50_1 = [0.25, 0.25, 0.25], gc50_2 = [0.25, 0.25, 0.25])

In [6]:
(; zip(
    ([Symbol("gc50_", string(i)) for i = 1:4]), 
    repeat([fill(0.5/2, 3)], 4)
)...)

(gc50_1 = [0.25, 0.25, 0.25], gc50_2 = [0.25, 0.25, 0.25], gc50_3 = [0.25, 0.25, 0.25], gc50_4 = [0.25, 0.25, 0.25])

In [7]:
probabilities

(gc40_1 = [0.2, 0.2, 0.3], gc50_1 = [0.25, 0.25, 0.25], gc50_2 = [0.25, 0.25, 0.25], gc60_1 = [0.3, 0.3, 0.2])

In [9]:
samplers = map(((a,b,c),) -> SamplerWeighted(sampler_order, [a,b,c]), probabilities)

(gc40_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.2, 0.2, 0.3, 0.30000000000000004]), gc50_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25]), gc50_2 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25]), gc60_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.3, 0.3, 0.2, 0.19999999999999996]))

This can also be done using a `do` syntax block as shown below.

In [10]:
map(probabilities) do (a,b,c)
    SamplerWeighted(sampler_order, [a,b,c])
end

(gc40_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.2, 0.2, 0.3, 0.30000000000000004]), gc50_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25]), gc50_2 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25]), gc60_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.3, 0.3, 0.2, 0.19999999999999996]))

Or with formal function syntax..

In [11]:
#=
map(function f((a,b,c)).. -- named
	or
map(function ((a,b,c),).. -- pseudo-anonymous?
	or..
=#
map(function f((a,b,c),) # -- combo of above syntax
		SamplerWeighted(rna"GCAU", [a,b,c])
end, probabilities)

(gc40_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.2, 0.2, 0.3, 0.30000000000000004]), gc50_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25]), gc50_2 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25]), gc60_1 = SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.3, 0.3, 0.2, 0.19999999999999996]))

**Note for above:** *You either have to put a comma after the tuple for argument expansion (like in the anonymous stabby function) beacuse this is how Julia defines the tuple (not with parenthesis), or name the function because.. I'm not sure why that works, but it does. Maybe just do both as shown above to be safe.* 😉

---

---

## Generating Random (RNA) Sequences

First we can set the length of sequences we would like to create. The maximum length IDT will provide for a custom RNA oligo is 60nt/bp for a *standard* synthesis and 120nt/bp for an *ultramer* synthesis. I think going shorter will be preferred for managing plates of guides. A 38mer will allow 24 x 16nt guides (if the guide which overhangs the 3ʹ end is included in this set: (38-16)+1+1=24), which works out well to screen variation in the 5ʹ-phosphorylated base. All combinations would yield 96 guides per substrate.

In [12]:
const oligoLength = 38

[38;2;144;202;249m38[39m

In [13]:
using Random

In [14]:
myseeds = [1234, 2345, 4567, 7890]
myrngs = Random.MersenneTwister.(myseeds)

[0m[38;2;155;179;224m╭─────────────────────────────────────╮[39m[0m
[38;2;155;179;224m│[39m                                     [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (1) [22m[38;2;155;179;224m [39m[0m[22m MersenneTwister([38;2;144;202;249m1234[39m)   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (2) [22m[38;2;155;179;224m [39m[0m[22m MersenneTwister([38;2;144;202;249m2345[39m)   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (3) [22m[38;2;155;179;224m [39m[0m[22m MersenneTwister([38;2;144;202;249m4567[39m)   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (4) [22m[38;2;155;179;224m [39m[0m[22m MersenneTwister([38;2;144;202;249m7890[39m)   [22m[38;2;155;179;224m [39m[0m [38;2

In [15]:
myrngs[1]

MersenneTwister(1234)

In [16]:
for i in 1:4
    println(rand(myrngs[i], samplers[1], 10))
end

RNA[RNA_A, RNA_U, RNA_A, RNA_A, RNA_U, RNA_U, RNA_C, RNA_C, RNA_C, RNA_A]
RNA[RNA_C, RNA_C, RNA_U, RNA_C, RNA_U, RNA_A, RNA_A, RNA_A, RNA_C, RNA_G]
RNA[RNA_A, RNA_G, RNA_G, RNA_U, RNA_G, RNA_A, RNA_G, RNA_U, RNA_C, RNA_U]
RNA[RNA_A, RNA_G, RNA_U, RNA_U, RNA_A, RNA_A, RNA_A, RNA_C, RNA_U, RNA_C]


In [17]:
myrng = Random.MersenneTwister()

MersenneTwister(0xc03e17f97973c86d8b07144e2209596c)

In [18]:

substrates_MT = map(samplers, myrngs) do smplr, rng
    println(rng)
    println(smplr)
    randseq(rng, RNAAlphabet{4}(), smplr, oligoLength)
end
# map(sampler -> randseq(Random.MersenneTwister(1234), RNAAlphabet{4}(), sampler, oligoLength), samplers)

MersenneTwister(1234, (0, 1002, 0, 10))
SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.2, 0.2, 0.3, 0.30000000000000004])
MersenneTwister(2345, (0, 1002, 0, 10))
SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25])
MersenneTwister(4567, (0, 1002, 0, 10))
SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.25, 0.25, 0.25, 0.25])
MersenneTwister(7890, (0, 1002, 0, 10))
SamplerWeighted{RNA}(RNA[RNA_G, RNA_C, RNA_A, RNA_U], [0.3, 0.3, 0.2, 0.19999999999999996])


[0m[38;2;155;179;224m╭──────────────────────────────────────────────────────╮[39m[0m
[38;2;155;179;224m│[39m                                                      [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (1) [22m[38;2;155;179;224m [39m[0m[22m AGGUAGCAGUUUUUUGGCGCUACGGAUCAUGACCCGGC   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (2) [22m[38;2;155;179;224m [39m[0m[22m GUAGGGUCCAUGAUCUUAUUGCAUUCUUGUCAUUGGCU   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (3) [22m[38;2;155;179;224m [39m[0m[22m UAAUACAGAUGUUGCUGAUCAAUGUAGCUGAUGUGCGU   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m   (4) [22m[38;2;155;179;224m [39m[0m[22m CCCGUCUGGCGCUGUCCCGGGCUUGCUCAGUUGCACGA   [22m[38;2;155;179;224m [39m[0m 

In [31]:
substrates = map(sampler -> randseq(RNAAlphabet{4}(), sampler, oligoLength), samplers)

(gc40_1 = GAACUCCGCUAGUUCACAGGUAAUACAAGGAUACAUGA, gc50_1 = CAUAGUCGAGCUUGAUUCUCCCCCCGGUACCAAAUGAA, gc50_2 = CCGUACUCCUUUACUGUUUCCCUAAGUCUCUCCCAACA, gc60_1 = GCGUCCGGGAUUGCCUCGGAGAGCGCUGGCACCGCAGU)

---

---

## Generating Guides for the Substrates

In [32]:
using AgoUtils

In [33]:
const guideLength = 16

[38;2;144;202;249m16[39m

In [34]:
guides = map(substrates) do seq
    makeguides(seq, guideLength, DNAAlphabet{2})
end

(gc40_1 = NucleicAcidGuide[GuideDNA{2}(DNAAlphabet{2}, TGAACTAGCGGAGTTC, 16, Float16(0.5), DNA_T, NucleicAcid[DNA_A, DNA_C, DNA_G], LongSequence{DNAAlphabet{2}}[AGAACTAGCGGAGTTC, CGAACTAGCGGAGTTC, GGAACTAGCGGAGTTC]), GuideDNA{2}(DNAAlphabet{2}, GTGAACTAGCGGAGTT, 16, Float16(0.5), DNA_G, NucleicAcid[DNA_A, DNA_C, DNA_T], LongSequence{DNAAlphabet{2}}[ATGAACTAGCGGAGTT, CTGAACTAGCGGAGTT, TTGAACTAGCGGAGTT]), GuideDNA{2}(DNAAlphabet{2}, TGTGAACTAGCGGAGT, 16, Float16(0.5), DNA_T, NucleicAcid[DNA_A, DNA_C, DNA_G], LongSequence{DNAAlphabet{2}}[AGTGAACTAGCGGAGT, CGTGAACTAGCGGAGT, GGTGAACTAGCGGAGT]), GuideDNA{2}(DNAAlphabet{2}, CTGTGAACTAGCGGAG, 16, Float16(0.5625), DNA_C, NucleicAcid[DNA_A, DNA_G, DNA_T], LongSequence{DNAAlphabet{2}}[ATGTGAACTAGCGGAG, GTGTGAACTAGCGGAG, TTGTGAACTAGCGGAG]), GuideDNA{2}(DNAAlphabet{2}, CCTGTGAACTAGCGGA, 16, Float16(0.5625), DNA_C, NucleicAcid[DNA_A, DNA_G, DNA_T], LongSequence{DNAAlphabet{2}}[ACTGTGAACTAGCGGA, GCTGTGAACTAGCGGA, TCTGTGAACTAGCGGA]), GuideDNA{2}(DNAAl

In [35]:
substrates.gc50_1

38nt RNA Sequence:
CAUAGUCGAGCUUGAUUCUCCCCCCGGUACCAAAUGAA

In [42]:
guides.gc50_1[1]

16nt DNA Sequence:
ATCAAGCTCGACTATG
43.75% GC


In [37]:
using DataFrames

In [38]:
AgoUtils._wells_96

[0m[38;2;155;179;224m╭───────────────────╮[39m[0m
[38;2;155;179;224m│[39m                   [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m    (1) [22m[38;2;155;179;224m [39m[0m[22m A1   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m    (2) [22m[38;2;155;179;224m [39m[0m[22m B1   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m    (3) [22m[38;2;155;179;224m [39m[0m[22m C1   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m    (4) [22m[38;2;155;179;224m [39m[0m[22m D1   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m    (5) [22m[38;2;155;179;224m [39m[0m[22m E1   [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;

In [39]:
wellseqs = reshape(mapreduce(AgoUtils._fetchseqs, vcat, vcat(guides...)), (24,16))

[38;2;155;179;224m╭──── [38;2;227;172;141mMatrix {LongSequence [39m[38;2;155;179;224m[38;2;155;179;224m ────────────────────────────────╮[39m[0m[39m
[38;2;155;179;224m│[39m                                                                             [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m                                                                           [39m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m      [22m[38;2;155;179;224m [39m[0m[2m     (1)      [22m[38;2;155;179;224m [39m[0m[2m     (2)      [22m[38;2;155;179;224m [39m[0m[2m     (3)      [22m[38;2;155;179;224m [39m[0m[2m     (4)      [22m[38;2;155;179;224m [39m[0m[2m (16) [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m                                                                           [39m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;

In [40]:
wellids = reshape(AgoUtils._wells_384, (24,16))

[38;2;155;179;224m╭──── [38;2;227;172;141mMatrix  [39m[38;2;155;179;224m[38;2;155;179;224m ───────────────────╮[39m[0m[39m
[38;2;155;179;224m│[39m                                         [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m                                       [39m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[2m      [22m[38;2;155;179;224m [39m[0m[2m (1) [22m[38;2;155;179;224m [39m[0m[2m (2) [22m[38;2;155;179;224m [39m[0m[2m (3) [22m[38;2;155;179;224m [39m[0m[2m (4) [22m[38;2;155;179;224m [39m[0m[2m (16) [22m[38;2;155;179;224m [39m[0m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m                                       [39m [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m [38;2;155;179;224m [39m[0m[22m [2m(1)[22m  [22m[38;2;155;179;224m [39m[0m[22m A1  [22m[38;2;155;179;224m [39m[0m[22m A4  [22m[38;2;155;179;224m [39m[0m[22m 

In [41]:
Dict(wellids[i] => wellseqs[i] for i=1:384)

[38;2;155;179;224m╭──── [38;2;227;172;141mDict {String, LongSequence [39m[38;2;155;179;224m[38;2;155;179;224m ──────╮[39m[0m[39m
[38;2;155;179;224m│[39m                                                         [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m  [38;2;155;179;224m [39m[0m[2m[38;2;187;134;219m  [22m[39m[38;2;155;179;224m [39m[0m[1m[38;2;224;219;121mI5[22m[39m [38;2;155;179;224m [39m[0m[38;2;239;83;80m=>[39m[38;2;155;179;224m [39m[0m[1m[38;2;179;212;255mGGGGGAGAATCAAGCT[22m[39m[38;2;155;179;224m [39m[0m[2m[38;2;187;134;219m  [22m[39m[38;2;155;179;224m [39m[0m  [38;2;155;179;224m│[39m
[38;2;155;179;224m│[39m  [38;2;155;179;224m [39m[0m[2m[38;2;187;134;219m  [22m[39m[38;2;155;179;224m [39m[0m[1m[38;2;224;219;121mI7[22m[39m [38;2;155;179;224m [39m[0m[38;2;239;83;80m=>[39m[38;2;155;179;224m [39m[0m[1m[38;2;179;212;255mACCGGGGGGAGAATCA[22m[39m[38;2;155;179;224m [39m[0m[2m[38;2;187;134;219m  [22m