Specifying coordinates in TOML #58

kevbrick · 2020-07-13T19:04:49Z

Readuntil target enrichment is working great if I specify individual chromosomes (either in the ru_generator TOML or in a separate txt file). However, if I try to specify a specific region of a chromosome (as detailed in the README and in issue #22), I get no enrichment. At the moment, I'm working with the fast5 files suggested in the readuntil README. Apologies if I'm doing something silly here, but any suggestions?

mattloose · 2020-07-13T19:11:27Z

Hi - can you post your toml file here so we can have a look?

alexomics · 2020-07-13T19:12:30Z

Could you give an example of the format that your targets are in? For entire contigs/chromosomes you can specify just the target name; for targets with coordinates you need to specify as contig,start,stop,strand. See the target formats in of TOML.md.

There is also a validator script that will let you know if your targets are recognised.

kevbrick · 2020-07-13T19:15:27Z

Hi Matt, Sorry for posting that prematurely. TOML is as follows: The validate script tells me all is OK.
[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "127.0.0.1"
port = 5556

[conditions]
reference = "/home/kevbrick/genomes/hg38/Minimap2Index/genome.mmi"

[conditions.0]
name = "select_chr_12_start"
control = false
min_chunks = 0
max_chunks = 12
targets = ["chr12,1,50000000,+","chr12,1,50000000,-"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

alexomics · 2020-07-13T19:26:19Z

What is the output from ru_validate?

kevbrick · 2020-07-13T19:27:58Z

ru_validate output:

😻 Looking good!
Generating experiment description - please be patient!
This experiment has 1 region on the flowcell

Using reference: /home/kevbrick/genomes/hg38/Minimap2Index/genome.mmi

Region 'select_chr_12_start' (control=False) has 1 target of which 1
are in the reference. Reads will be unblocked when classed as
single_off or multi_off; sequenced when classed as single_on or
multi_on; and polled for more data when classed as no_map or no_seq.

mattloose · 2020-07-13T19:29:59Z

I see @alexomics is on - I'll let him take this one!

But can you confirm - are you running with our bulk file here?

kevbrick · 2020-07-13T19:30:31Z

OK, great. Yep. I'm running with your bulk fast5 file.

alexomics · 2020-07-13T19:32:33Z

You are attempting to select 1.5% of the genome. How long are you running this test for? Are you seeing all reads being unblocked; are some reads being sequenced (no unblock sent)?

kevbrick · 2020-07-13T19:42:46Z

Yeah. I realise that a better test case would be to specify more of the genome. I ran this for ~2hr and get about 93,000 reads. How do I check if the reads are being unblocked / sequenced?

alexomics · 2020-07-13T19:59:03Z

There are a couple of log files that you can check. The first is the chunk_log.log (unless changed) and will be in the same directory read until was run from. You can run the following command to see how many of each mode was seen throughout the run. You are looking for single_on or multi_on which are the modes for when reads matched your range.

cut -f8 chunk_log.log | sort | uniq -c

Or in the run output directory (generated by MinKNOW) there will be a file called unblocked_read_ids.txt this is where all the read ids that read until sent are logged. You can search for all read ids that are in the sequencing summary that an unblock signal was not sent for:

grep -vFf unblocked_read_ids.txt sequencing_summary.txt

This last command might be very intensive/take a long time depending on the size of the sequencing summary or the unblocked read ids files.

kevbrick · 2020-07-13T20:33:52Z

Thanks @alexomics. Very helpful. Just ~1,000 reads were sequenced without being unblocked; that's close to what I'd expect for 1.5% of the genome. I minimapped these reads to the genome and saw about 4x enrichment on chr12, so looks like it worked fine! I'll run a bigger test later today to verify ....

github-actions · 2023-10-20T02:00:31Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2023-10-25T02:00:34Z

This issue was closed because there has been no response for 5 days after becoming stale.

kevbrick closed this as completed Jul 13, 2020

kevbrick reopened this Jul 13, 2020

alexomics added the question Further information is requested label Jul 13, 2020

github-actions bot added the Stale label Oct 20, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying coordinates in TOML #58

Specifying coordinates in TOML #58

kevbrick commented Jul 13, 2020 •

edited

Loading

mattloose commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020 •

edited

Loading

mattloose commented Jul 13, 2020

kevbrick commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020

github-actions bot commented Oct 20, 2023

github-actions bot commented Oct 25, 2023

Specifying coordinates in TOML #58

Specifying coordinates in TOML #58

Comments

kevbrick commented Jul 13, 2020 • edited Loading

mattloose commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020 • edited Loading

mattloose commented Jul 13, 2020

kevbrick commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020

alexomics commented Jul 13, 2020

kevbrick commented Jul 13, 2020

github-actions bot commented Oct 20, 2023

github-actions bot commented Oct 25, 2023

kevbrick commented Jul 13, 2020 •

edited

Loading

kevbrick commented Jul 13, 2020 •

edited

Loading