Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying coordinates in TOML #58

Closed
kevbrick opened this issue Jul 13, 2020 · 13 comments
Closed

Specifying coordinates in TOML #58

kevbrick opened this issue Jul 13, 2020 · 13 comments
Labels
question Further information is requested Stale

Comments

@kevbrick
Copy link

kevbrick commented Jul 13, 2020

Readuntil target enrichment is working great if I specify individual chromosomes (either in the ru_generator TOML or in a separate txt file). However, if I try to specify a specific region of a chromosome (as detailed in the README and in issue #22), I get no enrichment. At the moment, I'm working with the fast5 files suggested in the readuntil README. Apologies if I'm doing something silly here, but any suggestions?

@kevbrick kevbrick reopened this Jul 13, 2020
@mattloose
Copy link
Contributor

Hi - can you post your toml file here so we can have a look?

@alexomics
Copy link
Contributor

Could you give an example of the format that your targets are in? For entire contigs/chromosomes you can specify just the target name; for targets with coordinates you need to specify as contig,start,stop,strand. See the target formats in of TOML.md.

There is also a validator script that will let you know if your targets are recognised.

@alexomics alexomics added the question Further information is requested label Jul 13, 2020
@kevbrick
Copy link
Author

Hi Matt, Sorry for posting that prematurely. TOML is as follows: The validate script tells me all is OK.
[caller_settings]
config_name = "dna_r9.4.1_450bps_fast"
host = "127.0.0.1"
port = 5556

[conditions]
reference = "/home/kevbrick/genomes/hg38/Minimap2Index/genome.mmi"

[conditions.0]
name = "select_chr_12_start"
control = false
min_chunks = 0
max_chunks = 12
targets = ["chr12,1,50000000,+","chr12,1,50000000,-"]
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "unblock"
multi_off = "unblock"
no_seq = "proceed"
no_map = "proceed"

@alexomics
Copy link
Contributor

What is the output from ru_validate?

@kevbrick
Copy link
Author

kevbrick commented Jul 13, 2020

ru_validate output:

😻 Looking good!
Generating experiment description - please be patient!
This experiment has 1 region on the flowcell

Using reference: /home/kevbrick/genomes/hg38/Minimap2Index/genome.mmi

Region 'select_chr_12_start' (control=False) has 1 target of which 1
are in the reference. Reads will be unblocked when classed as
single_off or multi_off; sequenced when classed as single_on or
multi_on; and polled for more data when classed as no_map or no_seq.

@mattloose
Copy link
Contributor

I see @alexomics is on - I'll let him take this one!

But can you confirm - are you running with our bulk file here?

@kevbrick
Copy link
Author

OK, great. Yep. I'm running with your bulk fast5 file.

@alexomics
Copy link
Contributor

You are attempting to select 1.5% of the genome. How long are you running this test for? Are you seeing all reads being unblocked; are some reads being sequenced (no unblock sent)?

@kevbrick
Copy link
Author

Yeah. I realise that a better test case would be to specify more of the genome. I ran this for ~2hr and get about 93,000 reads. How do I check if the reads are being unblocked / sequenced?

@alexomics
Copy link
Contributor

There are a couple of log files that you can check. The first is the chunk_log.log (unless changed) and will be in the same directory read until was run from. You can run the following command to see how many of each mode was seen throughout the run. You are looking for single_on or multi_on which are the modes for when reads matched your range.

cut -f8 chunk_log.log | sort | uniq -c

Or in the run output directory (generated by MinKNOW) there will be a file called unblocked_read_ids.txt this is where all the read ids that read until sent are logged. You can search for all read ids that are in the sequencing summary that an unblock signal was not sent for:

grep -vFf unblocked_read_ids.txt sequencing_summary.txt

This last command might be very intensive/take a long time depending on the size of the sequencing summary or the unblocked read ids files.

@kevbrick
Copy link
Author

Thanks @alexomics. Very helpful. Just ~1,000 reads were sequenced without being unblocked; that's close to what I'd expect for 1.5% of the genome. I minimapped these reads to the genome and saw about 4x enrichment on chr12, so looks like it worked fine! I'll run a bigger test later today to verify ....

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 20, 2023
@github-actions
Copy link

This issue was closed because there has been no response for 5 days after becoming stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

3 participants