Skip to content

Commit

Permalink
Update PatMatch nucleic handling flags demystified.ipynb
Browse files Browse the repository at this point in the history
  • Loading branch information
fomightez committed Sep 17, 2018
1 parent d4218b7 commit 88ef335
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions notebooks/PatMatch nucleic handling flags demystified.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,8 @@
"\n",
"And so it looks strongly suggestive that the `-c` flag options means it is **for the complementary strand in addition to the strand in the dataset**. But can we prove that here? Can we see what the result is for just the complementary strand?\n",
"\n",
"As discussed above we seem to be out of options that we can use for nucleic acid with the command line `patmatch` command, but independent of anything involving PatMatch we can actually convert our sample sequence to a reverse complement using the popular [Biopython module](http://biopython.org/wiki/Seq), and then try with that where we again use the `-n` flag."
"As discussed above we seem to be out of options that we can use for nucleic acid with the command line `patmatch` command, but independent of anything involving PatMatch we can actually convert our sample sequence to a reverse complement using the popular [Biopython module](http://biopython.org/wiki/Seq), and then try with that where we again use the `-n` flag. \n",
"(This process has now been written up into a proper script; however, this form is left here as it is nicely contained and a good illustration of some basic use of the biopython module.)"
]
},
{
Expand All @@ -183,6 +184,7 @@
"outputs": [],
"source": [
"## CONVERT TO REVERSE COMPLEMENT USING BIOPYTHON **\n",
"# this code block has now been adapted into a proper script at https://github.com/fomightez/sequencework/blob/master/ConvertSeq/convert_fasta_to_reverse_complement.py\n",
"from Bio import SeqIO\n",
"from Bio.Seq import Seq # for reverse complement\n",
"\n",
Expand All @@ -206,10 +208,14 @@
"#print(sequence_record) #FOR DEBUGGING\n",
"\n",
"# Get reverse complement\n",
"seq_rev_compl_record = sequence_record.reverse_complement()\n",
"seq_rev_compl_record.id = sequence_record.id #record needs id for writing FASTA\n",
"#seq_rev_compl_record = sequence_record.reverse_complement()\n",
"#seq_rev_compl_record.id = sequence_record.id #record needs id for writing FASTA\n",
"seq_rev_compl_record = sequence_record.reverse_complement(id=True,description=True) #better\n",
"# way to do above, see\n",
"# https://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html#reverse_complement \n",
"#print(seq_rev_compl_record) #FOR DEBUGGING\n",
"\n",
"\n",
"# Save FASTA file for reverse complement\n",
"output_file_name = \"chr15.revcompl.fa\"\n",
"SeqIO.write(seq_rev_compl_record, output_file_name, \"fasta\");"
Expand Down Expand Up @@ -256,7 +262,7 @@
"\n",
"That proves that that `-c` means it is for the complementary strand **in addition to** the strand in the dataset. And perhaps, `-b`, along the lines of `-both`, might have been a better letter choice for the flag designation.\n",
"\n",
"(I didn't see an option in the USAGE infromation in the README to actually do the third option that the web-based PatMatch implementations typically offer, that being searching for the pattern in the reverse complement strand only. The sites must supply the reverse complement data and trigger using that with the `-n` option internally when user's choose just to look at the reverse complement, it seems. And we also have shown here that converting to the reverse complement is easily done within Python as well, and so those options are are available from the command line, too. It's just not as obvious how to that here as with the web-based implementations.)\n",
"(I didn't see an option in the USAGE information in the README to actually do the third option that the web-based PatMatch implementations typically offer, that being searching for the pattern in the reverse complement strand only. The sites must supply the reverse complement data and trigger using that with the `-n` option internally when user's choose just to look at the reverse complement, it seems. And we also have shown here that converting to the reverse complement is easily done within Python as well, and so those options are are available from the command line, too. It's just not as obvious how to that here as with the web-based implementations.)\n",
"\n",
"In summary, this is why I point out in the main notebook that typically you probably want the `-c` flag for nucleic acid pattern matching. Here is that note repeated here:\n",
"\n",
Expand Down

0 comments on commit 88ef335

Please sign in to comment.