# PatMatch Use on Custom Sequences and Integrating with Python 

The [previous notebook](PatMatch initial demo and introduction.ipynb) used prepared data supplied by the software authors in a `test` directory. The web-based PatMatch offerings listed [here](https://github.com/fomightez/patmatch-binder/#usage) lock you into matching patterns to specific sequencess. With the stand-alone, command line based PatMatch, you can run pattern matching on any sequence you'd like. Here, we will start at square one and work through such an example. 

This will get example sequence data from an external source, prepare it, and then analyze it. Then, subsequent steps will parse the resulting data into something useful in Python. And, even cover how to convert to Excel. 

### Preparing to use PatMatch software on raw sequence data

First, example nucleic acid sequence to work with is retrieved. The sequence files PatMatch works with are FASTA according to the [USAGE](PatMatch initial demo and introduction.ipynb#Usage) information.

Click on the cell below and type `shift-enter` or press `Run` on the toolbar above to get an example file.  
(In Jupyter notebooks running in the Python kernel as indicated in the upper right of this notebook, commands to the shell are prefaced with exclamation points. You'll notice when we switch to dealing with Python directly, this will not be needed.)

In [1]:
!curl -O https://downloads.yeastgenome.org/sequence/S288C_reference/chromosomes/fasta/chrmt.fsa

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 87344  100 87344    0     0  87344      0  0:00:01 --:--:--  0:00:01  169k-- --:--:-- --:--:--     0


<font color="#999">(Alternative ways to import data using the Jupyter environment's graphical user interface will be covered below.)</font>

Check the file listing by executing the next cell to see the FASTA-formatted file has been retrieved. Work through the following cells simialrly.

In [2]:
!ls

chrmt.fsa
PatMatch initial demo and introduction.ipynb
PatMatch nucleic handling flags demystified.ipynb
PatMatch with Python.ipynb


Further following the PatMatch [USAGE](PatMatch initial demo and introduction.ipynb#Usage) information, sequences should be processed so that the lines of sequence data are formatted to one line for handling by PatMatch. The PatMatch authors have provided a utility script for doing that preparation step. The following cell will run that on the example data.

In [3]:
!perl ../patmatch_1.2/unjustify_fasta.pl chrmt.fsa

That will produce a file with `.prepared` appended to the end of the supplied file name. 

Check that file was produced by running checking the file listing again using `!ls`.

In [4]:
!ls

chrmt.fsa
chrmt.fsa.prepared
PatMatch initial demo and introduction.ipynb
PatMatch nucleic handling flags demystified.ipynb
PatMatch with Python.ipynb


Having verified the prepared data file exists, you are ready to run the program to search for a pattern.

### Running PatMatch

The PatMatch [USAGE](PatMatch initial demo and introduction.ipynb#Usage) information says `-n` is for nucleotide pattern match and `-c` is for complementary strand; however, [based on my tests](PatMatch nucleic handling flags demystified.ipynb) it seems that `-c` means it is for the complementary strand **in addition to** the strand in the dataset. 

**<font color="red">Therefore, if you want the pattern search to be performed on BOTH strands of the supplied sequence, as is the default of the web-based PatMatch tools, you actually want to use the `-c` flag when authoring the command.</font>**

If you are curious about this aspect futher, I demonstrate that [here](PatMatch nucleic handling flags demystified.ipynb) and in the course of that cover how to replicate the three options typically offered for strand at PatMatch web-based offerings. Feel free to examine and run that notebook or simply use the `-c` flag if you are trying to scan both strands. 


In [5]:
# !perl ../patmatch_1.2/patmatch.pl -n "DDWDWTAWAAGTARTADDDD" chrmt.fsa.prepared #dataset strand only
!perl ../patmatch_1.2/patmatch.pl -c "DDWDWTAWAAGTARTADDDD" chrmt.fsa.prepared

>ref|NC_001224|:[54852,54833]
AGATATATAAGTAATAGGGG 
>ref|NC_001224|:[78310,78291]
ATTTTTATAAGTAGTATATT 
>ref|NC_001224|:[6458,6477]
TATTATATAAGTAATAAATA 
>ref|NC_001224|:[13345,13364]
ATTGATATAAGTAATAGATA 
>ref|NC_001224|:[32205,32224]
AAATATATAAGTAATAAATT 
>ref|NC_001224|:[34874,34893]
TATTATATAAGTAATATATA 
>ref|NC_001224|:[46067,46086]
ATTAATATAAGTAATATATA 
>ref|NC_001224|:[57996,58015]
ATATATATAAGTAGTAAAAA 
>ref|NC_001224|:[65380,65399]
TGTTATATAAGTAATAATAT 
>ref|NC_001224|:[69701,69720]
TTTATTATAAGTAATAATAT 
>ref|NC_001224|:[73692,73711]
TTAATTAAAAGTAGTATTAA 
>ref|NC_001224|:[77394,77413]
TATATTATAAGTAATAATAA 
>ref|NC_001224|:[82317,82336]
AAATATATAAGTAATAGGGG 
>ref|NC_001224|:[84996,85015]
GATTTTATAAGTAATATAAT 


In [6]:
!ls

chrmt.fsa
chrmt.fsa.prepared
PatMatch initial demo and introduction.ipynb
PatMatch nucleic handling flags demystified.ipynb
PatMatch with Python.ipynb


That is the basics of running PatMatch. There are options you can add to control this mismatch amount and whether to allow insertions,deletiions, or substitutions towards thos mismtaches. Example:

In [7]:
!perl ../patmatch_1.2/patmatch.pl -c "DDWDWTAWAAGTARTADDDD" chrmt.fsa.prepared 1 ids

>ref|NC_001224|:[175,157]
AATGATAAAATAATAAATA 
>ref|NC_001224|:[713,695]
TATAATAAAATAATAAAAA 
>ref|NC_001224|:[970,952]
ATAATTATAATAATAATAA 
>ref|NC_001224|:[994,976]
ATAATTATAATAATAATTA 
>ref|NC_001224|:[1018,1000]
TTAATTATAATAATAATTA 
>ref|NC_001224|:[1195,1177]
TTTTATAAAAGAATATATA 
>ref|NC_001224|:[1478,1459]
GAAATTAAAAATAATAATAA 
>ref|NC_001224|:[2929,2910]
TTATTTATAATTAATAATTT 
>ref|NC_001224|:[3170,3151]
TTATATAAAAATAATATTAA 
>ref|NC_001224|:[3219,3200]
TTATATAAAAATAATATTAA 
>ref|NC_001224|:[3482,3463]
ATTATTAAAAATAATAATAT 
>ref|NC_001224|:[3821,3803]
AAAAAATAAGTAATAGATT 
>ref|NC_001224|:[4004,3986]
AAAATTAAAATAATAATTA 
>ref|NC_001224|:[4098,4080]
TTTAATAAAATAATAAATG 
>ref|NC_001224|:[4119,4100]
TAAATTAAAAATAATAATAA 
>ref|NC_001224|:[4224,4206]
TATATTATAAGAATATAAT 
>ref|NC_001224|:[5880,5861]
ATAATTATAAATAATAAATT 
>ref|NC_001224|:[5923,5904]
TAAAATAAAAATAATAATAA 
>ref|NC_001224|:[6270,6251]
AAAAATATAAATAATATTAA 
>ref|NC_001224|:[6538,6519]
T

>ref|NC_001224|:[48,67]
TATTATAAAAATAATATTTA 
>ref|NC_001224|:[289,308]
ATAATTATAAATAATATAAA 
>ref|NC_001224|:[1586,1604]
TTATATATAATAATATTAT 
>ref|NC_001224|:[1771,1790]
AAATATATAAATAATATAAT 
>ref|NC_001224|:[1798,1816]
AAAAATATAATAATAATAA 
>ref|NC_001224|:[1959,1977]
TAATATAAAATAATAATTA 
>ref|NC_001224|:[2171,2190]
ATTATTAAAAATAATAAAAA 
>ref|NC_001224|:[2199,2219]
TTTAATAAGAAGTAATATTTA 
>ref|NC_001224|:[2922,2941]
ATAAATAAAAATAATAATTT 
>ref|NC_001224|:[3023,3042]
AGTTTTAAAAGTGATAATAT 
>ref|NC_001224|:[4444,4462]
TTATATATAATAATAATAT 
>ref|NC_001224|:[4606,4624]
TATAATATAATAATAATAT 
>ref|NC_001224|:[5031,5050]
TTTAATAAAAATAATAATAT 
>ref|NC_001224|:[5084,5102]
TTATATAAAATAATAATAA 
>ref|NC_001224|:[5346,5364]
TTAAATATAATAATAATTA 
>ref|NC_001224|:[6042,6060]
ATTTATATAATAATAATAT 
>ref|NC_001224|:[6458,6477]
TATTATATAAGTAATAAATA 
>ref|NC_001224|:[6482,6500]
TTTTATATAATAATAATAA 
>ref|NC_001224|:[6551,6569]
AAATTTATAAGAATATGAT 
>ref|NC_001224|:[6996,7014]

See the [USAGE](PatMatch initial demo and introduction.ipynb#Usage) for more information about those options. However, that covers the basics.

With the basics in hand, and using the power of the command line, searches of more sequences or more sequences and more patterns become possible. However, you'll quickly encounter problems handling all those results. As a simple example, we'll use the example pattern matching search we developed above as example for integrating with Python for more efficient handling of the results and to touch upon the advantanges offered by combining with a scripting language.

## Importing PatMatch Results into a Pandas Dataframe and Exporting to Excel

Now that you see what PatMatch is returning as results, you'll probably note that why that looks easy to read for a human, it isn't very computer friendly. Indeed, if you have used the web-based PatMatch offerings, you'll note that they return the results in a table form that is more useful.

Other ways to add the data to the running Binder are available using the file directory dashboard. EXPLAIN HOW TO UPLOAD using JUPYTER GUI.

In [8]:
# from https://stackoverflow.com/a/42703609/8508004
import io
#import pandas as pd
output = !perl ../patmatch_1.2/patmatch.pl -c "DDWDWTAWAAGTARTADDDD" chrmt.fsa.prepared 1 ids
#df = pd.read_table(io.StringIO(output.n))
print(type(output)) # see http://ipython.readthedocs.io/en/stable/api/generated/IPython.utils.text.html#IPython.utils.text.SList
print (output.n)

<class 'IPython.utils.text.SList'>
>ref|NC_001224|:[175,157]
AATGATAAAATAATAAATA 
>ref|NC_001224|:[713,695]
TATAATAAAATAATAAAAA 
>ref|NC_001224|:[970,952]
ATAATTATAATAATAATAA 
>ref|NC_001224|:[994,976]
ATAATTATAATAATAATTA 
>ref|NC_001224|:[1018,1000]
TTAATTATAATAATAATTA 
>ref|NC_001224|:[1195,1177]
TTTTATAAAAGAATATATA 
>ref|NC_001224|:[1478,1459]
GAAATTAAAAATAATAATAA 
>ref|NC_001224|:[2929,2910]
TTATTTATAATTAATAATTT 
>ref|NC_001224|:[3170,3151]
TTATATAAAAATAATATTAA 
>ref|NC_001224|:[3219,3200]
TTATATAAAAATAATATTAA 
>ref|NC_001224|:[3482,3463]
ATTATTAAAAATAATAATAT 
>ref|NC_001224|:[3821,3803]
AAAAAATAAGTAATAGATT 
>ref|NC_001224|:[4004,3986]
AAAATTAAAATAATAATTA 
>ref|NC_001224|:[4098,4080]
TTTAATAAAATAATAAATG 
>ref|NC_001224|:[4119,4100]
TAAATTAAAAATAATAATAA 
>ref|NC_001224|:[4224,4206]
TATATTATAAGAATATAAT 
>ref|NC_001224|:[5880,5861]
ATAATTATAAATAATAAATT 
>ref|NC_001224|:[5923,5904]
TAAAATAAAAATAATAATAA 
>ref|NC_001224|:[6270,6251]
AAAAATATAAATAATATTAA 
>ref|NC_001224|:[6538,6519]
TATTA

In [13]:
!perl ../patmatch_1.2/patmatch.pl -c "DDWDWTAWAAGTARTADDDD" chrmt.fsa.prepared > test.out

In [5]:
#!/usr/bin/env python
# patmatch_results_to_df.py
__author__ = "Wayne Decatur" #fomightez on GitHub
__license__ = "MIT"
__version__ = "0.1.0"


# patmatch_results_to_df.py by Wayne Decatur
# ver 0.1
#
#*******************************************************************************
# Verified compatible with both Python 2.7 and Python 3.6; written initially in 
# Python 3. (See below.)
#
#
# PURPOSE: Takes output from command line-based PatMatch and brings it into 
# Python as a dataframe and saves a file of that dataframe for use elsewhere. 
# Optionally, it can also return that dataframe for use inside a Jupyter 
# notebook.
#
# This script is meant to be a utility script for working with command 
# line-based PatMatch and Python, see a demonstration of use in
# https://github.com/fomightez/patmatch-binder/blob/master/notebooks/PatMatch%20with%20Python.ipynb
# 
# Assumes for nucleic acid patterns, it was run with `-c` flag and tries to 
# assign strand information.
# 
# See https://github.com/fomightez/patmatch-binder about PatMatch.
#
# Written to run from command line or pasted/loaded inside a Jupyter notebook 
# cell. 
#
#
#
#
#
#
# Verified compatible with both Python 2.7 and Python 3.6; written initially in 
# Python 3. 
#
#
# Dependencies beyond the mostly standard libraries/modules:
#
#
#
# VERSION HISTORY:
# v.0.1. basic working version

#
# To do:
# - verify compatible with 2.7 (use output from patmatch-binder) OTHERWISE, FIX TWO NOTES ABOVE ABOUT 2.7
# - add way to bring in pattern searched
# - add in way to signal nucleic or protein because with protein data won't need
#   strand handling
# - incorporate in demo notebook in patmatch-binder; also(?) in that demo binder 
#   show how to bring into Python theweb-based PatMatch data from the xls file?
#
#
#
#
# TO RUN:
# Examples,
# Enter on the command line of your terminal, the line
#-----------------------------------
# python patmatch_results_to_df.py -RESULTS_FILE
#-----------------------------------
# Issue `patmatch_results_to_df.py -h` for details.
# 
#
# When using in a notebook, if you don't specify dataframe objects, , you must
# instead supply strings of file names for the pickled dataframes in the call
# to the main function. 
# To use this after pasting or loading into a cell in a Jupyter notebook, in
# the next cell specify the two dataframes then call the main function similar 
# to below:
# pattern= "DDWDWTAWAAGTARTADDDD"
# df = patmatch_results_to_df("test.out", pattern=pattern)
# df
#
#
#
#
# 
#
'''
CURRENT ACTUAL CODE FOR RUNNING/TESTING IN A NOTEBOOK WHEN LOADED OR PASTED IN 
ANOTHER CELL:
pattern= "DDWDWTAWAAGTARTADDDD"
df = patmatch_results_to_df("test.out", pattern=pattern)
df
'''
#
#
#*******************************************************************************
#





#*******************************************************************************
##################################
#  USER ADJUSTABLE VALUES        #

##################################
#

## Settings and options for output plot 
df_save_as_name = 'patmatch_pickled_df.pkl' # name for saving pickled dataframe

#
#*******************************************************************************
#**********************END USER ADJUSTABLE VARIABLES****************************


















#*******************************************************************************
#*******************************************************************************
###DO NOT EDIT BELOW HERE - ENTER VALUES ABOVE###

import sys
import os
import pandas as pd




#*******************************************************************************
###------------------------'main' function of script---------------------------##

def patmatch_results_to_df(
    results_file, pattern="?", return_df = True, pickle_df=True):
    '''
    Main function of script. 
    It will take a file of results from command line-based PatMatch and make
    a dataframe that will be more useful with Python/othergenetic-oriented 
    scripts.
    Optionally also returns a dataframe of the results data. Meant for use in 
    a Jupyter notebook.
    '''
    # Bring in the necessary data:
    #---------------------------------------------------------------------------

    with open(results_file, 'r') as the_file:
        results = the_file.read()

    # feedback
    sys.stderr.write("Provided results read...")


    # Parse:
    #---------------------------------------------------------------------------
    results = results.split('>')
    # remove blanks
    results = [x for x in results if x]

    # prepare to give some unique indentifiers to each match
    identifiers=[]
    if pattern == "?":
        id_prefix = "pattern-"
    elif len(pattern) > 29:
        id_prefix = pattern[:25] + "...-"
    else:
        id_prefix = pattern + "-"

    matching_patterns = []
    starts = []
    ends = []
    strand_info = []
    for indx,each in enumerate(results):
        each_part = each.split()
        first_line, matching_pattern = each_part[0].strip(),each_part[1].strip()
        # Because I wanted to include `.strip()`, it seemed I couldn't do last
        # two lines all as `first_line, matching_pattern= each.split()[:2]`

        # Parse numbers between the brackets in first line(first, I will split
        # on the ':' just in case the extracted first line has `[]` in first 
        # part)
        second_half = first_line.split(":")[1]
        nums_str = second_half[second_half.find("[")+1:second_half.find("]")]
        nums = nums_str.split(",")[:2]
        start,end = nums[0],nums[1]
        # Determine strand. 
        # I noticed when `-c` flag is used the first number in the interval 
        # returned to indicate the location will be larger than the second for 
        # those on the negative strand. 
        if start > end:
            strand = -1
            # fix start and end so start is actually lowest value to be 
            # consistent with system I have been using of late (like Ensembl)
            start, end = end,start

        else:
            strand = 1
        assert start < end ,"The 'start' value should be lower; strand \
        information is handled by `strand` property."
        identifiers.append(id_prefix+str(indx+1)) #`+1` so numbering more 
        # tpyical than python zero-indexing
        matching_patterns.append(matching_pattern)
        starts.append(start)
        ends.append(end)
        strand_info.append(strand)

    # Make collected results into dataframe and improve on it
    #---------------------------------------------------------------------------
    df = pd.DataFrame(list(zip(
        identifiers, starts, ends,strand_info, matching_patterns)),
        columns=['seq_id', 'start','end','strand','matching pattern'])
    # add query pattern as a column
    df['query pattern'] = pattern

    # better re-order the columns(?)


    #print(updated_sites_df)#originally for debugging during development,added..
    # Document the full set of data collected in the terminal or 
    # Jupyter notebook display in some manner. 
    # Using `df.to_string()` because more universal than `print(df)` 
    # or Jupyter's `display(df)`.
    sys.stderr.write( "\nFor documenting purposes, the following lists the "
        "parsed data:\n")
    #with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    #    display(df)
    sys.stderr.write(df.to_string())

    # Handle pickling the modified sites dataframe
    if pickle_df == False:
        sys.stderr.write("\n\nA dataframe of the parsed data shown above "
        "was not stored for use\nelsewhere "
        "because `no_pickling` was specified in place of the output file name.")
    else:
        df.to_pickle(df_save_as_name )
        # Let user know
        sys.stderr.write( "\n\nA dataframe of the parsed data shown above "
        "has been\nsaved as a file in a manner where other "
        "Python programs\ncan access it (pickled form).\n"
        "RESULTING DATAFRAME is stored as ==> '{}'".format(df_save_as_name ))

    # optionally, return df
    if return_df:
        sys.stderr.write( "\n\nReturning a dataframe with the information "
                "as well.")
        return df

###--------------------------END OF MAIN FUNCTION----------------------------###
###--------------------------END OF MAIN FUNCTION----------------------------###












#*******************************************************************************
###------------------------'main' section of script---------------------------##

def main():
    """ Main entry point of the script """
    # placing actual main action in a 'helper'script so can call that easily 
    # with a distinguishing name in Jupyter notebooks, where `main()` may get
    # assigned multiple times depending how many scripts imported/pasted in.
    kwargs = {}
    if args.pattern:
        kwargs['pattern'] = args.pattern
    else:
        kwargs['pattern'] = '?'
    if df_save_as_name == 'no_pickling':
        kwargs['pickle_df'] = False
    kwargs['return_df'] = False #probably don't want dataframe returned if 
    # calling script from command line
    patmatch_results_to_df(results_file,**kwargs)
    # using https://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/#calling-a-function
    # to build keyword arguments to pass to the function above
    # (see https://stackoverflow.com/a/28986876/8508004 and
    # https://stackoverflow.com/a/1496355/8508004 
    # (maybe https://stackoverflow.com/a/7437238/8508004 might help too) for 
    # related help)





if __name__ == "__main__" and '__file__' in globals():
    """ This is executed when run from the command line """
    # Code with just `if __name__ == "__main__":` alone will be run if pasted
    # into a notebook. The addition of ` and '__file__' in globals()` is based
    # on https://stackoverflow.com/a/22923872/8508004
    # See also https://stackoverflow.com/a/22424821/8508004 for an option to 
    # provide arguments when prototyping a full script in the notebook.
    ###-----------------for parsing command line arguments-----------------------###
    import argparse
    parser = argparse.ArgumentParser(prog='patmatch_results_to_df.py',
        description="patmatch_results_to_df.py \
        Takes output from command line-based PatMatch and brings it into \
        Python as a dataframe and saves a file of that dataframe for use \
        elsewhere. Optionally, it can also return that dataframe for use \
        inside a Jupyter notebook. Meant to be a utility script for working \
        with command line-based PatMatch and Python.\
        Assumes for nucleic acid patterns, it was run with `-c` flag and tries \
        to assign strand. \
        **** Script by Wayne Decatur   \
        (fomightez @ github) ***")

    parser.add_argument("results_file", help="Name of file of PatMatch results \
        file to parse.\
        ", metavar="RESULTS_FILE")
    parser.add_argument('-patt', '--pattern', action='store', type=str, 
        help="**OPTIONAL** Pattern used to perform the pattern matching search \
        that generated the results. The resulting dataframe will more \
        informative if one is provided; however, it is not essential. To \
        provide the pattern simply enter the text after the flag. For example, \
        if the search had been for an EcoRI site, include `--pattern GAATTC`, \
        without quotes or ticks, in the call to the script.")
    parser.add_argument("-p", "--protein_results",help=
    "add this flag to indicate the data are from a pattern match of protein \
    sequences. Otherwise it assumed the results are from pattern matching on \
    nucleic acid sequences.", action="store_true")
    parser.add_argument('-dfo', '--df_output', action='store', type=str, 
    default= df_save_as_name, help="OPTIONAL: Set file name for saving pickled \
    dataframe. If none provided, '{}' will be used. To force no dataframe to \
    be saved, enter `-dfo no_pickling` without quotes as output file \
    (ATYPICAL).".format(df_save_as_name))




    #I would also like trigger help to display if no arguments provided because 
    # need at least one for url
    if len(sys.argv)==1:    #from http://stackoverflow.com/questions/4042452/display-help-message-with-python-argparse-when-script-is-called-without-any-argu
        parser.print_help()
        sys.exit(1)
    args = parser.parse_args()
    results_file= args.results_file
    df_save_as_name = args.df_output


    main()

#*******************************************************************************
###-***********************END MAIN PORTION OF SCRIPT***********************-###
#*******************************************************************************




In [8]:
pattern= "DDWDWTAWAAGTARTADDDD"
df = patmatch_results_to_df("test.out", pattern=pattern)
df

Provided results read...
For documenting purposes, the following lists the parsed data:
                     seq_id  start    end  strand      matching pattern         query pattern
0    DDWDWTAWAAGTARTADDDD-1  54833  54852      -1  AGATATATAAGTAATAGGGG  DDWDWTAWAAGTARTADDDD
1    DDWDWTAWAAGTARTADDDD-2  78291  78310      -1  ATTTTTATAAGTAGTATATT  DDWDWTAWAAGTARTADDDD
2    DDWDWTAWAAGTARTADDDD-3   6458   6477       1  TATTATATAAGTAATAAATA  DDWDWTAWAAGTARTADDDD
3    DDWDWTAWAAGTARTADDDD-4  13345  13364       1  ATTGATATAAGTAATAGATA  DDWDWTAWAAGTARTADDDD
4    DDWDWTAWAAGTARTADDDD-5  32205  32224       1  AAATATATAAGTAATAAATT  DDWDWTAWAAGTARTADDDD
5    DDWDWTAWAAGTARTADDDD-6  34874  34893       1  TATTATATAAGTAATATATA  DDWDWTAWAAGTARTADDDD
6    DDWDWTAWAAGTARTADDDD-7  46067  46086       1  ATTAATATAAGTAATATATA  DDWDWTAWAAGTARTADDDD
7    DDWDWTAWAAGTARTADDDD-8  57996  58015       1  ATATATATAAGTAGTAAAAA  DDWDWTAWAAGTARTADDDD
8    DDWDWTAWAAGTARTADDDD-9  65380  65399       1  TGTTATATAAGTAAT

Unnamed: 0,seq_id,start,end,strand,matching pattern,query pattern
0,DDWDWTAWAAGTARTADDDD-1,54833,54852,-1,AGATATATAAGTAATAGGGG,DDWDWTAWAAGTARTADDDD
1,DDWDWTAWAAGTARTADDDD-2,78291,78310,-1,ATTTTTATAAGTAGTATATT,DDWDWTAWAAGTARTADDDD
2,DDWDWTAWAAGTARTADDDD-3,6458,6477,1,TATTATATAAGTAATAAATA,DDWDWTAWAAGTARTADDDD
3,DDWDWTAWAAGTARTADDDD-4,13345,13364,1,ATTGATATAAGTAATAGATA,DDWDWTAWAAGTARTADDDD
4,DDWDWTAWAAGTARTADDDD-5,32205,32224,1,AAATATATAAGTAATAAATT,DDWDWTAWAAGTARTADDDD
5,DDWDWTAWAAGTARTADDDD-6,34874,34893,1,TATTATATAAGTAATATATA,DDWDWTAWAAGTARTADDDD
6,DDWDWTAWAAGTARTADDDD-7,46067,46086,1,ATTAATATAAGTAATATATA,DDWDWTAWAAGTARTADDDD
7,DDWDWTAWAAGTARTADDDD-8,57996,58015,1,ATATATATAAGTAGTAAAAA,DDWDWTAWAAGTARTADDDD
8,DDWDWTAWAAGTARTADDDD-9,65380,65399,1,TGTTATATAAGTAATAATAT,DDWDWTAWAAGTARTADDDD
9,DDWDWTAWAAGTARTADDDD-10,69701,69720,1,TTTATTATAAGTAATAATAT,DDWDWTAWAAGTARTADDDD


In [24]:
# %load https://gist.githubusercontent.com/fomightez/b012e51ebef6ec58c1515df3ee0c850a/raw/300da6c67ceeaf5384a3e500648b993345c361cb/run_every_eight_mins.py
import time

def executeSomething():
    #code here
    print ('.')
    time.sleep(480) #60 seconds times 8 minutes

while True:
    executeSomething()

.


KeyboardInterrupt: 

4