Skip to content
This repository has been archived by the owner on Mar 14, 2023. It is now read-only.

Palindrome tree - tool for analyzing inverted repeats in various DNA sequences using decision trees

License

Notifications You must be signed in to change notification settings

patrikkaura/palindrome-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Palindrome tree



Palindrome tree tool is used for analyzing inverted repeats in various DNA sequences using decision trees. This tool takes provided sequences and finds interesting parts in which there's high probability of palindrome occurrence using decision tree. This process filters a big portion of data. Interesting data are then analyzed using API from Palindrome Analyzer. DNA Analyser is a web-based server for nucleotide sequence analysis. It has been developed thanks to cooperation of Department of Informatics, Mendel’s University in Brno and Institute of Biophysics, Academy of Sciences of the Czech Republic.

Requirements

Palindrome tree was built with Python 3.7+.

Installation

To install palindrome tree use Pypi repository.

pip install palindrome-tree

Usage

User has to initialize palindrome tree analyzer instance which is imported from main package palindrome_tree.

from palindrome_tree import PalindromeTree

tree = PalindromeTree()

Predict regions (without API validation)

To predict regions with possible palindromes, run analyse without setting check_with_api paramether.

from palindrome_tree import PalindromeTree

sequence_file = open("/path/to/sequence/name.txt", "r")

tree = PalindromeTree()

tree.analyse(
    sequence=sequence_file.read(),
)

tree.results

The results are then stored in results variable as pd.DataFrame.

position sequence
0 8 TTTGTAGAGACAGGGTCTTGCTGTGTTTCC
1 10 TGTAGAGACAGGGTCTTGCTGTGTTTCCCA
2 49 CGAACTCCTGGCCTCTAGGCAATCCTCCCA
3 102 ATCCCACTCTTTTTTGAAAAATAAAATCTA
4 105 CCACTCTTTTTTGAAAAATAAAATCTACCA

Predict regions (with API validation)

To predict regions with possible palindromes and afterward validation, run analyse with check_with_api paramether set.

from palindrome_tree import PalindromeTree

sequence_file = open("/path/to/sequence/name.txt", "r")

tree = PalindromeTree()

tree.analyse(
    sequence=sequence_file.read(),
    validate_with_api=True,
)

tree.validated_results

The results are also stored in results variable as pd.DataFrame.

original_index after before mismatches opposite position sequence signature spacer stability_NNModel
0 0 CC TTTGT 2 CTGTGTTT 5 AGAGACAG 8-7-2 GGTCTTG {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}
1 0 TGCTG TTTGT 2 GGGTCT 5 AGAGAC 6-1-2 A {'cruciform': -2.54, 'linear': -13.84, 'delta': 11.3}
2 0 GTGTT TGTAG 2 CTTGCT 7 AGACAG 6-3-2 GGT {'cruciform': -1.94, 'linear': -17.509999999999998, 'delta': 15.569999999999999}
3 0 TTCC TAGAG 2 CTGTGT 9 ACAGGG 6-5-2 TCTTG {'cruciform': -3.7399999999999998, 'linear': -20.99, 'delta': 17.25}
4 1 CCCA TGT 2 CTGTGTTT 3 AGAGACAG 8-7-2 GGTCTTG {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}

Dependencies

  • xgboost = "^1.5.1"
  • pandas = "^1.3.5"
  • scikit-learn = "^1.0.2"
  • requests = "^2.26.0"

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Palindrome tree - tool for analyzing inverted repeats in various DNA sequences using decision trees

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages