Skip to content

Latest commit

 

History

History
61 lines (31 loc) · 4.47 KB

README.md

File metadata and controls

61 lines (31 loc) · 4.47 KB

CompareFASTA_or_FASTQ utilities

Repo for my own computational resources dealing with comparing sequences in FASTA or FASTQ format (both nucleic and protein), plus give me a place to reference other handy resources.

  • compare_organisms_in_two_files_of_fasta_entries.py

    See my comparing lists resources for related items.

  • score_differences_between_sequences_by_pairwise_alignment.py

    This script takes sequences in FASTA format (single multi-FASTA file) and scores the different to each of the others. It makes a matrix of the differences.

    This is located here in my alignment-utlities sub-repo.

    This script should probably be here in this sub-repo; however, I was thinking about it in context of comparing sequences in a rough manner prior to making a full sequence alignment and put it there.

  • roughly_score_relationships_to_subject_seq_pairwise_premsa.py

    This script takes sequences in FASTA format (single multi-FASTA file) and makes quick assessment of similarity of first sequence to each of the others. It makes a matrix of the differences.

    This is located here in my alignment-utlities sub-repo.

    This script should probably be here in this sub-repo; however, I was thinking about it in context of comparing sequences in a rough manner prior to making a full sequence alignment and put it there.

Related utilities in my other repositories

  • matches_a_patmatch_pattern.py

    This script can take a sequence in FASTA format or as a text string and tell if it contains a match to a pattern in PatMatch syntaz. It doesn't expect multi-FASTA file entries though, and will only use the first sequence if one is used as input. Just reports if matches or not. Really meant to compare a general pattern in PatMatch syntax.
    This is located here in my patmatch-utlities sub-repo.

    See my patmatch-binder if you need to locate matches in FASTA sequences and learn the details.

  • score_differences_between_sequences_by_pairwise_alignment.py

    This script takes sequences in FASTA format (single multi-FASTA file) and scores the different to each of the others. It makes a matrix of the differences.

    This is located here in my alignment-utlities sub-repo.

    This script should probably be here in this sub-repo; however, I was thinking about it in context of comparing sequences in a rough manner prior to making a full sequence alignment and put it there.

  • roughly_score_relationships_to_subject_seq_pairwise_premsa.py

    This script takes sequences in FASTA format (single multi-FASTA file) and makes quick assessment of similarity of first sequence to each of the others. It makes a matrix of the differences.

    This is located here in my alignment-utlities sub-repo.

    This script should probably be here in this sub-repo; however, I was thinking about it in context of comparing sequences in a rough manner prior to making a full sequence alignment and put it there.

  • report_diff_between_two_seq_strings.py

    This script directly compares two Python strings using an approach meant for aligning biological strings. This is like a single, direct version of my script score_differences_between_sequences_by_pairwise_alignment.py.

    This is located here in my compare_biological_seq_strings utilities sub-repo.

    Put there because this sub-repo is meant for when sequences already in FASTA or FASTQ format and I had occaision to need to compare sequences that are strings in dataframe cells that were pulled out has hits by PatMatch.