# Sequence Quality and Basic Analysis

Before performing any functional prediction or similarity search, the sequence must be evaluated for basic quality and biological plausibility.

This step ensures that the selected protein sequence is suitable for downstream bioinformatics analysis.

In [1]:
from Bio import SeqIO
from collections import Counter

In [2]:
record = SeqIO.read("../data/input_sequence.fasta", "fasta")
record

SeqRecord(seq=Seq('MIWKKLFNFGRSKETEFIVEQSGISQEALAIVKRSINGRIHPFYKVDLYTEKPA...WWD'), id='tr|A0A0A8X4J6|A0A0A8X4J6_MESS1', name='tr|A0A0A8X4J6|A0A0A8X4J6_MESS1', description='tr|A0A0A8X4J6|A0A0A8X4J6_MESS1 Hypothetical cytosolic protein OS=Mesobacillus selenatarsenatis (strain DSM 18680 / JCM 14380 / FERM P-15431 / SF-1) OX=1321606 GN=SAMD00020551_3039 PE=4 SV=1', dbxrefs=[])

In [3]:
print("Sequence ID:", record.id)
print("Description:", record.description)
print("Sequence length:", len(record.seq))
print("First 30 amino acids:", record.seq[:30])

Sequence ID: tr|A0A0A8X4J6|A0A0A8X4J6_MESS1
Description: tr|A0A0A8X4J6|A0A0A8X4J6_MESS1 Hypothetical cytosolic protein OS=Mesobacillus selenatarsenatis (strain DSM 18680 / JCM 14380 / FERM P-15431 / SF-1) OX=1321606 GN=SAMD00020551_3039 PE=4 SV=1
Sequence length: 201
First 30 amino acids: MIWKKLFNFGRSKETEFIVEQSGISQEALA


In [4]:
aa_counts = Counter(record.seq)
aa_counts

Counter({'I': 23,
         'E': 22,
         'K': 20,
         'L': 14,
         'G': 12,
         'F': 11,
         'S': 11,
         'D': 11,
         'V': 10,
         'A': 10,
         'Q': 9,
         'Y': 9,
         'N': 8,
         'R': 8,
         'P': 7,
         'W': 5,
         'T': 5,
         'C': 3,
         'M': 2,
         'H': 1})

## Sequence Quality Assessment

The protein sequence is 201 amino acids long, which is sufficient for meaningful homology-based analysis.
The amino acid composition appears biologically plausible, with no unusual or invalid residues.

Based on these observations, the sequence passes basic quality checks and is suitable for downstream analysis.