# Bioinformática - Eng. Biomédica - Análise de dois genes de interesse potencialmente relacionados com a diabetes tipo 2- 2022/2023

Trabalho realizado por: Ana Luísa Moreira de Sá (PG49857) José Miguel Moreira Santos (PG51190) Liliana Lima Brito (PG49866)

## Análise das Sequências (GenBank)

### Gene KCNJ11

In [45]:
from Bio import SeqIO

In [60]:
record = SeqIO.read("../sequences/KCNJ11.gb", "genbank")

In [61]:
# Análise de algumas features em específico

print("ID: ", record.id)
print("Accession code: ", record.name)
print("Size: ", len(record.seq))
print("Description: ", record.description)
print("Organism: ", record.annotations["organism"])
print ("Taxonomy: ", record.annotations["taxonomy"])
print ("Molecule type: ", record.annotations["molecule_type"])
print("\nSequence: ", record.seq)


# Análise geral das features

print("\nNumber of features: ", len(record.features),)

features_types = []
for feature in record.features:
    if feature.type not in features_types:
        features_types.append(feature.type)
        
print("Types of features: ", features_types)


# Análise de features de segmentos codificantes da sequência (CDS)

featcds = [ ]
for feature in record.features:
    if feature.type == "CDS":
        featcds.append(feature)
print("\nNumber of CDSs in sequence: ",len(featcds))
for cds in featcds:
    print (featcds.index(cds)+1, ": CDS starts at position:",cds.location.start, "and ends at position ",cds.location.end,"(", cds.location,")")


# Análise das proteínas codificadas pelos CDS identificados anteriormente

print("\n\nProteins codified by CDSs presented before: ")
for i in range(len(featcds)):
    print("\n", i+1, "- Protein/biological meaning:", featcds[i].qualifiers["product"] )
    print("\nExternal references: ", featcds[i].qualifiers['db_xref'])
    print("\nProtein translation (aa sequence): ", featcds[i].qualifiers["translation"])
    

# Análise de features do tipo gene

feat_gene = []
for i in range(len(record.features)):
    if record.features[i].type == "gene":
        feat_gene.append(i)
print("\n\nNumber of annotated genes: ", len(feat_gene))

for g in feat_gene:
    print("\n", feat_gene.index(g)+1, "- Strand location of gene: ", record.features[g].location.strand)
    print("Gene: ", record.features[g].qualifiers["gene"])

ID:  NC_000011.10
Accession code:  NC_000011
Size:  4099
Description:  Homo sapiens chromosome 11, GRCh38.p14 Primary Assembly
Organism:  Homo sapiens
Taxonomy:  ['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo']
Molecule type:  DNA

Sequence:  CCCGTTCCTCTCCTCGTGCGCCCCCCTCCCGCCGTCCTAGACCCCTGCCTAGCCCAGGTCGGTCTCCGCGGACCCACGGACGGACAGACAGACGGGAGGACGGCCAGCCGCGAGCGCCCGGGCGGCGGGAGGGGGCGGGGAGGCGACGGCCGTGGCGTGAGGAGCAGGAGCAGGTGCAGCGGCGGCGGCGGGCGGGGCCGGGACCCGGCGCGGAGCGGGAGCCGCGGCGCGGGCGGGCGGCAGGGACCGGGAGGCCGCGACTCGGAGTCAGCCCCGCCGGGTCGCGCGCAGGTCCGGGGAGCCGCGGTTGAGCCGGGTGGGGTGGTGACTCCAGAGAACGCAGGATCCCAAGGAGACAGAGAGGACGAGAGCTGGAGGGGGATCCGGAAAGCGGCGGGGGCGCTCCGGGAGGGGTGGAGTAGGACATAGGGGGCGCACCTGGAGGAGAGACGGGGCGGGGGTGGCCAGGACCTGAGCTGGAGCCTGGGAGCCCGAAGGCCAGACAGGTGAGGCGGGAGACCCGGAGGTGGGGGTGAGGTCCGGTTAGTGGGAGAGATCCGGAGGTGTTAAGTTCTGAGCTGGGCTGGGAAGGCAGGCTGGGCGGGGAGAAGGGCTCTTAGC

### Gene KCNQ1

In [62]:
record = SeqIO.read("../sequences/KCNQ1.gb", "genbank")

In [63]:
# Análise de algumas features em específico

print("ID: ", record.id)
print("Accession code: ", record.name)
print("Size: ", len(record.seq))
print("Description: ", record.description)
print("Organism: ", record.annotations["organism"])
print ("Taxonomy: ", record.annotations["taxonomy"])
print ("Molecule type: ", record.annotations["molecule_type"])
print("\nSequence: ", record.seq[:5000], "...")


# Análise geral das features

print("\nNumber of features: ", len(record.features),)

features_types = []
for feature in record.features:
    if feature.type not in features_types:
        features_types.append(feature.type)
        
print("Types of features: ", features_types)


# Análise de features de segmentos codificantes da sequência (CDS)

featcds = [ ]
for feature in record.features:
    if feature.type == "CDS":
        featcds.append(feature)
print("\nNumber of CDSs in sequence: ",len(featcds))
for cds in featcds:
    print (featcds.index(cds)+1, ": CDS starts at position:",cds.location.start, "and ends at position ",cds.location.end,"(", cds.location,")")


# Análise das proteínas codificadas pelos CDS identificados anteriormente

print("\n\nProteins codified by CDSs presented before: ")
for i in range(len(featcds)):
    print("\n", i+1, "- Protein/biological meaning:", featcds[i].qualifiers["product"] )
    print("\nExternal references: ", featcds[i].qualifiers['db_xref'])
    print("\nProtein translation (aa sequence): ", featcds[i].qualifiers["translation"])
    

# Análise de features do tipo gene

feat_gene = []
for i in range(len(record.features)):
    if record.features[i].type == "gene":
        feat_gene.append(i)
print("\n\nNumber of annotated genes: ", len(feat_gene))

for g in feat_gene:
    print("\n", feat_gene.index(g)+1, "- Strand location of gene: ", record.features[g].location.strand)
    print("Gene: ", record.features[g].qualifiers["gene"])

ID:  NC_000011.10
Accession code:  NC_000011
Size:  404103
Description:  Homo sapiens chromosome 11, GRCh38.p14 Primary Assembly
Organism:  Homo sapiens
Taxonomy:  ['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Euarchontoglires', 'Primates', 'Haplorrhini', 'Catarrhini', 'Hominidae', 'Homo']
Molecule type:  DNA

Sequence:  AGTGGCTGCCCGCACTGCGCCCGGGCGCTCGCCTTCGCTGCAGCTCCCGGTGCCGCCGCTCGGGCCGGCCCCCCGGCAGGCCCTCCTCGTTATGGCCGCGGCCTCCTCCCCGCCCAGGGCCGAGAGGAAGCGCTGGGGTTGGGGCCGCCTGCCAGGCGCCCGGCGGGGCAGCGCGGGCCTGGCCAAGAAGTGCCCCTTCTCGCTGGAGCTGGCGGAGGGCGGCCCGGCGGGCGGCGCGCTCTACGCGCCCATCGCGCCCGGCGCCCCAGGTCCCGCGCCCCCTGCGTCCCCGGCCGCGCCCGCCGCGCCCCCAGTTGCCTCCGACCTTGGCCCGCGGCCGCCGGTGAGCCTAGACCCGCGCGTCTCCATCTACAGCACGCGCCGCCCGGTGTTGGCGCGCACCCACGTCCAGGGCCGCGTCTACAACTTCCTCGAGCGTCCCACCGGCTGGAAATGCTTCGTTTACCACTTCGCCGTGTGAGTATCGCCACCGGCGACGGCCGGCACGAAGGTGCTTCCTGAGAGCTGGTGTGGGGGAGCTCTGTCCCAGCGCCACCTGCCCCGTCGGAGCTGCGACCCCGGAGCAGAGGAGGGAAGGAAGTGGGGAAACGCAGAAACA