Nama : Rika Ajeng Finatih

NIM : 121450036

Kelas : Bioinformatika RB

**Tugas**

Berdasarkan soal latihan sebelumnya anda saat sudah memiliki semua data genbank varian COVID-19. Tugas anda adalah menggabungkan semua data varian menjadi satu dataframe dengan kolom-kolom sebagai berikut:
1. Accession ID
2. Organism
3. Source
4. Submission Year
5. Sequence Length
6. CDS Number
7. Sequence
8. Reference Number

Jawaban dikumpulkan dalam bentuk file dengan format **.ipynb** di classroom

## **Install**

In [4]:
! pip install Biopython
! sudo apt-get install -y clustalw
! chmod +x clustalw2

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
clustalw is already the newest version (2.1+lgpl-7).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.
chmod: cannot access 'clustalw2': No such file or directory


## **Import**

In [5]:
from Bio import Entrez, SeqIO
from Bio.Blast import NCBIWWW, NCBIXML
from Bio.Align.Applications import ClustalwCommandline
import os

## **Topik COVID 19 dengan sekuen ID: NC_045512**

In [8]:
from Bio import Entrez, SeqIO
import pandas as pd

# Siapkan email Entrez (diwajibkan oleh NCBI)
Entrez.email = "rikaaja517@gmail.com"  # Ganti dengan email Anda

# Fungsi untuk mengambil data GenBank berdasarkan istilah pencarian
def fetch_genbank_data(search_term, max_records=10):
    # Cari database nukleotida NCBI
    handle = Entrez.esearch(db="nucleotide", term=search_term, retmax=max_records)
    record = Entrez.read(handle)
    handle.close()

    # Dapatkan daftar ID GenBank
    genbank_ids = record["IdList"]

    # Ambil data GenBank untuk setiap ID
    sequences = []
    for genbank_id in genbank_ids:
        handle = Entrez.efetch(db="nucleotide", id=genbank_id, rettype="gb", retmode="text")
        seq_record = SeqIO.read(handle, "genbank")
        sequences.append(seq_record)
        handle.close()

    return sequences

# Fungsi untuk mengonversi objek SeqRecord ke DataFrame dengan kolom yang diminta
def genbank_to_dataframe(seq_records):
    data = []
    for seq_record in seq_records:
        # Ekstrak fitur CDS (jika ada)
        cds_count = len([feat for feat in seq_record.features if feat.type == "CDS"])

        # Ekstrak nomor referensi (jika ada)
        reference_number = len(seq_record.annotations.get("references", []))

        # Ekstrak sumber jika tersedia
        source = next((feat.qualifiers.get("organism", ["Unknown"])[0] for feat in seq_record.features if feat.type == "source"), "Unknown")

        # Ekstrak tahun pengiriman dari anotasi `tanggal`
        submission_date = seq_record.annotations.get("date", "Unknown")
        submission_year = submission_date.split('-')[-1] if submission_date != "Unknown" else "Unknown"

        # Tambahkan informasi yang relevan
        data.append({
            "Accession ID": seq_record.id,
            "Organism": seq_record.annotations.get("organism", "Unknown"),
            "Source": source,
            "Submission Year": submission_year,
            "Sequence Length": len(seq_record.seq),
            "CDS Number": cds_count,
            "Sequence": str(seq_record.seq),
            "Reference Number": reference_number
        })
    return pd.DataFrame(data)

# Contoh penggunaan
search_term = "SARS-CoV-2[Organism]"  # Anda dapat mengganti ini dengan istilah pencarian Anda yang sebenarnya
sequences = fetch_genbank_data(search_term, max_records=10)  # Ambil 10 record
df = genbank_to_dataframe(sequences)

# Menampilkan DataFrame
print(df)


  Accession ID                                         Organism  \
0   OZ187850.1  Severe acute respiratory syndrome coronavirus 2   
1   OZ187849.1  Severe acute respiratory syndrome coronavirus 2   
2   OZ187848.1  Severe acute respiratory syndrome coronavirus 2   
3   OZ187847.1  Severe acute respiratory syndrome coronavirus 2   
4   OZ187846.1  Severe acute respiratory syndrome coronavirus 2   
5   OZ187845.1  Severe acute respiratory syndrome coronavirus 2   
6   OZ187844.1  Severe acute respiratory syndrome coronavirus 2   
7   OZ187843.1  Severe acute respiratory syndrome coronavirus 2   
8   OZ187842.1  Severe acute respiratory syndrome coronavirus 2   
9   OZ187841.1  Severe acute respiratory syndrome coronavirus 2   

                                            Source Submission Year  \
0  Severe acute respiratory syndrome coronavirus 2            2024   
1  Severe acute respiratory syndrome coronavirus 2            2024   
2  Severe acute respiratory syndrome coronavirus 2  

## **Topik Covid-19 dengan sekuen ID: NC_045512**

In [9]:
from Bio import SeqIO
import pandas as pd
from datetime import datetime

### **Explore single genbank file structure**

**1. Read Genbank file**

In [10]:
file_path = "/content/121450036_Rika Ajeng Finatih.gb"

In [11]:
with open(file_path, "r") as handle:
        seq_record = SeqIO.read(handle, "genbank")

**2. Data Structure**

In [12]:
# bidang anotasi; dic yang menyediakan sejumlah properti untuk seq
print('Annotations dictionary:\n')
print(seq_record.annotations)

print('\nKeys:')
print(seq_record.annotations.keys())

Annotations dictionary:

{'molecule_type': 'ss-RNA', 'topology': 'linear', 'data_file_division': 'VRL', 'date': '18-JUL-2020', 'accessions': ['NC_045512'], 'sequence_version': 2, 'gi': '1798174254', 'keywords': ['RefSeq'], 'source': 'Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)', 'organism': 'Severe acute respiratory syndrome coronavirus 2', 'taxonomy': ['Viruses', 'Riboviria', 'Orthornavirae', 'Pisuviricota', 'Pisoniviricetes', 'Nidovirales', 'Cornidovirineae', 'Coronaviridae', 'Orthocoronavirinae', 'Betacoronavirus', 'Sarbecovirus', 'Severe acute respiratory syndrome-related coronavirus'], 'references': [Reference(title='A new coronavirus associated with human respiratory disease in China', ...), Reference(title='Programmed ribosomal frameshifting in decoding the SARS-CoV genome', ...), Reference(title='The structure of a rigorously conserved RNA element within the SARS virus genome', ...), Reference(title="A phylogenetically conserved hairpin-type 3' untranslated reg

**3. Get Value**

In [13]:
# Mendapatkan nilai kunci tertentu
print('\nGet specific parts of the annotation:\n')
print('Taxonomy:')
print(seq_record.annotations['taxonomy'])

print('\nSource:')
print(seq_record.annotations['source'])

print('\nDate:')
print(seq_record.annotations['date'])

print('\nDate:')
print(seq_record.annotations['references'])

print('\nOriginal:')
print(seq_record.annotations['comment'])


Get specific parts of the annotation:

Taxonomy:
['Viruses', 'Riboviria', 'Orthornavirae', 'Pisuviricota', 'Pisoniviricetes', 'Nidovirales', 'Cornidovirineae', 'Coronaviridae', 'Orthocoronavirinae', 'Betacoronavirus', 'Sarbecovirus', 'Severe acute respiratory syndrome-related coronavirus']

Source:
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

Date:
18-JUL-2020

Date:
[Reference(title='A new coronavirus associated with human respiratory disease in China', ...), Reference(title='Programmed ribosomal frameshifting in decoding the SARS-CoV genome', ...), Reference(title='The structure of a rigorously conserved RNA element within the SARS virus genome', ...), Reference(title="A phylogenetically conserved hairpin-type 3' untranslated region pseudoknot functions in coronavirus RNA replication", ...), Reference(title='Direct Submission', ...), Reference(title='Direct Submission', ...)]

Original:
REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference seque

**3.1 Extract Year**

In [14]:
if "date" in seq_record.annotations:
    date_str = seq_record.annotations["date"]
    try:
        submission_year = datetime.strptime(date_str, "%d-%b-%Y").year
    except ValueError:
        submission_year = "Unknown"  # If date parsing fails
else:
    submission_year = "Unknown"
submission_year

2020

**4. Get feature**

In [15]:
seq_record.features[:5]

[SeqFeature(SimpleLocation(ExactPosition(0), ExactPosition(29903), strand=1), type='source', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(0), ExactPosition(265), strand=1), type="5'UTR"),
 SeqFeature(SimpleLocation(ExactPosition(265), ExactPosition(21555), strand=1), type='gene', qualifiers=...),
 SeqFeature(CompoundLocation([SimpleLocation(ExactPosition(265), ExactPosition(13468), strand=1), SimpleLocation(ExactPosition(13467), ExactPosition(21555), strand=1)], 'join'), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(265), ExactPosition(805), strand=1), type='mat_peptide', qualifiers=...)]

**5. Feature Gen**

In [16]:
gene_features = []
for i in range(len(seq_record.features)):
    if(seq_record.features[i].type == 'gene'):
        gene_features.append(seq_record.features[i])

print(f'Number of gene features: {len(gene_features)}')
gene_features

Number of gene features: 11


[SeqFeature(SimpleLocation(ExactPosition(265), ExactPosition(21555), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(21562), ExactPosition(25384), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(25392), ExactPosition(26220), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(26244), ExactPosition(26472), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(26522), ExactPosition(27191), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(27201), ExactPosition(27387), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(27393), ExactPosition(27759), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(27755), ExactPosition(27887), strand=1), type='gene', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(27893), ExactPosition(28259), strand=1), type='gene', qualifiers=

In [17]:
# Kualifikasi Gen
gene_features[0].qualifiers

{'gene': ['ORF1ab'],
 'locus_tag': ['GU280_gp01'],
 'db_xref': ['GeneID:43740578']}

**6. Feature CDS (Coding Sequence)**

In [18]:
CDS_features = []
for i in range(len(seq_record.features)):
    if(seq_record.features[i].type == 'CDS'):
        CDS_features.append(seq_record.features[i])

print(f"Number of CDS features: {len(CDS_features)}")
CDS_features

Number of CDS features: 12


[SeqFeature(CompoundLocation([SimpleLocation(ExactPosition(265), ExactPosition(13468), strand=1), SimpleLocation(ExactPosition(13467), ExactPosition(21555), strand=1)], 'join'), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(265), ExactPosition(13483), strand=1), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(21562), ExactPosition(25384), strand=1), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(25392), ExactPosition(26220), strand=1), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(26244), ExactPosition(26472), strand=1), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(26522), ExactPosition(27191), strand=1), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(27201), ExactPosition(27387), strand=1), type='CDS', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(27393), ExactPosition(27759), strand=1), type='CDS', qualifiers=...),
 SeqFeature(Simple

**7. Choose CDS and Explore The Feature**

In [19]:
print(f'CDS Qualifier Keys: {CDS_features[0].qualifiers.keys()}\n')

print('Showing First CDS Feature')
print(CDS_features[0].qualifiers) # ordered dictionary

CDS Qualifier Keys: dict_keys(['gene', 'locus_tag', 'ribosomal_slippage', 'note', 'codon_start', 'product', 'protein_id', 'db_xref', 'translation'])

Showing First CDS Feature
{'gene': ['ORF1ab'], 'locus_tag': ['GU280_gp01'], 'ribosomal_slippage': [''], 'note': ['pp1ab; translated by -1 ribosomal frameshift'], 'codon_start': ['1'], 'product': ['ORF1ab polyprotein'], 'protein_id': ['YP_009724389.1'], 'db_xref': ['GI:1796318597', 'GeneID:43740578'], 'translation': ['MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQENWNTKHSSGVTRELMRELNGGAYTRYVDNNFCGPDGYPLECIKDLLARAGKASCTLSEQLDFIDTKRGVYCCREHEHEIAWYTERSEKSYELQTPFEIKLAKKFDTFNGECPNFVFPLNSIIKTIQPRVEKKKLDGFMGRIRSVYPVASPNECNQMCLSTLMKCDHCGETSWQTGDFVKATCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSLAEYHNESGLKTILRKGGRTIAFGGCVFSYVGCHNKCAYWVPRASANIGCNHTGVVGEGSEGLNDNLLEILQKEKVNINIVGDFKLNEEIAIILASFSASTSAFVETVKGLDYKAFKQIVESCGNFKVTKGKAKKGAWNIGEQKSILS

**8. Enumerate The Structure**

In [20]:
for key, value in CDS_features[0].qualifiers.items():
    print(f'{key} : {value}')

gene : ['ORF1ab']
locus_tag : ['GU280_gp01']
ribosomal_slippage : ['']
note : ['pp1ab; translated by -1 ribosomal frameshift']
codon_start : ['1']
product : ['ORF1ab polyprotein']
protein_id : ['YP_009724389.1']
db_xref : ['GI:1796318597', 'GeneID:43740578']
translation : ['MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQENWNTKHSSGVTRELMRELNGGAYTRYVDNNFCGPDGYPLECIKDLLARAGKASCTLSEQLDFIDTKRGVYCCREHEHEIAWYTERSEKSYELQTPFEIKLAKKFDTFNGECPNFVFPLNSIIKTIQPRVEKKKLDGFMGRIRSVYPVASPNECNQMCLSTLMKCDHCGETSWQTGDFVKATCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSLAEYHNESGLKTILRKGGRTIAFGGCVFSYVGCHNKCAYWVPRASANIGCNHTGVVGEGSEGLNDNLLEILQKEKVNINIVGDFKLNEEIAIILASFSASTSAFVETVKGLDYKAFKQIVESCGNFKVTKGKAKKGAWNIGEQKSILSPLYAFASEAARVVRSIFSRTLETAQNSVRVLQKAAITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFGTVYEKLKPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFFKLVNKFLALCADSIIIGGAKLKALNLGETFVTHSKGLYRKCVKSRE

### **Open single record ganbank using function**

In [23]:
from Bio import SeqIO

# Fungsi untuk membuka dan mencetak detail dari file GenBank satu catatan dengan atribut yang diperlukan
def open_single_record_genbank(file_path):
    # Gunakan seqIO. untuk membaca single GenBank record
    with open(file_path, "r") as handle:
        seq_record = SeqIO.read(handle, "genbank")

        # Ekstrak fitur CDS (jika ada)
        cds_count = len([feat for feat in seq_record.features if feat.type == "CDS"])

        # Ekstrak nomor referensi (jika ada)
        reference_number = len(seq_record.annotations.get("references", []))

        # Ekstrak sumber jika tersedia
        source = next((feat.qualifiers.get("organism", ["Unknown"])[0] for feat in seq_record.features if feat.type == "source"), "Unknown")

        # Ekstrak tahun pengiriman dari anotasi `tanggal`
        submission_date = seq_record.annotations.get("date", "Unknown")
        submission_year = submission_date.split('-')[-1] if submission_date != "Unknown" else "Unknown"

        # Cetak informasi yang diminta
        print(f"Accession ID: {seq_record.id}")
        print(f"Organism: {seq_record.annotations.get('organism', 'Unknown')}")
        print(f"Source: {source}")
        print(f"Submission Year: {submission_year}")
        print(f"Sequence Length: {len(seq_record.seq)}")
        print(f"CDS Number: {cds_count}")
        print(f"Sequence: {seq_record.seq[:50]}...")  ## Cetak 50 basis pertama dari urutan tersebut
        print(f"Reference Number: {reference_number}")

        # Print features
        print("\nFeatures:")
        for feature in seq_record.features:
            print(f" - {feature.type} at location {feature.location}")
        print("-" * 40)

# Contoh penggunaan
file_path = "/content/121450036_Rika Ajeng Finatih.gb"  # Ganti dengan jalur single-record GenBank file path
open_single_record_genbank(file_path)


Accession ID: NC_045512.2
Organism: Severe acute respiratory syndrome coronavirus 2
Source: Severe acute respiratory syndrome coronavirus 2
Submission Year: 2020
Sequence Length: 29903
CDS Number: 12
Sequence: ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTC...
Reference Number: 6

Features:
 - source at location [0:29903](+)
 - 5'UTR at location [0:265](+)
 - gene at location [265:21555](+)
 - CDS at location join{[265:13468](+), [13467:21555](+)}
 - mat_peptide at location [265:805](+)
 - mat_peptide at location [805:2719](+)
 - mat_peptide at location [2719:8554](+)
 - mat_peptide at location [8554:10054](+)
 - mat_peptide at location [10054:10972](+)
 - mat_peptide at location [10972:11842](+)
 - mat_peptide at location [11842:12091](+)
 - mat_peptide at location [12091:12685](+)
 - mat_peptide at location [12685:13024](+)
 - mat_peptide at location [13024:13441](+)
 - mat_peptide at location join{[13441:13468](+), [13467:16236](+)}
 - mat_peptide at location [16236:18039](+)
 - 

### **Open multiple record genbank using function**

In [26]:
from Bio import SeqIO

# Fungsi untuk membuka dan mencetak detail file GenBank dengan atribut yang diperlukan
def open_genbank_file(file_path):
    with open(file_path, "r") as handle:
        for seq_record in SeqIO.parse(handle, "genbank"):
            # Ekstrak fitur CDS (jika ada)
            cds_count = len([feat for feat in seq_record.features if feat.type == "CDS"])

            # Ekstrak nomor referensi (jika ada)
            reference_number = len(seq_record.annotations.get("references", []))

            # Ekstrak sumber jika tersedia
            source = next((feat.qualifiers.get("organism", ["Unknown"])[0] for feat in seq_record.features if feat.type == "source"), "Unknown")

            # Ekstrak tahun pengiriman dari anotasi `tanggal`
            submission_date = seq_record.annotations.get("date", "Unknown")
            submission_year = submission_date.split('-')[-1] if submission_date != "Unknown" else "Unknown"

            # Cetak informasi yang diminta
            print(f"Accession ID: {seq_record.id}")
            print(f"Organism: {seq_record.annotations.get('organism', 'Unknown')}")
            print(f"Source: {source}")
            print(f"Submission Year: {submission_year}")
            print(f"Sequence Length: {len(seq_record.seq)}")
            print(f"CDS Number: {cds_count}")
            print(f"Sequence: {seq_record.seq[:50]}...")  # Cetak 50 basis pertama dari urutan tersebut
            print(f"Reference Number: {reference_number}")

            # Print features
            print("\nFeatures:")
            for feature in seq_record.features:
                print(f" - {feature.type} at location {feature.location}")
            print("-" * 40)

# Contoh Penggunaan
file_path = "/content/121450036_Rika Ajeng Finatih.gb"  # Ganti dengan path GenBank file
open_genbank_file(file_path)


Accession ID: NC_045512.2
Organism: Severe acute respiratory syndrome coronavirus 2
Source: Severe acute respiratory syndrome coronavirus 2
Submission Year: 2020
Sequence Length: 29903
CDS Number: 12
Sequence: ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTC...
Reference Number: 6

Features:
 - source at location [0:29903](+)
 - 5'UTR at location [0:265](+)
 - gene at location [265:21555](+)
 - CDS at location join{[265:13468](+), [13467:21555](+)}
 - mat_peptide at location [265:805](+)
 - mat_peptide at location [805:2719](+)
 - mat_peptide at location [2719:8554](+)
 - mat_peptide at location [8554:10054](+)
 - mat_peptide at location [10054:10972](+)
 - mat_peptide at location [10972:11842](+)
 - mat_peptide at location [11842:12091](+)
 - mat_peptide at location [12091:12685](+)
 - mat_peptide at location [12685:13024](+)
 - mat_peptide at location [13024:13441](+)
 - mat_peptide at location join{[13441:13468](+), [13467:16236](+)}
 - mat_peptide at location [16236:18039](+)
 - 

### **Save to dataframe**

In [28]:
from Bio import SeqIO
import pandas as pd
from datetime import datetime

# Fungsi untuk mengurai file GenBank dan mengekstrak data yang relevan
def genbank_to_dataframe_by_year(file_path):
    # Inisialisasi daftar kosong untuk menyimpan data
    data = []

    # Baca file GenBank (multiple records)
    with open(file_path, "r") as handle:
        for seq_record in SeqIO.parse(handle, "genbank"):
            # Ekstrak metadata urutannya
            accession = seq_record.id
            organism = seq_record.annotations.get("organism", "Unknown")

            # Ekstrak sumber dari fitur 'sumber'
            source = next((feat.qualifiers.get("organism", ["Unknown"])[0] for feat in seq_record.features if feat.type == "source"), "Unknown")

            sequence_length = len(seq_record.seq)
            sequence = str(seq_record.seq)

            # Ekstrak tahun pengiriman (dari anotasi 'tanggal')
            if "date" in seq_record.annotations:
                date_str = seq_record.annotations["date"]
                try:
                    submission_year = datetime.strptime(date_str, "%d-%b-%Y").year
                except ValueError:
                    submission_year = "Unknown"  # Jika penguraian tanggal gagal
            else:
                submission_year = "Unknown"

            # Hitung jumlah fitur CDS
            cds_count = len([feat for feat in seq_record.features if feat.type == "CDS"])

            # Ekstrak hitungan referensi
            reference_number = len(seq_record.annotations.get("references", []))

            # Tambahkan data record
            data.append({
                "Accession ID": accession,
                "Organism": organism,
                "Source": source,
                "Submission Year": submission_year,
                "Sequence Length": sequence_length,
                "CDS Number": cds_count,
                "Sequence": sequence,
                "Reference Number": reference_number
            })

    # Buat DataFrame dari data yang dikumpulkan
    df = pd.DataFrame(data)

    # Urutkan DataFrame berdasarkan tahun
    df = df.sort_values(by="Submission Year", ascending=True)

    return df

# Contoh penggunaan
file_path = "/content/121450036_Rika Ajeng Finatih.gb"  # Ganti dengan jalur file GenBank
df = genbank_to_dataframe_by_year(file_path)

# Menampilkan DataFrame
print(df)


  Accession ID                                         Organism  \
0  NC_045512.2  Severe acute respiratory syndrome coronavirus 2   

                                            Source  Submission Year  \
0  Severe acute respiratory syndrome coronavirus 2             2020   

   Sequence Length  CDS Number  \
0            29903          12   

                                            Sequence  Reference Number  
0  ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGA...                 6  


In [29]:
df

Unnamed: 0,Accession ID,Organism,Source,Submission Year,Sequence Length,CDS Number,Sequence,Reference Number
0,NC_045512.2,Severe acute respiratory syndrome coronavirus 2,Severe acute respiratory syndrome coronavirus 2,2020,29903,12,ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGA...,6


In [31]:
# Simpan DataFrame ke file CSV
df.to_csv("genbank_data_for.csv", index=False)