In [1]:
!pip install -q biopython

**Accessing NCBI Databases from a Colab notebook**

NCBI databases (e.g., Pubmed, GenBank) can be accessed though NCBI's data retrieval system, Entrez.

When using Entrez, please observe the following guidelines:


*   Provide your email address to NCBI
*   Max three requests per second
*   For >100 requests, access the system on the weekend/outside peak hours in the US

In [2]:
from Bio import Entrez

Entrez.email = input("Enter your e-mail address: ")

Enter your e-mail address: jacquelinekgrimm@gmail.com


In [3]:
# Download a genome using an accession number
from Bio import SeqIO

accession = input("Enter an accession number: ")
handle = Entrez.efetch(db="nucleotide", id=accession, rettype="fasta", retmode="text")
record = SeqIO.read(handle, "fasta")
handle.close()

Enter an accession number: CP102931


In [4]:
# Print the first 99 bases of the genome
first_99 = record.seq[:99]
print(first_99)

CCGATGAACTCCCGAGAATTTTCAATCGGACTGACACGCGGCACTGCCACGATGAACGGGTTGACTGCCTTTGATGTTCTTCGTTGTTCTTGGAGTGGA


In [5]:
# Count the number of each base in the genome
number_a = record.seq.count("A")
number_t = record.seq.count("T")
number_g = record.seq.count("G")
number_c = record.seq.count("C")

print(f"A: {number_a}")
print(f"T: {number_t}")
print(f"G: {number_g}")
print(f"C: {number_c}")

# Calculate the GC content
gc_content = (number_g + number_c) / len(record.seq) * 100
print(f"GC content: {gc_content:.2f}%")

A: 1233586
T: 1234679
G: 2412254
C: 2414223
GC content: 66.16%


**Find the Complement/Reverse Complement of a DNA sequence**

To find the complement or reverse complement of a DNA string, you can create a Seq object, which has the methods "complement()" and "reverse_complement()".

In [6]:
from Bio.Seq import Seq

# Create a Seq object
my_seq = Seq(first_99)

print(f"DNA Sequence: {first_99}")
print(f"Complement: {my_seq.complement()}")
print(f"Reverse Complement: {my_seq.reverse_complement()}")

DNA Sequence: CCGATGAACTCCCGAGAATTTTCAATCGGACTGACACGCGGCACTGCCACGATGAACGGGTTGACTGCCTTTGATGTTCTTCGTTGTTCTTGGAGTGGA
Complement: GGCTACTTGAGGGCTCTTAAAAGTTAGCCTGACTGTGCGCCGTGACGGTGCTACTTGCCCAACTGACGGAAACTACAAGAAGCAACAAGAACCTCACCT
Reverse Complement: TCCACTCCAAGAACAACGAAGAACATCAAAGGCAGTCAACCCGTTCATCGTGGCAGTGCCGCGTGTCAGTCCGATTGAAAATTCTCGGGAGTTCATCGG


**Translate a DNA or RNA sequence**

You can use BioPython's translate() method to translate DNA or RNA strings:

translate(sequence, table='Standard', stop_symbol='*', to_stop=False)

*   sequence: the DNA/RNA string
*   table: the codon table (see full list [here](https://www.bioinformatics.org/JaMBW/2/3/TranslationTables.html))
*   stop_symbol: symbol indicating a terminator
*   to_stop: if True, stops translating when it hits the first stop codon

In [7]:
from Bio.Seq import translate

print(f"DNA Sequence: {first_99}")

translation = translate(first_99)
print(f"Translation: {translation}")

DNA Sequence: CCGATGAACTCCCGAGAATTTTCAATCGGACTGACACGCGGCACTGCCACGATGAACGGGTTGACTGCCTTTGATGTTCTTCGTTGTTCTTGGAGTGGA
Translation: PMNSREFSIGLTRGTATMNGLTAFDVLRCSWSG


**Saving a FASTA file to Google Drive**

To save a FASTA file to Google Drive, you can temporarily save a version locally, mount Google Drive, and then transfer the file to the desired directory.

In [8]:
# Temporarily save the file locally
with open("genome.fasta", "w") as output_handle:
    SeqIO.write(record, output_handle, "fasta")

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [9]:
# Move the file to Google Drive
import shutil

# Specify where you'd like to save the file
path = '/content/drive/My Drive/fasta/genome.fasta'

# Move the file
shutil.move("genome.fasta", path)

'/content/drive/My Drive/fasta/genome.fasta'