# 🔎 Accessing NCBI Databases with Biopython

This notebook demonstrates how to use Biopython's `Entrez` module to access biological data from the **NCBI (National Center for Biotechnology Information)**.

We'll:
- Explore available databases
- Search for the **BRCA1** gene in **Homo sapiens**
- Fetch nucleotide sequence records
- Save results locally to avoid repeated querying


## 📡 Step 1: Setup Entrez and Required Modules

In [None]:
from Bio import Entrez, SeqIO
Entrez.email = "your.email@example.com"  # Replace with your real email (required by NCBI policies)
#Entrez.api_key = "<YOUR API KEY HERE>" #Optional

## 📚 Step 2: View Available NCBI Databases

In [None]:
handle = Entrez.einfo()
record = Entrez.read(handle)
databases = record['DbList']
print("Total databases:", len(databases))
print(databases, "...")

## 🔍 Step 3: Search for BRCA1 Gene in Humans
We'll use the **nucleotide** database to search for BRCA1 gene sequences.

In [None]:
search_handle = Entrez.esearch(db="nucleotide", term="BRCA1[Gene Name] AND Homo sapiens[Organism]", retmax=20)
search_results = Entrez.read(search_handle)
id_list = search_results['IdList']
print("Found IDs:", id_list)

## 📥 Step 4: Fetch and Save Records
We fetch the GenBank records and save them to a local file to avoid hitting the NCBI server repeatedly.

In [None]:
if id_list:
    ids = ",".join(id_list)
    fetch_handle = Entrez.efetch(db="nucleotide", id=ids, rettype="gb", retmode="text")
    with open("../data/brca1_records.gb", "w") as out_handle:
        out_handle.write(fetch_handle.read())
    print("Saved fetched records to ../data/brca1_records.gb")
else:
    print("No records found.")

## 📖 Step 5: Read and Inspect Records
Use `SeqIO` to parse the saved file and print sequence info.

In [None]:
records = list(SeqIO.parse("../data/brca1_records.gb", "gb"))
print(f"Loaded {len(records)} records.")
for rec in records:
    print(f"\nID: {rec.id}")
    print(f"Description: {rec.description}")
    print(f"Length: {len(rec.seq)} bp")
    print(f"First 100 bases: {rec.seq[:100]}")