# Putative Human – Mouse BRCA1 Orthologs  
Write a program using NCBI's E-Utilities to retrieve the ids of RefSeq human BRCA1 proteins from NCBI. 
* Use the query: "Homo sapiens"[Organism] AND BRCA1[Gene Name] AND REFSEQ
* Extend your program to search these protein ids (one at a time) vs RefSeq proteins (refseq_protein) using the NCBI blast web-service.
* Further extend your program to filter the results for significance (E-value < 1.0e-5) and to extract mouse sequences (match "Mus musculus" in the description).Note: you may need to 
* Request at least 200 alignments from qblast to see the first mouse protein (keyword parameter hitlist_size, default is 50), or __Restrict the qblast search to mouse refseq proteins (keyword parameter entrez_query)__

Need to use the [Biopython guide](http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc91)

Questions
* extend to refseq: search both gb and refseq? or search genes then proteins all in refseq?

In [6]:
from Bio import Entrez, SeqIO
from Bio.Blast import NCBIWWW, NCBIXML
import os.path

Entrez.email = 'michael.chambers2@nih.gov'

handle = Entrez.esearch(
    db='protein',
    term = '"Homo sapiens"[Organism] AND BRCA1[Gene Name] AND REFSEQ',
    usehistory='y'    
)

result = Entrez.read(handle)
handle.close()

id_list = ','.join(result['IdList'])
handle = Entrez.efetch(
    db='protein',
    id=id_list,
    rettype='gb'
)

for gi,r in zip(result['IdList'], SeqIO.parse(handle, 'genbank')):
    print(f'\n*** START: {gi} ***\n')
    print('GI:', gi)
    print('Accession:', r.id)
    print('Description:', r.description)

    print(f"\nBLAST for GI {gi}...\n")
    result_handle = NCBIWWW.qblast(
        'blastp',
        'refseq_protein',
        gi,
        expect=1e-5,
        entrez_query='"Mus musculus"[Organism]'
    )
    
    #blast_results = result_handle.read()
    #result_handle.close()
    
    # parse the file (was 38)
    for blast_result in NCBIXML.parse(result_handle):
        for desc in blast_result.descriptions:
            print('***Alignment***')
            print('Sequence:',desc.title)
            print('Evalue:', desc.e)
            print()
    
    file = f'blastp-np-{gi}.xml'    
    save_file = open(file, 'w')
    save_file.write(result_handle.read())
    result_handle.close()
    print(f'*** END {gi} ***')