## Analysis of human mitochondrial genome sequence accession records downloaded from Mitobank (genbank_ids.txt) and Accession records downloaded from Nucleotide database using query: 
```Homo sapiens[ORGN] AND complete genome[TITLE] AND mitochondrion[FILT] AND 015400:016700[SLEN] NOT (unverified OR Homo sp. Altai OR Denisova hominin OR neanderthalensis OR heidelbergensis OR consensus)```

### Reading accession records downloaded from nucleotide database

In [1]:
# Open the file in read mode
with open('homosapian_nucleotide_accession_list.txt', 'r') as file:
    # Read all lines and split them based on newline
    accession_content = file.read()
    accession_record_list_from_query = accession_content.splitlines()


print(f"Total count: {len(accession_record_list_from_query)}")
print(f"Few sample records: {accession_record_list_from_query[:10]}")

Total count: 62167
Few sample records: ['PQ963025.1', 'PQ868607.1', 'PQ868590.1', 'KF040496.1', 'PQ727057.1', 'PQ631132.1', 'MZ895063.1', 'MZ895062.1', 'MZ895061.1', 'MZ895060.1']


When esearch query was executed, it showed 62173 records. But when accession ids are fetched, it has 62167 records. 6 records are missing. 

- **script**: 
```
esearch -db nucleotide -query "\"Homo sapiens\"[ORGN] \
    AND \"complete genome\"[TITLE] AND mitochondrion[FILT] AND 015400:016700[SLEN] \
    NOT (unverified OR \"Homo sp. Altai\" OR \"Denisova hominin\" OR neanderthalensis OR heidelbergensis OR consensus)"
```

- **Result**: 
```
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_67a439bc3fec16fa56052504</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>62173</Count>
  <Step>1</Step>
  <Elapsed>1</Elapsed>
</ENTREZ_DIRECT>
```

### Reading Accession Ids downloaded from mitobank.

In [2]:
# Open the file in read mode
with open('genbank_ids.txt', 'r') as file:
    # Read all lines and split them based on newline
    genbank_content = file.read()
    accession_record_list_from_mitobank = genbank_content.splitlines()

item_to_remove = "genbank_id"
if item_to_remove in accession_record_list_from_mitobank:
    accession_record_list_from_mitobank.remove(item_to_remove)

print(f"Total count: {len(accession_record_list_from_mitobank)}")
print(f"Few sample records: {accession_record_list_from_mitobank[:10]}")

Total count: 61844
Few sample records: ['AB055387.1', 'AB626609.1', 'AB626610.1', 'AF346963.1', 'AF346964.1', 'AF346965.1', 'AF346966.1', 'AF346967.1', 'AF346968.1', 'AF346969.1']


### Common records in both **genbank_ids.txt** and **homosapian_nucleotide_accession_list.txt**

In [3]:
common_accession_ids_in_mitobank_and_query = [item for item in accession_record_list_from_mitobank if item in accession_record_list_from_query]

In [4]:
print(f"Total count of accssions ids that present in both genbank(Mitobank) and in extrected record:\n{len(common_accession_ids_in_mitobank_and_query)}")


Total count of accssions ids that present in both genbank(Mitobank) and in extrected record:
60676


### Accession Ids present in GenBank (Mitobank) but not in the records extracted from Nucleotide DB using query

In [5]:
accession_ids_in_genbank_only = [item for item in accession_record_list_from_mitobank if item not in accession_record_list_from_query]

print(f"Total count of accssions ids that present in genbank(Mitobank) but not in extrected record:\n {len(accession_ids_in_genbank_only)}")
print(f"Unique records in genbank only: \n {accession_ids_in_genbank_only}")

Total count of accssions ids that present in genbank(Mitobank) but not in extrected record:
 1168
Unique records in genbank only: 
 ['AJ842744.1', 'AJ842745.1', 'AJ842746.1', 'AJ842747.1', 'AJ842748.1', 'AJ842749.1', 'AJ842750.1', 'AJ842751.1', 'AM260558.1', 'AM260559.1', 'AM260560.1', 'AM260561.1', 'AM260562.1', 'AM260563.1', 'AM260564.1', 'AM260565.1', 'AM260566.1', 'AM260567.1', 'AM260568.1', 'AM260569.1', 'AM260570.1', 'AM260571.1', 'AM260572.1', 'AM260573.1', 'AM260574.1', 'AM260575.1', 'AM260576.1', 'AM260577.1', 'AM260578.1', 'AM260579.1', 'AM260580.1', 'AM260581.1', 'AM260582.1', 'AM260583.1', 'AM260584.1', 'AM260585.1', 'AM260586.1', 'AM260587.1', 'AM260588.1', 'AM260589.1', 'AM260590.1', 'AM260591.1', 'AM260592.1', 'AM260593.1', 'AM260594.1', 'AM260595.1', 'AM260596.1', 'AM260597.1', 'AM260598.1', 'AM260599.1', 'AM260600.1', 'AM260601.1', 'AM260602.2', 'AM260603.1', 'AM260604.1', 'AM260605.1', 'AM260606.1', 'AM260607.1', 'AM260608.1', 'AM260609.1', 'AM260610.1', 'AM260611.1',

### Accession Ids present in records extracted from Nucleotide DB using query but not in the GenBank (Mitobank)

In [6]:
accession_ids_in_exrtracted_accession_only = [item for item in accession_record_list_from_query if item not in accession_record_list_from_mitobank]

In [10]:
print(f"Total count of accssions ids that present in extrected record but not in genbank(Mitobank):\n {len(accession_ids_in_exrtracted_accession_only)}")
print(f"Unique records in extracted records using query: \n {sorted(accession_ids_in_exrtracted_accession_only)}")

Total count of accssions ids that present in extrected record but not in genbank(Mitobank):
 1491
Unique records in extracted records using query: 
 ['CP068254.1', 'EF657579.1', 'EF657704.1', 'EF660970.1', 'EF660972.1', 'EF660976.1', 'EF660979.1', 'EF660980.1', 'EF660984.1', 'EF660988.1', 'EF660989.1', 'EF660990.1', 'EF660991.1', 'EF660996.1', 'EF660997.1', 'EF660998.1', 'EF660999.1', 'EF661001.1', 'EF661002.1', 'EF661012.1', 'EF661013.1', 'EU170362.1', 'EU370391.1', 'EU370392.1', 'EU370393.1', 'EU370394.1', 'EU370395.1', 'EU370396.1', 'EU370397.1', 'EU597486.1', 'EU597489.1', 'EU597490.1', 'EU597491.1', 'EU597492.1', 'EU597493.1', 'EU597494.1', 'EU597495.1', 'EU597496.1', 'EU597497.1', 'EU597498.1', 'EU597501.1', 'EU597502.1', 'EU597503.1', 'EU597504.1', 'EU597505.1', 'EU597506.1', 'EU597507.1', 'EU597508.1', 'EU597509.1', 'EU597510.1', 'EU597513.1', 'EU597514.1', 'EU597515.1', 'EU597516.1', 'EU597517.1', 'EU597518.1', 'EU597519.1', 'EU597520.1', 'EU597521.1', 'EU597525.1', 'EU597526.