# Get NMDC Biosample geolocation data

You can use this notebook to generate a list of the `id` and geographic origin coordinates of each biosample in the NMDC database.

##### Install and import dependencies

In [None]:
%pip install requests

In [None]:
import csv

import requests

##### Fetch the `id` and geographical origin coordinates about each biosample

In this cell, we fetch the `id` and the geographical origin coordinates about each biosample in the NMDC database.

The NMDC API endpoint we use here only returns up to 2000 biosamples per request. Since the NMDC database contains more than 2000 biosamples, we submit multiple requests to the NMDC API endpoint.

In [None]:
lat_lons_by_biosample_id = dict()

page_num = 1
while True:
    request_params = dict(per_page=2000, fields="lat_lon", page=page_num)
    response = requests.get("https://api.microbiomedata.org/biosamples", params=request_params)

    # Collect the `id` and `lat_lon` value of each biosample in the response.
    # Note: Once we have it locally, we can explore it without Internet access.
    response_payload = response.json()
    for biosample in response_payload["results"]:
        biosample_id = biosample["id"]
        biosample_lat_lon = biosample["lat_lon"]
        lat_lons_by_biosample_id[biosample_id] = biosample_lat_lon

    # If we haven't fetched all the biosamples yet, prepare to fetch the next batch.
    # Note: In the NMDC database, each biosample has a unique `id` value.
    if len(lat_lons_by_biosample_id) < response_payload["meta"]["count"]:
        page_num += 1
    else:
        break

print(f"Fetched lat_lon data for {len(lat_lons_by_biosample_id)} biosamples")

##### Dump the `id` and geographical origin coordinates to a CSV file

In this cell, we dump the fetched data to a CSV file. The CSV file will have two columns: `id` and `lat_lon`.

You can modify this cell to break up the `lat_lon` values based upon your application.

In [None]:
OUTFILE_PATH = "./lat_lons_by_biosample_id.csv"

with open(OUTFILE_PATH, "w") as file:
    fieldnames = ["id", "lat_lon"]
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    for key, value in lat_lons_by_biosample_id.items():
        writer.writerow({"id": key, "lat_lon": value})

    print(f"Dumped data to: {OUTFILE_PATH}")