<a target="_blank" href="https://colab.research.google.com/github/ChuBL/How-to-Use-Mindat-API/blob/main/How_to_Use_Mindat_API.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# How to Use iSamples API to Query and Download sample descriptions


## 0. Access Your iSamples JWT Token

All of the export services are protected by authentication and require a JWT. The login URL is at [iSamples Central](https://central.isample.xyz/isamples_central/manage/login?raw_jwt=true). Make sure your orcid is allowed by talking to your iSamples in a Box administrator.

Once you've obtained your JWT, you include it iAPI requests.o:

This set of examples runs with a web connection to the iSamples API

In [6]:
from pathlib import Path
import os
import sys
import json
import re
import pprint
import requests
import json
from jsonschema import validate
# import google

You should **avoid** placing your JWT token explicitly in your code if you plan to share it. That would include working with a notebook that is in a public Github repo, like this one.  The toke must be renewed every day.

The solution here is to have the token saved in a file accessible from your notebook environment (e.g. in the same directory), and adding that file to your github .ignore file so it is not copied to the public Github repo.  This file will need to be updated daily. 

In [3]:

jwt_file_dir = "local/jwt.txt"
try:
    with open(jwt_file_dir, 'r') as f:
        api_jwt = f.read()
except FileNotFoundError:
    print("JWT token file not found. Please create a text file containing your token and place it in the correct directory.")
    
try:
    assert 0 != len(api_jwt)
except AssertionError:
    raise Exception("Please set a JWT token before the start!")

### Basic access pattern

In [4]:
# API root entry point
iSamples_API_baseURL = "https://central.isample.xyz/isamples_central/"

# authorization header that must be included with each request.
headers = {'Authorization': 'Bearer '+ api_jwt}
#headers


In [5]:
# see https://central.isample.xyz/isamples_central/docs for documentation on using the iSamples API
# get list of things https://central.isample.xyz/isamples_central/thing/?offset=0&limit=10&status=200&authority=SESAR


params = {
    'offset':'0',
    'limit':'10',
    'authority':'SESAR'
}

resource = 'thing'


# use python requests package to GET results from mindat
response = requests.get(iSamples_API_baseURL+"/" + resource + "/",
                    params=params,
                    headers=headers)
print (response.status_code)
json_array = []
json_out={}

if 200 <= response.status_code <= 299:
    json_out = response.json()
#    print (json_out)
    for anitem in json_out["data"]:
        json_array = json_array + [{"id":anitem["id"],"source":anitem["authority_id"]}]    
else:
    print ('problem-- ', response)



# this will echo the json file containing the query results.
json_array

200


[{'id': 'IGSN:ODP01WPZK', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TLJ', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TLL', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TLO', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TLR', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TLU', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TLX', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TM0', 'source': 'SESAR'},
 {'id': 'IGSN:ODP02RYR2', 'source': 'SESAR'},
 {'id': 'IGSN:NHB000TM3', 'source': 'SESAR'}]

### Get the Items with Selected Fields


In [12]:

f = open('../src/schemas/iSamplesSchemaCore1.0.json',) 
theschema=json.load(f)


# set up request for iSamples export

params = {
    'offset':'0',
    'limit':'10',
    'authority':'SESAR'
}
resource = 'thing'

# use python requests package to GET results
response = requests.get(iSamples_API_baseURL+"/" + resource + "/",
                    params=params,
                    headers=headers)
print (response.status_code)

In [16]:
params = {
    'format':'core'
}
resource='thing'

for asample in json_array:
    print (asample["id"])
    # use python requests package to GET results
    response = requests.get(iSamples_API_baseURL+"/" + resource + "/" + asample["id"],
                    params=params,
                    headers=headers)
    print (response.status_code)
    if 200 <= response.status_code <= 299:
        json_out = response.json()
        print (json_out)
        try:
            validate(instance=json_out, schema=theschema)
        except Exception as e:
            print(e)
    else:
        print ('problem-- ', response)
  

IGSN:ODP01WPZK
200
{'$schema': 'iSamplesSchemaCore1.0.json', '@id': 'https://data.isamples.org/digitalsample/igsn/ODP01WPZK', 'label': 'Sample 115-709A-17H-2 (90-92 cm.)', 'sample_identifier': 'IGSN:ODP01WPZK', 'description': '', 'has_context_category': [{'label': 'Any sampled feature', 'identifier': 'https://w3id.org/isample/vocabulary/sampledfeature/1.0/anysampledfeature'}], 'has_context_category_confidence': [1.0], 'has_material_category': [{'label': 'Natural Solid Material', 'identifier': 'https://w3id.org/isample/vocabulary/material/1.0/earthmaterial'}], 'has_material_category_confidence': [1.0], 'has_specimen_category': [{'label': 'Physical specimen', 'identifier': 'https://w3id.org/isample/vocabulary/specimentype/1.0/physicalspecimen'}], 'has_specimen_category_confidence': [1.0], 'informal_classification': [''], 'keywords': ['Individual Sample'], 'produced_by': {'@id': 'IGSN:ODP01ASA7', 'label': '', 'description': 'cruiseFieldPrgrm:ODP Leg 115. Janus sample_id: 250571', 'has_fea

get count of localities

In [None]:
import pandas
import time

idlist=[]
fields_str = 'id','name', 'locality','longid','groupid'
#, 'name', 'locality','longid','groupid'

json_array = []
json_array2 = []
json_out={}
summary={}
#        'id__in': idstr,
#    'expand':'locality'
for agroup in grouplist:
    params = {
        'fields': fields_str, # put your selected fields here
        'format': 'json',
        'groupid': agroup,
        'page_size': 1000
    }
    #params = {}
    response = requests.get(MINDAT_API_URL+"/geomaterials/",
                params=params,
                headers=headers)
    print (response.status_code)
    
    if 200 <= response.status_code <= 299:
        json_out = response.json()
        print (json_out)
        idlist2=[]
        for anitem in json_out["results"]:
            idlist2=idlist2+[str(anitem["id"])]
       #     summary={"results":[{"id":anitem["id"], "name":anitem["name"], "count":len(anitem["locality"])}]}
       #     #print(summary)
        #    json_array = json_array + [{"id":anitem["id"], "name":anitem["name"], "count":len(anitem["locality"])}]
    else:
        print ('problem-- ', params["id__in"])
    
    fields_str2 = 'id', 'name', 'locality','longid'
    count = 0
    localities=[]
    for idstr in idlist2:
        params = {
            'fields': fields_str2, # put your selected fields here
            'id__in': idstr, # set the item amount for each page
            'format': 'json',
            'expand':'locality'
        }
        response = requests.get(MINDAT_API_URL+"/geomaterials/",
                        params=params,
                        headers=headers)
        if 200 <= response.status_code <= 299:
            json_out = response.json()
            #print (json_out)
            for anitem in json_out["results"]:
                localities = localities + anitem["locality"]
                print(anitem["name"], 'count:', len(anitem["locality"]))
        else:
            print ('problem-- ', idstr)
    
    localities=list(set(localities))  #remove duplicates
    count=len(localities)
    print('group:',agroup,', count:',count)
    json_array = json_array + [{"group":agroup, "count":count}]
    time.sleep(2)

print ('Done')  

json_array
#json_array2
df_result = pandas.DataFrame(json_array)
df_result.to_csv('GroupCountDrillDown.csv') 



query for items by id in an id list

In [None]:
import pandas
import time

json_array = []
fields_str2 = 'id', 'name', 'locality','longid', 'groupid'

idlist = ['8598']

for idstr in idlist:

    params = {
        'fields': fields_str, # put your selected fields here
        'id__in': idstr, # set the item amount for each page
        'format': 'json',
        'expand':'locality'
    }

    response = requests.get(MINDAT_API_URL+"/geomaterials/",
                    params=params,
                    headers=headers)

    if 200 <= response.status_code <= 299:
        json_out = response.json()
        #print (json_out)
        json_array = json_array + json_out["results"]
    else:
        print ('problem-- ', idstr)
 
    time.sleep(3)
    
    
print ('Done')    
json_array


use the text search interface

# get all records that have a mereoritical_code value
# have to use cursor pagination

import pandas
import time

MINDAT_API_URL = "https://api.mindat.org"
headers = {'Authorization': 'Token '+ YOUR_API_KEY}

#fields_str = 'id','longid','guid','name'

json_array = []

params = {
    'fields': fields_str, # put your selected fields here
    'format': 'json',
    'meteoritical_code_exists':'true'
}

response = requests.get(MINDAT_API_URL+"/geomaterials/",
                params=params,
                headers=headers)

if 200 <= response.status_code <= 299:
    json_out = response.json()
#    print (json_out)
    json_array = json_array + json_out["results"]
else:
    print ('problem')



In [None]:
json_array


In [None]:
params = {}

while json_out["next"] is not None :
    response = requests.get(json_out["next"],
                params=params,
                headers=headers)
    
    print (response.status_code)
    if 200 <= response.status_code <= 299:
        json_out = response.json()
    #    print (json_out)
        json_array = json_array + json_out["results"]
    else:
        print ('problem-- ', json_out["next"])

print ('Done')

In [None]:
df_result = pandas.DataFrame(json_array)

df_result.to_csv('timestamp.csv') 


In [None]:

# Load the JSON array
# Create a DataFrame from results
df_nested_list = pandas.json_normalize(json_array, record_path =['results'])

# Display the DataFrame
print(df_nested_list)

In [None]:
df_nested_list.to_csv('49089.csv') 

extract a list of ids and long-id for all minerals.

In [None]:
fields_str = 'id','longid'

params = {
    'fields': fields_str, # put your selected fields here
    'format': 'json'
}

response = requests.get(MINDAT_API_URL+"/geomaterials/",
                params=params,
                headers=headers)

if 200 <= response.status_code <= 299:
    json_out = response.json()
#    json_array.append(json_out)
else:
    print ('problem ')

same operation, but iterate through allpages.

In [None]:
# Create a DataFrame from results
df_nested_list = pandas.json_normalize(json_out, record_path =['results'])

In [None]:
json_out


In [None]:
df_nested_list.to_csv('id-longid.csv') 

In [None]:
# mindat check sum algorithm
# from Jolyon,2023-06-18

def mindat_longid(authority, type, id):
    out = "{}:{}:{}:".format(authority, type, id)
    out2 = "{}{}{}".format(authority, type, id)
    t = 0
    for i in range(len(out2)):
        if i % 2 == 1:
            t += int(out2[i]) * 3
        else:
            t += int(out2[i])
    ck = t % 10
    if ck:
        ck = 10 - ck
    out += str(ck)
    return out

In [None]:
# run checksum function
mindat_longid(1,1,49602)

In [None]:
#from transformers import BertTokenizer

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM, BertTokenizer,BertForSequenceClassification

#tokenizer = AutoTokenizer.from_pretrained('daven3/k2_fp_delta')
tokenizer = BertTokenizer.from_pretrained('allenai/scibert_scivocab_uncased', do_lower_case=True, use_fast=True)
#tokenizer = BertTokenizer.from_pretrained(//path to tokenizers)
sample = 'where is Himalayas in the world map?'
encoding = tokenizer.encode(sample)
print(encoding)
print(tokenizer.convert_ids_to_tokens(encoding))


