## Getting started

To use the client, ensure that you either have:
1. Environment variables set called `ICSD_CLIENT_USERNAME` and `ICSD_CLIENT_PASSWORD`. These can be set in a `.bashrc`, `.zshrc`, etc. or in a `~/.env` file using the `dotenv` package
2. Initalize the client using your credentials manually:

```py
client = IcsdClient(username = "your_username_here", password = "your_password_here")
```

## Ex 1: Getting all of the ICSD

In [None]:
from xtalxd.icsd.client import IcsdClient
from xtalxd.icsd.client.enums import IcsdSubset
from xtalxd.icsd.client.schemas import IcsdPropertyDoc

import pandas as pd
from tqdm import tqdm

space_group_numbers = list(range(1, 231))

The structure of the output data fromat from the client, `IcsdPropertyDoc`, is designed to be parquet-friendly.
To use parquet, `pip install pyarrow` and uncomment the `to_parquet` line.

Otherwise, JSON output is supported.

In [None]:
for subset in [
    IcsdSubset.EXPERIMENTAL_METALORGANIC,
    IcsdSubset.EXPERIMENTAL_INORGANIC,
]:
    data = []
    with IcsdClient(use_document_model=False) as icsd_client:
        data += icsd_client.search(
            subset=subset,
            space_group_number=(1, 230),
            include_cif=True,
            include_metadata=False,
        )

    df = pd.DataFrame([IcsdPropertyDoc(**doc).model_dump() for doc in data])
    df = df.sort_values("collection_code")
    df = df.reset_index()
    # df.to_parquet(f"{subset.value}.parquet")
    df.to_json(f"{subset.value}.jsonl.gz", lines=True)

## Ex. 2: Getting only the subset of the ICSD which are valid (geo-)minerals

The following example shows how to extract only those materials which are recognized by the International Mineralogical Association as valid minerals.
One way to do this would be property querying on the ICSD, as follows, but note that ICSD limits the number of property queries.

In [None]:
with IcsdClient(use_document_model=True) as client:
    full_mats = client.search(
        subset="experimental_inorganic",
        space_group_number=1,
        include_cif=True,
        include_metadata=True,
        properties=[
            "MineralNameIma",
        ],
    )

ima_mats = pd.DataFrame([doc.model_dump() for doc in full_mats if doc.mineral_name_ima])
ima_mats.to_json("ima_materials.jsonl.gz", lines=True)

An alternate approach uses both the Mindat API client included in `xtalxd_mindat` to first obtain a list of valid IMA mineral names:

In [None]:
from tqdm import tqdm

from xtalxd.icsd.client import IcsdClient
from xtalxd.mindat.client import MindatClient

with MindatClient() as mindat_client:
    ima = mindat_client.get_mindat_data_by_endpoint(
        "minerals-ima",
        paginate=True,
    )

ima_mineral_names = {doc["name"] for doc in ima}

Now we query the ICSD API using the `xtalxd_icsd` client

In [None]:
ima_mats = []
with IcsdClient(use_document_model=True) as client:
    for mineral_name in tqdm(ima_mineral_names):
        ima_mats += client.search(
            subset="experimental_inorganic",
            mineral_name=mineral_name,
            include_cif=True,
            include_metadata=False,
        )