## Getting started

To use the client, ensure that you either have:
1. Environment variables set called `ICSD_CLIENT_USERNAME` and `ICSD_CLIENT_PASSWORD`. These can be set in a `.bashrc`, `.zshrc`, etc. or in a `~/.env` file using the `dotenv` package
2. Initalize the client using your credentials manually:

```py
client = IcsdClient(username = "your_username_here", password = "your_password_here")
```

## Ex 1: Getting all of the ICSD

In [None]:
from xtalxd.icsd.client import IcsdClient
from xtalxd.icsd.enums import IcsdSubset
from xtalxd.icsd.schemas import IcsdPropertyDoc

import pandas as pd

The structure of the output data fromat from the client, `IcsdPropertyDoc`, is designed to be parquet-friendly.
To use parquet, `pip install pyarrow` and uncomment the `to_parquet` line.

Otherwise, JSON output is supported.

In [None]:
for subset in [
    IcsdSubset.EXPERIMENTAL_METALORGANIC,
    IcsdSubset.EXPERIMENTAL_INORGANIC,
]:
    data = []
    with IcsdClient(use_document_model=False) as icsd_client:
        data += icsd_client.search(
            subset=subset,
            space_group_number=(1, 230),
            include_cif=True,
            include_metadata=False,
        )

    df = pd.DataFrame([IcsdPropertyDoc(**doc).model_dump() for doc in data])
    df = df.sort_values("collection_code")
    df = df.reset_index()
    # df.to_parquet(f"{subset.value}.parquet")
    df.to_json(f"{subset.value}.jsonl.gz", lines=True)

## Ex. 2: Getting only the subset of the ICSD which are valid (geo-)minerals

The following example shows how to extract only those materials which are recognized by the International Mineralogical Association (IMA) as valid minerals.
One way to do this would be property querying on the ICSD.
For a list of available properties, see `xtalxd.icsd.enums.IcsdAdvancedSearchKeys`.

Note that the `_chemical_name_mineral` [CIF key](https://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Ichemical_name_mineral.html) is reserved for IMA mineral names

In [None]:
from xtalxd.icsd.client import IcsdClient
from xtalxd.icsd.schemas import IcsdPropertyDoc
import pandas as pd
import re

with IcsdClient(use_document_model=False) as client:
    _ima_mats = client.search(
        subset="experimental_inorganic",
        mineral_name_ima="*",
        include_cif=True,
        include_metadata=False,
    )
ima_mats = pd.DataFrame(
    [
        IcsdPropertyDoc(
            **doc,
            mineral_name_ima=re.findall(r"_chemical_name_mineral\s+(.*)", doc["cif"])[0]
            .replace('"', "")
            .replace("'", "")
        ).model_dump()
        for doc in _ima_mats
    ]
)
ima_mats.to_json("ima_materials.jsonl.gz", lines=True)