# SoilSpecData


> A Python package for handling soil spectroscopy data, with a focus on the [Open Soil Spectral Library (OSSL)](https://explorer.soilspectroscopy.org/).

## Installation

```sh
pip install soilspecdata
```

## Features

- Easy loading and handling of OSSL dataset
- Support for both VISNIR (Visible Near-Infrared) and MIR (Mid-Infrared) spectral data
- Flexible wavelength range filtering
- Convenient access to soil properties and metadata
- Automatic caching of downloaded data
- Get aligned spectra and target variable(s)
- *Further datasets to come ...*

## Quick Start


In [None]:
# Import the package
from soilspecdata.datasets.ossl import get_ossl

Load the OSSL dataset:

In [None]:
#| eval: false
ossl = get_ossl()

* Get MIR spectra (600-4000 cm⁻¹):


In [None]:
#| eval: false
mir_data = ossl.get_mir(require_valid=True)

* Get VISNIR spectra with custom wavelength range:

In [None]:
#| eval: false
visnir_data = ossl.get_visnir(wmin=500, wmax=1000, require_valid=True)

* Get soil properties (e.g., CEC):

In [None]:
#| eval: false
properties = ossl.get_properties(['cec_usda.a723_cmolc.kg'], require_complete=True)

For more details on the OSSL dataset and its variables, see the [OSSL documentation](https://soilspectroscopy.github.io/ossl-manual/database-description.html).



* Get metadata (e.g., geographical coordinates):


In [None]:
#| eval: false
metadata = ossl.get_properties(['longitude.point_wgs84_dd', 'latitude.point_wgs84_dd'], require_complete=False)

* Or to get directly aligned spectra and target variable(s):

In [None]:
#| eval: false
X, y, ids = ossl.get_aligned_data(
    spectra_data=mir_data,
    target_cols='cec_usda.a723_cmolc.kg'
)

X.shape, y.shape, ids.shape

((57062, 1701), (57062, 1), (57062,))

## Data Structure

The package returns spectra data in a structured format containing:
- Wavenumbers
- Spectra measurements
- Measurement type (reflectance/absorbance)
- Sample IDs

Properties and metadata are returned as pandas DataFrames indexed by sample ID.


## Cache Management

By default, the OSSL dataset is cached in `~/.soilspecdata/`. To force a fresh download:

```python
ossl = get_ossl(force_download=True)
```


## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

Apache2

## Citation

TBC