# LUCAS Processing

<div class="alert alert-block alert-info">
        This notebook refers to the studies presented in [3] and to <b>Chapter 4</b> of the Ph.D. thesis [4].
    With this notebook, you can process the raw LUCAS 2012 soil dataset to work with the 1D CNNs [2].
    We can not guarantee completeness or correctness of the code.
    If you find bugs or if you have suggestions on how to improve the code, we encourage you to post your ideas as <a href="https://github.com/felixriese/lucas-processing/issues">GitHub issue</a>.
</div>
    

## Imports

In [1]:
import lucas_processing as lucas

## Config

In [None]:
# CHANGE path to data
path_to_data = r"../data/"

# CHANGE verbosity
verbose = 1

In [None]:
# DO NOT CHANGE if you don't know what you are doing
path_to_lucas_csv = path_to_data+"0_LUCAS_TOPSOIL_v1_spectral.csv"
path_to_datafile = path_to_data+"0_LUCAS_TOPSOIL_v1.csv"
path_to_chunk_concat = path_to_data+"1_lucas_fromchunks.csv"
path_to_combined = path_to_data+"2_lucas_fromchunks_combined.csv"
path_to_full = path_to_data+"3_lucas_full.csv"
path_to_final = path_to_data+"4_lucas_final.csv"
path_to_subsets = path_to_data+"5_lucas_final_"
path_to_chunks = path_to_data+"chunks/"
path_to_data = path_to_data+""

## Processing
All of the following steps are needed to process the LUCAS dataset.

### 1. Dimensionality reduction

In [None]:
lucas.dimension_reduction(
    csv_path=path_to_lucas_csv,
    path_to_chunks=path_to_chunks,
    chunksize=1000,
    verbose=verbose)

### 2. Divide dataset into chunks

In [None]:
lucas.concat_chunks(
    path_to_chunks=path_to_chunks,
    output_file=path_to_chunk_concat,
    verbose=verbose)

### 3. Combine hyperspectral samples

In [None]:
lucas.combine_hyp_samples(
    hyp_csv_path=path_to_chunk_concat,
    output_path=path_to_combined,
    verbose=verbose)

### 4. Combine CSV and XLS files

In [None]:
lucas.combine_csv_xls(
    hyp_csv_path=path_to_combined,
    other_csv_path=path_to_datafile,
    output_path=path_to_full,
    verbose=verbose)

### 5. Add categories and superclasses

In [None]:
lucas.add_categories_and_superclasses(
    input_path=path_to_full,
    output_path=path_to_final,
    verbose=verbose)

### 6. Split dataset

In [None]:
lucas.split_lucas_dataset(
    full_dataset_path=path_to_final,
    output_path_prefix=path_to_subsets,
    random_state=42,
    train_frac=0.8,
    val_frac=0.2,
    verbose=verbose)