# Example: Sudoku augmented design

First, import all required modules.

In [None]:
using SudokuPlantDesign
using DataFrames
using XLSX
using PyPlot

## 1) Generate (optimized) Sudoku configuration

We start by generating a new configuration of the plant design, called `conf`, consisting of `2` horizontal and `4` vertical blocks of dimensions `9` x `2` respectively. In total, there are `3` different types of checks in the configuration.

Each plot of the design can be one of three possibilities:
- missing (empty) plot
- check variety (replicated)
- entry (unreplicated)

Note, that at this stage of the code, entries cannot be distinguished from each other, such distinctions will only occur once the placement of checks has been finalized.

In our example, we leave all plots between `[1,3]` and `[2,4]` empty. Then, `119` entries are initialized (and the remaining plots are filled up with random checks). Finally the configuration is plotted, showing the initial configuration before optimization.

In [None]:
conf = get_configuration([9,9],[2,2,2,2],3)

empty_plots!(conf, 1:2,3:4)
initialize_entries!(conf, 119)

show_configuration(conf, zoom=0.2, show_coordinates=true)

mkpath("output/")
savefig("output/augmented_design_checks_initial.pdf")

Next, we define the cost function of the optimization by summing up pre-implemented functions. In our case, the cost function contains the following parts:
- costs when checks are unequal per type (`K_num_checks_equal_per_type`)
- costs for more than one check per type per block (`K_checks_per_type_per_block`)
- costs for clustering of checks (`K_neighbors_different_check_functional` and `K_neighbors_same_check_functional`).

In [None]:
cost_function(c) =  K_num_checks_equal_per_type(c) +
                    K_checks_per_type_per_block(c, 1)*20 +
                    K_neighbors_different_check_functional(c, d->0.5/(d^3)) +
                    K_neighbors_same_check_functional(c, d->1/(d^3))

Then, we proceed to defining the list of updates used in the optimization. We use updates to assign new labels to a given check, swap checks with each other and swap a check with an entry.

In [None]:
updates = [UpdateNewCheckLabel(),UpdateSwapCheckCheck(),UpdateSwapCheckEntry()]

Then, the optimization is run for `500000` updates (which means there are `500000` consecutive configurations generated which result in the final design).

In [None]:
optimize_design!(
    conf,
    updates,
    cost_function,
    500000
);

Information about the configuration can be printed and the configuration can also be examined visually by plotting.

In [None]:
print_info(conf)

In [None]:
show_configuration(conf, zoom=0.2, show_coordinates=true)
mkpath("output/")
savefig("output/augmented_design_checks_final.pdf")

## 2) Save design data with field plan

With an optimized configuration `conf` at hand, one can proceed to create a field plan for the design. For such a field plan, additional information on the genotypes involved in the trial is added. Here, this information enters as two dataframes with data for checks and entries respectively. They have to be of the following structure:
- first colum: genotype name
- further colums: additional information (optional, but have to be identical among the dataframes)

In the case of our example, the sheets `checks` and `entries` from the Excel file `input_augmented.xlsx` are read in and converted into the two dataframes. Each sheet contains the name of the entry in the first column, and in following columns further properties of the entries.

In [None]:
entrydata = string.(DataFrame(XLSX.readtable("input_augmented.xlsx", "entries")));
replace!.(eachcol(entrydata), "missing" => "NA");

In [None]:
checkdata = string.(DataFrame(XLSX.readtable("input_augmented.xlsx", "checks")));
replace!.(eachcol(checkdata), "missing" => "NA");

The field plan is then based on an upgraded version of the configuration `conf`, a so-called *labeled check configuration* with the name `lconf`. This labeled configuration contains not only the original configuration, but also indices (position ID) and labels of each plant.

Below, indices are set in a snake pattern along the y-direction and labels are filled from the previously created dataframes. Then, the labeled configuration is visualized.

In [None]:
lconf = LabeledCheckConfiguration(conf)

fill_indices_snake_y!(lconf, 1,1, index_for_empty=false)
fill_labels!(lconf, checkdata, entrydata)

show_configuration(lconf, check_labels=true, show_coordinates=true, text_zoom=0.9)
mkpath("output/")
savefig("output/augmented_design_final_design.pdf")

For exporting into a trial plan, the data of this optimized Sudoku-augmented design can now be converted back into a dataframe. This dataframe contains not only the genotype name and properties, but also their individual positions, xy-locations as well as information about their block. This dataframe can be further modified in julia before exporting it.

In this example, all generic property columns are renamed to the the column names of the checkdata file and two additional columns are added to the dataframe.

In [None]:
df = get_dataframe(lconf)

for (i,name) in enumerate(names(checkdata)[2:end])
    rename!(df,Symbol("property_"*string(i)) => Symbol(name))
end

df[:, :year]       .= 2023
df[:, :extra_info] .= "myextrainfo"

Finally, the trial plan is created by writing the dataframe into an Excel file.

In [None]:
mkpath("output/")
XLSX.writetable("output/augmented_design_final_design.xlsx", collect(eachcol(df)), names(df),overwrite=true)