# Spectral Analysis: Plant Physiology study casus

author: F.Feenstra 

### Background
The KCBBE research group is conducting precise plant‑physiology measurements in a controlled laboratory setting with the ultimate goal of applying these methods in the field. Plants are grown under regulated light, temperature, humidity, and nutrient conditions. The current focus is on hyperspectral leaf‑reflectance measurements to detect physiological responses to environmental stressors, especially salt (NaCl) treatments. Analyzing hyperspectral leaf‑reflectance spectra can be used for studying salt‑induced stress because the optical properties of a leaf change in predictable ways as its physiology responds to elevated sodium chloride levels. Salinity disrupts chlorophyll synthesis and accelerates pigment degradation, which manifests as altered absorption in the red region (approximately 630–690 nm) and consequently increased reflectance in that band. Simultaneously, osmotic imbalance and reduced water content modify the strength of water‑absorption features and affect the near‑infrared plateau (750–900 nm) that is governed by leaf internal structure. Moreover, stress‑related shifts in the red‑edge (around 690–740 nm) and subtle variations in the first‑derivative of the spectrum highlight rapid changes in leaf biochemistry and morphology that precede visible wilting. Because hyperspectral sensors capture reflectance at fine wavelength intervals across the entire visible‑near‑infrared range, they can simultaneously resolve these multiple stress signatures, allowing researchers to detect early physiological responses, quantify severity, and compare short‑term osmotic shocks with longer‑term structural adaptations in a non‑destructive, high‑throughput manner. This makes hyperspectral spectroscopy a powerful tool for monitoring plant health under saline conditions and for developing robust spectral biomarkers of salt stress.

The coming two weeks we use spectral data to learn how to evaluate this type of data. 

### About the data

Two experimental timelines are used:

1. Short‑term	The plant stays in the measurement rig. Baseline reflectance is recorded, then a salt solution is applied to the roots, and the leaf reflectance is measured immediately without moving the sensor. For this experiment a patato plant is used.

2. Long‑term	Plants are grown at a given salt concentration for about a week. Reflectance is measured again (sensor placed near the same spot on the leaf, but slight positional variation is possible). For this experiment Chinese Cabbage is used. 

Reflectance values are calibrated against a (DARK) reference and a white reference (WR). 

The spreadsheet contains the raw reflectance data, reference values, and some derived features. The data can be found at `assemblix2019:/data/datasets/Programming/reflectance.xlsx`


### Assignment Goal
Design and execute a data‑analysis workflow that evaluates whether hyperspectral reflectance can reliably indicate salt‑induced physiological stress in plants. You will compare short‑term and long‑term experiments, explore spectral features, and assess the usefulness of vegetation indices and derivative metrics.

### Learning outcomes
- Apply Python proficiently to clean and structure raw signal data, ensuring it is in a format conducive to analysis
- Develop a maintainable and effective preprocessing and evaluation pipeline for spectral series data
- Implement mathematical algorithms in Python to discern and interpret patterns
- Design and develop visually appealing and functional data visualizations for spectral data
- Deliver a well-organized solution that not only solves the given problem but also showcases a systematic and structured approach throughout the entire data processing and analysis pipeline

### Instructions
1. Copy the original data from assemblix to your data directory
2. In the first week read part 1 and conduct the data inspections according the part 1 instructions. Enhance and expand your inspection when needed, but make sure you keep a balance between essential analysis and nice to haves. Be prepared to discuss and demonstrate the solution next week's first session.
3. In the second week read part 2 and conduct the data exploration according the part 2 instructions. Enhance and expand your inspection when needed, but make sure you keep a balance between essential analysis and nice to haves. Be prepared to discuss and demonstrate the solution in the third week's first session.
4. Upload the solution of this study case in a repository and submit the link to the brightspace assignment. Make sure that your repository is private and invite your teachers and tutors. Please submit your work (even unfinished) before the deadline to receive feedback.
5. You are welcome to collaborate in small groups, but please ensure that you acknowledge each member's contributions and engage in discussions to collectively assess the outcomes.

---

## Part 1 (week 1): Data Inspection and Quality Enhancement


### 1. Data Familiarisation
Load the spreadsheet and familiarize yourself with the data. 

Deliverable: a short notebook cell that prints a tidy summary table and a paragraph interpreting the experimental design (e.g., “The dataset comprises 2 salt levels (0‑60 mM) with .. biological replicates each, yielding .. short‑term pairs and .. long‑term observations.”)

### 2. Pre‑processing

- Verify that all spectra (% reflectance) are correctly *normalised* to the dark and white references. (R = (R_raw‑R_dark)/(R_white‑R_dark)). If not, apply the formula.
- Apply a filter to *smooth* to each spectrum before derivative computation. Document why you chose a filter and use a literature reference. 
- To *ensure comparability* accross runs interpolate all spectra onto a common wavelength grid (e.g., 400‑1100 nm at 1 nm intervals)
- Conduct research how to *determine outliers* in spectral data implement the algorithm and inspect the outliers. Decide if you want to remove them. 
- Impute isolated *missing wavelengths*; if > 5 % of a spectrum is missing, discard the sample. Justify your chosen imputation algorithm.


Deliverable: a notebook section with code, a short justification paragraph, and a figure showing a raw vs. processed smoothed spectrum.

### 3. Exploratory Analysis


reflectance curves

- Plot mean ± 95 % CI for each salt level, separated by timeline (short vs. long).
- Shade the 680‑700 nm window and annotate any systematic peaks/troughs.
- Automate peak detection inside this window using scipy.signal.find_peaks (store peak wavelength & amplitude).

---

## Part 2 (week 2): Analyze Data


### 4. Extract features

Spectral (hyperspectral) data measure how much light a plant leaf reflects at many different wavelengths. Each wavelength region (bandwidth) is related to specific biophysical or biochemical properties of the plant.

- Red (~630–690 nm) → chlorophyll absorption
- Near Infra Red (NIR) (750–900 nm) → leaf structure and biomass
- Water absorption bands (~970, 1200, 1450) → hydration
- Red-edge (~690–740 nm)→ stress and chlorophyll changes

From the spectrum we derive features that summarize what happens in those important bands. Instead of using many of wavelengths separately, we compute *vegetation indices* (ratios between bands), slopes and *derivatives* (shape of the spectrum) and *red-edge position*. These features summarize plant physiological information and can be used in machine learning models to detect stress, diseases, nutrient status, or salinity effects. 

#### Derivative inspection
The first-derivative spectrum describes how fast reflectance changes with wavelength. Instead of looking at the reflectance value itself, you look at the slope of the spectrum at each wavelength. Compute the first derivative and try to evaluate the results. If you have time left, consider the second‑derivative analysis. 

#### vegetation indices
Vegetation indices are combinations (usually ratios) of reflectance values at specific wavelengths designed to extract biologically meaningful information from spectral data 
Determine which vegetation indices you can use for this salinity analysis. Use an argumentative approach and justify with literature. Compute the indices and evaluate them. 


### 3. Statistical Testing

- Test whether vegetation indices differ significantly among salt concentrations.



### 4. Feature Evaluation

- Discuss which spectral features (raw reflectance, indices, derivatives) provide the strongest discrimination of salt stress.
- Compare the reliability of short‑term versus long‑term measurements.

NB: Make sure that you design publication‑quality figures (with captions) and a brief narrative interpreting trends (e.g., “NDVI decreases monotonically with increasing NaCl, whereas the first derivative at 692 nm shows a pronounced inflection at 60 mM”).


### Documentation
Document all decisions made during the explorations, providing transparency and aiding reproducibility. Reflect on the outcomes using captions when appropiate. Deliver your solution in a repository with a readme file.

### Challenges
In this study case there are several advanced topics you could consider to implement:

- An interactive Bokeh plot to dynamically examine the different band aspects.
- An parser object (or set of objects) loading and preprocess the data.


Good luck, and enjoy exploring the hidden spectral signatures of plant salt stress!