-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functional data section #101
Changes from all commits
4a0696c
4c5c3a3
876f2be
adf5a27
135a07a
fe5ee3b
9e7f1b2
4c355bc
a0e603e
da5c3e4
2d60038
9a3f056
56229c3
b217854
a70c8a6
7aabdb8
d98bd57
c0d3d85
00cda6e
7aa2c7a
c22b656
bef8456
794eeab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,237 @@ | ||
# Functional Data | ||
|
||
Functional data provides information about curves varying over a continuum, such as time. | ||
This type of data is often present when analyzing measurements at various time points. | ||
Such curves usually are interdependent, which means that the measurement at a point $t_{i + 1}$ usually depends on some measurements ${t_1, ..., t_i}; i \in \mathbb{N}$. | ||
|
||
As traditional machine learning techniques usually do not emphasize the interdependence between features, | ||
they are often not _well suited_ for such tasks, which can lead to poor performance. | ||
Functional data analysis on the other hand tries to address this by either using algorithms specifically tailored to functional data, or by transforming the functional covariates into a non time-dependent feature space. | ||
For a more in depth introduction to functional data analysis see e.g [When the data are functions](http://rd.springer.com/article/10.1007/BF02293704) Ramsay, J.O., 1982. | ||
|
||
Each observation of a functional covariate in the data are evaluations of a functional, i.e. measurements of a scalar value at various time points. | ||
A single observation might then look like this: | ||
```{r} | ||
# Plot NIR curve for first observarion | ||
library(FDboost) | ||
data(fuelSubset) | ||
library(ggplot2) | ||
# NIR_Obs_1 are the measurements for NIR of the first functional covariate. | ||
# lambda are the time points, the data was measured at. | ||
df = data.frame("NIR_Obs1" = fuelSubset$NIR[1, ], | ||
"lambda" = fuelSubset$nir.lambda) | ||
ggplot(df) + | ||
geom_line(aes(y = NIR_Obs1, x = lambda)) | ||
``` | ||
|
||
## How to model functional data | ||
|
||
There are two commonly used approaches for analysing functional data. | ||
|
||
* Directly analyze the functional data using a [learner](&Learner.md) that is suitable for functional data on a [Task](&makeTask). Those learners have the prefixes __classif.fda__ and __regr.fda__. | ||
For more info on learners see [fda learners](functional_data.Rmd#constructing-a-learner). | ||
For this purpose, the functional data has to be saved as a matrix column in the data.frame used | ||
for constructing the [Task](&makeTask). For more info on functional tasks consider the folowing section. | ||
|
||
* Transform the task into a format suitable for standard __classification__ or __regression__ [learners](&Learner.md). | ||
This is done by [extracting](functional_data.Rmd#feature-extraction) non-temporal/non-functional features from the curves. Non-temporal features do not have any interdependence between each other, similarly to features in traditional machine learning.This is explained in more detail [below](functional_data.Rmd#feature-extraction). | ||
|
||
|
||
## Creating a Task that contains functional features | ||
|
||
The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. Functional data in contrast to __numeric__ data have to be stored as a matrix column in the data.frame. | ||
After that a [Task](&makeTask) that contains the data in a well-defined format is created. [Tasks](&makeTask) come in different flavours, such as [ClassifTask](&makeClassifTask) and [RegrTask](&makeRegrTask), which can be used according to the class of the target variable. | ||
|
||
In the following example, the data is first stored as matrix columns using the | ||
helper function [makeFunctionalData](&makeFunctionalData) for the [fuelSubset](fuelSubset.task) | ||
data from package [%FDboost]. | ||
|
||
The data is provided in the following structure: | ||
|
||
```{r} | ||
str(fuelSubset) | ||
``` | ||
|
||
* __heatan__ is the target variable, in this case a numeric value. | ||
* __h2o__ is an additional scalar variable. | ||
* __NIR__ and __UVVIS__ are matricies containing the curve data. Each column corresponds to a single time point the data was sampled at. Each row indicates a single curve. __NIR__ was measured at $231$ time points, while __UVVIS__ was measured at $129$ time points. | ||
* __nir.lambda__ and __uvvis.lambda__ are vectors of length $231$ and $129$ indicate the time points the data was measured at. Each entry corresponds to one column of __NIR__ and __UVVIS__ respectively. For now we ignore this additional information in mlr. | ||
|
||
Our data already contains functional features as matricies in a list. | ||
In order to showcase how such a matrix can | ||
be created from arbitrary numeric columns, we transform the list into a data.frame with a set of numeric columns for each matrix. These columns refer to the matrix columns in the list, i.e | ||
__UVVIS.1__ is the first column of the UVVIS matrix. | ||
|
||
```{r} | ||
## Put all values into a data.frame | ||
df = data.frame(fuelSubset[c("heatan", "h2o", "UVVIS", "NIR")]) | ||
str(df[, 1:5]) | ||
``` | ||
|
||
Before constructing the [Task](&makeTask), the data is again reformated so it contains column matricies. | ||
This is done by providing a list __fd.features__, that identifies the functional covariates. | ||
All columns not mentioned in the list are kept as-is. In our case the column indices 3:136 correspond to the columns of the UVVIS matrix. Alternatively we could also specify the respective column names. | ||
|
||
```{r} | ||
# fd.features is a named list, where each name corresponds to the name of the | ||
# fuctional feature and the values to the respective column indices or column names. | ||
fd.features = list("UVVIS" = 3:136, "NIR" = 137:367) | ||
fdf = makeFunctionalData(df, fd.features = fd.features) | ||
``` | ||
|
||
[makeFunctionalData](&makeFunctionalData) returns a data.frame, where the functional features are | ||
matricies. | ||
|
||
```{r} | ||
str(fdf) | ||
``` | ||
|
||
Now with a data.frame containing the functionals as matricies, a [Task](&makeTask) can be created: | ||
|
||
```{r} | ||
# Create a regression task, classification tasks behave analogously | ||
# In this case we use column indices | ||
tsk1 = makeRegrTask("fuelsubset", data = fdf, target = "heatan") | ||
tsk1 | ||
``` | ||
|
||
|
||
## Constructing a learner | ||
|
||
For functional data, [learners](&Learner.md) are constructed using | ||
`makeLearner("<classif.<R_method_name>")` or | ||
`makeLearner("<regr.<R_method_name>")` depending on the target variable. | ||
|
||
Applying learners to a [Task](&makeTask) works in two ways: | ||
|
||
* Use a [learner](&Learner.md) | ||
|
||
+ For regression: | ||
|
||
```{r} | ||
# The following learners can be used for the task. | ||
listLearners(tsk1, properties = "functionals") | ||
# Create a FDboost learner | ||
fdalrn = makeLearner("regr.FDboost") | ||
``` | ||
|
||
+ Or alternatively for classification: | ||
|
||
```{r} | ||
# knn learner | ||
knn.lrn = makeLearner("classif.fdausc.knn") | ||
``` | ||
|
||
* Use a _standard_ [learner](&Learner.md): | ||
In this case the temporal structure is disregarded | ||
|
||
```{r} | ||
## Decision Tree learner | ||
rpartlrn = makeLearner("classif.rpart") | ||
``` | ||
|
||
* Alternatively, transform the functional data into a non-temporal/non-functional space by [extracting](functional_data.Rmd#feature-extraction) features before training. | ||
In this case, a normal regression- or classification-[learner](&Learner.md) | ||
can be applied. | ||
|
||
This is explained in more detail in the [feature extraction](functional_data.Rmd#feature-extraction) | ||
section below. | ||
|
||
|
||
## Train the learner | ||
|
||
The resulting learner can now be trained on the task created in section [Creating a Task](functional_data.Rmd#creating-a-task) above. | ||
|
||
```{r} | ||
# Train the fdalrn on the constructed task | ||
m = train(learner = fdalrn, task = tsk1) | ||
m | ||
p = predict(m, tsk1) | ||
performance(p, rmse) | ||
|
||
# Or simply resample (3-fold Cross-Validation) | ||
resample(fdalrn, tsk1, resampling = cv3, measures = mse) | ||
``` | ||
|
||
Alternatively, learners that do not specifically treat functional covariates can | ||
be applied. In this case the temporal structure is completely disregarded, and all | ||
columns are treated as independent. | ||
|
||
```{r} | ||
# Train a normal learner on the constructed task. | ||
# Note that we get a message, that functionals have been converted to numerics. | ||
rpart.lrn = makeLearner("regr.rpart") | ||
m = train(learner = rpart.lrn, task = tsk1) | ||
m | ||
``` | ||
|
||
## Feature Extraction | ||
|
||
In contrast to applying a learner that works on a [Task](&makeTask) containing functional features, | ||
the [Task](&makeTask) can be converted to a normal [&Task.md]. | ||
This works by transforming the functional features into a | ||
non-functional domain, e.g by extracting wavelets. | ||
|
||
The currently supported preprocessing functions are: | ||
* discrete wavelet transform | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: unrelated to tutorial There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We kinda have that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it makes sense to keep |
||
* fast fourier transform | ||
* functional principal component analysis | ||
* multi-resolution feature extraction | ||
|
||
In order to do this, we specify methods for each functional feature in the task in a __list__. | ||
In this case we simply want to extract the mean from each __UVVIS__ functional and the fourier transformed features from each __NIR__ functional. Additional arguments can be passed on | ||
|
||
```{r} | ||
# feat.methods specifies what to extract from which functional | ||
# from the first functional we extract the fourier transformation, from the second the fpca scores | ||
feat.methods = list("UVVIS" = extractFDAFourier(), "NIR" = extractFDAFPCA()) | ||
|
||
# Either create a new task from an existing task | ||
extracted = extractFDAFeatures(tsk1, feat.methods = feat.methods) | ||
extracted | ||
``` | ||
|
||
|
||
### Wavelets | ||
|
||
In this case, discrete wavelet feature transformation is applied. | ||
We can specify which feature extraction method is used via _method = "wavelets"_ and add additional parameters (i.e. the filter and the boundary) in the pars argument. | ||
This functions returns a regression task of type regr since the raw data contained temporal structure but the transformed data does not inherit temporal structure anymore. | ||
For more informations on wavelets consider the documentation [wavelets](dwt). | ||
|
||
```{r, eval = FALSE} | ||
## Specify the feature extraction method and generate new task. | ||
## Here, we use the Haar filter: | ||
feat.methods = list("UVVIS" = extractFDAWavelets(filter = "haar")) | ||
task.w = extractFDAFeatures(tsk1, feat.methods = feat.methods) | ||
|
||
# Use the Daubechie wavelet with filter length 4. | ||
feat.methods = list("NIR" = extractFDAWavelets(filter = "d4")) | ||
task.wd4 = extractFDAFeatures(tsk1, feat.methods = feat.methods) | ||
``` | ||
|
||
|
||
### Fourier transformation | ||
|
||
Now, we use the fourier feature transformation. Either the amplitude or the phase of the complex fourier coefficients can be used for analysis. This can be specified in the additional _fft.coeff_ argument: | ||
|
||
```{r, eval = FALSE} | ||
# Specify the feature extraction method and generate new task. | ||
# We use the fourier features and the amplitude for NIR, as well as the phase for UVVIS | ||
feat.methods = list("NIR" = extractFDAFourier(trafo.coeff = "amplitude"), | ||
"UVVIS" = extractFDAFourier(trafo.coeff = "phase")) | ||
task.fourier = extractFDAFeatures(tsk1, feat.methods = feat.methods) | ||
task.fourier | ||
``` | ||
|
||
### Wrappers | ||
Additionally we can wrap the preprocessing around a standard learner such as __classif.rpart__. | ||
For additional information, please consider the __Wrappers__ section. | ||
|
||
```{r} | ||
# Use a FDAFeatExtractWrapper | ||
feat.methods = list("UVVIS" = extractFDAMultiResFeatures(), "NIR" = extractFDAFourier()) | ||
wrapped.lrn = makeExtractFDAFeatsWrapper("classif.rpart", feat.methods = feat.methods) | ||
wrapped.lrn | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How well known is functional data analysis? I would give a bit more of a primer on how, when, and why functional analysis is used. Like a paragraph or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for all of my comments keep in mind I am coming from the perspective of someone who knows little about FDA. I may not be your target audience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that there should be just a paragraph of explanation, especially on the how and why of FDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see next commit.
I may not be your target audience ->You are exactly the target audience!