Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functional data section #101

Closed
wants to merge 23 commits into from
Closed

Add functional data section #101

wants to merge 23 commits into from

Conversation

pfistfl
Copy link
Collaborator

@pfistfl pfistfl commented Mar 14, 2017

This is a first draft for the tutorial on functional data.
The section depends on several pull-requests that are not merged into mlr yet.

The Text and Structure can be reviewed though.

@@ -0,0 +1,177 @@
# Functional Data

Functional data provides information about curves varying over a continuum, such as time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How well known is functional data analysis? I would give a bit more of a primer on how, when, and why functional analysis is used. Like a paragraph or so

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for all of my comments keep in mind I am coming from the perspective of someone who knows little about FDA. I may not be your target audience

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there should be just a paragraph of explanation, especially on the how and why of FDA

Copy link
Collaborator Author

@pfistfl pfistfl Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see next commit.

I may not be your target audience ->You are exactly the target audience!

Functional data provides information about curves varying over a continuum, such as time.
There are two commonly used approaches for analysing functional data.

* Directly analyse the functional data using a [learner](&Learner.md) that is suitable for functional data on a [FDATask](&makeFDATask). Those learners have the prefixes __FDAClassif.__ and __FDARegr.__.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyse should be analyze? I think?

* Directly analyse the functional data using a [learner](&Learner.md) that is suitable for functional data on a [FDATask](&makeFDATask). Those learners have the prefixes __FDAClassif.__ and __FDARegr.__.

* Transform the task into a format suitable for standard __classification__ or __regression__ [learners](&Learner.md).
This is done by [extracting](functional_data.Rmd#feature-extraction) non-temporal/non-functional features from the curves.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would define what you mean by temporal and functional features

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## Creating a FDATask

The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. The rows are considered independent, while column-ranges, such as $3:136$ in the example below correspond to a functional feature.
After that a [FDATask](&makeFDATask) that contains the data in a well-defined format is created. [FDATasks](&makeFDATask), just like normal [Tasks](&Task.md) come in different flavours, such as [FDAClassifTask](&makeFDAClassifTask) and [FDARegrTask](&makeFDARegrTask), which can be used according to the class of the target variable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is ambiguous. I would explain what is the well-defined format that you use. Also try to use present tense

After coalescing the features to a data.frame, the data object needs to pass into an FDATask, where the data is parsed into ____ format. This structure of data is used inside an FDATask because ________

## Creating a FDATask

The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. The rows are considered independent, while column-ranges, such as $3:136$ in the example below correspond to a functional feature.
After that a [FDATask](&makeFDATask) that contains the data in a well-defined format is created. [FDATasks](&makeFDATask), just like normal [Tasks](&Task.md) come in different flavours, such as [FDAClassifTask](&makeFDAClassifTask) and [FDARegrTask](&makeFDARegrTask), which can be used according to the class of the target variable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After that a FDATask that contains the data in a well-defined format is created. FDATasks, just like normal Tasks come in different flavours, such as FDAClassifTask and FDARegrTask, which can be used according to the class of the target variable.

Break into multiple sentences

The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. The rows are considered independent, while column-ranges, such as $3:136$ in the example below correspond to a functional feature.
After that a [FDATask](&makeFDATask) that contains the data in a well-defined format is created. [FDATasks](&makeFDATask), just like normal [Tasks](&Task.md) come in different flavours, such as [FDAClassifTask](&makeFDAClassifTask) and [FDARegrTask](&makeFDARegrTask), which can be used according to the class of the target variable.

In the following example, this is done for the [fuelSubset](fuelSubset.task) data from package [%FDboost].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ambiguous

```{r}
## Put all values into a data.frame
df = data.frame(fuelSubset[c("heatan", "h2o", "UVVIS", "NIR")])
## Change row names to V1 to V367
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not changing the columns? Also is this necessary? Kind of odd

Copy link
Collaborator Author

@pfistfl pfistfl Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I get it...
Xudong suspedted a bug when not renaming, when I wrote the tutorial.
This is now fixed.

OLD:
So the original structure is a list.
It has one list element for each feature, such as
h2o which is a vector of length NObs
NIR and UVVIS which are matrices of Size NObs x P1, NObs x P2
NIR.lambda and UVVIS.lambda which are vectors of Size P1 and P2.
-> What I do is to actually cbind those vectors, matricies. from a list of objects that do not have
equal dimensions.
If you have a better idea feel free to tell me :)

`makeLearner("<fdaclassif.<R_method_name>")` or
`makeLearner("<fdaregr.<R_method_name>")` depending on the target variable.

Applying learners to a [FDATask](&makeFDATask) works in two ways:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List both ways here, then explain each

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think

non-functional domain, e.g by extracting wavelets.

The currently supported preprocessing functions are:
* discrete wavelet transform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: unrelated to tutorial
It would be really nice if these preprocessing methods could be used in forecasting as well. Maybe we could construct a sub preprocessing method like createWaveletFeatures() that can be used on TimeTasks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We kinda have that.
My proposal for the API is contained in the fda_pull1_task_featExtract branch.
I am not entirely sure, this is how it is going to be, but we can build upon that.

Copy link
Collaborator Author

@pfistfl pfistfl Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep
The currently supported preprocessing functions are:
as we can extend this list along with an example whenever a new method is added.
Or do we want a getFDAFeatureExtractors() or getFDAFeaturePreprocessingMethods()

@pfistfl
Copy link
Collaborator Author

pfistfl commented Jul 24, 2017

Update:
The branch now contains all relevant features from the current master branch.
Maybe someone could review?

@SteveBronder
Copy link
Collaborator

Did you guys rebase this yet to be up to date with gh-pages? travis works on gh-pages now

@pfistfl
Copy link
Collaborator Author

pfistfl commented Nov 13, 2017

I do not get why travis fails here and yet works on gh-pages....

...
Knitting file 'nested_resampling.Rmd' ...
Quitting from lines 312-338 () 
Error in requirePackages(package, why = stri_paste("learner", id, sep = " "),  : 
  For learner classif.kknn please install the following packages: kknn
Calls: lapply ... addClasses -> makeRLearnerInternal -> requirePackages -> stopf

@SteveBronder
Copy link
Collaborator

Not sure, I did a rebase on mine and was able to get it to work

@pfistfl
Copy link
Collaborator Author

pfistfl commented Nov 14, 2017

Continued in #125

@pfistfl pfistfl closed this Nov 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants