-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functional data section #101
Conversation
can not build without merged branch in mlr.
feature extraction, as it is not in the PR yet.
@@ -0,0 +1,177 @@ | |||
# Functional Data | |||
|
|||
Functional data provides information about curves varying over a continuum, such as time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How well known is functional data analysis? I would give a bit more of a primer on how, when, and why functional analysis is used. Like a paragraph or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for all of my comments keep in mind I am coming from the perspective of someone who knows little about FDA. I may not be your target audience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that there should be just a paragraph of explanation, especially on the how and why of FDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see next commit.
I may not be your target audience ->You are exactly the target audience!
src/functional_data.Rmd
Outdated
Functional data provides information about curves varying over a continuum, such as time. | ||
There are two commonly used approaches for analysing functional data. | ||
|
||
* Directly analyse the functional data using a [learner](&Learner.md) that is suitable for functional data on a [FDATask](&makeFDATask). Those learners have the prefixes __FDAClassif.__ and __FDARegr.__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
analyse should be analyze? I think?
src/functional_data.Rmd
Outdated
* Directly analyse the functional data using a [learner](&Learner.md) that is suitable for functional data on a [FDATask](&makeFDATask). Those learners have the prefixes __FDAClassif.__ and __FDARegr.__. | ||
|
||
* Transform the task into a format suitable for standard __classification__ or __regression__ [learners](&Learner.md). | ||
This is done by [extracting](functional_data.Rmd#feature-extraction) non-temporal/non-functional features from the curves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would define what you mean by temporal and functional features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/functional_data.Rmd
Outdated
## Creating a FDATask | ||
|
||
The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. The rows are considered independent, while column-ranges, such as $3:136$ in the example below correspond to a functional feature. | ||
After that a [FDATask](&makeFDATask) that contains the data in a well-defined format is created. [FDATasks](&makeFDATask), just like normal [Tasks](&Task.md) come in different flavours, such as [FDAClassifTask](&makeFDAClassifTask) and [FDARegrTask](&makeFDARegrTask), which can be used according to the class of the target variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that
is ambiguous. I would explain what is the well-defined format that you use. Also try to use present tense
After coalescing the features to a
data.frame
, the data object needs to pass into an FDATask, where the data is parsed into ____ format. This structure of data is used inside an FDATask because ________
src/functional_data.Rmd
Outdated
## Creating a FDATask | ||
|
||
The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. The rows are considered independent, while column-ranges, such as $3:136$ in the example below correspond to a functional feature. | ||
After that a [FDATask](&makeFDATask) that contains the data in a well-defined format is created. [FDATasks](&makeFDATask), just like normal [Tasks](&Task.md) come in different flavours, such as [FDAClassifTask](&makeFDAClassifTask) and [FDARegrTask](&makeFDARegrTask), which can be used according to the class of the target variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After that a FDATask that contains the data in a well-defined format is created. FDATasks, just like normal Tasks come in different flavours, such as FDAClassifTask and FDARegrTask, which can be used according to the class of the target variable.
Break into multiple sentences
src/functional_data.Rmd
Outdated
The first step is to get the data in the right format. [%mlr] expects a [data.frame](&base::data.frame) which consists of the functional features and the target variable as input. The rows are considered independent, while column-ranges, such as $3:136$ in the example below correspond to a functional feature. | ||
After that a [FDATask](&makeFDATask) that contains the data in a well-defined format is created. [FDATasks](&makeFDATask), just like normal [Tasks](&Task.md) come in different flavours, such as [FDAClassifTask](&makeFDAClassifTask) and [FDARegrTask](&makeFDARegrTask), which can be used according to the class of the target variable. | ||
|
||
In the following example, this is done for the [fuelSubset](fuelSubset.task) data from package [%FDboost]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
is ambiguous
src/functional_data.Rmd
Outdated
```{r} | ||
## Put all values into a data.frame | ||
df = data.frame(fuelSubset[c("heatan", "h2o", "UVVIS", "NIR")]) | ||
## Change row names to V1 to V367 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you not changing the columns? Also is this necessary? Kind of odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I get it...
Xudong suspedted a bug when not renaming, when I wrote the tutorial.
This is now fixed.
OLD:
So the original structure is a list.
It has one list element for each feature, such as
h2o which is a vector of length NObs
NIR and UVVIS which are matrices of Size NObs x P1, NObs x P2
NIR.lambda and UVVIS.lambda which are vectors of Size P1 and P2.
-> What I do is to actually cbind
those vectors, matricies. from a list of objects that do not have
equal dimensions.
If you have a better idea feel free to tell me :)
src/functional_data.Rmd
Outdated
`makeLearner("<fdaclassif.<R_method_name>")` or | ||
`makeLearner("<fdaregr.<R_method_name>")` depending on the target variable. | ||
|
||
Applying learners to a [FDATask](&makeFDATask) works in two ways: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List both ways here, then explain each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I think
non-functional domain, e.g by extracting wavelets. | ||
|
||
The currently supported preprocessing functions are: | ||
* discrete wavelet transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: unrelated to tutorial
It would be really nice if these preprocessing methods could be used in forecasting as well. Maybe we could construct a sub preprocessing method like createWaveletFeatures()
that can be used on TimeTasks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We kinda have that.
My proposal for the API is contained in the fda_pull1_task_featExtract branch.
I am not entirely sure, this is how it is going to be, but we can build upon that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep
The currently supported preprocessing functions are:
as we can extend this list along with an example whenever a new method is added.
Or do we want a getFDAFeatureExtractors()
or getFDAFeaturePreprocessingMethods()
Update: |
Did you guys rebase this yet to be up to date with gh-pages? travis works on gh-pages now |
I do not get why travis fails here and yet works on gh-pages....
|
Not sure, I did a rebase on mine and was able to get it to work |
Continued in #125 |
This is a first draft for the tutorial on functional data.
The section depends on several pull-requests that are not merged into mlr yet.
The Text and Structure can be reviewed though.