This repository has been archived by the owner on Jul 15, 2022. It is now read-only.
Replies: 1 comment
-
Thanks for the write-up. In general, I don't mind if we add a new feature
namespace if you think it makes more sense in terms of API and easier
usage. Some things to consider:
- I was thinking about stacking also word features with different
pretrained weights, although I think stacking spacy features would be more
useful and thus of higher priority.
- for deciding between option 1 and 2, please consider the impact on hpo
definitions and readability of logging in wanddb/mlflow. Not sure which one
is better. I remember other option we discussed, with a dict with spacy
feature as keys, have you considered this one too?
- for custom features, we need further discussion I fail to see how this
automatic feature creation would work and how should head config would look
like. Also, even if this is done under the hood, is this conceptually a
spacy feature? In general I would prefer more explicit configuration of
features, not automatically added ones under the hood. Conceptually,
entities is a feature you will be passing and want to configure as the
other features. Also, this might be useful outside the relation classifier,
for example for a text classifier which can leverage entities or other
custom features, I am unsure about the design if this responsibility goes
to the head.
El lun., 4 ene. 2021 16:34, David Fidalgo <notifications@github.com>
escribió:
… This is a follow-up idea to our discussion we had before Christmas
regarding the spacy token features.
I think in the end we decided to go with following scheme:
"features": {
"word": [
{"embedding_dim": 300, "weights_file": "fasttext"}, # default is text
{"embedding_dim": 16, "feature": "pos"},
{"embedding_dim": 32, "feature": "dep"},
{"embedding_dim": 8, "feature": "shape"}, # tokens’s string orthographic features like Xxxx
] # note that "word" is no longer the vocab namespace and each feature will have its own namespace
}
There are two things that still feel a bit itchy to me:
- the WordFeatures options weights_file, trainable and to some extent
lowercase_tokens only really make sense for the text feature_name
- when providing pre-tokenized text all the spacy pipeline features
(pos, dep, ...) are not available.
I thought that maybe we should be more explicit about where the features
come from, that is create a new *spacy* feature:
"features": {
"word": {"embedding_dim": 300, "weights_file": "fasttext"},
"spacy": {"attributes": ["pos", "dep"], "embedding_dims": [16, 32]} # i chose "attributes" in order not to repeat "features" inside the "features" key ...
}
or
"features": {
"word": {"embedding_dim": 300, "weights_file": "fasttext"},
"spacy": [
{"attribute": "pos", "embedding_dim": 16},
{"attribute": "dep", "embedding_dim": 32},
],
}
I would only choose the latter configuration type if we also want to
support stacking the other features, that is several transformers or word
(with different weights_file) features. @dvsrepo
<https://github.com/dvsrepo> I mention this, since i do not know if, for
you, the main motivation of stacking features was actually stacking spacy
features, was it?
The spacy feature can also accept a custom attribute, but i think in the
use case of the RelationClassification head, we actually should add this
feature automatically to the config. Meaning, the PipelineConfiguration
class for example, adds automatically a
"spacy": {"attributes": ["bilou_from_relation_entity"], "embedding_dims": [32]} # the embedding_dim is taken from the `RelationClassification` head config
feature if a RelationClassification head is detected. I think this is
less error-prone, and if a head absolutely requires a certain feature, it
should be added automatically.
What do you think @dvsrepo <https://github.com/dvsrepo> @frascuchon
<https://github.com/frascuchon> ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#487>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIOJJYKLUAQ5B7TZHX5HM3SYHNXRANCNFSM4VTFRH3Q>
.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a follow-up idea to our discussion we had before Christmas regarding the spacy token features.
I think in the end we decided to go with following scheme:
There are two things that still feel a bit itchy to me:
WordFeatures
optionsweights_file
,trainable
and to some extentlowercase_tokens
only really make sense for thetext
feature_nameI thought that maybe we should be more explicit about where the features come from, that is create a new spacy feature:
or
I would only choose the latter configuration type if we also want to support stacking the other features, that is several transformers or word (with different weights_file) features. @dvsrepo I mention this, since i do not know if, for you, the main motivation of stacking features was actually stacking spacy features, was it?
The spacy feature can also accept a custom attribute, but i think in the use case of the
RelationClassification
head, we actually should add this feature automatically to the config. Meaning, thePipelineConfiguration
class for example, adds automatically afeature if a
RelationClassification
head is detected. I think this is less error-prone, and if a head absolutely requires a certain feature, it should be added automatically.What do you think @dvsrepo @frascuchon ?
Beta Was this translation helpful? Give feedback.
All reactions