-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for HOCON config. Fixes #29. #31
Conversation
I can help review it once the tests are fixed, thank you! |
@zhangxuhong Frustratingly, the tests pass locally and fail on Travis. I've got Travis running Docker with failing tests, so hopefully I'll be able to figure out what's going on. |
29a57a5
to
abd5f4b
Compare
* @param inputFeatureInfoConfig | ||
* @return | ||
*/ | ||
private def parseInputFeatureInfo(inputFeatureInfoConfig: Config): InputFeatureInfo = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some comment here why we need a separate parser for InputFeatureInfo
. is it mainly because we want to convert some fields to Option
type? Then how about the shape
field in OutputTensorInfo
? do we need a special handling here? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because the transform config has type Map[String, Map[String, Any]]
--none of the Scala libraries for typesafe config seem to support decoding to Map[String, Any]
. I think the right solution is to have HashInfo
and (a new)TokenizationInfo
extend the same sealed trait, call it TransformType
. Then the transform config becomes Map[String, Map[String, TransformType]]
and you should be able to decode, complete with nice exceptions.
I'm happy to make this change, but it involves touching other parts of the codebase beyond just the config, so I wanted to talk about it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looking at this some more, the transform config should probably be a case class with two optional fields hashInfo
and tokenization
. That gets rid of the sealed trait and enforces the field names in the config as it's parsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I left a minor comment.
Changes:
Because the transform config has type
Map[String, Map[String, Any]]
, the decoding ofFeature
andInputFeatureInfo
has to be done fairly manually. I was unable to find a Scala library that supported decoding typesafe Config objects to aMap[String, Any]
. The downside to this is that you lose a nice error message in the case of a malformedInputFeatureInfo
. (OutputTensorInfo
will still give a nice message.) I can think of three solutions to this:Any
could be anEither[HashInfo, TokenizationInfo]]
(it doesn't look likeTokenizationInfo
exists today). This obviously doesn't extend nicely if there are other transformations available in the future.HashInfo
and (a new)TokenizationInfo
extend the same sealed trait. I think this is the best option, but it involves changing the code where the transformers were used, so I didn't want to go ahead and do it without discussion.