Scala yaml parsing #17

jakemannix · 2020-04-03T05:33:12Z

scala makes simple data structs very concise, and it plays nicely with jackson, including the nice:

@JsonIgnoreProperties(ignoreUnknown = true)

annotation!

… *want* the proto builder to be doing.

…_building_update_wip

…get it a test

…new signature

…th required=False

…ference signature

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/FeaturePreprocessor.scala

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/FeatureProcessors.scala

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/FeaturesConfigHelper.scala

darshshah · 2020-04-06T21:16:32Z

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/ModelFeaturesConfig.scala

+case class FeatureConfig(@JsonProperty("node_name") nodeName: String,
+                         @JsonProperty("dtype") dTypeString: String,
+                         @JsonProperty("serving_info") servingConfig: ServingConfig,
+                         @JsonProperty("default_value") defaultValue: String,


With this, we are NOT moving default_value under serving_info. Is this something we stick with for now? We can update it in v2 but this will have a cascading effect later on (i.e. changes to yaml and backward compatibility).

ah, we can do either way - what did we say we wanted to do? I can change it to whatever way we want the config to be. I didn't want to break any current configs.

We don't need to worry about backwards compatibility until this ships, and even then, until a major version of ml4ir ships, we can change, as long as we know models go out with new jars as needed.

Let's move this under serving_info to make it future safe as we may have defaults during training time.

Also update the model_features.yaml to move the default_value under serving_info.

darshshah

@jakemannix I have reviewed this code and left some comments. It looks in good shape. One thing to update is the ModelExecutorConfig and also both poms (make it 0.0.2-SNAPSHOT so that I can publish 0.0.2 and send a PR for 0.0.3-SNAPSHOT). Otherwise, you can also merge #11 in your branch.

…e python side, then these unit tests should pass.

darshshah · 2020-04-07T20:00:35Z

jvm/ml4ir-inference/pom.xml

@@ -10,6 +10,10 @@
  <artifactId>ml4ir-inference</artifactId>
  <packaging>jar</packaging>

+  <properties>
+    <jackson.version>2.7.5</jackson.version>


Update this to 2.10.0

We can re-use the jackson version from parent's pom instead of mentioning it here again. But we have to use 2.10.0 (as this version is not compatible with spring)

darshshah · 2020-04-07T20:03:02Z

jvm/pom.xml

@@ -13,6 +13,7 @@
    <scala.version>2.11.8</scala.version>
    <scala.artifact.suffix>2.11</scala.artifact.suffix>
    <tensorflow.version>1.15.0</tensorflow.version>
+    <jackson.version>2.7.5</jackson.version>


2.10.0 here as well

jakemannix

@darshshah this now hopefully has all of your suggestions, plus is rebased against master. Try building it and seeing how it works for you?

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/SequenceExampleBuilder.scala

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/TFRecordIO.scala

darshshah · 2020-04-10T18:59:27Z

Some of the code in GenericSequenceExampleBuildingTest.scala is not used in any tests. E.g. serializeTestData/sampleSequenceExamples, etc. Ideally, we should remove such code but I think we are planning to create a test case using this in the next iteration. So we can keep it for now.

darshshah

This PR has code/changes from all the currently open JVM PRs. Approving this. Great job @jakemannix !

jakemannix · 2020-04-10T19:05:06Z

Some of the code in GenericSequenceExampleBuildingTest.scala is not used in any tests. E.g. serializeTestData/sampleSequenceExamples, etc. Ideally, we should remove such code but I think we are planning to create a test case using this in the next iteration. So we can keep it for now.

I can remove it and put it on another branch. Cleaner that way

darshshah · 2020-04-10T19:13:41Z

Some of the code in GenericSequenceExampleBuildingTest.scala is not used in any tests. E.g. serializeTestData/sampleSequenceExamples, etc. Ideally, we should remove such code but I think we are planning to create a test case using this in the next iteration. So we can keep it for now.

I can remove it and put it on another branch. Cleaner that way

Sounds good. Let's do that so that we have a clean 0.0.2

darshshah and others added 30 commits March 11, 2020 14:07

Freezing 0.0.1 and bumping the current version to 0.0.2-SNAPSHOT

afde5ad

wip: refactor proto building

e2b50ca

failing unit test updated: still fails, but now truly encodes what we…

e6f421c

… *want* the proto builder to be doing.

updated WIP

f21c4ec

update SavedModelBundle to use the FeatureConfig

bf0a05c

[WIP] adding yaml parsing and FeatureConfig builder

e89cea9

Updating ModelFeatures and adding a test to parse the yaml file

dd74647

wip

e9076fc

Merge branch 'sequence_example_building_update' into sequence_example…

0cbe890

…_building_update_wip

unit tests passing again. start on an interface for feature extraction

d5faefc

remove doubly-added files

f7b01cd

Merge branch 'master' into sequence_example_building_no_yaml

128ec9b

fix broken test compile

8058303

tests passing, but need to wire together the FeaturePreprocessor and …

3203de5

…get it a test

forgot to add this

b54fd0f

wip (failing tests currently!)

12cd393

fmt

6a6273b

extract to public outer classes

931f5dc

minor cleanup

313f24f

scala-based yaml parsing POC

a355a85

less aligning of scala code

ba599b5

reorg!

3e5e62e

reminder why it's failing

50dcc07

Adding functionality to load weights from SavedModel and resave with …

8db8bc5

…new signature

Adding separate serving signature for inference which omits fields wi…

763d91e

…th required=False

Checkpointing resaving trained model works

e61457e

Updating parse_args help

a9eda44

Adding dynamic num_records for inference, i.e., no padding

9464d84

Checkpoint save and reload of SavedModel with padding removed from in…

5339bc8

…ference signature

Fixing tests, adding pad_records_at_inference arg

e53ab6a

salesforce-cla bot removed the cla:missing label Apr 4, 2020

darshshah reviewed Apr 6, 2020

View reviewed changes

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/FeaturePreprocessor.scala Show resolved Hide resolved

darshshah reviewed Apr 6, 2020

View reviewed changes

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/FeatureProcessors.scala Outdated Show resolved Hide resolved

darshshah reviewed Apr 6, 2020

View reviewed changes

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/FeaturesConfigHelper.scala Outdated Show resolved Hide resolved

darshshah reviewed Apr 6, 2020

View reviewed changes

lastmansleeping and others added 4 commits April 6, 2020 15:41

Adding support for queries with 1 record

1d43034

Need aa newly trained model utilizing the latest and greatest from th…

b2638f3

…e python side, then these unit tests should pass.

Merge branch 'version-update' into scala_yaml_parsing

80b9e89

Merge branch 'ashish/serving_signature' into integration_0.0.2

6fac6a0

darshshah reviewed Apr 7, 2020

View reviewed changes

Jake added 4 commits April 9, 2020 11:57

wip

690894b

working unit tests!

2730617

saved model bundle now checking scores sensibly

0d6cfb6

Merge branch 'master' into scala_yaml_parsing

0ed67ea

salesforce-cla bot added the cla:signed label Apr 10, 2020

jakemannix changed the base branch from model_features_parsing_cleanup to master April 10, 2020 02:40

fix build break by adding experimental tfrecord writing code

9156571

jakemannix commented Apr 10, 2020

View reviewed changes

darshshah reviewed Apr 10, 2020

View reviewed changes

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/SequenceExampleBuilder.scala Outdated Show resolved Hide resolved

darshshah reviewed Apr 10, 2020

View reviewed changes

jvm/ml4ir-inference/src/main/scala/ml4ir/inference/tensorflow/data/TFRecordIO.scala Outdated Show resolved Hide resolved

addressing PR comments

2706dbe

darshshah approved these changes Apr 10, 2020

View reviewed changes

cleanup

14692b5

jakemannix merged commit bdf1355 into master Apr 10, 2020

lastmansleeping deleted the scala_yaml_parsing branch February 15, 2022 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scala yaml parsing #17

Scala yaml parsing #17

jakemannix commented Apr 3, 2020

darshshah Apr 6, 2020

jakemannix Apr 6, 2020

darshshah Apr 6, 2020

darshshah Apr 7, 2020

darshshah left a comment

darshshah Apr 7, 2020

darshshah Apr 7, 2020

darshshah Apr 7, 2020

jakemannix left a comment

darshshah commented Apr 10, 2020

darshshah left a comment

jakemannix commented Apr 10, 2020

darshshah commented Apr 10, 2020

Scala yaml parsing #17

Scala yaml parsing #17

Conversation

jakemannix commented Apr 3, 2020

darshshah Apr 6, 2020

Choose a reason for hiding this comment

jakemannix Apr 6, 2020

Choose a reason for hiding this comment

darshshah Apr 6, 2020

Choose a reason for hiding this comment

darshshah Apr 7, 2020

Choose a reason for hiding this comment

darshshah left a comment

Choose a reason for hiding this comment

darshshah Apr 7, 2020

Choose a reason for hiding this comment

darshshah Apr 7, 2020

Choose a reason for hiding this comment

darshshah Apr 7, 2020

Choose a reason for hiding this comment

jakemannix left a comment

Choose a reason for hiding this comment

darshshah commented Apr 10, 2020

darshshah left a comment

Choose a reason for hiding this comment

jakemannix commented Apr 10, 2020

darshshah commented Apr 10, 2020