Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection #77

Open
adambabik opened this issue Dec 9, 2022 · 1 comment
Open

Language detection #77

adambabik opened this issue Dec 9, 2022 · 1 comment

Comments

@adambabik
Copy link
Collaborator

In the VS Code extension, we have a way to detect a language using https://github.com/microsoft/vscode-languagedetection. It has a model embedded which comes from https://github.com/yoeo/guesslang. It executes it using https://github.com/tensorflow/tfjs which comes with various backends like CPU or WebGL.

For runme, which is built and distributed as a statically linked binary, the best would be to use Tensorflow Lite which is fairly small (<6MB). Unfortunately, the original guesslang model might not be easily converted to TF Lite. I tried that and got the following output:

2022-12-09 18:49:10.637176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-09 18:49:17.220826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-09 18:49:20.342003: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-12-09 18:49:20.342041: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-12-09 18:49:20.343338: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /Users/adambabik/projects/github.com/yoeo/guesslang/guesslang/data/model
2022-12-09 18:49:20.346015: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-12-09 18:49:20.346046: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /Users/adambabik/projects/github.com/yoeo/guesslang/guesslang/data/model
2022-12-09 18:49:20.354134: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
2022-12-09 18:49:20.372631: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

	while inferring type of node 'dnn/zero_fraction/cond/output/_44'
2022-12-09 18:49:20.378047: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-12-09 18:49:20.409092: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /Users/adambabik/projects/github.com/yoeo/guesslang/guesslang/data/model
2022-12-09 18:49:20.425946: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 82615 microseconds.
2022-12-09 18:49:20.538034: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2022-12-09 18:49:20.820819: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2046] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexBincount, FlexCast, FlexConcatV2, FlexSparseFillEmptyRows, FlexSparseReshape, FlexSparseSegmentMean, FlexSparseSegmentSum, FlexStringSplit, FlexStringToHashBucketFast, FlexTensorListReserve, FlexTensorListSetItem, FlexTensorListStack
Details:
	tf.Bincount(tensor<?xi32>, tensor<i32>, tensor<0xi64>) -> (tensor<?xi64>) : {device = ""}
	tf.Cast(tensor<!tf_type.variant<tensor<*x!tf_type.string>>>) -> (tensor<!tf_type.variant>) : {Truncate = false}
	tf.ConcatV2(tensor<?x!tf_type.string>, tensor<10000x!tf_type.string>, tensor<i32>) -> (tensor<?x!tf_type.string>) : {device = ""}
	tf.SparseFillEmptyRows(tensor<?x2xi64>, tensor<?xi64>, tensor<2xi64>, tensor<i64>) -> (tensor<?x2xi64>, tensor<?xi64>, tensor<?xi1>, tensor<?xi64>) : {device = ""}
	tf.SparseReshape(tensor<?x2xi64>, tensor<2xi64>, tensor<2xi64>) -> (tensor<?x2xi64>, tensor<2xi64>) : {device = ""}
	tf.SparseSegmentMean(tensor<?x70xf32>, tensor<?xi32>, tensor<?xi64>) -> (tensor<?x70xf32>) : {device = ""}
	tf.SparseSegmentSum(tensor<?x54xf32>, tensor<?xi32>, tensor<?xi64>) -> (tensor<?x54xf32>) : {device = ""}
	tf.StringSplit(tensor<1x!tf_type.string>, tensor<!tf_type.string>) -> (tensor<?x2xi64>, tensor<?x!tf_type.string>, tensor<2xi64>) : {device = "", skip_empty = false}
	tf.StringToHashBucketFast(tensor<?x!tf_type.string>) -> (tensor<?xi64>) : {device = "", num_buckets = 5000 : i64}
	tf.TensorListReserve(tensor<i32>, tensor<i32>) -> (tensor<!tf_type.variant<tensor<*x!tf_type.string>>>) : {device = ""}
	tf.TensorListSetItem(tensor<!tf_type.variant>, tensor<i32>, tensor<?x!tf_type.string>) -> (tensor<!tf_type.variant<tensor<*x!tf_type.string>>>) : {device = ""}
	tf.TensorListStack(tensor<!tf_type.variant<tensor<*x!tf_type.string>>>, tensor<1xi32>) -> (tensor<?x?x!tf_type.string>) : {device = "", num_elements = -1 : i64}
See instructions: https://www.tensorflow.org/lite/guide/ops_select
2022-12-09 18:49:20.820999: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2057] The following operation(s) need TFLite custom op implementation(s):
Custom ops: StringNGrams
Details:
	tf.StringNGrams(tensor<?x!tf_type.string>, tensor<2xi64>) -> (tensor<?x!tf_type.string>, tensor<2xi64>) : {Tsplits = i64, device = "", left_pad = "", ngram_widths = [2], pad_width = 0 : i64, preserve_short_sequences = false, right_pad = "", separator = " "}
See instructions: https://www.tensorflow.org/lite/guide/ops_custom

Using:

from pathlib import Path

import tensorflow as tf

DATA_DIR = Path(__file__).absolute().parent.joinpath('data')
DEFAULT_MODEL_DIR = DATA_DIR.joinpath('model')

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(str(DEFAULT_MODEL_DIR))
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]

tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

It actually generated a model.tflite which might be possible to execute if StringNGrams is provided.

To be continued...

@adambabik
Copy link
Collaborator Author

After all linking etc. I got to this point:

INFO: Created TensorFlow Lite delegate for select TF ops.
2023-02-24 09:09:31.976946: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: TfLiteFlexDelegate delegate: 10 nodes delegated out of 75 nodes with 6 partitions.

ERROR: Encountered unresolved custom op: StringNGrams.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom 
ERROR: Node number 36 (StringNGrams) failed to prepare.
ERROR: Node number 4 (WHILE) failed to invoke.

We would need to provide StringNGrams custom operator which is not a trivial task. Additionally, I needed to link the libtensorflowlite_flex.dylib lib which weights over 200MB...

At this point, it feels like TensorflowLite is kinda a dead end for our use case, unfortunately.

@adambabik adambabik removed their assignment Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant