Language detection #77

adambabik · 2022-12-09T19:29:59Z

In the VS Code extension, we have a way to detect a language using https://github.com/microsoft/vscode-languagedetection. It has a model embedded which comes from https://github.com/yoeo/guesslang. It executes it using https://github.com/tensorflow/tfjs which comes with various backends like CPU or WebGL.

For runme, which is built and distributed as a statically linked binary, the best would be to use Tensorflow Lite which is fairly small (<6MB). Unfortunately, the original guesslang model might not be easily converted to TF Lite. I tried that and got the following output:

2022-12-09 18:49:10.637176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-09 18:49:17.220826: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-09 18:49:20.342003: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-12-09 18:49:20.342041: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-12-09 18:49:20.343338: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /Users/adambabik/projects/github.com/yoeo/guesslang/guesslang/data/model
2022-12-09 18:49:20.346015: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-12-09 18:49:20.346046: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /Users/adambabik/projects/github.com/yoeo/guesslang/guesslang/data/model
2022-12-09 18:49:20.354134: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
2022-12-09 18:49:20.372631: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

	while inferring type of node 'dnn/zero_fraction/cond/output/_44'
2022-12-09 18:49:20.378047: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-12-09 18:49:20.409092: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /Users/adambabik/projects/github.com/yoeo/guesslang/guesslang/data/model
2022-12-09 18:49:20.425946: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 82615 microseconds.
2022-12-09 18:49:20.538034: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2022-12-09 18:49:20.820819: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2046] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexBincount, FlexCast, FlexConcatV2, FlexSparseFillEmptyRows, FlexSparseReshape, FlexSparseSegmentMean, FlexSparseSegmentSum, FlexStringSplit, FlexStringToHashBucketFast, FlexTensorListReserve, FlexTensorListSetItem, FlexTensorListStack
Details:
	tf.Bincount(tensor<?xi32>, tensor<i32>, tensor<0xi64>) -> (tensor<?xi64>) : {device = ""}
	tf.Cast(tensor<!tf_type.variant<tensor<*x!tf_type.string>>>) -> (tensor<!tf_type.variant>) : {Truncate = false}
	tf.ConcatV2(tensor<?x!tf_type.string>, tensor<10000x!tf_type.string>, tensor<i32>) -> (tensor<?x!tf_type.string>) : {device = ""}
	tf.SparseFillEmptyRows(tensor<?x2xi64>, tensor<?xi64>, tensor<2xi64>, tensor<i64>) -> (tensor<?x2xi64>, tensor<?xi64>, tensor<?xi1>, tensor<?xi64>) : {device = ""}
	tf.SparseReshape(tensor<?x2xi64>, tensor<2xi64>, tensor<2xi64>) -> (tensor<?x2xi64>, tensor<2xi64>) : {device = ""}
	tf.SparseSegmentMean(tensor<?x70xf32>, tensor<?xi32>, tensor<?xi64>) -> (tensor<?x70xf32>) : {device = ""}
	tf.SparseSegmentSum(tensor<?x54xf32>, tensor<?xi32>, tensor<?xi64>) -> (tensor<?x54xf32>) : {device = ""}
	tf.StringSplit(tensor<1x!tf_type.string>, tensor<!tf_type.string>) -> (tensor<?x2xi64>, tensor<?x!tf_type.string>, tensor<2xi64>) : {device = "", skip_empty = false}
	tf.StringToHashBucketFast(tensor<?x!tf_type.string>) -> (tensor<?xi64>) : {device = "", num_buckets = 5000 : i64}
	tf.TensorListReserve(tensor<i32>, tensor<i32>) -> (tensor<!tf_type.variant<tensor<*x!tf_type.string>>>) : {device = ""}
	tf.TensorListSetItem(tensor<!tf_type.variant>, tensor<i32>, tensor<?x!tf_type.string>) -> (tensor<!tf_type.variant<tensor<*x!tf_type.string>>>) : {device = ""}
	tf.TensorListStack(tensor<!tf_type.variant<tensor<*x!tf_type.string>>>, tensor<1xi32>) -> (tensor<?x?x!tf_type.string>) : {device = "", num_elements = -1 : i64}
See instructions: https://www.tensorflow.org/lite/guide/ops_select
2022-12-09 18:49:20.820999: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2057] The following operation(s) need TFLite custom op implementation(s):
Custom ops: StringNGrams
Details:
	tf.StringNGrams(tensor<?x!tf_type.string>, tensor<2xi64>) -> (tensor<?x!tf_type.string>, tensor<2xi64>) : {Tsplits = i64, device = "", left_pad = "", ngram_widths = [2], pad_width = 0 : i64, preserve_short_sequences = false, right_pad = "", separator = " "}
See instructions: https://www.tensorflow.org/lite/guide/ops_custom

Using:

from pathlib import Path

import tensorflow as tf

DATA_DIR = Path(__file__).absolute().parent.joinpath('data')
DEFAULT_MODEL_DIR = DATA_DIR.joinpath('model')

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(str(DEFAULT_MODEL_DIR))
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]

tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

It actually generated a model.tflite which might be possible to execute if StringNGrams is provided.

To be continued...

The text was updated successfully, but these errors were encountered:

adambabik · 2023-02-24T08:22:11Z

After all linking etc. I got to this point:

INFO: Created TensorFlow Lite delegate for select TF ops.
2023-02-24 09:09:31.976946: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: TfLiteFlexDelegate delegate: 10 nodes delegated out of 75 nodes with 6 partitions.

ERROR: Encountered unresolved custom op: StringNGrams.
See instructions: https://www.tensorflow.org/lite/guide/ops_custom 
ERROR: Node number 36 (StringNGrams) failed to prepare.
ERROR: Node number 4 (WHILE) failed to invoke.

We would need to provide StringNGrams custom operator which is not a trivial task. Additionally, I needed to link the libtensorflowlite_flex.dylib lib which weights over 200MB...

At this point, it feels like TensorflowLite is kinda a dead end for our use case, unfortunately.

adambabik mentioned this issue Dec 12, 2022

Port code block functions from the VS Code codebase #72

Closed

adambabik self-assigned this Dec 14, 2022

adambabik removed their assignment Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language detection #77

Language detection #77

adambabik commented Dec 9, 2022

adambabik commented Feb 24, 2023

Language detection #77

Language detection #77

Comments

adambabik commented Dec 9, 2022

adambabik commented Feb 24, 2023