Skip to content

This project is currently in the refactoring stage, but it can already make it clear how to combine Onnx and C#.

License

Notifications You must be signed in to change notification settings

eedudka/TextClassification_on_CSharp

Repository files navigation

TextClassification on C#

A project of a classical classification problem built on C# WinFroms using CatBoost converted to Onnx, as well as FastText.NetWrapper.

User Interface

img1 img2

Third-party librarys used

  1. FastText NetWrapper
  2. Yandex.CatBoost
  3. Onnx.Runtime
  4. MaterialSkin

How it works (in the form of a diagram)

Text

DataSet

BBC News DataSet

Solutions that I have been looking for for a long time and wanted to share

  • Connecting and using the onnx model
    • This is the most difficult of all stages in the course of solving this problem, since there is documentation on the Internet, but it is not exhaustive. Therefore, here is the part of the code that is responsible for the operation of the Onnx model.
 var modelInputLayerName = session.InputMetadata.Keys.Single(); // we get the name of the first (input) layer of the model

                    var dimensions = new int[] { 1, 300 }; // we set the size of the input data (if we draw analogies, it's like in Tensorflow input_shame)
                    var inputTensor = new DenseTensor<float>(TextVector, dimensions); // description of the input Tensor
                    var modelInput = new List<NamedOnnxValue>() //create input data
                    {
                        NamedOnnxValue.CreateFromTensor(modelInputLayerName, inputTensor)
                    };
                    return int.Parse(session.Run(modelInput) // processing and return of the result
                                        .ToList()[0]
                                        .AsTensor<long>()
                                        .ToArray()[0]
                                        .ToString());
  • Сontextual removal of homonymy and application to a certain form of the word
    • For ease of processing the input text, the MyStem(Yandex) was used. Of course, deeper semantic analysis requires the use of more modern solutions and approaches.
await File.WriteAllTextAsync("Data/input.txt", textFromPath.ToString(), Encoding.Default);
                using (var MySt = new Process())
                {
                    MySt.StartInfo.FileName = "Data/mystem.exe";
                    MySt.StartInfo.Arguments = "-lc Data/input.txt Data/output.txt";
                    MySt.StartInfo.UseShellExecute = false;
                    MySt.StartInfo.CreateNoWindow = true;
                    MySt.Start();
                    MySt.WaitForExit();
                    MySt.Close();
                }
                var textAfterMyStem =await File.ReadAllTextAsync("Data/output.txt", Encoding.Default);

Updates

  • 02 july 2022 - Update class TextSegmaentation : after refactoring the TS class, processing decreased from 01:52 min (380 files) until 00:56.
  • 03 jule 2022 - Update class Classification : processing time has improved slightly, but now it has become more readable.
  • 05 jule 2022 - Update class DataGridDataAdd : А bit of a shift from procedural to oop. There was a need to change the idea of presenting data in tables. This change will increase the speed of work.
  • 09 jule 2022 - During the revision of the HttpPostRequest class, it became clear that it was necessary to abandon the use of var with types where the system selects a Nullable option for them. Nullable types can certainly be modern, but I don't think they are correct.
  • 17 jule 2022 - Updated the data display in DataGridView, also removed unnecessary TaskRun. All this led to a reduction in the "processing + display" operation time, but left the same principle of "processed - showed the result".

About

This project is currently in the refactoring stage, but it can already make it clear how to combine Onnx and C#.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages