Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# Wrapper System.AccessViolationException #1280

Open
Cavan09 opened this issue Jan 16, 2023 · 4 comments
Open

C# Wrapper System.AccessViolationException #1280

Cavan09 opened this issue Jan 16, 2023 · 4 comments
Labels
help wanted Maintainers want help because they don't have the knowledge or the time, or for another reason

Comments

@Cavan09
Copy link

Cavan09 commented Jan 16, 2023

I was wondering if there are any C# devs here that may have experience using LibLouis and using the DLLImport to run translations. I've put together a basic sample of what I'm doing (leaving out some logging and a some integration code).

Generally the code is working just fine, however after running multiple translations in a longer process I tend to get smacked with a couple exceptions which cannot be caught by through try catch. These exceptions are:

System.AccessViolationException - Read/Writing protected memory.

I have tried this on several difference machines, as well as VM's. All yield the same results, and I call the lou_free() method after any process has finished. For testing, I have also tried calling it after each translation (I know that is not correct, I was just hoping it might clear any potential memory issues).

The versions I have tested are:
liblouis 3.24.0
liblouis 3.22.0

If there is any additional info needed please let me know.

The code below is the wrapper.

using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Text;

namespace Sample
{
    public static class SampleWrapper
    {
        [DllImport(@"liblouis\bin\liblouis.dll", CharSet = CharSet.Unicode)]
        private static extern unsafe int lou_translateString(
            [MarshalAs(UnmanagedType.LPStr)]
            [In] string tableList,
            [In] byte[] inbuf,
            [In, Out] IntPtr inlen,
            [Out] byte[] outbuf,
            [In, Out] IntPtr outlen,
            [In] Typeforms[] typeform,
            [MarshalAs(UnmanagedType.LPStr)]
            string spacing,
            int mode);

        [DllImport(@"liblouis\bin\liblouis.dll", EntryPoint = "lou_charSize")]
        private static extern int lou_charSize();

        [DllImport(@"liblouis\bin\liblouis.dll", EntryPoint = "lou_free")]
        private static extern void lou_free();

        public static string TranslateString(string text, Typeforms[] sourceTypeformMap)
        {
            //Get the encoding type based on the lou_charSize.
            var size = lou_charSize();
            var encoding = GetEncoding(size);

            //Encode the input string and set up buffers and int pointers.
            var converted = encoding.GetBytes(text);
            var maxInSize = text.Length * size;

            //Set up the output buffers.
            var maxOutSize = Math.Max(text.Length * (size * 2), 4096);
            var outBuff = new byte[maxOutSize];

            var translation = "";

            //Get the translation table
            var tables = @"liblouis\tables\en-ueb-g2.ctb";


                unsafe
                {
                    var intPtr = new IntPtr(&maxInSize);
                    var outPrt = new IntPtr(&maxOutSize);

                    //Note: Liblouis docs on typeforms says the input buffer should be the size of the max output buffer.
                    //Yet this works current at the size of input.
                    //Run the translation
                    lou_translateString(tables, converted, intPtr,
                        outBuff, outPrt, sourceTypeformMap, null, 128);

                    Array.Resize(ref outBuff, maxOutSize * size);
                    //Decode the translation
                    translation = encoding.GetString(outBuff);
                }

                //trim out any empty characters.
            return translation;
        }

        /// <summary>
        /// Gets the encoding based on the character size from libluois
        /// </summary>
        /// <param name="size">Character size, 4 bytes is UTF-32, anything else is UTF-16</param>
        /// <returns>Character encoding</returns>
        private static Encoding GetEncoding(int size)
        {
            if (size == 4)
            {
                return Encoding.GetEncoding("UTF-32");
            }

            return Encoding.GetEncoding("UTF-16");
        }

        public enum Typeforms : ushort
        {
            None = 0,
            Italic = 1,
            Underline = 2,
            Bold = 4,
            Script = 8,
            TNEmbed = 16,
        }
    }
}

@bertfrees
Copy link
Member

@Cavan09 I have no idea what could be causing this. And I do not have experience with C#. Have you become any wiser in the meantime?

@bertfrees bertfrees added the help wanted Maintainers want help because they don't have the knowledge or the time, or for another reason label Jun 12, 2023
@JensJensenPublic
Copy link

I think that the problem is that the Garbage Collector moves the buffers "after running multiple translations".
I have included the following "fixed" sentence in my own code.
The 2 byte* variables are not used for anything, but this construction seems to prevent the GC from moving the buffers.

fixed (byte* pInBuf = converted, pOutBuff = outBuff) // Prevents GarbageCollector from moving the buffers
{
// Existing logic goes here
}

@tibbsa
Copy link
Contributor

tibbsa commented Sep 30, 2024

I tried reproducing this and suspect it has something to do with longer strings and perhaps exceeding the #define MAXSTRING 2048 limit set in internal.h. I could not reproduce the problem with shorter strings, even after thousands of iterations, but once you get past 2048, weird things begin to happen. The C# runtime doesn't necessarily pick up on it right away, sometimes it take a few instances of this happening before it throws an exception, but note the odd output that begins toward the end of the returned translation strings in the following examples.

All is well for these:

Input string at length 504:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?" So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting
Input size: 2016
Output buffer size: 4096
Translation:
`#7`.<,alice 0 2g9n+ to get v tir$ ( sitt+ by h} si/} on ! bank1 & ( hav+ no?+ to d3 once or twice %e _h peep$ 9to ! book h} si/} 0 r1d+1 b x _h no pictures or 3v}sa;ns 9 x1 8& :at is ! use ( a book10 ?"| ,alice1 8)|t pictures or 3v}sa;ns80 ,s %e 0 3sid}+ 9 h} {n m9d "<z well z %e cd1 = ! hot "d made h} f`.>`#'eel v sleepy & /upid">1 :e!r ! pl1sure ( mak+ a daisy-*a9 wd 2 wor? ! tr|ble ( gett+

But once we exceed an input buffer size of 2048, garbage starts to appear at the end:

Input string at length 519:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?" So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking
Input size: 2076
Output buffer size: 4152
Translation:
`#7`.<,alice 0 2g9n+ to get v tir$ ( sitt+ by h} si/} on ! bank1 & ( hav+ no?+ to d3 once or twice %e _h peep$ 9to ! book h} si/} 0 r1d+1 b x _h no pictures or 3v}sa;ns 9 x1 8& :at is ! use ( a book10 ?"| ,alice1 8)|t pictures or 3v}sa;ns80 ,s %e 0 3sid}+ 9 h} {n m9d "<z well z %e cd1 = ! hot "d made h} feel v sleepy an`.>`#'d /upid">1 :e!r ! pl1sure ( mak+ a daisy-*a9 wd 2 wor? ! tr|ble ( gett+ u~2_2p a_2nd p`#2i`2_2ck+

... somewhat later ...

Input string at length 554:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?" So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White
Input size: 2216
Output buffer size: 4432
Translation:
`#7`.<,alice 0 2g9n+ to get v tir$ ( sitt+ by h} si/} on ! bank1 & ( hav+ no?+ to d3 once or twice %e _h peep$ 9to ! book h} si/} 0 r1d+1 b x _h no pictures or 3v}sa;ns 9 x1 8& :at is ! use ( a book10 ?"| ,alice1 8)|t pictures or 3v}sa;ns80 ,s %e 0 3sid}+ 9 h} {n m9d "<z well z %e cd1 = ! hot "d made h} feel v sleepy & /upid">1 :e!r ! pl1sure ( mak+ a daisy-*a9 wd 2 wor? ! tr|ble ( gett+ up & pick+ ! daisies1 :5 sudd5ly`.>`#' .#2_#2`2~2a  `.>~1~#2_2.2h_#2i~'`'.#'te

The last quasi-successful try (it doesn't crash, but the returned translation is garbage at the end):

Input string at length 586:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?" So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close
Input size: 2344
Output buffer size: 4688
Translation:
`#7`.<,alice 0 2g9n+ to get v tir$ ( sitt+ by h} si/} on ! bank1 & ( hav+ no?+ to d3 once or twice %e _h peep$ 9to ! book h} si/} 0 r1d+1 b x _h no pictures or 3v}sa;ns 9 x1 8& :at is ! use ( a book10 ?"| ,alice1 8)|t pictures or 3v}sa;ns80 ,s %e 0 3sid}+ 9 h} {n m9d "<z well z %e cd1 = ! hot "d made h} feel v sleepy & /upid">1 :e!r ! pl1sure ( mak+ a daisy-*a9 wd 2 wor? ! tr|ble ( gett+ up & pick+ ! daisies1 :5 sudd5ly`.>`#' .#2_#2`2~2a  `.>~1~#2_2.2h_#2i~'`'.#'te ,ra2it "#1~#1`#1_2wi`.>`#'~#'"#'th p9k ey"#1`1.1~2_2es r`.>~1_2.2a~#'_#2n c`.>`1~1_2.2l~#'_#2o~'`'.#'s`2e

And then it throws an exception on this one:

Input string at length 589:
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice, "without pictures or conversations?" So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by
Input size: 2356
Output buffer size: 4712
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at LibLouis.LLWrapper.lou_translateString(System.String, Byte[], IntPtr, Byte[], IntPtr, Typeforms[], System.String, Int32)
--------------------------------
   at LibLouis.LLWrapper.TranslateString(System.String, Typeforms[])
   at Program.Main(System.String[])

Your "fix" did not help me. Nor did trying to avoid the memory manager by allocating buffers directly from the marhsaler (Marshal.AllocHGlobal).

Do you know the size of your inputs? Are they relatively small or might something similar be happening there?

@JensJensenPublic
Copy link

Without having looket much further into it, I think that you are right: This problem has to do with the string length.
A quick and dirty practical solution could be to split the input string into shorter substrings, for instance on each period or LineFeed.
This would probably do for all practical purposes.
I think that a more detailed ananysis of the root-problem would require that you build the LibLouis dll on your own machine together with a small testprogramin in C/C++ to avoid the marshalling and allow detailed, native debugging. For even if the marshalling might not be the cause of the problem it certainly blurs the symptoms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Maintainers want help because they don't have the knowledge or the time, or for another reason
Projects
None yet
Development

No branches or pull requests

4 participants