Permalink
Find file Copy path
e05ee62 Jun 8, 2016
1 contributor

Users who have contributed to this file

105 lines (66 sloc) 3.15 KB

SimpleHelpers.FileEncoding

NuGet GitHub license

Detect any text file charset encoding using Mozilla Charset Detector (UDE.CSharp).

FileEncoding support almost all charset encodings (utf-8, utf-7, utf-32, ISO-8859-1, ...). It checks if the file has a BOM header, and if not FileEncoding will load and analize the file bytes and try to decide its charset encoding.

Features

  • Byte order mark (BOM) detection
  • Analyse file content
  • Comprehensive charset encodings detection
  • Large files support

Installation

NuGet Package Details

You can install using NuGet, see SimpleHelpers.FileEncoding at NuGet.org

PM> Install-Package SimpleHelpers.FileEncoding

The nuget package contains C# source code.

The source code will be installed in your project with the following file system structure:

|-- <project root>
    |-- SimpleHelpers
        |-- FileEncoding.cs

Download

If you prefer, you can also download the source code: FileEncoding.cs

Dependencies

Compiled version of "C# port of Mozilla Universal Charset Detector"

This userful library can detect the charset encoding by analysing a byte array.

API

DetectFileEncoding

Tries to detect the file encoding by checking byte order mark (BOM) existence and then loading a part of the file and tries to detect the charset using UDE.CSharp

    var encoding = FileEncoding.DetectFileEncoding ("./my_text_file.txt");

TryLoadFile

Tries to load file content with the correct encoding. This is a shortcut that uses System.IO.File.ReadAllText to load the file content, but first it detects the correct encoding.

If the file doesn't exist or it couldn't be loaded, the provided defaultValue (second parameter) will be returned.

    var content = FileEncoding.TryLoadFile ("./my_text_file.txt", "");

Detect

Detects the encoding of textual data of the specified input data

var det = new FileEncoding ();
using (var stream = new System.IO.FileStream (inputFilename, System.IO.FileMode.Open))
{
    det.Detect (inputStream);
}

// Finalize detection phase and gets detected encoding name
var encoding = det.Complete ();

// check results
Console.WriteLine ("IsText = {0}", det.IsText);
Console.WriteLine ("HasByteOrderMark = {0}", det.HasByteOrderMark);
Console.WriteLine ("EncodingName = {0}", det.EncodingName);

Project Information