# Working with file data in C#

Everything we need to work with files and file system is in namespace `System.IO`. So we start with importing it:

In [8]:
using System.IO;

## Text files

One of the most common tasks is to read and write strings. For smaller amount of data -- small enough to be held in memory at once -- there are high-level APIs `WriteAllText` and `ReadAllText`. They allow writing/reading of strings with a single line of code:

In [9]:
// WriteAllText will write string to a text file
// Existing file is silently overwritten, non-existing file is created
var s1 = "Hello, world!";
File.WriteAllText("demo.txt", s1);

// ReadAllText will read string from a text file
var s2 = File.ReadAllText("demo.txt");
s2

Hello, world!

Attempt to read non-existing file will result in `FileNotFoundException`:

In [10]:
File.ReadAllText("notfound.txt")

Error: System.IO.FileNotFoundException: Could not find file 'c:\Users\Altair\Source\Repos\CSharp-Notebooks\Concepts\notfound.txt'.
File name: 'c:\Users\Altair\Source\Repos\CSharp-Notebooks\Concepts\notfound.txt'
   at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
   at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
   at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, Nullable`1 unixCreateMode)
   at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
   at System.IO.File.ReadAllText(String path, Encoding encoding)
   at Submission#11.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

With text files we often want to process file contents by lines. That's why we have the `ReadAllLines` and `WriteAllLines` methods. They work exactly as the previous, but use `string[]` or `IEnumerable<string>`:

In [11]:
File.ReadAllLines("Files-CRLF.txt")

Windows uses the `CRLF` sequence as line separator and Linux and Mac uses `LF`. The `WriteAllLines` method uses the separator appropriate for the platform the code is running on. The `ReadAllLines` method accepts both. The previous file used `CRLF`, the next one uses just `LF` and the results would be identical.

In [12]:
File.ReadAllLines("Files-LF.txt")

There is also method `ReadLines` which returns `IEnumerable<string>` instead of `string[]` and allows reading lines one by one using enumerator, ie. in `foreach` loop:

In [13]:
var lines = File.ReadLines("Files-CRLF.txt");
foreach(var line in lines) {
    Console.WriteLine(line);
}

Lorem ipsum
Dolor sit amet
Consectetur adipiscing elit


Methods `AppendAllText` and `AppendAllLines` will append the string at end of existing file, instead of overwriting it. If the file does not exist, it's created. The following code will append current date and time at end of file every time it's run:

In [18]:
// Append line with current date and time
File.AppendAllLines("append.txt", new[] { DateTime.Now.ToString() });

// Display contents of the file
File.ReadAllText("append.txt")

02.08.2023 20:37:56
02.08.2023 20:37:59
02.08.2023 20:38:23
02.08.2023 20:38:25
02.08.2023 20:38:26
02.08.2023 20:54:42
02.08.2023 22:41:45
02.08.2023 22:41:48
02.08.2023 22:41:49
02.08.2023 22:41:50
02.08.2023 22:41:51


### Character encoding


The above methods all use the UTF-8 encoding without [BOM](https://en.wikipedia.org/wiki/Byte_order_mark). If you want to use different encoding, use different overloads accepting appropriate `System.Text.Encoding` instance: 

In [19]:
// .NET supports only US-ASCII, Unicode and Latin1 encodings by default
// To use legacy 1-byte encoding (like Windows-1250), we have to call the following method and register them explicitly:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

var s = "Žluťoučký kůň úpěl ďábelské ódy.";
File.WriteAllText("Demo-UTF8-without-BOM.txt", s);
File.WriteAllText("Demo-UTF8-with-BOM.txt", s, System.Text.Encoding.UTF8);
File.WriteAllText("Demo-1250.txt", s, System.Text.Encoding.GetEncoding("Windows-1250"));

The UTF encodings are detected automatically. The legacy encodings have to be specified explicitly, if they have to be understood correctly:

In [20]:
// The following will work
Console.WriteLine(File.ReadAllText("Demo-UTF8-without-BOM.txt"));
Console.WriteLine(File.ReadAllText("Demo-UTF8-with-BOM.txt"));

// The following won't, because the file was written using Windows-1250 encoding, but is read as Unicode:
Console.WriteLine(File.ReadAllText("Demo-1250.txt"));

// To read it correctly, we have to specify the encoding explicitly
Console.WriteLine(File.ReadAllText("Demo-1250.txt", System.Text.Encoding.GetEncoding("Windows-1250")));

Žluťoučký kůň úpěl ďábelské ódy.
Žluťoučký kůň úpěl ďábelské ódy.
�lu�ou�k� k�� �p�l ��belsk� �dy.
Žluťoučký kůň úpěl ďábelské ódy.


### Asynchronous methods


The methods used until now are all synchronous. But for performance reasons, use of `async` method is generally advisable. Luckily, all the methods have they asynchronous variants:

In [21]:
var s1 = "Hello, world!";
await File.WriteAllTextAsync("demo.txt", s1);

var s2 = await File.ReadAllTextAsync("demo.txt");
s2

Hello, world!

This is also true for all other file access methods mentioned in this tutorial, regardless of them working with text, binary data, streams, readers etc.

## Binary files


The above methods work with text strings. If we want to write arbitrary binary data (represented as `byte[]` or similar), there are also methods for that. The are called `WriteAllBytes` and `ReadAllBytes`:

In [22]:
// Generate 32 bytes (256 bits) of random data
var buffer = System.Security.Cryptography.RandomNumberGenerator.GetBytes(32);

// Write those bytes to a file
File.WriteAllBytes("random.bin", buffer);

// Read those bytes
var readBuffer = File.ReadAllBytes("random.bin");

readBuffer

Of course they also have their async variants:

In [23]:
await File.WriteAllBytesAsync("random.bin", buffer);
var readBufferAsync = await File.ReadAllBytesAsync("random.bin");
readBufferAsync

## Streams


The above methods all work with the data at once. Which is fine for data small enough to fit into RAM, but for bigger data we want to split them into more manageable chunks. That's what streams are for.

The abstract class `System.IO.Stream` allows you to work with any kind of data sources, that can be sequentially read or written. If the implementation supports it, you can move the virtual "cursor" trough the stream back and forth. Using children of this class you can work with memory (`MemoryStream`), files (`FileStream`), network connection (`NetworkStream`) or more abstract concepts like archives or encryption algorithms.

Let's create a new file and work with its contents a bit using `FileStream`:

In [24]:
// Let's create some data -- bytes with values from 0x01 to 0x0F:
var buffer = new byte[] { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F };

// Now we will create new file and access it as a stream
var fs = File.Create("data.bin");

// Write first five bytes
for(var i = 0; i < 5; i++){
    fs.WriteByte(buffer[i]);
}

// Move cursor to the beginning of stream
fs.Seek(0, SeekOrigin.Begin);

// Read and display all bytes until the end of file
while(true) {
    var b = fs.ReadByte();
    if(b == -1) {
        // The ReadByte returns int; it's either value read (0-255) or -1 if at end of stream
        Console.WriteLine();
        break;
    }
    Console.Write(b.ToString("X2") + " ");
}

// If we opened the stream, we should close it, to save all changes and release the file
fs.Close();

01 02 03 04 05 


There are various low-level methods for working with streams and you can find them in [documentation](https://learn.microsoft.com/en-us/dotnet/api/system.io.stream).

### Readers and writers


Working with streams directly is usually not convenient. We have bunch of abstraction classes that simplify reading specific types of data from streams (and maybe some other sources).

The most low-level of them are the [`System.IO.BinaryReader`](https://learn.microsoft.com/en-us/dotnet/api/system.io.binaryreader) and [`System.IO.BinaryWriter`](https://learn.microsoft.com/en-us/dotnet/api/system.io.binarywriter) classes.

They simplify reading and writing of simple data types (like boolean and numerics) from and to binary files.

But what you'll probably use most are the `TextReader` and `TextWriter` classes. These are abstract classes that simplify reading and writing of text strings. Their implementations are `StreamReader` and `StreamWriter` (for working with streams) and `StringReader` and `StringWriter` (for working with strings the same way).

> **Note:** There are various other classes called _reader_ in .NET. Like `System.Xml.XmlReader` or `System.Text.Json.Utf8JsonReader`. They are not related to what we discussed here.

Let's use these classes to write and read some lines:

In [25]:
var w = File.CreateText("reader-writer.txt");
for(var i = 1; i < 10; i++) {
    w.WriteLine($"This is line number {i}.");
}
w.Close();

var r = File.OpenText("reader-writer.txt");
var lineNumber = 0;
while(!r.EndOfStream) {
    var line = r.ReadLine();
    lineNumber++;
    Console.WriteLine($"{lineNumber,3}: {line}");
}
r.Close();

  1: This is line number 1.
  2: This is line number 2.
  3: This is line number 3.
  4: This is line number 4.
  5: This is line number 5.
  6: This is line number 6.
  7: This is line number 7.
  8: This is line number 8.
  9: This is line number 9.


> **Note:** With regard of `CRLF` and `LF`, the `WriteLine` and `ReadLine` behave exactly the same way like `WriteAllLines` and `ReadAllLines` do. Which is only logical, because the later use the former.

We can read the data character-by-character using the `Read` method, which will return the current character and move cursor to the next one. The `Peek` method returns next character (as `int`) or `-1` if no character is available -- and won't move the cursor.

In [26]:
var r = File.OpenText("reader-writer.txt");
while(r.Peek() != -1) {
    var c = (char)r.Read();
    Console.Write(c);
}
r.Close();

This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 7.
This is line number 8.
This is line number 9.


### Methods of the File class


The `System.IO.File` class has a bunch of methods to work with file data:

* **Binary access** (uses `FileStream`)
  * `Open` - the most universal method that can open file in any mode, depending on `FileMode` value
  * `Create` - create file, overwrite if exists
  * `OpenRead` - open for reading, fail if does not exist
  * `OpenWrite` - open for writing, create if not exist
* **Text access** (uses `StreamReader` or `StreamWriter`)
  * `CreateText` - create file, overwrite if exists
  * `OpenText` - open for reading, fail if does not exist
  * `AppendText` - open for appending, create if not exist