# 🧐 Numerical Stability

Machine learning algorithms often involve many mathematical operations, like matrix multiplications, optimizations, and iterative updates. (In this context) Numerical stability refers to how well an algorithm handles floating-point arithmetic errors during calculations. This is crucial because these calculations can lead to small rounding errors that accumulate over time and cause incorrect results

__Key Concepts__
- _Floating point precision_: Computers represent real numbers in binary (base-2) using a limited number of bits. Commonly, single precision (32 bits) or double precision (64 bits) are used to represent numbers. However, due to limited precision, some real numbers cannot be exactly represented. This can lead to small errors in calculations. For example, the number 1/3 cannot be represented exactly, and rounding errors accumulate during iterative processes.
- _Overflow_ occurs when a calculation results in a number too large to be represented within the range of the chosen number format (e.g., 32-bit or 64-bit). For instance, multiplying two large numbers might exceed the maximum number that can be stored
- _Underflow_ happens when a calculation results in a number that is too small (close to zero) to be represented within the precision of the system. This often happens in computations involving very small probabilities or when numbers get very close to zero, like in softmax or normalization functions

In [None]:
// Precision Loss Example
float sum = 0;
for (int i = 0; i < 1000000; i++)
    sum += 1e-6f;  // Small increment
Console.WriteLine($"Precision Loss Example: {sum}");  // Expected to be close to 1, but may be slightly different

# 🎁 NumFlat

- https://github.com/sinshu/numflat

The goal of this project is to provide a lightweight package for handling various mathematical and computational tasks, including linear algebra, multivariable analysis, clustering, and signal processing, using only C#

<img src=images/euclidean-manhattan.png width=400>

In [None]:
#r "nuget: NumFlat"

In [None]:
// https://github.com/sinshu/numflat/blob/main/NumFlatTest/DistanceTests.cs

using NumFlat;

Vec<double> x = [1, 2, 3]; // the issue of libraries coming up with their implementation of Vector and Tensor
Vec<double> y = [1, 3, 7]; // and often will not be beneficiary of any newer runtime optimization/support

var euclidean = Distance.Euclidean.GetDistance(x, y); //(x - y).Norm()
var manhattan = Distance.Manhattan.GetDistance(x, y); //(x - y).L1Norm()

In [None]:
string[] words = { "clinic", "hospital" };
string userInput = "hopital";

Func<string, int, Vec<double>> vectorize = (word, maxLength) =>
{
    var paddedWord = word.PadRight(maxLength, '\0'); // Pad with null characters
    return new Vec<double>(paddedWord.Select(c => (double)c).ToArray()); // how character are numbers
}; 

int maxLength = Math.Max(userInput.Length, words.Max(w => w.Length)); // Determining the maximum length
var wordVectors = words.Select(word => vectorize(word, maxLength)).ToArray();
var userVector = vectorize(userInput, maxLength);

// Calculate Euclidean distances
var distances = wordVectors.Select(
    (vector, index) => (Word: words[index], Distance: Distance.Euclidean.GetDistance(userVector, vector)))
    .OrderBy(result => result.Distance);

foreach(var distance in distances)
    Console.WriteLine($"Closest match: {distance.Word}, Distance: {distance.Distance}");

In [None]:
static int DamerauLevenshteinDistance(this string s, string t)
{
    var bounds = new { Height = s.Length + 1, Width = t.Length + 1 };

    int[,] matrix = new int[bounds.Height, bounds.Width];

    for (int height = 0; height < bounds.Height; height++) { matrix[height, 0] = height; };
    for (int width = 0; width < bounds.Width; width++) { matrix[0, width] = width; };

    for (int height = 1; height < bounds.Height; height++)
    {
        for (int width = 1; width < bounds.Width; width++)
        {
            int cost = (s[height - 1] == t[width - 1]) ? 0 : 1;
            int insertion = matrix[height, width - 1] + 1;
            int deletion = matrix[height - 1, width] + 1;
            int substitution = matrix[height - 1, width - 1] + cost;

            int distance = Math.Min(insertion, Math.Min(deletion, substitution));

            if (height > 1 && width > 1 && s[height - 1] == t[width - 2] && s[height - 2] == t[width - 1])
                distance = Math.Min(distance, matrix[height - 2, width - 2] + cost);

            matrix[height, width] = distance;
        }
    }

    return matrix[bounds.Height - 1, bounds.Width - 1];
}

string[] words = { "clinic", "hospital" };
string userInput = "hopital";

var q = from w in words
        select new { word = w, Distance = w.DamerauLevenshteinDistance(userInput) };
foreach(var r in q.OrderBy(w => w.Distance))
    Console.WriteLine($"{r.word} {r.Distance}");

- ChatGPT agrees; you dont need to be Mathematical Guru and can easily figure this out with some basic prompts
- https://chatgpt.com/share/678e0c58-c820-800b-aff3-7a2a6ebfd8a2

<img src=images/levenshtein-distance.png>

Vectors can also be created from objects that implement IEnumerable<T>. Since the vector itself is an IEnumerable<T>, it is also possible to call LINQ methods on the vector if needed.

In [None]:
using NumFlat;

// Some enumerable.
var enumerable = Enumerable.Range(0, 10).Select(i => i / 10.0);

// Create a vector from an enumerable.
var vector = enumerable.ToVector();

// Show the vector.
Console.WriteLine(vector);

# 📐 Vectors

<img src=images/whats-vector.png width=900>

- __Binary Vectors__ contains only binary values (0 or 1)
    - often used in feature engineering to represent the presence or absence of a feature 
- __One-Hot Vectors__ is a binary vector with a single 1 and the rest 0s 👈
    - these are Used to to represent categorical data, we encode categorical variables in machine learning models
    - House is Urban or Rural; we can encode this categorical data as either [1, 0] or [0, 1]

- __Count Vector__ represents the frequency of terms in a document or dataset
    - used in text processing (e.g., bag-of-words feature extraction technique)
- __Frequency Vectors__ represent the occurrence or frequency of events or features
    - Used in text analysis and signal processing

- __Unit Vector__ has magnitude (length) of 1; their values are 0-1, we obtain these numbers normalizing input vector
    - these are Used in scenarios where the direction of the vector matters more than its magnitude (cosine similarity)
    - [0.6, 0.8] normalized form of [3, 4]
- __Probability Vector__ contains values that sum to 1, representing probabilities
    - used in classification tasks (eg output of a softmax layer in neural networks)
    - [0.1, 0.7, 0.2] (probabilities of three classes)
- __Normalized Vectors__ are scaled to have a specific norm (eg L2 norm of 1)
    - these are used to standardize vector magnitudes for comparison (eg cosine similarity)
    - [0.6, 0.8] (L2-normalized form of [3, 4])
        - Euclidean distance is the straight-line distance between two points in space
        - sqrt(0.6^2 + 0.8^2) = 1
        - 3/5 + 4/5
        - sqrt(3^2 + 4^2) = 5
    - L2 Normalization aka Euclidean Norm is a technique used to scale vectors so that their L2 norm (Euclidean length) becomes 1. It is commonly used in machine learning, NLP, and deep learning
        - __Prevents dominance__: Ensures no single feature dominates due to magnitude
        - __Stabilizes optimization__: Helps gradient-based algorithms converge faster
        - __Improves generalization__: Reduces sensitivity to feature scaling

- __Embedding Vector__ are dense, low-dimensional representations of high-dimensional data (e.g., words, images, or categories) learned by models
    - Often used in natural language processing (e.g., Word2Vec, GloVe) and recommendation systems
    - [0.25, -0.1, 0.7] (a 3D embedding of a word)
    - __Composite Vectors__ are formed by combining multiple vectors (eg concatenation, averaging) 👈
        - these are used in feature engineering and multi-modal learning
        - Concatenating word embeddings to form a sentence embedding

- __Latent Vectors__ are learned representations in a lower-dimensional space that capture underlying patterns in the data
    - these are used in unsupervised learning (eg autoencoders, generative models)
    - eg [0.3, -0.5, 0.2] can be a latent representation of an image
- __Weight Vectors__ represent the parameters of a model (eg, coefficients in linear regression, weights in a neural network)
    - Used in model training and inference
    - [0.5, -0.2, 0.3] (weights of a linear model)
- __Gradient Vector__ contains the partial derivatives of a function with respect to its parameters
    - these are used in optimization algorithms (eg gradient descent) (learning of neural network)
- __Eigenvectors__ are special vectors in linear algebra that do not change direction when a linear transformation (like a matrix multiplication) is applied to them. Instead, they only get scaled by a constant factor called the eigenvalue
    - these are used in dimensionality reduction (eg Principal Component Analysis / PCA) and spectral clustering
    - also used in Graph algorithms, Computer Vision and Quantum mechanics

# 📐 Sparse and Dense Vectors

<img src=images/vectors-sparse-dense.webp width=800>

- Sparse Vectors can be generated according to application / business need
- Dense Vectors are generated generally using Neural Networks

In [None]:
record Person(int ID, string Name);
record Movie(int ID, string Name);
record WatchedMovie(int PersonID, int MovieID);

IEnumerable<Person> persons = [
    new Person(1, "Khurram"),
    new Person(2, "Mohammad"),
    new Person(3, "Abdullah")
];
IEnumerable<Movie> movies = [
    new Movie(1, "The Shawshank Redemption"),
    new Movie(2, "Top Gun"),
    new Movie(3, "Man of Steel")
];
IEnumerable<WatchedMovie> watchedMovies = [
    new WatchedMovie(1, 1), new WatchedMovie(1, 3),
    new WatchedMovie(2, 2), new WatchedMovie(2, 3)
];

### Sparse Vector

In [None]:
var sparseVectors = persons.Select(person =>
    new
    {
        PersonID = person.ID,
        SparseVector = movies.Select(movie =>
            watchedMovies.Any(w => w.PersonID == person.ID && w.MovieID == movie.ID) ? 1 : 0
        ).ToList()
    });

foreach (var vector in sparseVectors)
    Console.WriteLine($"PersonID: {vector.PersonID}, Sparse Vector: [{string.Join(", ", vector.SparseVector)}]");

### Dense Vector

This often involves assigning a real number to each movie watched or skipped, rather than using binary (sparse) values. To create a "dense vector" representation
- we can map each person's watched movie pattern into a continuous vector space
- we can project the sparse vectors into a continuous space
- we can use a simple mathematical transformation without needing a full neural network

#### Idea 1: One-Hot Vector

- Represent each movie as a unique one-hot vector based on the number of movies
- Map the sparse vector (list of watched movie IDs) to a single dense vector by averaging the one-hot vectors of the watched movies

In [None]:
var oneHotVectors = movies.Select(movie =>
    new 
    { 
        MovieID = movie.ID, 
        OneHotVector = movies.Select(m => m.ID == movie.ID ? 1 : 0).ToList() 
    }).ToDictionary(x => x.MovieID, x => x.OneHotVector);

var denseVectors = persons.Select(person =>
{
    var watched = watchedMovies.Where(w => w.PersonID == person.ID).Select(w => oneHotVectors[w.MovieID]).ToList();

    return new
    {
        PersonID = person.ID,
        DenseVector = watched.Any()
            ? watched.Aggregate(new int[movies.Count()],
                (acc, vector) => acc.Zip(vector,
                    (x, y) => x + y).ToArray()
                ).Select(x => (double)x / watched.Count).ToList()
            : Enumerable.Repeat(0.0, movies.Count()).ToList()
    };
});

foreach (var vector in denseVectors)
    Console.WriteLine($"PersonID: {vector.PersonID}, Dense Vector: [{string.Join(", ", vector.DenseVector)}]");

__What does this mean?__
- Each dimension in the dense vector represents a particular movie
    - The number of dimensions in the dense vector depends on the problem context and the dataset
    - We can reduce the dimensions; if required using techniques like __Principal Component Analysis__
- The value in a dimension is the normalized "contribution" of that movie to the person’s watched movies
    - Relationships between movies watched by a person
    - Magnitude of interaction with each movie
    - If someone has watched two out of three movies; say first and third and we are getting [0.5, 0, 0.5]
        - This means the person is equally influenced by Movie 1 and Movie 3 in this representation

#### Idea 2: Weighted Movie Embedding

Sounds controversial; but mathematically / statistically it is not

- Assign a unique weight (e.g., a random number) to each movie, which represents its contribution to the vector
- These weights could simulate learned parameters in a neural network
- Aggregate Weights per Person: For each person, calculate their dense vector by summing the weights of the movies they have watched

In [None]:
var movieWeights = movies.ToDictionary(
    movie => movie.ID,
    movie => new Random(movie.ID).NextDouble() // Assign random weights based on Movie ID
);

var denseVectors = persons.Select(person =>
    new
    {
        PersonID = person.ID,
        DenseVector = movies.Select(movie =>
            watchedMovies.Any(w => w.PersonID == person.ID && w.MovieID == movie.ID)
                ? movieWeights[movie.ID]
                : 0.0 // If not watched, weight is 0
        ).ToList()
    });

foreach (var vector in denseVectors)
    Console.WriteLine($"PersonID: {vector.PersonID}, Dense Vector: [{string.Join(", ", vector.DenseVector)}]");

- This approach mimics embedding layers in neural networks where categorical features are mapped to a dense vector representation
- This is not truly "learned" as in a neural network but serves as a conceptual visualization of how dense vectors might look

### Sparse Representation

Sparse Vectors on surface looks inefficient; but libraries (and vector databases) internally optimizes these vectors using "sparse representations"

__Sparse Representation__
- Dense vector: [0, 0, 3, 0, 0, 7, 0]
- Sparse representation: (indices: [2, 5], values: [3, 7])

# 🔢 Floating-point numeric types

- https://en.wikibooks.org/wiki/A-level_Computing/AQA/Paper_2/Fundamentals_of_data_representation/Floating_point_numbers
    - Mantissa x Exponent
    - 6.63 x 10^-34
    - Computers uses base 2; hardware efficiency (Shift Operations)
        - 1/3, 2/3, even 1/10
            - 1/5 and 1/10 etc; 5 is not a power of 2
    - C#' __decimal__ uses base 10
- Floating Point Units
- Recent Processors can do multiple floating point operations per clock cycle per core
    - Intel Recent Processors with AVX512 and FMA can do 32ops on FP64 and 64ops for FP32
    - AMD Zen 2/3 can do 16ops on FP64 and 32ops on FP32
    - Qualcomm Kyro 4xx and 5xx can do 8ops on FP64 and 16 on FP32
    - Samsung Exynos M3 and M4, 3ops on FP64 and 12 on FP32
    - https://en.wikipedia.org/wiki/Floating_point_operations_per_second

- __FP8__ 1 Byte; no .NET native representation
    - Usually 2 decimal places; used in AI/ML (NVIDIA Tensor Cores)
- __FP16__ 2 Bytes; System.Half
    - No built in type; 3-5 decimal places; introduced in .NET 8
- __FP32__ 4 Bytes; System.Single
    - __float__ has 6-9 digits precision
- __FP64__ 8 Bytes; System.Double
    - __double__ has 15-17 digits precision
- __FP128__ 16 Bytes; no .NET native representation
    - System.Decimal is 16 bytes / 128 bit floating point implementation but its not IEE 754 / FP128 compatible
    - __decimal__ has 28-29 digits precision; FP128  has 34; suited for financial, accounting, precise decimal calculations
    - FP128/IEE 754

In [None]:
float f = -5.75f;
int intBits = BitConverter.SingleToInt32Bits(f); // IEE754 representation of float / reinterpreting
// once we have reinterpretted bits; we can do bit operations and extract sign, mantissa and exponent

Console.WriteLine($"Float: {f}");
Console.WriteLine($"Binary: {Convert.ToString(intBits, 2).PadLeft(32, '0')}");
Console.WriteLine($"Hex: 0x{intBits:X8}");

## Floats and Vectors

__Normalizing to [-1, 1]__

- Better Distribution for Activation Functions
    - Neural networks (especially those with sigmoid or tanh activations) perform better with inputs in this range, preventing vanishing/exploding gradients
    - Hyperbolic Tangent function, Tanh outputs naturally range from [-1, 1] making it a good fit
- Numerical Stability in Floating-Point Arithmetic
    - Floats have higher precision around small values (closer to 0)
    - The standard IEEE 754 32-bit float has more precision between -1 and 1 than for very large/small numbers
- Prevention of Overflow and Underflow
    - Many ML/AI algorithms involve exponentiation, which can easily overflow with large numbers
    - Keeping numbers small helps prevent such issues

__Float Precision in the [-1, 1] Range__

Floating-point numbers (IEEE 754) allocate bits as:
- 1 bit for sign
- 8 bits for exponent
- 23 bits for the fraction (mantissa)

For floats in [-1, 1]:
- The exponent is small, leaving more bits for the fraction (mantissa)
- The precision is highest when numbers are close to zero
- In this range, a 32-bit float gives approximately 7 decimal places of precision, which is generally enough for ML computations

## FP8 Implementation

FP8 FORMATS FOR DEEP LEARNING
- https://arxiv.org/pdf/2209.05433

In [None]:
using System;

public readonly struct FP8 : IEquatable<FP8>
{
    readonly byte value;

    public FP8(byte value) => this.value = value;

    // Convert from float (FP32) to FP8
    public static explicit operator FP8(float f)
    {
        if (float.IsNaN(f)) return new FP8(0x7F); // NaN
        if (float.IsPositiveInfinity(f)) return new FP8(0x7C); // +Inf
        if (float.IsNegativeInfinity(f)) return new FP8(0xFC); // -Inf
        if (f == 0f) return new FP8(0x00); // Zero

        int sign = f < 0 ? 1 : 0;
        f = Math.Abs(f);

        /*
         * E5M2 5 bits for exponent and 2 bits for mantissa
         * E4M3 will have more precision but lesser range
         */

        int bias = 15; // E5M2 bias
        int maxExp = 31; // Max exponent for E5M2
        int mantissaBits = 2; // Mantissa size for E5M2

        int floatBits = BitConverter.SingleToInt32Bits(f);
        int floatExp = (floatBits >> 23) & 0xFF; // Extracting the exponent (bits 23-30)
        int floatMantissa = floatBits & 0x7FFFFF; // Lower 32 bits of Mantissa

        if (floatExp == 0) return new FP8((byte)(sign << 7)); // Subnormal / Zero

        int exponent = floatExp - 127 + bias;
        if (exponent < 0) return new FP8((byte)(sign << 7)); // Underflow to zero
        if (exponent > maxExp) return new FP8((byte)((sign << 7) | (maxExp << mantissaBits))); // Overflow to Inf

        int mantissa = floatMantissa >> (23 - mantissaBits); // Truncate mantissa
        return new FP8((byte)((sign << 7) | (exponent << mantissaBits) | mantissa));
    }

    // Convert from FP8 to float (FP32)
    public static explicit operator float(FP8 fp8)
    {
        int sign = (fp8.value & 0x80) != 0 ? -1 : 1;
        int exponent = (fp8.value >> 2) & 0x1F; // Extract exponent
        int mantissa = fp8.value & 0x03; // Extract mantissa
        int bias = 15; // E5M2 bias

        if (exponent == 0) return sign * (mantissa / 4f); // Subnormal case
        if (exponent == 31) return sign * float.PositiveInfinity; // Infinity/NaN

        return sign * (1 + mantissa / 4f) * (float)Math.Pow(2, exponent - bias);
    }

    // Arithmetic operations (via float conversions)
    public static FP8 operator +(FP8 a, FP8 b) => (FP8)((float)a + (float)b);
    public static FP8 operator -(FP8 a, FP8 b) => (FP8)((float)a - (float)b);
    public static FP8 operator *(FP8 a, FP8 b) => (FP8)((float)a * (float)b);
    public static FP8 operator /(FP8 a, FP8 b) => (FP8)((float)a / (float)b);

    public override bool Equals(object obj) => obj is FP8 other && Equals(other);
    public bool Equals(FP8 other) => value == other.value;
    public override int GetHashCode() => value.GetHashCode();
    public override string ToString() => $"{(float)this:F6} (0x{value:X2})";
}

FP8 a = (FP8)1.5f;
FP8 b = (FP8)2.75f;

Console.WriteLine($"a = {a}");
Console.WriteLine($"b = {b}");

FP8 sum = a + b;
Console.WriteLine($"a + b = {sum} (as float: {(float)sum})");

FP8 difference = a - b;
Console.WriteLine($"a - b = {difference} (as float: {(float)difference})");

FP8 product = a * b;
Console.WriteLine($"a * b = {product} (as float: {(float)product})");

FP8 quotient = a / b;
Console.WriteLine($"a / b = {quotient} (as float: {(float)quotient})");

In [None]:
using System;
using System.Numerics;
using System.Globalization;

public readonly struct FP8 : INumber<FP8>
{
    const int Bias = 15, MaxExp = 31, MantissaBits = 2;
    readonly byte value;

    public FP8(byte value) => this.value = value;

    public static explicit operator FP8(float f)
        => float.IsNaN(f) ? new FP8(0x7F)
         : float.IsPositiveInfinity(f) ? new FP8(0x7C)
         : float.IsNegativeInfinity(f) ? new FP8(0xFC)
         : f == 0f ? new FP8(0x00)
         : CreateFromFloat(f);

    private static FP8 CreateFromFloat(float f)
    {
        int sign = f < 0 ? 1 : 0;
        f = Math.Abs(f);
        int bits = BitConverter.SingleToInt32Bits(f);
        int exp = (bits >> 23) & 0xFF;
        int man = bits & 0x7FFFFF;
        if (exp == 0) return new FP8((byte)(sign << 7));
        int newExp = exp - 127 + Bias;
        if (newExp < 0) return new FP8((byte)(sign << 7));
        if (newExp > MaxExp) return new FP8((byte)((sign << 7) | (MaxExp << MantissaBits)));
        int newMan = man >> (23 - MantissaBits);
        return new FP8((byte)((sign << 7) | (newExp << MantissaBits) | newMan));
    }

    public static explicit operator float(FP8 fp8)
    {
        int sign = (fp8.value & 0x80) != 0 ? -1 : 1;
        int exp = (fp8.value >> MantissaBits) & 0x1F;
        int man = fp8.value & 0x03;
        if (exp == 0) return sign * (man / 4f);
        if (exp == MaxExp) return man == 0 ? sign * float.PositiveInfinity : float.NaN;
        return sign * (1 + man / 4f) * (float)Math.Pow(2, exp - Bias);
    }

    public static FP8 operator +(FP8 a, FP8 b) => (FP8)((float)a + (float)b);
    public static FP8 operator -(FP8 a, FP8 b) => (FP8)((float)a - (float)b);
    public static FP8 operator *(FP8 a, FP8 b) => (FP8)((float)a * (float)b);
    public static FP8 operator /(FP8 a, FP8 b) => (FP8)((float)a / (float)b);
    public static FP8 operator %(FP8 a, FP8 b) => (FP8)((float)a % (float)b);

    public static FP8 operator +(FP8 value) => value;
    public static FP8 operator -(FP8 value) => (FP8)(-((float)value));
    public static FP8 operator ++(FP8 value) => value + One;
    public static FP8 operator --(FP8 value) => value - One;

    public static bool operator ==(FP8 left, FP8 right) => left.Equals(right);
    public static bool operator !=(FP8 left, FP8 right) => !left.Equals(right);
    public static bool operator <(FP8 left, FP8 right) => ((float)left) < ((float)right);
    public static bool operator <=(FP8 left, FP8 right) => ((float)left) <= ((float)right);
    public static bool operator >(FP8 left, FP8 right) => ((float)left) > ((float)right);
    public static bool operator >=(FP8 left, FP8 right) => ((float)left) >= ((float)right);

    public int CompareTo(FP8 other) => ((float)this).CompareTo((float)other);
    int IComparable.CompareTo(object? obj)
        => obj is FP8 other ? CompareTo(other) : throw new ArgumentException("Not a FP8");

    public bool Equals(FP8 other) => value == other.value;
    public override bool Equals(object? obj) => obj is FP8 other && Equals(other);
    public override int GetHashCode() => value.GetHashCode();

    public override string ToString() => ((float)this).ToString("G", CultureInfo.InvariantCulture);
    public string ToString(string? format, IFormatProvider? provider) => ((float)this).ToString(format, provider);
    public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider)
        => ((float)this).TryFormat(destination, out charsWritten, format, provider);

    public static FP8 Parse(string s, IFormatProvider? provider) => (FP8)float.Parse(s, provider);
    public static FP8 Parse(string s, NumberStyles style, IFormatProvider? provider) => (FP8)float.Parse(s, style, provider);
    public static FP8 Parse(ReadOnlySpan<char> s, IFormatProvider? provider) => (FP8)float.Parse(s.ToString(), provider);
    public static FP8 Parse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider? provider)
        => (FP8)float.Parse(s.ToString(), style, provider);

    public static bool TryParse(string? s, IFormatProvider? provider, out FP8 result)
        => float.TryParse(s, NumberStyles.Float, provider, out float f)
            ? (result = (FP8)f, true).Item2
            : (result = default, false).Item2;
    public static bool TryParse(string? s, NumberStyles style, IFormatProvider? provider, out FP8 result)
        => float.TryParse(s, style, provider, out float f)
            ? (result = (FP8)f, true).Item2
            : (result = default, false).Item2;
    public static bool TryParse(ReadOnlySpan<char> s, IFormatProvider? provider, out FP8 result)
        => float.TryParse(s.ToString(), NumberStyles.Float, provider, out float f)
            ? (result = (FP8)f, true).Item2
            : (result = default, false).Item2;
    public static bool TryParse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider? provider, out FP8 result)
        => float.TryParse(s.ToString(), style, provider, out float f)
            ? (result = (FP8)f, true).Item2
            : (result = default, false).Item2;

    public static FP8 Abs(FP8 value) => (FP8)Math.Abs((float)value);
    public static bool IsCanonical(FP8 value) => true;
    public static bool IsComplexNumber(FP8 value) => false;
    public static bool IsEvenInteger(FP8 value) => IsInteger(value) && (((long)(float)value) & 1L) == 0;
    public static bool IsFinite(FP8 value) => !IsInfinity(value) && !IsNaN(value);
    public static bool IsImaginaryNumber(FP8 value) => false;
    public static bool IsInfinity(FP8 value)
    {
        int exp = (value.value >> MantissaBits) & 0x1F;
        int man = value.value & 0x03;
        return exp == MaxExp && man == 0;
    }
    public static bool IsInteger(FP8 value) => Math.Floor((float)value) == (float)value;
    public static bool IsNaN(FP8 value)
    {
        int exp = (value.value >> MantissaBits) & 0x1F;
        int man = value.value & 0x03;
        return exp == MaxExp && man != 0;
    }
    public static bool IsNegative(FP8 value) => (value.value & 0x80) != 0;
    public static bool IsNegativeInfinity(FP8 value) => IsInfinity(value) && IsNegative(value);
    public static bool IsNormal(FP8 value) => ((value.value >> MantissaBits) & 0x1F) is not 0 and not MaxExp;
    public static bool IsOddInteger(FP8 value) => IsInteger(value) && (((long)(float)value) & 1L) != 0;
    public static bool IsPositive(FP8 value) => !IsNegative(value) && !IsZero(value);
    public static bool IsPositiveInfinity(FP8 value) => IsInfinity(value) && !IsNegative(value);
    public static bool IsRealNumber(FP8 value) => true;
    public static bool IsSubnormal(FP8 value) => ((value.value >> MantissaBits) & 0x1F) == 0 && (value.value & 0x7F) != 0;
    public static bool IsZero(FP8 value) => (value.value & 0x7F) == 0;

    public static FP8 MaxMagnitude(FP8 x, FP8 y) => Math.Abs((float)x) >= Math.Abs((float)y) ? x : y;
    public static FP8 MaxMagnitudeNumber(FP8 x, FP8 y) => MaxMagnitude(x, y);
    public static FP8 MinMagnitude(FP8 x, FP8 y) => Math.Abs((float)x) <= Math.Abs((float)y) ? x : y;
    public static FP8 MinMagnitudeNumber(FP8 x, FP8 y) => MinMagnitude(x, y);

    public static int Radix => 2;
    public static FP8 AdditiveIdentity => Zero;
    public static FP8 MultiplicativeIdentity => One;

    public static bool TryConvertFromChecked<TOther>(TOther value, out FP8 result) where TOther : INumberBase<TOther>
    {
        try { result = (FP8)Convert.ToSingle(value); return true; } catch { result = default; return false; }
    }
    public static bool TryConvertFromSaturating<TOther>(TOther value, out FP8 result) where TOther : INumberBase<TOther>
        => TryConvertFromChecked(value, out result);
    public static bool TryConvertFromTruncating<TOther>(TOther value, out FP8 result) where TOther : INumberBase<TOther>
        => TryConvertFromChecked(value, out result);
    public static bool TryConvertToChecked<TOther>(FP8 value, out TOther result) where TOther : INumberBase<TOther>
    {
        try { result = (TOther)Convert.ChangeType((float)value, typeof(TOther)); return true; }
        catch { result = default!; return false; }
    }
    public static bool TryConvertToSaturating<TOther>(FP8 value, out TOther result) where TOther : INumberBase<TOther>
        => TryConvertToChecked(value, out result);
    public static bool TryConvertToTruncating<TOther>(FP8 value, out TOther result) where TOther : INumberBase<TOther>
        => TryConvertToChecked(value, out result);

    public static FP8 One => (FP8)1f;
    public static FP8 Zero => (FP8)0f;
    public static FP8 NegativeOne => (FP8)(-1f);
}

FP8 a = (FP8)1.5f;
FP8 b = (FP8)2.75f;

Console.WriteLine($"a = {a}");
Console.WriteLine($"b = {b}");

FP8 sum = a + b;
Console.WriteLine($"a + b = {sum} (as float: {(float)sum})");

FP8 difference = a - b;
Console.WriteLine($"a - b = {difference} (as float: {(float)difference})");

FP8 product = a * b;
Console.WriteLine($"a * b = {product} (as float: {(float)product})");

FP8 quotient = a / b;
Console.WriteLine($"a / b = {quotient} (as float: {(float)quotient})");

## FP1.58b Playground

In [None]:
using System;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

public static class TernaryLutAvx
{
    static readonly byte[] AddLut = new byte[9] {
        0, 0, 1,  // -1 + {-1, 0, 1}
        0, 1, 2,  //  0 + {-1, 0, 1}
        1, 2, 2   //  1 + {-1, 0, 1}
    };

    public static byte AddPackedByte(byte a, byte b)
    {
        byte result = 0;

        for (int i = 0; i < 4; i++)
        {
            int va = (a >> (i * 2)) & 0b11;
            int vb = (b >> (i * 2)) & 0b11;

            int idx = va * 3 + vb;
            byte sum = AddLut[idx];

            result |= (byte)(sum << (i * 2));
        }

        return result;
    }

    public static void AddPackedTernary(byte[] a, byte[] b, byte[] result)
    {
        if (!Avx2.IsSupported) throw new PlatformNotSupportedException("AVX2 not supported");
        int len = a.Length;
        int i = 0;

        unsafe
        {
            fixed (byte* pa = a, pb = b, pr = result)
            {
                for (; i <= len - 32; i += 32)
                {
                    var va = Avx.LoadVector256(pa + i);
                    var vb = Avx.LoadVector256(pb + i);

                    for (int j = 0; j < 32; j++)
                        pr[i + j] = AddPackedByte(pa[i + j], pb[i + j]);
                }
            }
        }

        for (; i < len; i++)
            result[i] = AddPackedByte(a[i], b[i]);
    }

    public static int[] Unpack(byte packed)
    {
        int[] result = new int[4];
        for (int i = 0; i < 4; i++)
        {
            int val = (packed >> (i * 2)) & 0b11;
            result[i] = val switch
            {
                0 => -1,
                1 => 0,
                2 => 1,
                _ => 0 // reserved
            };
        }
        return result;
    }

    public static byte Pack(params int[] values)
    {
        byte packed = 0;
        for (int i = 0; i < values.Length && i < 4; i++)
        {
            byte val = values[i] switch
            {
                -1 => 0,
                0 => 1,
                1 => 2,
                _ => throw new ArgumentException()
            };
            packed |= (byte)(val << (i * 2));
        }
        return packed;
    }
}

var a = new byte[]
{
    TernaryLutAvx.Pack(-1, 0, 1, -1),
    TernaryLutAvx.Pack(0, 1, -1, 0)
};
var b = new byte[]
{
    TernaryLutAvx.Pack(1, -1, 0, 1),
    TernaryLutAvx.Pack(-1, 0, 1, -1)
};

var result = new byte[a.Length];
// Perform ternary addition
TernaryLutAvx.AddPackedTernary(a, b, result);

// Display unpacked result
Console.WriteLine("Result:");
foreach (var packed in result)
{
    var unpacked = TernaryLutAvx.Unpack(packed);
    Console.WriteLine(string.Join(", ", unpacked));
}

Result:
0, -1, 1, 0
-1, 1, 0, -1


# 📚 Resources

- https://qdrant.tech/articles/sparse-vectors
- https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp

## ⚙️ Tools

- https://github.com/PowerShell/PowerShell
- https://github.com/microsoft/terminal
- https://dotnet.microsoft.com
    - https://github.com/dotnet
- https://code.visualstudio.com/
    - https://github.com/microsoft/vscode
    - https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode
    - https://github.com/dotnet/interactive