Skip to content

Commit

Permalink
Merge pull request #68 from milankl/mk/conversion
Browse files Browse the repository at this point in the history
Match SoftPosit with 2022 posit standard
  • Loading branch information
milankl committed Jun 10, 2022
2 parents aa307d5 + add266c commit d55118a
Show file tree
Hide file tree
Showing 40 changed files with 1,129 additions and 1,086 deletions.
6 changes: 3 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name = "SoftPosit"
uuid = "0775deef-a35f-56d7-82da-cfc52f91364d"
version = "0.4.0"
version = "0.5.0"

[deps]
SoftPosit_jll = "f9aa12f2-fb2a-5e38-99be-91dba0a1f972"

[compat]
SoftPosit_jll = "0.4.1"
julia = "1.6"
SoftPosit_jll = "0.4.1, 0.4.2"
julia = "1.6, 1.7"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Expand Down
35 changes: 25 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@

[Julia](https://julialang.org/) types for the C-based [SoftPosit](https://gitlab.com/cerlane/SoftPosit) library - a posit arithmetic emulator.

Posit numbers are an alternative to floating-point numbers. Posits extend floats by introducing regime bits, that allow for a higher precision around one, yet a wide dynamic range of representable numbers. For further information see https://posithub.org
Posit numbers are an alternative to floating-point numbers. Posits extend floats by introducing regime bits
that allow for a higher precision around one, yet a wide dynamic range of representable numbers.
For further information see [posithub.org](https://posithub.org).

If this library doesn't support a desired functionality or for anything else, please raise an issue.
If this library doesn't support a desired functionality or for anything else, please
[raise an issue](https://github.com/milankl/SoftPosit.jl/issues).

Note: This library is not yet conform with the [2022 Standard for Posit Arithmetic](https://posithub.org/docs/posit_standard-2.pdf).

# Installation

Expand All @@ -20,15 +25,23 @@ where `]` opens the package manager. Then simply `using SoftPosit` which enables

# 8, 16 and 32bit posit formats

SoftPosit.jl emulates the following Posit number formats `Posit(n,es)`, with `n` number of bits including `es` exponent bits: Posit(8,0), Posit(16,1), Posit(32,2) as primitive types called
SoftPosit.jl emulates the following Posit number formats `Posit(n,es)`, with `n` number of bits including `es` exponent bits:
Posit(8,0), Posit(16,1), Posit(32,2) as primitive types called

Posit8, Posit16, Posit32

following the [draft standard](https://posithub.org/docs/posit_standard.pdf). Additionally, the following off-standard formats are defined as primitive types, which are internally stored as 32bit (the remaining bits are kept as zeros): Posit(8,1), Posit(8,2), Posit(16,1), Posit(16,2), Posit(24,1), and Posit(24,2) called `Posit8_1`, `Posit8_2`, `Posit16_1`, `Posit16_2`, `Posit24_1`, and `Posit24_2`.
following the [draft standard](https://posithub.org/docs/posit_standard.pdf) although this will be changed to always 2 exponent
bits to match the [2022 standard](https://posithub.org/docs/posit_standard-2.pdf). Additionally, the following formats are defined
as primitive types, which are internally stored as 32bit (the remaining bits are kept as zeros): Posit(8,1), Posit(8,2), Posit(16,1),
Posit(16,2), Posit(24,1), and Posit(24,2) called `Posit8_1`, `Posit8_2`, `Posit16_1`, `Posit16_2`, `Posit24_1`, and `Posit24_2`.

For all the types `Posit8, Posit16, Posit32, Posit8_2, Posit16_2, Posit24_2` conversions between Integers and Floats and basic arithmetic operations `+`, `-`, `*`, `/` and `sqrt` (among others) are defined. Unfortunately, `Posit8_1, Posit16_1, Posit24_1` are not yet fully supported by the underlying C library.
For all the types `Posit8, Posit16, Posit32, Posit8_2, Posit16_2, Posit24_2` conversions between Integers and Floats and
basic arithmetic operations `+`, `-`, `*`, `/` and `sqrt` (among others) are defined.
Unfortunately, `Posit8_1, Posit16_1, Posit24_1` are not fully supported by the underlying C library.

To support quires, `Quire8`, `Quire16` and `Quire32` are implemented as 32 / 128 / 512bit types for fused multiply-add and fused multiply-subtract. Additional math functions like `exp`,`log`,`sin`,`cos`,`tan` are defined via conversion to `Float64` (no support yet of the C library) and therefore do not have error-free rounding.
To support quires, `Quire8`, `Quire16` and `Quire32` are implemented as 32 / 128 / 512bit types for fused multiply-add and
fused multiply-subtract. Additional math functions like `exp`,`log`,`sin`,`cos`,`tan` are defined via conversion to `Float64`
(no support yet of the C library) and therefore do not have error-free rounding.

# Examples

Expand All @@ -54,10 +67,12 @@ please read [softposit_examples.ipynb](https://github.com/milankl/SoftPosit.jl/b

# Rounding mode

Following the posit standard, posits should underflow between [-minpos/2,0) and (0,minpos/2] and
never overflow in (-∞,-maxpos] and [maxpos,∞). Float ±infinity and Not-a-Number are mapped to posit
Not-a-Real NaR, and NaR is mapped back to NaN. In the current version v0.4 the implementaion of this
rounding mode make a few changes
Following the [2022 posit standard](https://posithub.org/docs/posit_standard-2.pdf),
posits should never underflow nor overflow. Consequently, `(0,minpos/2]` and `[-minpos/2,0)`
are mapped to `±minpos`, respectively, and `[maxpos,∞)` and `(-∞,-maxpos]` to `±maxpos`.
Float ±infinity and Not-a-Number are mapped to posit Not-a-Real NaR, and NaR is mapped back to NaN.
In the current version v0.4 the implementaion of this rounding mode make a few changes that
will be resolved in upcoming releases.

- **Posit8 and Posit32**: No underflow, all numbers in [-minpos,0) are mapped to -minpos (and (0,minpos] to minpos)
- **Posit16**: Underflow occurs at minpos/4 and overflow occurs at floatmax(Float32)/4
Expand Down
46 changes: 10 additions & 36 deletions src/SoftPosit.jl
Original file line number Diff line number Diff line change
@@ -1,43 +1,17 @@
module SoftPosit

import SoftPosit_jll
# For compatibility with previous versions of SofPosit.jl which used the name
# `SoftPositPath`.
const SoftPositPath = SoftPosit_jll.softposit
# import SoftPosit_jll
# # For compatibility with previous versions of SofPosit.jl which used the name
# # `SoftPositPath`.
# const SoftPositPath = SoftPosit_jll.softposit

export AbstractPosit, Posit8, Posit16, Posit32,
Posit8_1, Posit16_1, Posit24_1,
Posit8_2, Posit16_2, Posit24_2,
notareal, minusone,
AbstractQuire, Quire8, Quire16, Quire32, fms,
Posit16_old, Float32_old
export AbstractPosit, Posit8, Posit16, Posit32, Posit16_1,
notareal

import Base: Float64, Float32, Float16, Int32, Int64,
UInt8, UInt16, UInt32,
(+), (-), (*), (/), (<), (<=), (==), sqrt,
bitstring, round, one, zero, promote_rule, eps,
floatmin, floatmax, signbit, sign, isfinite,
nextfloat, prevfloat, fma,
exp, exp2, exp10, log, log2, log10, cos, sin, tan,
expm1,log1p

include("typedef.jl")
include("conversionFloatToPosit.jl")
include("conversionPositToFloat.jl")
include("conversionPositToPosit.jl")
include("conversionIntToPosit.jl")
include("conversionPositToInt.jl")
include("conversionHexBinToPosit.jl")
include("conversionBoolToPosit.jl")
include("conversionQuire.jl")
include("arithmetic.jl")
include("comparison.jl")
include("type_definitions.jl")
include("comparisons.jl")
include("constants.jl")
include("round.jl")
include("conversions.jl")
include("arithmetics.jl")
include("print.jl")
include("nextprevfloat.jl")
include("eps.jl")
include("quire.jl")
include("explog_trigonometric.jl")

end
64 changes: 0 additions & 64 deletions src/arithmetic.jl

This file was deleted.

49 changes: 49 additions & 0 deletions src/arithmetics.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# NEGATION via two's complement
Base.:(-)(x::T) where {T<:AbstractPosit} = reinterpret(T,-unsigned(x))

# SIGNBIT from the corresponding Int
Base.signbit(x::AbstractPosit) = signbit(signed(x))

# SIGN redefine, not x/|x| as in Base for floats
function Base.sign(x::T) where {T<:AbstractPosit}
iszero(x) && return zero(T)
isnan(x) && return notareal(T)
return signbit(x) ? minusone(T) : one(T)
end

# TWO ARGUMENT ARITHMETIC, addition, subtraction, multiplication, division via conversion
for op in (:(+), :(-), :(*), :(/))
@eval begin
Base.$op(x::T,y::T) where {T<:AbstractPosit} = convert(T,$op(float(x),float(y)))
end
end

# ONE ARGUMENT ARITHMETIC, sqrt, exp, log, etc. via conversion
for op in (:sqrt, :exp, :exp2, :exp10, :expm1, :log, :log2, :log10, :log1p,
:sin, :cos, :tan)
@eval begin
Base.$op(x::T) where {T<:AbstractPosit} = convert(T,$op(float(x)))
end
end

Base.sincos(x::AbstractPosit) = sin(x),cos(x) # not in eval loop because of convert

# complex trigonometric functions
for P in (:Posit8, :Posit16, :Posit16_1, :Posit32)
@eval begin
sin(x::Complex{$P}) = Complex{$P}(sin(Complex{Base.floattype($P)}(x)))
cos(x::Complex{$P}) = Complex{$P}(cos(Complex{Base.floattype($P)}(x)))
exp(x::Complex{$P}) = cos(im*x) - im*sin(im*x)
end
end

# nextfloat, prevfloat have a wrap-around behaviour nextfloat(maxpos) = NaR, nextfloat(NaR) = -maxpos
Base.nextfloat(x::T) where {T<:AbstractPosit} = reinterpret(T,reinterpret(Base.uinttype(T),x)+one(Base.uinttype(T)))
Base.prevfloat(x::T) where {T<:AbstractPosit} = reinterpret(T,reinterpret(Base.uinttype(T),x)-one(Base.uinttype(T)))

# precision (taken from lookup table)
eps(::Type{Posit8}) = reinterpret(Posit8,0x28)
eps(::Type{Posit16}) = reinterpret(Posit16,0x0a00)
eps(::Type{Posit16_1}) = reinterpret(Posit16_1,0x0100)
eps(::Type{Posit32}) = reinterpret(Posit32,0x00a0_0000)
eps(x::AbstractPosit) = max(x-prevfloat(x),nextfloat(x)-x)
34 changes: 0 additions & 34 deletions src/comparison.jl

This file was deleted.

9 changes: 9 additions & 0 deletions src/comparisons.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# == for posits only true if and only if types and bits match exactly (===, egality)
Base.:(==)(x::AbstractPosit,y::AbstractPosit) = x === y
Base.isnan(x::AbstractPosit) = x == notareal(x) # use isnan for "is NaR?" check
Base.isfinite(x::AbstractPosit) = ~isnan(x) # finite if not NaR
Base.iszero(x::AbstractPosit) = x == zero(x)

# COMPARISONS via two's complement (- of uints)
Base.:(<)(x::T,y::T) where {T<:AbstractPosit} = -unsigned(x) > -unsigned(y)
Base.:(<=)(x::T,y::T) where {T<:AbstractPosit} = -unsigned(x) >= -unsigned(y)

0 comments on commit d55118a

Please sign in to comment.