Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uint64_t is not an alphabet? #200

Open
14 tasks
marehr opened this issue Aug 25, 2020 · 5 comments
Open
14 tasks

uint64_t is not an alphabet? #200

marehr opened this issue Aug 25, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@marehr
Copy link
Member

marehr commented Aug 25, 2020

Description

#include <seqan3/alphabet/all.hpp>

static_assert(seqan3::alphabet<uint64_t>);

fails to compile.

The code states the following reason:

 * \attention
 * Note that `uint64_t` is absent from the list, because there is no corresponding
 * character type.

Is that really an issue? We have at least the alphabet_variant alphabet that can have a larger rank than a char can represent.

Some problems: size of uint64_t is not representable in the same type => size = 0.

Acceptance Criteria

  • test 1
  • test 2
  • test 3

Tasks

  • task 1
  • task 2
  • task 3

Definition of Done

  • Implementation and design approved
  • Unit tests pass
  • Test coverage = 100%
  • Microbenchmarks added and/or affected microbenchmarks < 5% performance drop
  • API documentation added
  • Tutorial/teaching material added
  • Test suite compiles in less than 30 seconds (on travis)
  • Changelog entry added
@marehr marehr added the bug Something isn't working label Aug 25, 2020
@smehringer
Copy link
Member

Is that really an issue?

What's the result of seqan3::to_char(std::numeric_limits<uint64_t>::max()) gonna be then?

@marehr
Copy link
Member Author

marehr commented Aug 25, 2020

A truncated char. We can also just define it as a semi_alphabet.

@marehr
Copy link
Member Author

marehr commented Apr 12, 2021

Core-Meeting 2021-04-12:

We talked about some solution approaches, like changing the meaning of alphabet_size:

alphabet_size = uint16_t; // 256
rank_type = uint8_t
rank_type max_rank = 255; // numeric_limits::max
for (size_t rank = 0u; rank < alphabet_size; ++rank)

// be-aware: this is an endless-loop; case would be uint8_t for example
for (rank_type rank = 0u; rank <= max_rank; ++rank) // max_rank is constexpr, compiler should warn about endless-loop

for (alphabet && alph: enumerate_alphabet<alphabet_t>()) // 
{}

The compiler does not give any warning. (even though I saw one at some point in time)

Another approach would be to use compiler extensions, like [u]int128 to extend the size. The problem would be again having a generic way to iterate over the alphabet (enumerate_alphabet<alphabet_t>()).

@marehr
Copy link
Member Author

marehr commented May 3, 2021

This would allow utf-32 on 32bit machines. (And in general make our library compile on 32bit machines.)

@marehr
Copy link
Member Author

marehr commented May 18, 2021

Moved to Release 3.2.

@marehr marehr removed this from the 3.0.3 2nd Release Sprint milestone May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants