Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounded integer parsing #72

Closed
studoot opened this issue May 4, 2022 · 3 comments
Closed

Add bounded integer parsing #72

studoot opened this issue May 4, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@studoot
Copy link

studoot commented May 4, 2022

I'm trying out lexy with a re-implementation of a parser I've already got which, in part, parses register names of the form r<0..31> and f<0..15> - i.e. there are 32 general purpose registers (named r0, r1, ... r31) and 16 floating point registers.

I'd like to verify that the register number is in range when parsed, so I'd like to have something like lexy::code_point, which has traits that check the parsed integer is in range. I've come up with the following generic bounded integer struct (kind of modelled on unbounded).

template <typename T, T Min, T Max>
struct bounded
{
};
template <typename T, T Min, T Max>
struct lexy::integer_traits<bounded<T, Min, Max>>
{
   using type                       = typename integer_traits<T>::type;
   static constexpr auto is_bounded = true;

   template <int Radix>
   static constexpr void add_digit_unchecked(type& result, unsigned digit)
   {
      integer_traits<T>::template add_digit_unchecked<Radix>(result, digit);
   }
   template <int Radix>
   static constexpr std::size_t max_digit_count = lexy::_digit_count(Radix, Max);

   template <int Radix>
   static constexpr bool add_digit_checked(type& result, unsigned digit)
   {
      return integer_traits<T>::add_digit_checked<Radix>(result, digit) && result >= Min && result <= Max;
   }
};

This can be used like this:

namespace grammar
{
   namespace dsl = lexy::dsl;

   template <char Type, uint8_t RegCount>
   struct reg_t
   {
      using reg_num_t = bounded<uint8_t, 0, RegCount-1>;
      static constexpr auto rule = dsl::ascii::case_folding(dsl::lit_c<Type>)
                                 + dsl::integer<reg_num_t>(dsl::digits<dsl::decimal>);
      static constexpr auto value = lexy::forward<std::uint8_t>;
   };
   using r_reg = reg_t<'r', 32>;
   using f_reg = reg_t<'f', 16>;
} // namespace grammar

Is this something that would make a useful addition to lexy? If so, I'm happy enough to create a PR based on this...

@foonathan
Copy link
Owner

You're just needing a lower upper limit, right? If so, I'd be happy with lexy::bounded<T, Max>.

Your proposed bounded<T, Min, Max> has a couple of issues. The first is a bug: add_digit_checked() is only called for the last digit: if the number has fewer than max digits, it won't be called and as such, the Min check never happens. The fix would require an additional validate function that is called by the parser after the entire number has been constructed. The second is philosophical: the only reason dsl::integer does overflow checking is that the digits won't fit into the integer otherwise - there is nothing it can do. In principle, it would be cleaner if lexy doesn't need to produce any overflow errors, those aren't really syntax errors, but more semantic errors that should be handled at a later stage.

As such,I feel that checking whether e.g. a register is out of bounds isn't something that lexy should do, it's something that should be handled later, just like any other semantic errors. I'm already making an exception for overflow in general and lexy::code_point in particular, so checking for a maximal value is probably okay, but I don't like the min check. Does it make sense to you what I'm saying? :D

@studoot
Copy link
Author

studoot commented May 5, 2022

Your proposed bounded<T, Min, Max> has a couple of issues. The first is a bug: add_digit_checked() is only called for the last digit: if the number has fewer than max digits, it won't be called and as such, the Min check never happens.

You're quite right - I hadn't fully thought that part of it through, and as I hadn't done any test cases for the minimum bound checking, I hadn't discovered it that way yet!

As such,I feel that checking whether e.g. a register is out of bounds isn't something that lexy should do, it's something that should be handled later, just like any other semantic errors. I'm already making an exception for overflow in general and lexy::code_point in particular, so checking for a maximal value is probably okay, but I don't like the min check. Does it make sense to you what I'm saying? :D

Yeah, that makes sense - I think my preference for doing some semantic checking within the parser (and potentially getting better locality of where the error occurred within the input text) probably comes from ANTLR's semantic predicates, as ANTLR was the first parser generator I used. And, to be honest, most of the range checking I want to do in a parser tends to be of unsigned integers, so the Min bound is less useful.

I'll put together a PR with a bounded<T, Max> and take it from there.

@foonathan
Copy link
Owner

I'll put together a PR with a bounded<T, Max> and take it from there.

Thanks, please also try and use it to implement the specialization for lexy::code_point.

@foonathan foonathan added the enhancement New feature or request label May 7, 2022
@foonathan foonathan added help wanted Extra attention is needed good first issue Good for newcomers labels Jun 20, 2022
@foonathan foonathan removed help wanted Extra attention is needed good first issue Good for newcomers labels Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants