Add bounded integer parsing #72

studoot · 2022-05-04T16:15:58Z

I'm trying out lexy with a re-implementation of a parser I've already got which, in part, parses register names of the form r<0..31> and f<0..15> - i.e. there are 32 general purpose registers (named r0, r1, ... r31) and 16 floating point registers.

I'd like to verify that the register number is in range when parsed, so I'd like to have something like lexy::code_point, which has traits that check the parsed integer is in range. I've come up with the following generic bounded integer struct (kind of modelled on unbounded).

template <typename T, T Min, T Max>
struct bounded
{
};
template <typename T, T Min, T Max>
struct lexy::integer_traits<bounded<T, Min, Max>>
{
   using type                       = typename integer_traits<T>::type;
   static constexpr auto is_bounded = true;

   template <int Radix>
   static constexpr void add_digit_unchecked(type& result, unsigned digit)
   {
      integer_traits<T>::template add_digit_unchecked<Radix>(result, digit);
   }
   template <int Radix>
   static constexpr std::size_t max_digit_count = lexy::_digit_count(Radix, Max);

   template <int Radix>
   static constexpr bool add_digit_checked(type& result, unsigned digit)
   {
      return integer_traits<T>::add_digit_checked<Radix>(result, digit) && result >= Min && result <= Max;
   }
};

This can be used like this:

namespace grammar
{
   namespace dsl = lexy::dsl;

   template <char Type, uint8_t RegCount>
   struct reg_t
   {
      using reg_num_t = bounded<uint8_t, 0, RegCount-1>;
      static constexpr auto rule = dsl::ascii::case_folding(dsl::lit_c<Type>)
                                 + dsl::integer<reg_num_t>(dsl::digits<dsl::decimal>);
      static constexpr auto value = lexy::forward<std::uint8_t>;
   };
   using r_reg = reg_t<'r', 32>;
   using f_reg = reg_t<'f', 16>;
} // namespace grammar

Is this something that would make a useful addition to lexy? If so, I'm happy enough to create a PR based on this...

The text was updated successfully, but these errors were encountered:

foonathan · 2022-05-04T16:33:17Z

You're just needing a lower upper limit, right? If so, I'd be happy with lexy::bounded<T, Max>.

Your proposed bounded<T, Min, Max> has a couple of issues. The first is a bug: add_digit_checked() is only called for the last digit: if the number has fewer than max digits, it won't be called and as such, the Min check never happens. The fix would require an additional validate function that is called by the parser after the entire number has been constructed. The second is philosophical: the only reason dsl::integer does overflow checking is that the digits won't fit into the integer otherwise - there is nothing it can do. In principle, it would be cleaner if lexy doesn't need to produce any overflow errors, those aren't really syntax errors, but more semantic errors that should be handled at a later stage.

As such,I feel that checking whether e.g. a register is out of bounds isn't something that lexy should do, it's something that should be handled later, just like any other semantic errors. I'm already making an exception for overflow in general and lexy::code_point in particular, so checking for a maximal value is probably okay, but I don't like the min check. Does it make sense to you what I'm saying? :D

studoot · 2022-05-05T12:24:13Z

Your proposed bounded<T, Min, Max> has a couple of issues. The first is a bug: add_digit_checked() is only called for the last digit: if the number has fewer than max digits, it won't be called and as such, the Min check never happens.

You're quite right - I hadn't fully thought that part of it through, and as I hadn't done any test cases for the minimum bound checking, I hadn't discovered it that way yet!

As such,I feel that checking whether e.g. a register is out of bounds isn't something that lexy should do, it's something that should be handled later, just like any other semantic errors. I'm already making an exception for overflow in general and lexy::code_point in particular, so checking for a maximal value is probably okay, but I don't like the min check. Does it make sense to you what I'm saying? :D

Yeah, that makes sense - I think my preference for doing some semantic checking within the parser (and potentially getting better locality of where the error occurred within the input text) probably comes from ANTLR's semantic predicates, as ANTLR was the first parser generator I used. And, to be honest, most of the range checking I want to do in a parser tends to be of unsigned integers, so the Min bound is less useful.

I'll put together a PR with a bounded<T, Max> and take it from there.

foonathan · 2022-05-05T13:05:30Z

I'll put together a PR with a bounded<T, Max> and take it from there.

Thanks, please also try and use it to implement the specialization for lexy::code_point.

foonathan added the enhancement New feature or request label May 7, 2022

foonathan added help wanted Extra attention is needed good first issue Good for newcomers labels Jun 20, 2022

foonathan closed this as completed in 1a99b36 Jul 13, 2022

foonathan removed help wanted Extra attention is needed good first issue Good for newcomers labels Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bounded integer parsing #72

Add bounded integer parsing #72

studoot commented May 4, 2022

foonathan commented May 4, 2022

studoot commented May 5, 2022 •

edited

foonathan commented May 5, 2022

Add bounded integer parsing #72

Add bounded integer parsing #72

Comments

studoot commented May 4, 2022

foonathan commented May 4, 2022

studoot commented May 5, 2022 • edited

foonathan commented May 5, 2022

studoot commented May 5, 2022 •

edited