Skip to content

Conversation

arjenmarkus
Copy link
Member

The changes I made allow the code to work properly on a system that uses a different encoding than ASCII, for isntance systems that use EBCDIC.
Note: the conversion to lower case and upper case should be regarded with care! I am uncertain that they would work as intended.

Correct the is_lower function - I overlooked that one.
Solve a glitch in the function is_printable
@14NGiestas
Copy link
Member

14NGiestas commented Sep 16, 2020

The to_lower and to_upper functions assumes that the input char. is in ASCII. The EBCDIC support is trickier, since it's not compatible with ASCII table conversions.
EDIT: I just realized... the whole module is called stdlib_ascii it would be weird to assume or support other tables
EDIT2: It seems in the current code, both functions will work with EBCDIC chars reference here

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 17, 2020 via email

@ivan-pi
Copy link
Member

ivan-pi commented Sep 17, 2020

Thank you @arjenmarkus for fixing this.

I was aware of these issues since posting about the character validation functions at the Fortran Discourse (see https://fortran-lang.discourse.group/t/character-validation-functions/131) but hadn't taken the time to fix them. In fact at some point I would like to see all of these routines replaced by static lookup tables (https://github.com/ivan-pi/fortran-ascii/blob/master/fortran_ascii_bit.f90), as this solution appears to be the most performant one.

For the functions to_upper and to_lower, I've seen a dozen of different possibilities on comp.lang.fortran. The solution I used here was adapted from @certik's library: https://github.com/certik/fortran-utils/blob/master/src/utils.f90#L18
Are you worried about non-ascii characters or other encodings?

+1 to merge

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 18, 2020 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 18, 2020 via email

@LKedward
Copy link
Member

LKedward commented Sep 18, 2020

Hi @arjenmarkus, I noticed in the benchmark code you posted that you appear to be testing the same implementation twice as opposed to two different implementations - is this a typo? On my machine I see the lookup implementation as being almost two orders of magnitude slower than the shift implementation.
As an aside, the lookup approach can be implemented much more efficiently by using the character code directly as an index into a static table instead of using index. EDIT: Apologies, I see you've already considered this.

I am aware performance is not the primary concern for this PR or the stdlib reference implementation but I thought I would point it out.

@ivan-pi
Copy link
Member

ivan-pi commented Sep 18, 2020

So, a simple robust and general implementation should be enough, certainly for a first version.

I agree.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 18, 2020 via email

@14NGiestas
Copy link
Member

It makes sense since in the lookup method in the worst case scenario (letter 'z') has to check 26 chars, on the other side the iachar method is constant for all cases. This explains the difference in the execution time.

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me, thank you Arjen. I'm also in favor of removing extra sets of parentheses throughout, but don't feel strongly about it.

arjenmarkus and others added 2 commits September 24, 2020 21:00
Co-authored-by: Ian Giestas Pauli <iangiestaspauli@gmail.com>
Co-authored-by: Ian Giestas Pauli <iangiestaspauli@gmail.com>
@arjenmarkus
Copy link
Member Author

Closing this after committing the suggestions. I will make a few more edits (especially remove the extra parentheses to make the code a bit more consistent) and I will also implement the to_lower and to_upper functions differently.

@milancurcic
Copy link
Member

@arjenmarkus Did you mean to merge it? You only closed the PR without merging it.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 24, 2020 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 24, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Sep 24, 2020

Hm, I see a failure from CI, but I do not see what went wrong - the code I committed works fine on my machine. Or at least I am convinced it does. How can I see the reason? Op do 24 sep. 2020 om 21:28 schreef Arjen Markus

You can go in Actions to find the details. Here is the link

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 24, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Sep 24, 2020 via email

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 24, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Sep 24, 2020

Well, I see that you opened a new PR #235 from the same branch update_ascii, and that the CI for this PR #235 is fine. So, I would say that the problem is solved by himself (at least temporarily) ;)
Furthermore this PR cannot be reopened because #235 exists.

@arjenmarkus
Copy link
Member Author

arjenmarkus commented Sep 24, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Sep 24, 2020

So: once you have made a pull request for a particular branch, you have to create a new branch to continue work. Seems reasonably simple.

I am not sure what you mean and what were your intentions. However, for me, #235 seems to be #233 with some additional commits that were pushed to update_ascii after opening #233.
If #233 was not closed (before pushing the additonal commits to update_ascii), these additional commits would have appear in #233. And #235 would have not been needed (since #233 would include the additional commits).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants