Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update fortran keywords #3656

Merged
merged 1 commit into from Apr 21, 2024
Merged

update fortran keywords #3656

merged 1 commit into from Apr 21, 2024

Conversation

cx384
Copy link
Contributor

@cx384 cx384 commented Oct 25, 2023

This fixes #3362.

I added the keywords from Fortran 2018 with the help of a python script (if this has to be done again in the future).
The current standard and keywords can be found here:
https://j3-fortran.org/doc/year/18/18-007r1.pdf
https://github.com/cdslaborg/FortranKeywords

Script
old_f = open("filetypes.fortran", "r")
old_lines = old_f.read().splitlines()

old_primary = set()
old_intrinsic_functions = set()
old_user_functions = set()
for line in old_lines:
    s = line.split("=")
    if s[0] == "primary":
        for w in s[1].split():
            old_primary.add(w)
    elif s[0] == "intrinsic_functions":
        for w in s[1].split():
            old_intrinsic_functions.add(w)
    elif s[0] == "user_functions":
        for w in s[1].split():
            old_user_functions.add(w)
old_f.close()


f = open("FortranKeywords2018.txt", "r")
lines = f.read().splitlines()

primary = set()
intrinsic_functions = set()
user_functions = set()

types_primary = {"specifier", "statement", "attribute"}
types_intrinsic_functions = {
    "function_elemental",
    "function_transformational",
    "function_inquiry",
    "function_void",
    "subroutine",
    "subroutine_atomic",
    "subroutine_pure",
    "constant",
    "subroutine_collective",
    "type_derived",
    "module_intrinsic",
}
types_user_functions = {}

old_maping = {
    "primary": set(),
    "intrinsic_functions": set(),
    "user_functions": set(),
    "unspecified": set(),
}

for line in lines:
    s = line.split(",")
    ws = s[0].replace("(", " ").replace(")", " ").replace(".", " ").split()

    # check old maping
    for w in ws:
        wl = w.lower()
        if wl in old_intrinsic_functions and not wl in old_primary :
            old_maping["intrinsic_functions"].add(s[1])
        elif wl in old_primary and not wl in old_intrinsic_functions:
            old_maping["primary"].add(s[1])
        # elif wl in old_user_functions:
            # old_maping["user_functions"].add(s[1])
        old_maping["unspecified"].add(s[1])

    old_maping["unspecified"] = (
        old_maping["unspecified"]
        - old_maping["primary"]
        - old_maping["intrinsic_functions"]
        - old_maping["user_functions"]
    )

    if s[1] in types_primary:
        for w in ws:
            primary.add(w.lower())
    elif s[1] in types_intrinsic_functions:
        for w in ws:
            intrinsic_functions.add(w.lower())
    elif s[1] in types_user_functions:
        for w in ws:
            user_functions.add(w.lower())
    else:
        print("not added:", s)
        if s[0] in old_primary:
            print("old_primary")
        if s[0] in old_primary:
            print("old_intrinsic_functions")
f.close()

old_dubs = old_intrinsic_functions.intersection(old_primary)
wrong = (
    primary.intersection(old_intrinsic_functions).union(
        intrinsic_functions.intersection(old_primary)
    )
    - old_dubs
)
print("wrong:", wrong)
# print("old duplicates:", old_dubs)
# dubs = primary.intersection(intrinsic_functions)-old_dubs
# print("dubs:",' '.join(dubs))

print("\nmissing primary:", " ".join(primary - old_primary))
print(
    "\nmissing intrinsic_functions:",
    " ".join(intrinsic_functions - old_intrinsic_functions),
)

print("\n approximate old maping:", old_maping)

# print("legacy:")
# print("primary:", old_primary-primary)
# print("intrinsic_functions:", old_intrinsic_functions-intrinsic_functions)

Some of the keywords are added to both primary and intrinsic_functions, but this was also the case in the past. I mapped keyword categories (from the standard) to "primary" and "intrinsic_functions" like it has been in the past, and new categories like "module_intrinsic" have been added to the most fitting one.
As far as I can tell, it is correct and all keywords are handled. Also, I didn't remove any of the existing keywords to keep compatibility to older Fortran versions.

Category Mapping
primary = {"specifier", "statement", "attribute"}
intrinsic_functions = {
    "function_elemental",
    "function_transformational",
    "function_inquiry",
    "function_void",
    "subroutine",
    "subroutine_atomic",
    "subroutine_pure",
    "constant",
    "subroutine_collective",
    "type_derived",
    "module_intrinsic",
}

@waynelapierre
Copy link

waynelapierre commented Jan 10, 2024

When will this PR be merged?

@elextr
Copy link
Member

elextr commented Jan 10, 2024

"Somebody" who knows Fortran should check it, but none of the devs use Fortran AFAIK so its waiting for any contributor.

Copy link

@gnikit gnikit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look fine, albeit a bit hard to read.

@elextr
Copy link
Member

elextr commented Feb 19, 2024

@gnikit thanks.

I just noticed a comment on the OP that some names are in more than one list.

That won't break anything, but it will have performance effects. The lexer has to search all lists for every identifier it finds (so all your variable names, function names, etc as well as keywords, it doesn't know they are keywords until it finds them in a list) so duplicating names makes lists bigger which will slow the lexer down searching them, especially for those identifiers not in a list (it does a linear search of names with the same start character in each list).

Maybe somebody might want to "optimise" it, it won't hurt to delay for a bit since there is no release on the horizon.

@gnikit
Copy link

gnikit commented Feb 19, 2024

@elextr Truth be told a lot of these intrinsic variables/methods should be conditional to module imports, exactly in order to alleviate pressure from the lexer/parser. In the fortls language server, we only start parsing for certain tokens only if the modules are USEd. Our serialised data looks somewhat different than yours see. In the simple grammar rules, I think we don't parse certain of these variables/methods at all.

@elextr
Copy link
Member

elextr commented Feb 19, 2024

these intrinsic variables/methods should be conditional to module imports,

The code doing highlighting is called lexers because they are just that, pure syntax, no semantics is available, so no knowledge of what is imported. Its just whats in the lists.

There are some experiments with various LSPs (C/C++, Go, Python, not Fortran, remember what I said above about no devs using it) which replace the lexers for some things, but not basic syntax. But its not merged yet and it doesn't replace the Lexers.

@gnikit
Copy link

gnikit commented Feb 23, 2024

Completely understandable @elextr about the lexer. We have a GSoC project this year that aims at adding highlighting support in our language server so fingers crossed Fortran can soon join the experiment.

@eht16
Copy link
Member

eht16 commented Apr 21, 2024

Don't we want to merge this anyway? Maybe the performance impact is tolerable and then it is still an improvement for Fortran users.

@elextr
Copy link
Member

elextr commented Apr 21, 2024

@eht16 sure, doesn't break anything if nobody wants to fix it.

@eht16 eht16 merged commit 2c42615 into geany:master Apr 21, 2024
@b4n b4n added this to the 2.1 milestone Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fortran function random_init not highlighted
7 participants