Skip to content

Rewrite Python standard library tags creation script for Python 3 #3039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 21, 2023

Conversation

eht16
Copy link
Member

@eht16 eht16 commented Dec 2, 2021

This is a followup of #2630 to fully port the scripts/create_py_tags.py script for generating tags for the Python standard library to Python 3.

While continuing on @claudep's work, I noticed plain porting is harder than to more or less rewrite the script. Now the script works by fully importing the modules, if possible, to use Python's inspect.Signature API to extract symbols. If this is not possible, the existing regular expression based parser is used as fallback.

Deprecated modules are ignored completely as well as a couple of special modules like the included Idle IDE and executable modules in general.

I'm using the resulting tags file since a few weeks and it feels fine, much better than before especially because of the better extracted argument lists of functions and methods.

@eht16 eht16 added this to the 1.39/2.0 milestone Dec 2, 2021
@techee
Copy link
Member

techee commented Dec 7, 2021

The question now is whether not to switch to the ctags file format for the languages which have identical implementation (and therefore kind letters) both in Geany and ctags. I don't know what the script does exactly but if we switched to the ctags file format, wouldn't it be sufficient to run the ctags binary on the corresponding directory with sources?

The "proprietary" half-binary format is still useful for our unit tests since it already contains ctags kinds mapped to our internal representation so we can verify this mapping is done correctly. But for the tag files shipped with Geany I think we are more or less ready to switch to the ctags file format and suggest users to use ctags to generate it.

@elextr
Copy link
Member

elextr commented Dec 7, 2021

ctags file format and suggest users to use ctags to generate it.

What about the other languages in c.c?

@techee
Copy link
Member

techee commented Dec 7, 2021

What about the other languages in c.c?

Yes, vala for instance is missing and those will still have to be generated by Geany if users want them. But all the parsers for the tags under geany/data/tags will be the upstream ones.

@techee
Copy link
Member

techee commented Dec 12, 2021

Maybe to clarify - as outlined here #3049 (comment) I would suggest switching to the ctags file format. That however doesn't mean that we have to necessarily use ctags to generate such files. We can still use Geany or whatever scripts to write the tag files, just in a different format (I think Geany processes includes and parses the included files automatically which I think ctags doesn't do and might be shame to lose that functionality). The ctags file format is pretty simple and generating it ourselves shouldn't be hard to do.

@eht16
Copy link
Member Author

eht16 commented Oct 27, 2022

I followed your suggestion and changed the output format of the tags file to ctags.
In my tests the tags worked but I don't know the format that well, so it would be cool if you could spend a look at it, @techee.

@techee
Copy link
Member

techee commented Oct 31, 2022

I followed your suggestion and changed the output format of the tags file to ctags.

Well, it isn't something I'm one hundred percent sure we should do, but rather something I wanted to discuss. Also, I had something different in mind - to use ctags directly to generate the tag files instead of doing it by ourselves in the script (so there wouldn't be the need for messing with the ctags file format on our side). I haven't checked what exactly the script does and whether something like this would be possible though - what do you think?

Also, if we want to use the ctags format, we should merge #3049, otherwise not all the fields are parsed correctly.

To the topic of pros/cons of using the ctags file format, these are the advantages I can think of:

  • we could use ctags directly to generate tag files as mentioned above
  • currently the tagmanager format doesn't escape characters 200-215 which could break tag file parsing (it is fixable though)
  • ctags file format is "standard" while the tagmanager format is "proprietary" to geany (and also binary which isn't very nice)

On the other the cons of the ctags format are:

  • the tag files are bigger
  • they are slower to parse
  • command line ctags may be less flexible in generating tag files than some specific-purpose script
  • if Geany ctags is out of sync with the ctags command-line that produces tags, we may not be able to read all of the tags

@eht16
Copy link
Member Author

eht16 commented Nov 13, 2022

I followed your suggestion and changed the output format of the tags file to ctags.

Well, it isn't something I'm one hundred percent sure we should do, but rather something I wanted to discuss. Also, I had something different in mind - to use ctags directly to generate the tag files instead of doing it by ourselves in the script (so there wouldn't be the need for messing with the ctags file format on our side). I haven't checked what exactly the script does and whether something like this would be possible though - what do you think?

Regarding whether to create the Python with ctags instead of this script:
I gave it a try and there are a couple of differences and problems with ctags:

  • ctags will find way more tags, many tags we are not interested in for a global tags file like private methods and special methods (_* and __*) and variables. Those could be filtered out afterwards though.
  • ctags will add the path and search pattern or line numbers of the source file which doesn't make sense for global tags. Those could be filtered out afterwards though.
  • Classes found by ctags have no signature (the one of the corrsponding __init__ method) while the ones of my script have
  • ctags will include deprecated tags as well while my script filter them out (even more than the manually defined ones)

Overall, for me, the generated tags of the script look cleaner and more sane than the ctags ones.

For reference, the ctags command I tried:

ctags \
	--exclude=encodings \
	--exclude=dist-packages \
	--exclude=distutils \
	--exclude=idlelib \
	--exclude=ensurepip/_bundled \
	--exclude=test \
	--exclude=Tools \
	--exclude=turtledemo \
	--exclude=site-packages \
	--exclude=turtle.py \
	--exclude=asyncio/windows_utils.py \
	--exclude=asyncio/windows_events.py \
	--exclude=antigravity.py \
	--exclude=ctypes/wintypes.py \
	--recurse \
	--languages=Python \
	--excmd=number \
	--totals=extra \
	--fields=+tS /home/enrico/.pyenv/versions/3.10.8/lib/python3.10

While playing with the ctags command, I noticed I erroneously set a class' base as parent which is wrong in this context and methods used the wrong kind. Those are fixed.

Also, if we want to use the ctags format, we should merge #3049, otherwise not all the fields are parsed correctly.

+1

To the topic of pros/cons of using the ctags file format, these are the advantages I can think of:

* we could use `ctags` directly to generate tag files as mentioned above

* currently the tagmanager format doesn't escape characters 200-215 which could break tag file parsing (it is fixable though)

* `ctags` file format is "standard" while the tagmanager format is "proprietary" to geany (and also binary which isn't very nice)

On the other the cons of the ctags format are:

* the tag files are bigger

* they are slower to parse

* command line `ctags` may be less flexible in generating tag files than some specific-purpose script

* if Geany ctags is out of sync with the `ctags` command-line that produces tags, we may not be able to read all of the tags

I'd prefer the ctags format because, as you say, it's the standard format and probably less error prone than the custom tagmanager format.

@techee
Copy link
Member

techee commented Nov 14, 2022

I'd prefer the ctags format because, as you say, it's the standard format and probably less error prone than the custom tagmanager format.

OK, it probably makes sense to use the python script also because of all the additional problems you mentioned.

Classes found by ctags have no signature (the one of the corrsponding init method) while the ones of my script have

Curious about this one - how does it behave when there are multiple corresponding __init__ functions with a different signature? Will it pick just one of them for calltip? I'm asking because we now have this code

constructor_method = tm_parser_get_constructor_method(tag->lang);

which can look up all __init__ functions for a class and display a multi-calltip (with arrows on the side to scroll among the found calltips) containing all the constructors.

One more thing - wouldn't it make sense to factor-out the tag writing code to a separate file so it can be reused by other tag writing scripts? For instance, there's also create_php_tags.py which I think could reuse this code too. And maybe this tag-writing code could be configurable to either output the ctags format or the tag manager format - I can imagine that having the tagmanager format could be useful for debugging. What do you think?

@eht16
Copy link
Member Author

eht16 commented Nov 15, 2022

Classes found by ctags have no signature (the one of the corrsponding init method) while the ones of my script have

Curious about this one - how does it behave when there are multiple corresponding __init__ functions with a different signature? Will it pick just one of them for calltip? I'm asking because we now have this code

constructor_method = tm_parser_get_constructor_method(tag->lang);

which can look up all __init__ functions for a class and display a multi-calltip (with arrows on the side to scroll among the found calltips) containing all the constructors.

In Python, there is no point in having multiple __init__ methods. While technically possible, it makes no sense because the latter method overrides the previous one. The script here would probably pick one of them, I don't know which one, it is decided by the inspect library.

One more thing - wouldn't it make sense to factor-out the tag writing code to a separate file so it can be reused by other tag writing scripts? For instance, there's also create_php_tags.py which I think could reuse this code too. And maybe this tag-writing code could be configurable to either output the ctags format or the tag manager format - I can imagine that having the tagmanager format could be useful for debugging. What do you think?

Sure we can do that. But IMO both ideas would be better handled in seperate PRs to not blow this one even more.

@techee
Copy link
Member

techee commented Nov 15, 2022

In Python, there is no point in having multiple init methods. While technically possible, it makes no sense because the latter method overrides the previous one. The script here would probably pick one of them, I don't know which one, it is decided by the inspect library.

Ah, OK, I thought you could have __init__(self, a) and __init__(self, a, b) but after checking now, there can only be one __init__() in python.

Sure we can do that. But IMO both ideas would be better handled in seperate PRs to not blow this one even more.

Yeah, sure.

@techee
Copy link
Member

techee commented Nov 16, 2022

In my tests the tags worked but I don't know the format that well, so it would be cool if you could spend a look at it, @techee.

I just had a look and it looks good to me.

@eht16
Copy link
Member Author

eht16 commented May 7, 2023

I just cleaned the commit history and would like to merge this in a few days if there are no objections.

Copy link
Member

@b4n b4n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested or properly reviewed, but I trust you :)

# If called without command line arguments, a preset of common Python libs is used.
#
# WARNING
# Be aware that running this script will actually *import* modules in the specified directory
# Be aware that running this script will actually *import* all modules given on the command line
# or in the standard library path of your Python installation. Dependent on what Python modules
# you have installed, this might not be want you want and can have weird side effects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[…] what* you want […]

not that it changed in this PR though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, someone actually read the docs :). Thank you for spotting, fixed.

# Parses all files given on command line for Python classes or functions and write
# them into data/tags/std.py.tags (internal tagmanager format).
# Parses all files in the directories given on command line for Python classes or functions and
# write them into data/tags/std.py.tags (internal tagmanager format).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's not in tagmanager format anymore, is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting, fixed.

@eht16 eht16 merged commit d6ce258 into geany:master May 21, 2023
@eht16 eht16 deleted the py3_tags_v2 branch May 21, 2023 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants