Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix capitalization among headings in documentation files #32550

Open
tonywu1999 opened this issue Mar 9, 2020 · 48 comments
Open

Fix capitalization among headings in documentation files #32550

tonywu1999 opened this issue Mar 9, 2020 · 48 comments

Comments

@tonywu1999
Copy link
Contributor

tonywu1999 commented Mar 9, 2020

In #26933, we made the capitalization of titles consistent. For example, a title used to be capitalized like, "This is the Section Title", and many of the titles in the pandas documentation was changed to a correct format, like "This is the section title".

In #31114, we made a script called scripts/validate_rst_title_capitalization.py that extracts all titles in the documentation, making sure that only the first letter of the sentence is uppercase, or words defined in a short list, like Series, DataFrame, etc. The script also outputs how to fix the title as well.

We validated capitalization is correct by integrating this script into CI (continuous integration). The idea is that we should run this script through ci/code_checks.sh, and when title capitalization errors show up on CI, the user should fix those errors on the specified files.

To verify the code is working on your side, the command below instructs the program to validate the doc/source/development/contributing.rst file. There should be no output from this command as this file as no capitalization errors:

./scripts/validate_rst_title_capitalization.py doc/source/development/contributing.rst

This command below instructs the program to validate both doc/source/index.rst and doc/source/development/policies.rst files.

./scripts/validate_rst_title_capitalization.py doc/source/index.rst doc/source/development/policies.rst

This command produces the output below:

doc/source/development/policies.rst:9:Heading capitalization formatted incorrectly. Please correctly capitalize "Version Policy" to "Version policy" 
doc/source/development/policies.rst:51:Heading capitalization formatted incorrectly. Please correctly capitalize "Python Support" to "Python support"

The goal of this issue is to correct the title capitalization of all files in the pandas documentation.
In order to see all titles that need to be validated in the documentation folder, one should run the following command below on the command line.

./scripts/validate_rst_title_capitalization.py doc/source

This program validates all RST files in the doc/source folder. Once all titles are all correctly validated, we would like to add the above command into the ci/code_checks.sh file.

Here's a checklist of all the files that had at least one incorrectly capitalized heading:

- [ ] doc/source/user_guide/timedeltas.rst
- [ ] doc/source/whatsnew/v0.7.0.rst
- [ ] doc/source/whatsnew/v0.23.4.rst
- [ ] doc/source/whatsnew/v0.6.0.rst
- [ ] doc/source/whatsnew/v1.0.2.rst
- [ ] doc/source/whatsnew/v0.18.0.rst
- [ ] doc/source/whatsnew/v0.16.2.rst
- [ ] doc/source/whatsnew/v0.7.1.rst
- [ ] doc/source/whatsnew/v0.8.0.rst
- [ ] doc/source/user_guide/integer_na.rst
- [ ] doc/source/reference/io.rst
- [ ] doc/source/user_guide/computation.rst
- [ ] doc/source/whatsnew/v0.16.0.rst
- [ ] doc/source/whatsnew/v0.23.2.rst
- [ ] doc/source/whatsnew/v0.12.0.rst
- [ ] doc/source/getting_started/10min.rst
- [ ] doc/source/user_guide/advanced.rst
- [ ] doc/source/reference/arrays.rst
- [ ] doc/source/development/maintaining.rst
- [ ] doc/source/user_guide/groupby.rst
- [ ] doc/source/user_guide/cookbook.rst
- [ ] doc/source/development/developer.rst
- [ ] doc/source/development/meeting.rst
- [ ] doc/source/getting_started/intro_tutorials/03_subset_data.rst
- [ ] doc/source/whatsnew/v0.4.x.rst
- [ ] doc/source/whatsnew/v0.16.1.rst
- [ ] doc/source/whatsnew/v1.0.0.rst
- [ ] doc/source/whatsnew/v0.23.1.rst
- [ ] doc/source/getting_started/tutorials.rst
- [ ] doc/source/reference/series.rst
- [ ] doc/source/getting_started/intro_tutorials/02_read_write.rst
- [ ] doc/source/whatsnew/v0.6.1.rst
- [ ] doc/source/whatsnew/v0.13.1.rst
- [ ] doc/source/whatsnew/v0.21.0.rst
- [ ] doc/source/reference/frame.rst
- [ ] doc/source/whatsnew/v0.20.0.rst
- [ ] doc/source/getting_started/intro_tutorials/09_timeseries.rst
- [ ] doc/source/whatsnew/index.rst
- [ ] doc/source/user_guide/merging.rst
- [ ] doc/source/whatsnew/v0.18.1.rst
- [ ] doc/source/user_guide/enhancingperf.rst
- [ ] doc/source/development/contributing_docstring.rst
- [ ] doc/source/whatsnew/v0.9.0.rst
- [ ] doc/source/whatsnew/v0.25.2.rst
- [ ] doc/source/development/extending.rst
- [ ] doc/source/reference/window.rst
- [ ] doc/source/whatsnew/v0.7.3.rst
- [ ] doc/source/user_guide/options.rst
- [ ] doc/source/ecosystem.rst
- [ ] doc/source/getting_started/intro_tutorials/01_table_oriented.rst
- [ ] doc/source/user_guide/categorical.rst
- [ ] doc/source/whatsnew/v0.14.1.rst
- [ ] doc/source/whatsnew/v0.19.0.rst
- [ ] doc/source/whatsnew/v0.20.2.rst
- [ ] doc/source/whatsnew/v0.24.0.rst
- [ ] doc/source/development/roadmap.rst
- [ ] doc/source/whatsnew/v0.17.0.rst
- [ ] doc/source/user_guide/boolean.rst
- [ ] doc/source/getting_started/comparison/comparison_with_r.rst
- [ ] doc/source/whatsnew/v0.17.1.rst
- [ ] doc/source/whatsnew/v0.22.0.rst
- [ ] doc/source/reference/indexing.rst
- [ ] doc/source/user_guide/missing_data.rst
- [ ] doc/source/getting_started/install.rst
- [ ] doc/source/user_guide/index.rst
- [ ] doc/source/user_guide/visualization.rst
- [ ] doc/source/getting_started/comparison/comparison_with_stata.rst
- [ ] doc/source/whatsnew/v0.19.1.rst
- [ ] doc/source/whatsnew/v0.15.1.rst
- [ ] doc/source/whatsnew/v0.10.0.rst
- [ ] doc/source/whatsnew/v0.19.2.rst
- [ ] doc/source/whatsnew/v0.25.3.rst
- [ ] doc/source/user_guide/gotchas.rst
- [ ] doc/source/whatsnew/v0.14.0.rst
- [ ] doc/source/user_guide/reshaping.rst
- [ ] doc/source/reference/groupby.rst
- [ ] doc/source/whatsnew/v0.23.3.rst
- [ ] doc/source/user_guide/timeseries.rst
- [ ] doc/source/whatsnew/v0.9.1.rst
- [ ] doc/source/getting_started/comparison/comparison_with_sql.rst
- [ ] doc/source/whatsnew/v0.24.1.rst
- [ ] doc/source/reference/index.rst
- [ ] doc/source/development/policies.rst
- [ ] doc/source/whatsnew/v0.21.1.rst
- [ ] doc/source/whatsnew/v0.20.3.rst
- [ ] doc/source/development/code_style.rst
- [ ] doc/source/user_guide/sparse.rst
- [ ] doc/source/whatsnew/v0.24.2.rst
- [ ] doc/source/whatsnew/v0.15.2.rst
- [ ] doc/source/whatsnew/v1.1.0.rst
- [ ] doc/source/reference/offset_frequency.rst
- [ ] doc/source/whatsnew/v1.0.1.rst
- [ ] doc/source/getting_started/basics.rst
- [ ] doc/source/whatsnew/v0.5.0.rst
- [ ] doc/source/user_guide/text.rst
- [ ] doc/source/user_guide/indexing.rst
- [ ] doc/source/whatsnew/v0.11.0.rst
- [ ] doc/source/whatsnew/v0.8.1.rst
- [ ] doc/source/getting_started/comparison/comparison_with_sas.rst
- [ ] doc/source/whatsnew/v0.23.0.rst
- [ ] doc/source/user_guide/io.rst
- [ ] doc/source/whatsnew/v0.25.1.rst
- [ ] doc/source/whatsnew/v0.13.0.rst
- [ ] doc/source/whatsnew/v0.25.0.rst
- [ ] doc/source/whatsnew/v0.15.0.rst
- [ ] doc/source/whatsnew/v0.10.1.rst
@datapythonista
Copy link
Member

@tonywu1999 do you mind editing the description and providing more context? Imagine a random user wanting to contribute to pandas lands here. We would like to explain what's the problem, why it's useful to fix it, and step by step information on what to do (e.g. We want to add fixes files to ci/code_checks.sh).

Also, if you want to get the list of files to check, and add it in the description (you can use - [ ] docs/source/whatever.rst, so we can easily check the ones that are fixed).

Thanks!

@themien
Copy link
Contributor

themien commented Mar 9, 2020

take

@themien
Copy link
Contributor

themien commented Mar 10, 2020

@tonywu1999 working on the issue I am getting some outputs that I am not sure are valid. If I run the script on /doc/source/whatsnew/v0.25.0.rst for example I get this a part of the output:
/doc/source/whatsnew/v0.25.0.rst:561:Heading capitalization formatted incorrectly. Please correctly capitalize "Indexing an IntervalIndex with Interval objects" to "Indexing an intervalindex with Interval objects"
or:
/doc/source/whatsnew/v0.25.0.rst:1087:Heading capitalization formatted incorrectly. Please correctly capitalize "-" to ""

Can you confirm that this is an expected output?

@datapythonista
Copy link
Member

We just developed this validation script, so it's expected that we find some false positives. Can you find where this error is being generated, so we can see what's the problem?

@tonywu1999
Copy link
Contributor Author

tonywu1999 commented Mar 10, 2020

It looks like those lines in the .rst files are used as bullet points rather than headings. However, those bullet points appear to be empty (i.e. they may have been inserted into the .rst file by accident). You can refer to the following website to see what I mean by empty bullet points:

https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

control-f for

when passing a dict of columns and types the

to find the empty bullet points and to give context on what's going on.

Hope this helps.

@datapythonista
Copy link
Member

There is a condition in the script where we check that a line just contains dashes (or other specific characters, and that the length of the analysed line and the previous have the same length. I guess we want to add another condition that the length should be greater than one and the previous line shouldn't have one of these specific characters.

May be that can be implemented in a separate PR, or together with fixing a single line where this happens.

@themien
Copy link
Contributor

themien commented Mar 10, 2020

@tonywu1999 @datapythonista I believe there are a few capitalization exceptions missing like IntervalIndex or RangeIndex.

Also in find_titles() this statement is selecting the empty bullet points.

line_chars = set(line)
            if (
                len(line_chars) == 1
                and line_chars.pop() in symbols
                and len(line) == len(previous_line)
            ):

@datapythonista
Copy link
Member

No problem on changing whatever is needed in the script.

@cleconte987
Copy link
Contributor

Hello i'd like to work on it

@cleconte987
Copy link
Contributor

Though I don't understand exactly what is the issue or the goal of the issue here. The script does it well to find all occurrences of titles that need to be decapitalized, is it to actually make the changes to the documentation?

@tonywu1999
Copy link
Contributor Author

The goal of this issue is to actually make the changes to the documentation.

@cleconte987
Copy link
Contributor

Yes but there are exceptions that you don't want to lower and that are not in CAPITALIZATION_EXCEPTIONS. What do you do with it? Should you extend it?

@datapythonista
Copy link
Member

Yes, the script will validate most cases all right, but if there is anything that need to be changed there, like adding new keywords, you can do it.

@datapythonista
Copy link
Member

Better don't open a huge PR, take few documents (e.g. five), and just fix those.if you want to fix more (surly appreciated) then keep opening PRs, no problem in opening many.

Thanks!

@cleconte987
Copy link
Contributor

I am not very used to git yet, how do I push to remote repository? I have pulled the repository on my local machine. I have modified some files in doc, commit, and doesn't work when I push to the GitHub url. What is the url where I should push to?

@datapythonista
Copy link
Member

@cleconte987 you need to open a pull request. It's a bit tricky the first time, but there are resources out there to help you know how it works. If you don't find anything better, you can see these slides https://docs.google.com/presentation/d/1rOSYXZPyMe9KXnbVK_xbJzw_-ijxd6bIxndmvPU6L2o/edit?usp=sharing and this video (sorry the audio is awful): https://www.youtube.com/watch?v=LCTk0leNH1g

@tonywu1999
Copy link
Contributor Author

https://dev.pandas.io/docs/development/contributing.html

I started contributing 2 months ago, and I found that this link helped me a lot.

@cleconte987
Copy link
Contributor

Ok thank you

@themien
Copy link
Contributor

themien commented Mar 19, 2020

@cleconte987 I am already on the issue. Will do a pull request with all the updated documentation soon

themien added a commit to themien/pandas that referenced this issue Mar 19, 2020
@cleconte987
Copy link
Contributor

Well, what should I do now? @tonywu1999 @datapythonista. I started to commit to the documentation. I guess you are assignee. Im here if I can help

@datapythonista
Copy link
Member

As said early, you should be working on small batches, so keep opening small pull requests with the fixes, and we'll be merging them. There are many titles to fix, try to coordinate if possible, but more than one person can work with this, no problem.

@cleconte987
Copy link
Contributor

cleconte987 commented Mar 19, 2020

And I think it's not correct to lower words like DataFrame to Dataframe, shouldn't it be kept with capitalization?

@SomtochiUmeh
Copy link
Contributor

take

@SomtochiUmeh
Copy link
Contributor

Hey,
RadViz should be kept as is right?

@SomtochiUmeh
Copy link
Contributor

Also SpareArray and SparseDtype?

@INDIG0N
Copy link
Contributor

INDIG0N commented Aug 12, 2022

Hey @datapythonista , I was starting to wok on this issue and came across a weird scenario, specifically with the stumpy package mentioned in ecosystem.rst.

So, ecosystem.rst refers to the pckage in all caps, "STUMPY". The script catches this of course and says to correct it to "Stumpy".
It loks like the authors of the package refer to it in all caps in their documentation which matches the current capitalization we use, but when importing the package and using it, it's all lowercase.

In situations like this, should I use the capitalization the script suggests, correct the capitalization to all lowercase to match with how it's imported, add the package name to the list of exceptions in the script itself, or the last 2 combined?

@datapythonista
Copy link
Member

You can add it to the list of exceptions. Or, if you think it's reasonable and not too complicated, just skip that level of header (probably h3) of the ecosystem page, as everything in it should be a package name if I'm not wrong.

@INDIG0N
Copy link
Contributor

INDIG0N commented Aug 12, 2022

@datapythonista Thanks, I had another question though. it looks like the script is asking me to change the capitalization in one of the urls.

For reference this is the original url: https://github.com/TDAmeritrade/stumpy

it wants me to make the link all lowercase. The link works fine as it is, but weirdly enough putting the link in all lowercase also seems to work fine, and I have no idea why. Is there some kind of weird behavior that means I shouldn't change the capitalization in links or am I good to go?

@datapythonista
Copy link
Member

URLs are not case sensitive afaik. So, making the url all lowercase shouldn't be a problem when clicking on it. I guess the capitalization is more for branding, and it'd probably be nice to keep it and don't validate links in the titles. If it doesn't introduce much extra complexity to the validation, and you want to give it a try, that would be great.

@harsimran44
Copy link

can i work on this issue?

@suresh33661
Copy link

take

@harsimran44
Copy link

Take

@skregas
Copy link

skregas commented Nov 1, 2023

Hi, can an admin take a look at #55685? Not sure how to make the tests pass. I didn't make any changes to anything that's being tested in the checks.
Thanks

kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 29, 2024
…snew doc files. Sorted exceptions list alphabetically, for better maintainability, proposed name change from CAPITALIZATION_EXCEPTIONS to CAPITALIZATION_EXCLUSIONS. (pandas-dev#32550)
@kajor3k
Copy link

kajor3k commented Jun 29, 2024

Hey everyone -
I've created my PR. I've seen that this story has been considered to be too wide to squeeze in one PR, hence I covered only 3 most recent files from whatsnew directory for now (2.2.1 didn't require any changes).

In the original comment in that issue, I saw that proposed way of running that script was:
./scripts/validate_rst_title_capitalization.py doc/source,
but unfortunately that won't work anymore as the script requires list of strings, so in other words, one need to provide particular files i.e.
scripts/validate_rst_title_capitalization.py doc/source/whatsnew/v2.2.1.rst doc/source/whatsnew/v2.2.2.rst

I also tried to reuse exclusions wherever it was possible, i.e. instead of adding "I/O" to the list I've edited rst to use "IO" as the second one was already on the list.

I also think, that there's a need for surpressing some of the validations, and exclusions may not be enough. I.e. - "pandas" is added to exclusions with underscore, however it can also be used at the beginning of the title and then this particular entry in an exclusion doesn't work as expected.

I'll be happy to pick up other files as well and trigger some discussions, but before I do so, I just wanted to confirm with you if that's an expected way of working.

Potential future stories:
1. From what I see, this script has never been turned on on code_checks.sh. That's something I could tackle as well. In order to achieve that, I think the good predecessor story would be to allow this script to run for all files in docs and subdirectories. I see that the validator is being run on the PR, however it is not configured on code_checks.sh
2. I think some more sophisticated logic for exclusions should be introduced as well. Maybe "rule" approach would be a good choice here? The first example to tackle could be a "pandas" word example I've described above.

kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 30, 2024
kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 30, 2024
kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment