Fix capitalization among headings in documentation files #32550

tonywu1999 · 2020-03-09T02:53:20Z

In #26933, we made the capitalization of titles consistent. For example, a title used to be capitalized like, "This is the Section Title", and many of the titles in the pandas documentation was changed to a correct format, like "This is the section title".

In #31114, we made a script called scripts/validate_rst_title_capitalization.py that extracts all titles in the documentation, making sure that only the first letter of the sentence is uppercase, or words defined in a short list, like Series, DataFrame, etc. The script also outputs how to fix the title as well.

We validated capitalization is correct by integrating this script into CI (continuous integration). The idea is that we should run this script through ci/code_checks.sh, and when title capitalization errors show up on CI, the user should fix those errors on the specified files.

To verify the code is working on your side, the command below instructs the program to validate the doc/source/development/contributing.rst file. There should be no output from this command as this file as no capitalization errors:

./scripts/validate_rst_title_capitalization.py doc/source/development/contributing.rst

This command below instructs the program to validate both doc/source/index.rst and doc/source/development/policies.rst files.

./scripts/validate_rst_title_capitalization.py doc/source/index.rst doc/source/development/policies.rst

This command produces the output below:

doc/source/development/policies.rst:9:Heading capitalization formatted incorrectly. Please correctly capitalize "Version Policy" to "Version policy" 
doc/source/development/policies.rst:51:Heading capitalization formatted incorrectly. Please correctly capitalize "Python Support" to "Python support"

The goal of this issue is to correct the title capitalization of all files in the pandas documentation.
In order to see all titles that need to be validated in the documentation folder, one should run the following command below on the command line.

./scripts/validate_rst_title_capitalization.py doc/source

This program validates all RST files in the doc/source folder. Once all titles are all correctly validated, we would like to add the above command into the ci/code_checks.sh file.

Here's a checklist of all the files that had at least one incorrectly capitalized heading:

- [ ] doc/source/user_guide/timedeltas.rst
- [ ] doc/source/whatsnew/v0.7.0.rst
- [ ] doc/source/whatsnew/v0.23.4.rst
- [ ] doc/source/whatsnew/v0.6.0.rst
- [ ] doc/source/whatsnew/v1.0.2.rst
- [ ] doc/source/whatsnew/v0.18.0.rst
- [ ] doc/source/whatsnew/v0.16.2.rst
- [ ] doc/source/whatsnew/v0.7.1.rst
- [ ] doc/source/whatsnew/v0.8.0.rst
- [ ] doc/source/user_guide/integer_na.rst
- [ ] doc/source/reference/io.rst
- [ ] doc/source/user_guide/computation.rst
- [ ] doc/source/whatsnew/v0.16.0.rst
- [ ] doc/source/whatsnew/v0.23.2.rst
- [ ] doc/source/whatsnew/v0.12.0.rst
- [ ] doc/source/getting_started/10min.rst
- [ ] doc/source/user_guide/advanced.rst
- [ ] doc/source/reference/arrays.rst
- [ ] doc/source/development/maintaining.rst
- [ ] doc/source/user_guide/groupby.rst
- [ ] doc/source/user_guide/cookbook.rst
- [ ] doc/source/development/developer.rst
- [ ] doc/source/development/meeting.rst
- [ ] doc/source/getting_started/intro_tutorials/03_subset_data.rst
- [ ] doc/source/whatsnew/v0.4.x.rst
- [ ] doc/source/whatsnew/v0.16.1.rst
- [ ] doc/source/whatsnew/v1.0.0.rst
- [ ] doc/source/whatsnew/v0.23.1.rst
- [ ] doc/source/getting_started/tutorials.rst
- [ ] doc/source/reference/series.rst
- [ ] doc/source/getting_started/intro_tutorials/02_read_write.rst
- [ ] doc/source/whatsnew/v0.6.1.rst
- [ ] doc/source/whatsnew/v0.13.1.rst
- [ ] doc/source/whatsnew/v0.21.0.rst
- [ ] doc/source/reference/frame.rst
- [ ] doc/source/whatsnew/v0.20.0.rst
- [ ] doc/source/getting_started/intro_tutorials/09_timeseries.rst
- [ ] doc/source/whatsnew/index.rst
- [ ] doc/source/user_guide/merging.rst
- [ ] doc/source/whatsnew/v0.18.1.rst
- [ ] doc/source/user_guide/enhancingperf.rst
- [ ] doc/source/development/contributing_docstring.rst
- [ ] doc/source/whatsnew/v0.9.0.rst
- [ ] doc/source/whatsnew/v0.25.2.rst
- [ ] doc/source/development/extending.rst
- [ ] doc/source/reference/window.rst
- [ ] doc/source/whatsnew/v0.7.3.rst
- [ ] doc/source/user_guide/options.rst
- [ ] doc/source/ecosystem.rst
- [ ] doc/source/getting_started/intro_tutorials/01_table_oriented.rst
- [ ] doc/source/user_guide/categorical.rst
- [ ] doc/source/whatsnew/v0.14.1.rst
- [ ] doc/source/whatsnew/v0.19.0.rst
- [ ] doc/source/whatsnew/v0.20.2.rst
- [ ] doc/source/whatsnew/v0.24.0.rst
- [ ] doc/source/development/roadmap.rst
- [ ] doc/source/whatsnew/v0.17.0.rst
- [ ] doc/source/user_guide/boolean.rst
- [ ] doc/source/getting_started/comparison/comparison_with_r.rst
- [ ] doc/source/whatsnew/v0.17.1.rst
- [ ] doc/source/whatsnew/v0.22.0.rst
- [ ] doc/source/reference/indexing.rst
- [ ] doc/source/user_guide/missing_data.rst
- [ ] doc/source/getting_started/install.rst
- [ ] doc/source/user_guide/index.rst
- [ ] doc/source/user_guide/visualization.rst
- [ ] doc/source/getting_started/comparison/comparison_with_stata.rst
- [ ] doc/source/whatsnew/v0.19.1.rst
- [ ] doc/source/whatsnew/v0.15.1.rst
- [ ] doc/source/whatsnew/v0.10.0.rst
- [ ] doc/source/whatsnew/v0.19.2.rst
- [ ] doc/source/whatsnew/v0.25.3.rst
- [ ] doc/source/user_guide/gotchas.rst
- [ ] doc/source/whatsnew/v0.14.0.rst
- [ ] doc/source/user_guide/reshaping.rst
- [ ] doc/source/reference/groupby.rst
- [ ] doc/source/whatsnew/v0.23.3.rst
- [ ] doc/source/user_guide/timeseries.rst
- [ ] doc/source/whatsnew/v0.9.1.rst
- [ ] doc/source/getting_started/comparison/comparison_with_sql.rst
- [ ] doc/source/whatsnew/v0.24.1.rst
- [ ] doc/source/reference/index.rst
- [ ] doc/source/development/policies.rst
- [ ] doc/source/whatsnew/v0.21.1.rst
- [ ] doc/source/whatsnew/v0.20.3.rst
- [ ] doc/source/development/code_style.rst
- [ ] doc/source/user_guide/sparse.rst
- [ ] doc/source/whatsnew/v0.24.2.rst
- [ ] doc/source/whatsnew/v0.15.2.rst
- [ ] doc/source/whatsnew/v1.1.0.rst
- [ ] doc/source/reference/offset_frequency.rst
- [ ] doc/source/whatsnew/v1.0.1.rst
- [ ] doc/source/getting_started/basics.rst
- [ ] doc/source/whatsnew/v0.5.0.rst
- [ ] doc/source/user_guide/text.rst
- [ ] doc/source/user_guide/indexing.rst
- [ ] doc/source/whatsnew/v0.11.0.rst
- [ ] doc/source/whatsnew/v0.8.1.rst
- [ ] doc/source/getting_started/comparison/comparison_with_sas.rst
- [ ] doc/source/whatsnew/v0.23.0.rst
- [ ] doc/source/user_guide/io.rst
- [ ] doc/source/whatsnew/v0.25.1.rst
- [ ] doc/source/whatsnew/v0.13.0.rst
- [ ] doc/source/whatsnew/v0.25.0.rst
- [ ] doc/source/whatsnew/v0.15.0.rst
- [ ] doc/source/whatsnew/v0.10.1.rst

The text was updated successfully, but these errors were encountered:

datapythonista · 2020-03-09T09:38:52Z

@tonywu1999 do you mind editing the description and providing more context? Imagine a random user wanting to contribute to pandas lands here. We would like to explain what's the problem, why it's useful to fix it, and step by step information on what to do (e.g. We want to add fixes files to ci/code_checks.sh).

Also, if you want to get the list of files to check, and add it in the description (you can use - [ ] docs/source/whatever.rst, so we can easily check the ones that are fixed).

Thanks!

themien · 2020-03-09T12:19:02Z

take

themien · 2020-03-10T15:15:34Z

@tonywu1999 working on the issue I am getting some outputs that I am not sure are valid. If I run the script on /doc/source/whatsnew/v0.25.0.rst for example I get this a part of the output:
/doc/source/whatsnew/v0.25.0.rst:561:Heading capitalization formatted incorrectly. Please correctly capitalize "Indexing an IntervalIndex with Interval objects" to "Indexing an intervalindex with Interval objects"
or:
/doc/source/whatsnew/v0.25.0.rst:1087:Heading capitalization formatted incorrectly. Please correctly capitalize "-" to ""

Can you confirm that this is an expected output?

datapythonista · 2020-03-10T15:38:10Z

We just developed this validation script, so it's expected that we find some false positives. Can you find where this error is being generated, so we can see what's the problem?

tonywu1999 · 2020-03-10T15:40:06Z

It looks like those lines in the .rst files are used as bullet points rather than headings. However, those bullet points appear to be empty (i.e. they may have been inserted into the .rst file by accident). You can refer to the following website to see what I mean by empty bullet points:

https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

control-f for

when passing a dict of columns and types the

to find the empty bullet points and to give context on what's going on.

Hope this helps.

datapythonista · 2020-03-10T15:47:04Z

There is a condition in the script where we check that a line just contains dashes (or other specific characters, and that the length of the analysed line and the previous have the same length. I guess we want to add another condition that the length should be greater than one and the previous line shouldn't have one of these specific characters.

May be that can be implemented in a separate PR, or together with fixing a single line where this happens.

themien · 2020-03-10T15:51:12Z

@tonywu1999 @datapythonista I believe there are a few capitalization exceptions missing like IntervalIndex or RangeIndex.

Also in find_titles() this statement is selecting the empty bullet points.

line_chars = set(line)
            if (
                len(line_chars) == 1
                and line_chars.pop() in symbols
                and len(line) == len(previous_line)
            ):

datapythonista · 2020-03-10T15:54:55Z

No problem on changing whatever is needed in the script.

cleconte987 · 2020-03-18T00:37:35Z

Hello i'd like to work on it

cleconte987 · 2020-03-18T00:40:18Z

Though I don't understand exactly what is the issue or the goal of the issue here. The script does it well to find all occurrences of titles that need to be decapitalized, is it to actually make the changes to the documentation?

tonywu1999 · 2020-03-18T02:36:19Z

The goal of this issue is to actually make the changes to the documentation.

cleconte987 · 2020-03-18T10:24:31Z

Yes but there are exceptions that you don't want to lower and that are not in CAPITALIZATION_EXCEPTIONS. What do you do with it? Should you extend it?

datapythonista · 2020-03-18T10:30:05Z

Yes, the script will validate most cases all right, but if there is anything that need to be changed there, like adding new keywords, you can do it.

datapythonista · 2020-03-18T10:32:28Z

Better don't open a huge PR, take few documents (e.g. five), and just fix those.if you want to fix more (surly appreciated) then keep opening PRs, no problem in opening many.

Thanks!

cleconte987 · 2020-03-18T14:12:03Z

I am not very used to git yet, how do I push to remote repository? I have pulled the repository on my local machine. I have modified some files in doc, commit, and doesn't work when I push to the GitHub url. What is the url where I should push to?

datapythonista · 2020-03-18T19:14:32Z

@cleconte987 you need to open a pull request. It's a bit tricky the first time, but there are resources out there to help you know how it works. If you don't find anything better, you can see these slides https://docs.google.com/presentation/d/1rOSYXZPyMe9KXnbVK_xbJzw_-ijxd6bIxndmvPU6L2o/edit?usp=sharing and this video (sorry the audio is awful): https://www.youtube.com/watch?v=LCTk0leNH1g

tonywu1999 · 2020-03-18T19:55:57Z

https://dev.pandas.io/docs/development/contributing.html

I started contributing 2 months ago, and I found that this link helped me a lot.

cleconte987 · 2020-03-18T22:58:53Z

Ok thank you

themien · 2020-03-19T17:58:56Z

@cleconte987 I am already on the issue. Will do a pull request with all the updated documentation soon

…ion files

cleconte987 · 2020-03-19T21:37:02Z

Well, what should I do now? @tonywu1999 @datapythonista. I started to commit to the documentation. I guess you are assignee. Im here if I can help

datapythonista · 2020-03-19T21:47:37Z

As said early, you should be working on small batches, so keep opening small pull requests with the fixes, and we'll be merging them. There are many titles to fix, try to coordinate if possible, but more than one person can work with this, no problem.

cleconte987 · 2020-03-19T21:51:55Z

And I think it's not correct to lower words like DataFrame to Dataframe, shouldn't it be kept with capitalization?

SomtochiUmeh · 2022-07-14T13:12:57Z

take

SomtochiUmeh · 2022-07-14T13:19:16Z

Hey,
RadViz should be kept as is right?

SomtochiUmeh · 2022-07-14T17:40:39Z

Also SpareArray and SparseDtype?

…#32550

…7732)

INDIG0N · 2022-08-12T05:50:42Z

Hey @datapythonista , I was starting to wok on this issue and came across a weird scenario, specifically with the stumpy package mentioned in ecosystem.rst.

So, ecosystem.rst refers to the pckage in all caps, "STUMPY". The script catches this of course and says to correct it to "Stumpy".
It loks like the authors of the package refer to it in all caps in their documentation which matches the current capitalization we use, but when importing the package and using it, it's all lowercase.

In situations like this, should I use the capitalization the script suggests, correct the capitalization to all lowercase to match with how it's imported, add the package name to the list of exceptions in the script itself, or the last 2 combined?

datapythonista · 2022-08-12T06:03:18Z

You can add it to the list of exceptions. Or, if you think it's reasonable and not too complicated, just skip that level of header (probably h3) of the ecosystem page, as everything in it should be a package name if I'm not wrong.

INDIG0N · 2022-08-12T19:45:36Z

@datapythonista Thanks, I had another question though. it looks like the script is asking me to change the capitalization in one of the urls.

For reference this is the original url: https://github.com/TDAmeritrade/stumpy

it wants me to make the link all lowercase. The link works fine as it is, but weirdly enough putting the link in all lowercase also seems to work fine, and I have no idea why. Is there some kind of weird behavior that means I shouldn't change the capitalization in links or am I good to go?

datapythonista · 2022-08-14T06:44:48Z

URLs are not case sensitive afaik. So, making the url all lowercase shouldn't be a problem when clicking on it. I guess the capitalization is more for branding, and it'd probably be nice to keep it and don't validate links in the titles. If it doesn't introduce much extra complexity to the validation, and you want to give it a try, that would be great.

harsimran44 · 2023-09-12T06:29:08Z

can i work on this issue?

suresh33661 · 2023-09-26T03:51:24Z

take

harsimran44 · 2023-09-26T16:18:17Z

Take

skregas · 2023-11-01T02:03:49Z

Hi, can an admin take a look at #55685? Not sure how to make the tests pass. I didn't make any changes to anything that's being tested in the checks.
Thanks

…snew doc files. Sorted exceptions list alphabetically, for better maintainability, proposed name change from CAPITALIZATION_EXCEPTIONS to CAPITALIZATION_EXCLUSIONS. (pandas-dev#32550)

kajor3k · 2024-06-29T08:57:21Z

Hey everyone -
I've created my PR. I've seen that this story has been considered to be too wide to squeeze in one PR, hence I covered only 3 most recent files from whatsnew directory for now (2.2.1 didn't require any changes).

In the original comment in that issue, I saw that proposed way of running that script was:
./scripts/validate_rst_title_capitalization.py doc/source,
but unfortunately that won't work anymore as the script requires list of strings, so in other words, one need to provide particular files i.e.
scripts/validate_rst_title_capitalization.py doc/source/whatsnew/v2.2.1.rst doc/source/whatsnew/v2.2.2.rst

I also tried to reuse exclusions wherever it was possible, i.e. instead of adding "I/O" to the list I've edited rst to use "IO" as the second one was already on the list.

I also think, that there's a need for surpressing some of the validations, and exclusions may not be enough. I.e. - "pandas" is added to exclusions with underscore, however it can also be used at the beginning of the title and then this particular entry in an exclusion doesn't work as expected.

I'll be happy to pick up other files as well and trigger some discussions, but before I do so, I just wanted to confirm with you if that's an expected way of working.

Potential future stories:
1. From what I see, this script has never been turned on on code_checks.sh. That's something I could tackle as well. In order to achieve that, I think the good predecessor story would be to allow this script to run for all files in docs and subdirectories. I see that the validator is being run on the PR, however it is not configured on code_checks.sh
2. I think some more sophisticated logic for exclusions should be introduced as well. Maybe "rule" approach would be a good choice here? The first example to tackle could be a "pandas" word example I've described above.

tonywu1999 mentioned this issue Mar 9, 2020

CI: Adding script to validate consistent and correct capitalization among headings in documentation (#26941) #31114

Merged

datapythonista added Docs good first issue labels Mar 9, 2020

github-actions bot assigned themien Mar 9, 2020

datapythonista mentioned this issue Mar 18, 2020

HDF file compression not working #29310

Open

cleconte987 mentioned this issue Mar 19, 2020

DOC: Updating capitalization in folder doc/source/reference #32824

Closed

1 task

themien added a commit to themien/pandas that referenced this issue Mar 19, 2020

BUG: pandas-dev#32550 Fix capitalization among headings in documentat…

79d9109

…ion files

themien mentioned this issue Mar 19, 2020

DOC: Fix capitalization among headings in documentation files (#32550) #32843

Closed

4 tasks

github-actions bot assigned SomtochiUmeh Jul 14, 2022

SomtochiUmeh added a commit to SomtochiUmeh/pandas that referenced this issue Jul 15, 2022

DOC: Updating some capitalization in doc/source/user_guide pandas-dev…

a802d86

…#32550

SomtochiUmeh mentioned this issue Jul 15, 2022

DOC: Updating some capitalization in doc/source/user_guide #32550 #47732

Merged

2 tasks

jreback added this to the 1.5 milestone Jul 15, 2022

SomtochiUmeh added a commit to SomtochiUmeh/pandas that referenced this issue Jul 17, 2022

DOC: Updating some capitalization in doc/source/user_guide pandas-dev…

93299ba

…#32550

SomtochiUmeh added a commit to SomtochiUmeh/pandas that referenced this issue Jul 18, 2022

DOC: Updating some capitalization in doc/source/user_guide pandas-dev…

9c3bf7d

…#32550

jreback pushed a commit that referenced this issue Jul 22, 2022

DOC: Updating some capitalization in doc/source/user_guide #32550 (#4…

d8bb752

…7732)

mroeschke removed this from the 1.5 milestone Aug 15, 2022

INDIG0N mentioned this issue Aug 15, 2022

DOC: Altered capitalization validation script to handle edge cases #48100

Closed

github-actions bot assigned suresh33661 Sep 26, 2023

github-actions bot assigned harsimran44 Sep 26, 2023

suresh33661 mentioned this issue Sep 27, 2023

Valid capitalization errors #32550 #55304

Closed

3 tasks

skregas mentioned this issue Oct 25, 2023

DOC: Update title caps validation script to step through directories #55685

Closed

2 tasks

kajor3k mentioned this issue Jun 29, 2024

DOC: Fixed validate_title_capitalization warnings in most recent what… #59146

Open

3 tasks

kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 30, 2024

BUG: initial commit for running script recursively (pandas-dev#32550)

e818a03

kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 30, 2024

DOC: post review comments (pandas-dev#32550)

c06839a

kajor3k added a commit to kajor3k/pandas-kajor3k that referenced this issue Jun 30, 2024

DOC: removing unnecessary exclusions from the list (pandas-dev#32550)

1bb1b78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix capitalization among headings in documentation files #32550

Fix capitalization among headings in documentation files #32550

tonywu1999 commented Mar 9, 2020 •

edited

Loading

datapythonista commented Mar 9, 2020

themien commented Mar 9, 2020

themien commented Mar 10, 2020 •

edited

Loading

datapythonista commented Mar 10, 2020

tonywu1999 commented Mar 10, 2020 •

edited

Loading

datapythonista commented Mar 10, 2020

themien commented Mar 10, 2020 •

edited

Loading

datapythonista commented Mar 10, 2020

cleconte987 commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

tonywu1999 commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

datapythonista commented Mar 18, 2020

datapythonista commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

datapythonista commented Mar 18, 2020

tonywu1999 commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

themien commented Mar 19, 2020

cleconte987 commented Mar 19, 2020

datapythonista commented Mar 19, 2020

cleconte987 commented Mar 19, 2020 •

edited

Loading

SomtochiUmeh commented Jul 14, 2022

SomtochiUmeh commented Jul 14, 2022

SomtochiUmeh commented Jul 14, 2022

INDIG0N commented Aug 12, 2022

datapythonista commented Aug 12, 2022

INDIG0N commented Aug 12, 2022

datapythonista commented Aug 14, 2022

harsimran44 commented Sep 12, 2023

suresh33661 commented Sep 26, 2023

harsimran44 commented Sep 26, 2023

skregas commented Nov 1, 2023

kajor3k commented Jun 29, 2024 •

edited

Loading

Fix capitalization among headings in documentation files #32550

Fix capitalization among headings in documentation files #32550

Comments

tonywu1999 commented Mar 9, 2020 • edited Loading

datapythonista commented Mar 9, 2020

themien commented Mar 9, 2020

themien commented Mar 10, 2020 • edited Loading

datapythonista commented Mar 10, 2020

tonywu1999 commented Mar 10, 2020 • edited Loading

datapythonista commented Mar 10, 2020

themien commented Mar 10, 2020 • edited Loading

datapythonista commented Mar 10, 2020

cleconte987 commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

tonywu1999 commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

datapythonista commented Mar 18, 2020

datapythonista commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

datapythonista commented Mar 18, 2020

tonywu1999 commented Mar 18, 2020

cleconte987 commented Mar 18, 2020

themien commented Mar 19, 2020

cleconte987 commented Mar 19, 2020

datapythonista commented Mar 19, 2020

cleconte987 commented Mar 19, 2020 • edited Loading

SomtochiUmeh commented Jul 14, 2022

SomtochiUmeh commented Jul 14, 2022

SomtochiUmeh commented Jul 14, 2022

INDIG0N commented Aug 12, 2022

datapythonista commented Aug 12, 2022

INDIG0N commented Aug 12, 2022

datapythonista commented Aug 14, 2022

harsimran44 commented Sep 12, 2023

suresh33661 commented Sep 26, 2023

harsimran44 commented Sep 26, 2023

skregas commented Nov 1, 2023

kajor3k commented Jun 29, 2024 • edited Loading

tonywu1999 commented Mar 9, 2020 •

edited

Loading

themien commented Mar 10, 2020 •

edited

Loading

tonywu1999 commented Mar 10, 2020 •

edited

Loading

themien commented Mar 10, 2020 •

edited

Loading

cleconte987 commented Mar 19, 2020 •

edited

Loading

kajor3k commented Jun 29, 2024 •

edited

Loading