Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence-based word wrapping #4

Open
hukkin opened this issue Jul 15, 2020 · 3 comments
Open

Sentence-based word wrapping #4

hukkin opened this issue Jul 15, 2020 · 3 comments
Labels
enhancement New feature or request plugin A plugin should be created or updated research Research needs to be done

Comments

@hukkin
Copy link
Member

hukkin commented Jul 15, 2020

Experiment with something like

from nltk import tokenize
sentences = tokenize.sent_tokenize(paragraph)

and find out if we can implement sentence-based word wrapping to reduce diffs.

Implement as an option, dont change the default mode (which preserves wrapping).

@hukkin hukkin added the enhancement New feature or request label Jul 15, 2020
@hukkin hukkin added the research Research needs to be done label Jan 21, 2021
@hukkin hukkin changed the title Implement word wrapping that optimizes for smaller diffs Word wrapping that optimizes for smaller diffs Jan 21, 2021
@hukkin
Copy link
Member Author

hukkin commented Apr 27, 2021

Due to the nltk dependency and the many, many bugs that I expect, I think any work should be started (and most likely stay) in a plugin.

@hukkin hukkin added the plugin A plugin should be created or updated label Apr 27, 2021
@choldgraf
Copy link
Member

choldgraf commented Jun 9, 2021

100% plugin is the right place to experiment with this. I suspect it will lead to many unexpected or unpredictable side effects which is the last thing you want in a black-style program :-)

@hukkin hukkin changed the title Word wrapping that optimizes for smaller diffs Sentence-based word wrapping Jun 9, 2021
@jspaezp
Copy link

jspaezp commented Sep 15, 2022

Hello there!

I thought A LOT about this issue in the last couple of days and wanted to pitch an idea.
Inspired in the way that black handles docstrings, where a lot of the times "if it can fit in a a line of less than 88 chars it should", could a "lazy" implementation of the problem be to separate on punctuation unless the generated section was less than X number of characters?

I think it would accomplish a predictable behaviour and greatly reduce diff sizes, it will sometimes lead to uglier docstrings but ... well .. they will not look ugly when rendered to html ...

start empty chunk
for a given paragraph, start from the end
    append to chunk until a punctuation mark is found
        if the chunk is larger than X (.... i dont know ... 42 characters)
            yield the chunk (separate new line)

let me know what you think, (i am trying to find problems with my approach)

Temple of Doom was discovered by Dr. Jones.

Ended up implementing a version of this ... regex based and including support for some other stuff ...
I want to play with it a bit more before publishing it but looks promising
LMK if there are things you feel it should support that I have not tested.

https://github.com/jspaezp/mdformat-sentencebreak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugin A plugin should be created or updated research Research needs to be done
Projects
None yet
Development

No branches or pull requests

3 participants