Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Announcement: Textstat organisation #167

Closed
alxwrd opened this issue Aug 18, 2021 · 3 comments
Closed

Announcement: Textstat organisation #167

alxwrd opened this issue Aug 18, 2021 · 3 comments

Comments

@alxwrd
Copy link
Member

alxwrd commented Aug 18, 2021

Hello! I am very pleased to announce the creation of the Textstat GitHub organisation: https://github.com/textstat.

I have been reviewing the current state of Textstat, and recently there has been a lot of interest in additional language support. We've accepted a number of contributions to add Spanish, German, Italian, and Arabic!

However, as more languages are getting added, the core of Textstat is becoming harder to maintain. With the addition of more languages, the base calculations are being pulled in more directions, some of which expect conflicting results.

With the introduction of the organisation, I am beginning the work to separate each language implementation into its own module. This will give each language the space to deviate from the core when applicable, but still be able to default back to solid base calculations.

Let me know your thoughts, either below, or via email: alxwrd@googlemail.com.

Details

  • My hope is to migrate shivam5992/textstattextstat/textstat soon.
  • Contributions will still be accepted for the current implementation of Textstat. Caveat: language additions (a new Spanish index, for example) will be migrated to their new repositories and may not be available straight away in the new version.
  • The changes are planned with a version 1.0 release of Textstat.
  • Until 1.0, there may still be releases for the current version.
    • With 1.0, 0.x will become unsupported.
  • I am not planning on maintaining backwards compatibility with the current API.
  • Target Python version is currently planned at 3.8+.

Planned architecture

The following diagram shows the planned "architecture" of Textstat.

Diagram showing planned Textstat architecture. There is a user layer which is responsible for exposing the library API, and managing the CLI. A language layer which implements language specifics. Readability tests, any overrides to core functionality. And finally, a core layer which implements base text statistics. Character count, word count, etc.

@alxwrd alxwrd pinned this issue Aug 18, 2021
@LKirst
Copy link
Contributor

LKirst commented Aug 18, 2021

Great idea!

If there will be a language layer, we could try to use the CMU dictionary (imported from nltk) for the syllable count in the implementation for English because the CMU is more precise than the pyphen syllable count.

We could also use the migration to improve the documentation.

@GuillemGSubies
Copy link
Contributor

Sounds good, maybe more details about the API would be interesting. Also it would be interesting to make texstat compatible with spacy Docs, so the generated analysis are extra properties of those objects

@alxwrd
Copy link
Member Author

alxwrd commented Oct 24, 2021

Update:

I had initially been working on getting the template repository in good order. There was a lot of community and helper files I wanted to add: code of conduct, issue templates etc. So, I felt this was a good place to start as it would be a pain to update multiple repositories if there ever needed to be a change to any of these files in the template. However, it seems the best place for them is the .github repo.

This frees up the template repo, which means I can use it to generate the first language repo: English. Once generated, the English formulas from Textstat will need to be reimplemented in textstat-en.

I will try and get some issues writen up, and something basic implemented, ASAP.

@alxwrd alxwrd closed this as completed Aug 6, 2022
@alxwrd alxwrd unpinned this issue Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants