Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change package name from "transformers" to something less generic #24934

Closed
geajack opened this issue Jul 19, 2023 · 7 comments
Closed

Change package name from "transformers" to something less generic #24934

geajack opened this issue Jul 19, 2023 · 7 comments

Comments

@geajack
Copy link

geajack commented Jul 19, 2023

Feature request

I'm repeatedly finding myself in situations where I want to have a package called datasets.py or evaluate.py in my code and can't because those names are being taken up by Huggingface packages. While I can understand how (even from the user's perspective) it's aesthetically pleasing to have nice terse library names, ultimately a library hogging simple names like this is something I find short-sighted, impractical and at my most irritable, frankly rude.

My preference would be a pattern like what you get with all the other big libraries like numpy or pandas:

import huggingface as hf
# hf.transformers, hf.datasets, hf.evaluate

or things like

import huggingface.transformers as tf
# tf.load_model(), etc

If this isn't possible for some technical reason, at least just call the packages something like hf_transformers and so on.

I realize this is a very big change that's probably been discussed internally already, but I'm making this issue and sister issues on each huggingface project just to start the conversation and begin tracking community feeling on the matter, since I suspect I'm not the only one who feels like this.

Sorry if this has been requested already on this issue tracker, I couldn't find anything looking for terms like "package name".

Sister issues:

Motivation

Not taking up package names the user is likely to want to use.

Your contribution

No - more a matter of internal discussion among core library authors.

@sgugger
Copy link
Collaborator

sgugger commented Jul 19, 2023

You do realize this would break the existing code of many many people?

@geajack
Copy link
Author

geajack commented Jul 19, 2023

Yes

My theory/suggestion is that HF is still a relatively young library used by a relatively niche community used to having to move in a rapidly developing field (we're not talking about the C standard lib or something), that a lot of people likely feel this way, and that if this change were implemented it would be looked back on as a good decision ten years later (not as if we're new to breaking changes in the Python community - hell even HF has pushed breaking changes before)

@henk717
Copy link

henk717 commented Jul 21, 2023

That kind of stuff would be hell for projects like ours, we have many low level patches in place to extend HF.

@SoyGema
Copy link
Contributor

SoyGema commented Jul 28, 2023

Hello there.
Would like to share some mental models about this

General TLTR ; No, because for now, the libraries are consistent and helpful in becoming the standard .

The following comments have sections

  • Impact
    TLTR; 17M monthly downloads, 1700 monthly MAUs, +100K repositories impact
    Q : can I help you brainstorm other names -syntactically and semantically aligned- that could help solve your problem?
  • Considerations in the matter
    TLTR; Other standard names are also taken.
    Balancing Makers and Takers to scale and sustain Open Source is a line of thought to take into account
    Q : would it be worthy to think deeply about the trade-off that the libraries are giving with respect to what they are taking ? Can I help you brainstorm the utilities you put on evaluate.py and datasets.py on your code and submit a contribution so we can encapsulate your needs to all coders and avoid frustration?
  • Responsibility when becoming the standard
    TLTR; Motivation of owners might be becoming the standard. They seem worried about that responsibility in many dimensions.
    Q : do you think we shall consider this dimension into account for this matter?
  • Bibliography and Openness

Impact

TLTR; 17M monthly downloads, 1700 monthly MAUs, +100K repositories impact
Hypothesis limitations: this data could change with other insights about MAUs funnel conversion and maintained active repositories + private repositories. Total MAUs have not being calculated due to incomplete information that would made data-driven conclusions too intuitive

In order to gain some data-driven perspective about the impact of this change, what I did is check-in the downloads coming from PyPI from the 3 libraries and make a sum of the last month's downloads, giving an overall sum of 17M-ish . I'm assuming that there is a clear funnel here that separates users that are newcomers, explorers, and MAUs ( Monthly Active Users ). My analysis took me to focus on these last ones, as they are using the code regularly or might be the ones that might be using the libraries in a production scenario or in a work dependent project. Taking out 4 orders of magnitude - in a pessimistic overview - the hypothesis takes us to new 1700 montly-MAUs

Captura de pantalla 2023-07-27 a las 12 47 21 Captura de pantalla 2023-07-27 a las 12 48 20 Captura de pantalla 2023-07-27 a las 12 47 50

Therefore, the data-driven impact exploration took me to used-by reporting in the head page of the repository, as the impact of a number of repositories that depend on the libraries. Transformers library has been reported to be used by 84,4 K people, datasets by 20,4 k people, and datasets by 2.9 k people. This gave a total of +100K repositories this change could have impact in .

Hypothesis limitations: this data could change with other insights about MAUs funnel conversion and maintained active repositories + private repositories.

Before going further, and I guess this is a question directly for @geajack , can I help you brainstorm other names - syntactically and semantically aligned - that could help solve your problem?

Considerations in the matter

TLTR; Other standard names are also taken.
Balancing Makers and Takers to scale and sustain Open Source is a line of thought to take into account

What I understood from the issue is that the generalization of the package name supposed an interference and a cognitive dissonance WRT the naming standard with respect to other libraries. Then I went to check-availability package to see if other standard names could solve your problem - tried dataset and evaluation - and none were available.

check-availability pypi dataset --verbose 3
GET https://pypi.org/project/dataset
Got status code 200
The name dataset is not available on pypi
GET https://pypi.org/project/evaluation
Got status code 200
The name evaluation is not available on pypi

I really -really- tried to benchmark your motivations with Open Source Research insights 1 2 3 to try to have an empathetic generalistic view about this concern . Still maturing it, but what Im taking is that you might encounter beneficial and aligned with some Open Source ideas(yet to be proven representative) that generalistic names are not proprietary, beyond your individual code problem.

However, I invite you to go deeper into motivations behind Open Source, as there seem to be equally important motivations that contributors and users are driven by. Encourage you to please share with me mature ideas that might not be aligned with my mental model. If we can go beyond one individual, and try to catch a community o a more general mental model, that would be amazing.

On the other hand, putting myself in Hugginface's shoes, I couldn´t stop thinking broadly about their Open Source sustainability contribution with respect to other companies and proprietary software. Really recommend this reading!

Before going further , and I guess these is a question for @geajack , would it be worthy to think deeply about the trade-off that the libraries are giving with respect to what they are taking ? Can I help you brainstorm the utilities you put on evaluate.py and datasets.py on your code and submit a contribution so we can encapsulate your needs to all coders and avoid frustration?

Responsibility when becoming the standard

TLTR; Motivation of owners might be becoming the standard. They seem worried about that responsibility in many dimensions.

It might be fair to think that that naming in this case might entail the search for becoming the standard, and I left to the reader to analyze whether the owners of the libraries are being responsible or not with respect to their Open Source duties for being recognized as such beyond the naming in order to analyze coherence. On my side, the trust level system and contributor management , together with the pro-active response with respect to other Open Source responsibilities, talk by itself. This doesn´t entail that they should have a present and future concern on this matter.

I guess this is a question for @geajack , do you think we shall consider this dimension into account for this matter?

Bibliography and Openess

Beyond the cited readings, I really recommend this book .

I m acknowledging that this response might be dense, so I would like to thank the reader, the owner of this issue, the contributors, and the maintainer for going through this material. As an emotional openness exercise and following the bravery of @geajack , I must confess It has taken me a significant amount of courage to press Comment on this one.
I just hope that this can glimpse another logical perspective, new possible paths coming from questions, and other thoughts that might be mutable due to new shreds of evidence.

@geajack
Copy link
Author

geajack commented Jul 28, 2023

@SoyGema thanks for the detailed breakdown. First of all I just want to say that I don't intend to present myself as some kind of sponsor for these issues - I just want there to be a place in the issue tracker for people to voice this concern if it is indeed a common concern.

I do think you may have misunderstood the issue at a couple of points, though. In your second section, it sounds like you think the complaint is that because HF is taking up evaluate on PyPi, that therefore I or somebody else can't have our own package on PyPi. That isn't the issue - the issue is that if I want to use HF's evaluate locally, I can't have my own local evaluate.py.

My most recent use-case for this was wanting a script called evaluate.py that I would actually run from the command line to run evaluation of my results - I had to change it to something more awkward like evaluation.py, which is annoying because it is after all a command and should ideally have the form of an imperative verb. I also routinely have a package called in my codebases to provide utility functions for managing my own datasets. As it happens, I've always called that package data, but I could imagine another programmer wanting to call it datasets and being annoyed that they can't.

I'm not under the impression that this is a change that can be made tomorrow or even this year. When I opened these issues I pictured them (assuming they didn't just get buried) being the kinds of issues that sit open for years and years accumulating hundreds of comments, acting as an informal community forum before anything is done about them. The only place on the internet I could find someone expressing a similar sentiment was this highly upvoted /r/Python comment, but I suspect a fair few people feel this way.

@SoyGema
Copy link
Contributor

SoyGema commented Jul 28, 2023

Hey @geajack thanks for your response and for the clarification. Thanks also for the reddit link, that wasn't on my radar until now. As feedback , if you could share a line with the motivations and links behind this issue when opened that would be great!🙂

I'm happy that you already have a turn around for this . Yes, you are correct. I thought that this was beyond a local use of a script and more library oriented due to the impact of the change and my normal sparks under 'annoying' naming scenario.
I agree with your impression, and let's see what time brings 🙂

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants