New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider renaming to nld #138
Comments
I would suggest |
Chiming in to second everything @honnibal said, and to add that I think the current name is going to impact the discoverability of this library. People who are looking for "NLP Datasets" through a search engine are going to see a library called The names of the other huggingface libraries work because they're the only game in town: there are not very many robust, distinct libraries for |
I'm also not sure whether the naming of |
Interesting, thanks for sharing your thoughts. As we’ll move toward a first non-beta release, we will pool the community of contributors/users of the library for their opinions on a good final name (like when we renamed the beautifully (?) named In the meantime, using |
I feel like we are conflating two distinct subjects here:
(let me know if I mischaracterize your point) I'll chime in to say that the first point is a bit silly IMO. As Python developers due to the limitations of the import system we already have to share:
If we add the constraint that this flat namespace also be shared with variable names this gets untractable pretty fast :) I also think all Python software developers/ML engineers/scientists are capable of at least a subset of:
|
By the way, I see it as a laboratory for testing several long-term ideas about how we could do NLP in terms of research as well as open-source and community sharing, most of these ideas being too experimental/big to fit in Some of the directions we would like to explore are about sharing, traceability and more experimental models, as well as seeing a model as the community-based process of creating a composite entity from data, optimization, and code. We'll see how these ideas end up being implemented and we'll better know how we should define the library when we start to dive into these topics. I'll try to get the |
I'm sort of confused by your point here. The namespace is shared by variable names. You should not use local variables that are named the same as modules, because then you cannot use the module within the scope of your function. For instance, import nlp
import transformers
nlp = transformers.pipeline("sentiment-analysis") This is a bug: you've just overwritten the module, so now you can't use it. Or instead: import transformers
nlp = transformers.pipeline("sentiment-analysis")
# (Later, e.g. in a notebook)
import nlp This is also a bug: you've overwritten your variable with an import. If you have a module named
Okay but the same logic applies to naming the module literally anything else. There's absolutely no point in having a module name that's 3 letters if you always plan to do And finally:
So...it isn't a datasets library? https://twitter.com/Thom_Wolf/status/1261282491622731781 I'm confused 😕 |
Dropping by as I noticed that the library has been renamed |
I guess indeed |
I'd argue that |
I can't speak for the HF team @jramapuram, but as member of the community it looks to me that HF wanted to avoid the past path of changing names as scope broadened over time: Remember ;) Jokes aside, seems that the library is growing in a multi-modal direction (#363) so the current name is not that implausible. Possibly HF ambition is really to grow its community and bring here a large chunk of datasets of the world (including tabular / vision / audio?). |
Yea I see your point. However, wouldn't scoping solve the entire problem? import huggingface.datasets as D
import huggingface.transformers as T Calling something |
Sorry to reply to an old thread, but the name issue really makes troubles recently in my project. I'd never known in advance there's a package called "datasets". My first thought is that such a general term may be safe to arbitrarily use. Avoiding such a common name because of its ambiguity is quite weird. As we know in python it's not easy to differentiate system-wide and project-wide import like in C and C++. On the contrary I fully understand the challenge to rename a popular library. So it seems to provide a "huggingface" wrapper library as suggested above by @jramapuram may be a happy medium for both developers and users. Best Regards. |
Hey :)
Just making a thread here recording what I said on Twitter, as it's impossible to follow discussion there. It's also just really not a good way to talk about this sort of thing.
The issue is that modules go into the global namespace, so you shouldn't use variable names that conflict with module names. This means the package makes
nlp
a bad variable name everywhere in the codebase. I've always usednlp
as the canonical variable name of spaCy'sLanguage
objects, and this is a convention that a lot of other code has followed (Stanza, flair, etc). And actually, yourtransformers
library usesnlp
as the name for itsPipeline
instance in your readme.If you stick with the
nlp
name for this package, if anyone uses it then they should rewrite all of that code. Ifnlp
is a bad choice of variable anywhere, it's a bad choice of variable everywhere --- because you shouldn't have to notice whether some other function uses a module when you're naming variables within a function. You want to have one convention that you can stick to everywhere.If people use your
nlp
package and continue to use thenlp
variable name, they'll find themselves with confusing bugs. There will be many many bits of code cut-and-paste from tutorials that give confusing results when combined with the data loading from thenlp
library. The problem will be especially bad for shadowed modules (people might reasonably have a module namednlp.py
within their codebase) and notebooks, as people might run notebook cells for data loading out-of-order.I don't think it's an exaggeration to say that if your library becomes popular, we'll all be answering issues around this about once a week for the next few years. That seems pretty unideal, so I do hope you'll reconsider.
I suggest
nld
as a better name. It more accurately represents what the package actually does. It's pretty unideal to have a package namednlp
that doesn't do any processing, and contains data about natural language generation or other non-NLP tasks. The name is equally short, and is sort of a visual pun onnlp
, since a d is a rotated p.The text was updated successfully, but these errors were encountered: