Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settle on strings vs. atoms for column names #58

Closed
cigrainger opened this issue Sep 13, 2021 · 4 comments
Closed

Settle on strings vs. atoms for column names #58

cigrainger opened this issue Sep 13, 2021 · 4 comments
Labels
note:discussion Further information is requested note:help wanted Extra attention is needed

Comments

@cigrainger
Copy link
Member

One more thing that came up, most noticeable in the dtypes and names test is the inconsistency of referring to columns sometimes as atoms and sometimes as strings. with_columns and names both takes the column names as strings, while dtypes expects a keyword list where they're atoms. Should I add another commit to settle on one or the other? I could also make either work, but that gives me flashbacks to Rails where controller params keys could be either strings or atoms, which caused a lot of confusion, and which Phoenix rightly (IMO) fixed by standardizing on just strings.

On this, I think the entire API is a bit split on this. I've tried to make both work in many cases where I felt it was more ergonomic, but I'm feeling the same as you that they should be standardised. I think the obvious way to do that is to settle on strings. Let's leave this the way it is right now and we can revisit this in a broader context.

Originally posted by @cigrainger in #48 (comment)

@cigrainger cigrainger added note:discussion Further information is requested note:help wanted Extra attention is needed labels Sep 13, 2021
@josevalim
Copy link
Member

They definitely have to be strings because I can see people using dataframes to work on external data and you don't want to convert to atoms. On the other hand, for things like dtypes, having them as atoms is definitely more convenient. So I would go with supporting both but normalizing all of them to strings on the facade API instead of the backend, as this is shared logic across all backends and all shared logic goes in the facade.

@cigrainger
Copy link
Member Author

This is pretty much the approach I've taken for column names -- support both but normalize on the facade API. There are a couple places where I think strings aren't supported and I'll try to root those out (the dtypes/schema approach above definitely is one of those). @losvedir what do you think about everything accepting strings for column names but some places also accepting atoms where it may be more convenient or readable (e.g. keyword lists as arguments)?

@losvedir
Copy link
Contributor

losvedir commented Sep 18, 2021

@losvedir what do you think about everything accepting strings for column names but some places also accepting atoms where it may be more convenient or readable (e.g. keyword lists as arguments)?

Sounds good to me! As long as the whole interface accepts one or the other then I'm happy. And it makes sense to me for that to be strings not only because that works more easily with different backends per what José said, but also because that's what the library drives you towards in the core interaction with DataFrames: you do df["my_col"] and not df[:my_col] or df.my_col. From what I've come across I've only run into dtypes not supporting strings, and that's an easy fix.

Personally, I don't find [col: :string] that much more convenient than [{"col", :string}] or %{"col" => :string} and so think the conceptual inconsistency isn't worth the benefit, but I think reasonable people can disagree here. If others like the keyword list approach (and it sounds like that's the case) then also supporting atoms for column names there makes sense. Though, actually, I think maybe then it would make sense in that case to support atoms everywhere column names are used? I think this is what José suggested, though contrary to your question to me which says "some places".

But don't give undue weight to my thoughts. This is your library, and José is the unparalleled genius in development experience. I'm only chiming in here because you asked directly and so, well, here are my two cents. 😄

@cigrainger
Copy link
Member Author

This was actually closed with #62!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
note:discussion Further information is requested note:help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants