Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let vs set in new columns #6

Closed
max-sixty opened this issue Jan 24, 2022 · 11 comments
Closed

let vs set in new columns #6

max-sixty opened this issue Jan 24, 2022 · 11 comments

Comments

@max-sixty
Copy link
Member

max-sixty commented Jan 24, 2022

justinpombrio makes a good point:

Another suggestion around let: consider splitting it into two operations, for creating a new column and for modifying an existing one. E.g. called let and set. Those are in effect pretty different operations: you need to know which one is happening to know how many columns the table will have, and renaming a table column can with your current system change which operation is happening.

Splitting them into separate operations would make things easier on the reader: they can tell what's happening without having to know all the column names of the table. And it shouldn't really be harder for the writer, who ought to already know which they're doing.

I think this is a good idea! dplyr has something similar with mutate & transmute.

It can mostly be enforced by PRQL. There's a case where we transpile to:

select *, x+1 as x_plus_one

...where we don't know whether or not we're overwriting an existing column. But it's a minority of cases, and the contract could stand within PRQL.

let & set seem reasonable but also open to other syntax.

@RCHowell
Copy link

How about removing the keywords and using Golang's solution for this? It's rather concise.

  • := for declaration i.e. creating/declaring a new column
  • = for assignment i.e. modifying an existing column

@max-sixty
Copy link
Member Author

I had the same initial thought, but there was fairly broad feedback that always leading with a keyword was more consistent and easier to scan, and after making the change I empathize (max-sixty#2)...

(though I agree that the golang is nice had we not decided to lead with a keyword, thanks for the suggestion @RCHowell )

@RCHowell
Copy link

Makes sense. Just my opinion, but the I feel the declarative nature of assignment interrupts the functional flow. I'm curious what you think of having an explicit projection function to maintain this functional nature. Then the projection fits in just like any other transformation/operation/iterator

from employees
project [
  gross_salary := salary + payroll_tax,
  first_name = uppercase(first_name)
]

from employees
project[
  let gross_salary = salary + payroll_tax,
  set first_name = uppercase(first_name)
]

@max-sixty
Copy link
Member Author

Yes, that's compelling:

from employees
project [
  gross_salary := salary + payroll_tax,
  first_name = uppercase(first_name)
]

...could be great. And if we figured out whether a single item can be a list, then it could just be project gross_salary := salary + payroll_tax for a single item.

I wonder what people think of project; vs let / assign / etc? I don't think I've used a lang that uses it, but it does make sense.

@max-sixty
Copy link
Member Author

FYI @hadley likes derive: https://news.ycombinator.com/item?id=30067418, from arquero

@qharlie
Copy link
Contributor

qharlie commented Jan 25, 2022

I like the distinction of := and = in Golang, but personally don't think it belongs in PRQL , because people other than programmers will be using this , and the set/let keywords flow and look better with the language and I think its more obvious what they do ( for non programmers or people not used to assignment vs declaration operators ).

I like derive probably the best word for this operation ... but let/set just seems to fit so nicely here, I still vote let/set.

@hadley
Copy link

hadley commented Jan 25, 2022

IMO let and set are likely to be confusing to non-programmers. And even I would need to think whether set creates or overrides, since I don't commonly work in a language that makes that distinction.

@max-sixty
Copy link
Member Author

Thanks @hadley . Would you recommend we have derive do both creating & overriding? How helpful have you found the mutate / transmute distinction in dpylr?

@hadley
Copy link

hadley commented Jan 26, 2022

I don't find the distinction that important, although I can see that it might be nice to be explicit about overwriting an existing column (but you'd need to figure out if that explicitness is something that SQL programmers enjoy or find annoying).

In hindsight, I'm not sure I'd keep transmute() since it's now a variation of mutate() with .keep = "none". I'd suggest you take a look at that argument as well as .before/.after, as they were popular requests from users. The ability to control where the new variables go was particularly popularly, although that may be partly because tibble only displays the first columns that fit on screen, so if you're adding new variables to the end you might not be able to see them to check that you've done the computation correctly. .keep = "used" is particularly nice for this use case.

@hadley
Copy link

hadley commented Jan 26, 2022

One more thought: it's also really nice to be able to write x2 = x + 1, x3 = x2 + 1 and not have to worry about the subqueries. This is one of my person pain points when I write SQL by hand.

@RCHowell
Copy link

In my opinion, derive reads better than project. @max-sixty what's your opinion on grouping assignments into a projection operator to maintain the pipelined composition?

Something to consider is the argument to this operator
A. Block - challenges if this language is declarative
B. List - list of assignment statements
C. Map - output column name to lambda with relations in scope, like arquero derive
D. Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants