Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to show the translation to pandas #116

Closed
GitHunter0 opened this issue May 4, 2022 · 9 comments
Closed

Function to show the translation to pandas #116

GitHunter0 opened this issue May 4, 2022 · 9 comments

Comments

@GitHunter0
Copy link
Collaborator

GitHunter0 commented May 4, 2022

Hey @pwwang , does datar have a function to display the translation to pandas commands?

I believe it would be a very useful addition for many reasons, one being that it would help a lot datar and pandas users to work together in a project.

In R, an analogous function would be dplyr::show_query(), which shows the translation to SQL, like in the example below:

df <- dbplyr::lazy_frame(mtcars)
df |> dplyr::select(mpg) |>  dplyr::show_query()
#> <SQL>
#> SELECT `mpg`
#> FROM `df`

Thank you

@pwwang
Copy link
Owner

pwwang commented May 4, 2022

No, not now. We actually had a discussion before: #48 (comment)
Need some investigation on implementation, but it's definitely in the plan.

@GitHunter0
Copy link
Collaborator Author

@pwwang , I think that discussion is about a different issue, porting dbplyr to datar.

What I'm proposing here is a different matter, not related to SQL or dbplyr, it is just a function to show the translation of datar commands to pandas commands. The translation is already being made in the backend, show_query() would just be a function to display that to the user.

Did you get what I mean?

@pwwang
Copy link
Owner

pwwang commented May 4, 2022

Nope.

Could you define "datar commands" and "pandas commands"?
From the example you gave in the main post, it looked like a dbplyr function.
Could you give an example without dbplyr, if it is what you meant?

@GitHunter0
Copy link
Collaborator Author

GitHunter0 commented May 4, 2022

@pwwang , basically, the show_query() function would display the datar syntax converted into the equivalent pandas syntax.
Here's a simple example:

import datar.all as d
from datar.all import f
import pandas as pd
#
d.tibble(x=range(1,5), y=range(2,6)) >> \
    d.filter(f.x.isin([2,4])) >> \
    d.show_query()
# Would return something like:
'''
df = pd.DataFrame({'x': range(1,5), 'y': range(2,6)})
df[df['x'].isin([2,4])]
'''

Please ask if it is still not clear what I mean.

@pwwang
Copy link
Owner

pwwang commented May 4, 2022

I see what you mean now. But I am not sure if this is feasible since the operations are not 1:1 or 1:N mappings. It could be way more complicated than you think. Quite a lot of verbs/functions are not simply composed of just a couple of pandas operations.

For example:

from datar.all import glimpse
from datar.datasets import iris

iris >> glimpse()

# Rows: 150
# Columns: 5
# . Sepal_Length <float64> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0,…
# . Sepal_Width  <float64> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4,…
# . Petal_Length <float64> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5,…
# . Petal_Width  <float64> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2,…
# . Species      <object>  'setosa', 'setosa', 'setosa', 'setosa',…

It's almost "impossible" to remap it to simple and clean pandas commands.

@GitHunter0
Copy link
Collaborator Author

For sure @pwwang , I'm aware of the complexity and that a one to one map is impossible.

I was not expecting to consistently get clean pandas commands but rather a look on what operations are being executed in the backend (pandas or not).
Using the example you gave, glimpse() does not have a pandas equivalent but there was a sequence of python operations executed to reach the result and show_query() would display that to the user even if that is complex.

It is just an idea to see datar under the hood (see the backend), which I think would be helpful. But feel free to close this issue if you believe it's not worth the effort.

@pwwang
Copy link
Owner

pwwang commented May 4, 2022

It looks like you want to reach the implementation of how datar is wrapping the operations. I think the best way is to take a look at datar's source code of that verb/function.

I wouldn't say it's really impossible to provide the source code of implementation, but to make it runnable with raw pandas/python setup, without attaching datar configurations and other preparations, is a completely different question. If we are not sticking with making it runnable (which sets us free from attaching those configurations/preparations), I am not sure how abstract we should do for the pseudo-code. We have to definitely find a rule for it as we can't write pseudo-code for those verbs/functions one by one manually.

I am closing it for now, but we can keep an eye on this for sure.

@pwwang pwwang closed this as completed May 4, 2022
@GitHunter0
Copy link
Collaborator Author

GitHunter0 commented May 31, 2022

@pwwang , just for the sake of completeness. I will let an example below of an R package that shows the translation from dplyr to data.table syntax.
Of course it is a different case than datar and pandas but maybe can serve as a reference for future work.

library(data.table)
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

mtcars2 <- lazy_dt(mtcars)

mtcars2 %>% 
  filter(wt < 5) %>% 
  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
  group_by(cyl) %>% 
  summarise(l100k = mean(l100k))
#> Source: local data table [3 x 2]
#> Call:   `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)), keyby = .(cyl)]

It shows that the dplyr code above is equivalent to data.table code

`_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)), keyby = .(cyl)]`. 

@pwwang
Copy link
Owner

pwwang commented May 31, 2022

Good mention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants