Skip to content
This repository has been archived by the owner on Jan 28, 2023. It is now read-only.

Nice to have toMap function in DataFrame #86

Closed
alshan opened this issue Jun 29, 2020 · 8 comments
Closed

Nice to have toMap function in DataFrame #86

alshan opened this issue Jun 29, 2020 · 8 comments
Labels

Comments

@alshan
Copy link

alshan commented Jun 29, 2020

This code works for me:

val mpg_df = DataFrame.readCSV("https://jetbrains.bintray.com/lets-plot/mpg.csv")

val dat = (mpg_df.names.filter {it.isNotBlank()}.map { Pair(it, mpg_df.get(it).values())}).toMap()

but would be nice to just use:

mpg_df.toMap()
@holgerbrandl
Copy link
Owner

Thanks for the sharing. You could define an extension function to support this:

fun DataFrame.asMap() = names.filter { it.isNotBlank()}.map { Pair(it, get(it).values())}.toMap()

Clearly krangl's API needs to be extended to cover more use-cases, but unless you could elaborate more on the specific usecase, I'd think that such extension does not need to be part of the core API.

@alshan
Copy link
Author

alshan commented Jul 1, 2020

Here is my use-case: https://github.com/JetBrains/lets-plot-kotlin/blob/master/docs/examples/jupyter-notebooks/geom_smooth.ipynb

I read CSV file to krangl DataFrame and then have to convert it to a regular Map to pass to Lets-Plot as data parameter.
Lets-Plot is not dependent on Krangl but even if it were dependent creation of such an extension does look like a leaky abstraction to me.

The most reasonable place to put the toMap extension IMO would be Krangl itself.

@alshan
Copy link
Author

alshan commented Jul 1, 2020

Another use-case and the pain is right here
image

https://jaxenter.com/kotlin-jupyter-notebook-172202.html

@holgerbrandl
Copy link
Owner

Great examples. I took the liberty to disable the isNotBlank filter. Imho either krangl should not support empty column names (similar to how dplyr does it). Or, an empty column name (just one, because names must be unique by design) is legit, and should be preserved when doing toMap. But I guess this would be a different ticket.

@alshan
Copy link
Author

alshan commented Jul 2, 2020

My understanding was that no-name column was the index. As map doesn't have index I've dropped it.

@holgerbrandl
Copy link
Owner

There is no equivalent to pandas indices in krangl. I personally found the index concept always confusing compared to dplyr which works great (if not better) without confronting the user with a similar index-concept.

So in krangl, we enforce unqique columns already (following dplyr's lead), but we currently support an empty string as column name. However, I can't imagine any usecase for such a feature, so I'd be more in favor of enforcing not-null not-empty column names.

@holgerbrandl
Copy link
Owner

Btw, how did you write https://jetbrains.bintray.com/lets-plot/mpg.csv ? I've noticed that the latest version of the used csv parsing library commons-csv can not handle it because of https://issues.apache.org/jira/browse/CSV-257 and I'd like to comment on it by pointing to your example file.

@alshan
Copy link
Author

alshan commented Jul 2, 2020

I really couldn't recall now. Please feel free to point at it. I believe we've checked the license before using it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants