Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename docs and order #6

Open
brentp opened this issue Jun 24, 2021 · 5 comments
Open

rename docs and order #6

brentp opened this issue Jun 24, 2021 · 5 comments

Comments

@brentp
Copy link

brentp commented Jun 24, 2021

I have a header line that starts with '#' so I want to do:

var df = readCsv(tsv, sep='\t').rename(f{"mode" <- "#mode"})

this works, but the docs say to use ~ which does not work.

Another issue is that it changes the column ordering by adding the new column at the end (as expected using OrderedTable), but I would expect it to retain the column order but keep a new name.

I know it's early days, but just wanted to flag this as I saw it.

Thanks for the dataframe lib!

@HugoGranstrom
Copy link
Member

Good catch, that it definitely a typo in the docs, should be <- as you noted :)

Another issue is that it changes the column ordering by adding the new column at the end (as expected using OrderedTable), but I would expect it to retain the column order but keep a new name.

It should be possible by reconstructing the OrderedTable with the replaced key(s), and the columns have reference semantics so there is no expensive copying of the data involved in recreating it.

@Vindaar
Copy link
Member

Vindaar commented Jun 24, 2021

Thanks for the dataframe lib!

You're welcome!

I have a header line that starts with '#' so I want to do:

First of all, you can just use the header argument of readCsv:

var df = readCsv(tsv, sep='\t', header = "#")

(Note: it takes a string, but the implementation currently only works on the first char of it)

this works, but the docs say to use ~ which does not work.

@HugoGranstrom is almost right. It's not actually a typo, but a leftover from the first fully runtime based data frame implementation. That one had formulas based on ~ without any f{} macro, so one could write:

let fn = "mode" ~ "#mode"

and pass such a thing to the procs. I threw out the non scoped formulas when I rewrote the data frame implementation, because it seemed to foreign. Better to encapsulate it.

I really need to go through the datamancer code and fix up the docstrings and add runnable examples everywhere. Didn't do any announcements about a first datamancer release because the docs are still in the current state.

For reference on the old implementation. The DF + formula implementation lives here:
https://github.com/Vindaar/ggplotnim/blob/fixFormulaImpl/src/ggplotnim/dataframe/fallback/formula.nim
and here are some examples of what was possible with that.
https://github.com/Vindaar/ggplotnim/blob/fixFormulaImpl/tests/tests.nim#L207-L224

Another issue is that it changes the column ordering by adding the new column at the end (as expected using OrderedTable), but I would expect it to retain the column order but keep a new name.

Yup. I haven't paid much attention to the order of columns so far aside from making sure the order is as initially inserted. It feels like bad style to depend on the order of columns. It should only be important not to confuse people when printing / viewing them or writing them to file.

It seems misguided to provide access to columns based on indices like pandas allows, but maybe I'm missing something.

I do agree though, that the order for renaming and mutating existing columns (here I'm not sure from the top of my head if the order is actually kept, but it should even now) should stay the same.

Again @HugoGranstrom has a good point. In principle we can reconstruct a new table and assign the columns, as they are ref objects.

Maybe the standard library could grow a replace procedure for OrderedTable though. Seems like a useful thing to have in general.

@Vindaar
Copy link
Member

Vindaar commented Jul 5, 2021

This should hopefully be addressed once #15 is merged. Feel free to provide feedback. Otherwise I'll close this issue in a couple of weeks. :)

edit: Now live here: https://scinim.github.io/Datamancer/dataframe.html

@carlosrup
Copy link

Hi, i was triying to open a csv file and get the next error:

Error: unhandled exception: /home/carlos/.nimble/pkgs/datamancer-0.2.5/datamancer/io.nim(518, 14) row + skippedLines == lineCnt - 1 Bad file. Please report an issue. [AssertionDefect]

i used the next code line: var df = readCsv("base_Mdiciembre.csv", sep=';')

Iḿ not sure what to do.

@Vindaar
Copy link
Member

Vindaar commented Jul 19, 2022

As already mentioned on discord/matrix, could you please provide the first few lines of the CSV file so I can reproduce the problem?

(you could have opened a new issue for this, btw)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants