dataframe to tibble #55

Mrostgaard · 2021-09-14T16:48:43Z

Hello

I'm trying to read a csv file with pandas and pass it to a tibble to work with it. I couldn't find any documentation for this.

What I want to do is:

Read csv file (currently using pandas for this) and converting it to a dataframe
Take that dataframe do

df >> group_by(f.col1, f.col2) >> mutate(newCol1 = min(f.col-value), newCol2 = max(f.col-value))

When i try to do it with a pandas dataframe I get this error:

/python/lib/python3.8/site-packages/pipda/utils.py:161: UserWarning: Failed to fetch the node calling the function, call it with the original function.
  warnings.warn(
NotImplementedError: 'group_by' is not registered for type: <class 'pipda.symbolic.DirectRefAttr'>.

and then just a traceback of the most recent calls.

How should i properly load my csv file to use datar?

The text was updated successfully, but these errors were encountered:

pwwang · 2021-09-14T16:50:02Z

Are you running in a raw python REPL?

Mrostgaard · 2021-09-15T07:51:42Z

I'm running in Databricks on azure

pwwang · 2021-09-15T08:14:00Z

Related: #45, #54

datar relies on the source code to detect the AST node, so we know whether the verbs are calling with piping syntax.
We can't detect AST node at runtime on Databricks notebooks as well as raw python REPL.

You have two solutions in such a case:

Use regular calling:

mutate(group_by(df, f.col1, f.col2), newCol1 = min(f.col-value), newCol2 = max(f.col-value))

Use "all piping" mode:

from pipda import options
options.assume_all_piping  = True

# imports and data loading
df >> group_by(f.col1, f.col2) >> mutate(newCol1 = min(f.col-value), newCol2 = max(f.col-value))

In this "blind" environment, regular calling and piping calling are mutually exclusive. This means with the "all piping" mode, you have to even call df >> nrow(), instead of nrow(df), since nrow is registered as a verb. But min and max in the above example are okay, because they are registered as functions.

Mrostgaard · 2021-09-15T08:55:14Z

It no longer fails on group_by so thanks for that, and an amazing response time!
Option number one doesn't seem to do exactly what I want it to do. When running the mutate function the df now is of type:
Verb(func=mutate, dataarg=True)
And it isn't mutated, when i run
print(df[newCol1])
It returns NameError: name 'newCol1' is not defined

Option two just fails with:
ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements

This might be a problem with my implementation and not datar though?

Is there anything I need to do to collapse from Verb to dataframe or something?

pwwang · 2021-09-15T15:16:04Z

Could you provide a minimal reproducible code and data?

Mrostgaard · 2021-09-15T15:24:04Z

Sure I will try

Mrostgaard · 2021-09-15T17:38:33Z

I have looked at it and it is definitely my own mergings fault. Works fine with smaller inputs so somewhere I'm wrong, this is not the library's fault.

Thank you for a great library!

* 📝 Add documentation for the "blind" environments (#45, #54, #55) * 🩹 Fix trimws not importable from datar.all/datar.base * ✨ Make as_date() return pd datetime types; Add as_pd_date() as an alias of pd.to_datetime() (#56) * 🔖 0.5.1 * 🚨 Fix linting * 👷 Deploy the docs on dev branch as well * 💚 Fix docs deply in CI

pwwang added the raw python repl Issues with piping syntax in raw python REPL label Sep 15, 2021

Mrostgaard closed this as completed Sep 15, 2021

pwwang added a commit that referenced this issue Sep 16, 2021

📝 Add documentation for the "blind" environments (#45, #54, #55)

cbf5e3b

pwwang mentioned this issue Sep 16, 2021

0.5.1 #59

Merged

sthagen mentioned this issue Sep 17, 2021

0.5.1 (#59) sthagen/pwwang-datar#1

Merged

pwwang mentioned this issue Feb 26, 2024

PipeableCallCheckWarning: Failed to detect AST node calling xxx, assuming a normal call #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataframe to tibble #55

dataframe to tibble #55

Mrostgaard commented Sep 14, 2021

pwwang commented Sep 14, 2021

Mrostgaard commented Sep 15, 2021

pwwang commented Sep 15, 2021

Mrostgaard commented Sep 15, 2021

pwwang commented Sep 15, 2021

Mrostgaard commented Sep 15, 2021

Mrostgaard commented Sep 15, 2021

dataframe to tibble #55

dataframe to tibble #55

Comments

Mrostgaard commented Sep 14, 2021

pwwang commented Sep 14, 2021

Mrostgaard commented Sep 15, 2021

pwwang commented Sep 15, 2021

Mrostgaard commented Sep 15, 2021

pwwang commented Sep 15, 2021

Mrostgaard commented Sep 15, 2021

Mrostgaard commented Sep 15, 2021