feat: Add bind_rows and bind_cols #411

Techzune · 2022-03-31T15:03:56Z

A very rudimentary implementation of the dplyr equivalent.
Similar to join, when piping you must specify all involved dataframes.

e.g.: one >> bind_rows(_, two) or bind_rows(one, two)

machow · 2022-04-01T14:47:32Z

Hey--thanks for your PR! I hope it's okay, I added a couple commits for...

Allowing from siuba import bind_rows
Basic tests. I marked the dplyr behaviors that seem useful with pytest.mark.xfail!
A page in the docs, largely translated from the dplyr bind docs

Any chance you are interested in trying to implement the last couple pieces? :) It seems like there are just a couple dplyr behaviors left to get most of its bind_rows functionality!

I've listed out some of their key features below, and am happy to help with whatever is useful!

bind_rows

_id argument to create a column indicating which dataframe the row came from
support for dictionaries
~~support for lists of DataFrames~~ (this seems unnecessary in python, which can unpack things using *[])

bind_cols

I think that pd.concat([...], axis=1) aligns on the index, while bind_cols aligns on the row number.
See https://stackoverflow.com/a/48253339/1144523
Overall, I think bind_cols is a lot less important that bind_rows, so focusing on bind_rows seems like it'd be a huge boost for siuba!
- bind_rows(): accept lists? tidyverse/dplyr#1104
- Consider bind_cols() alternatives tidyverse/dplyr#5063

import pandas as pd

df1 = pd.DataFrame({'x': [0,1]}, index = [0, 1])
df2 = pd.DataFrame({'y': [1, 2]}, index = [1, 2])

# note that this also doesn't work
# pd.concat([df1, df2], axis=1, ignore_index=True)

pd.concat([df1, df2], axis=1)

     x    y
0  0.0  NaN
1  1.0  1.0
2  NaN  2.0

machow

Thanks for submitting this! Added some feedback in a comment. I'm still feeling out which dplyr bind_rows behaviors are most useful, and would love to get your feedback on what pieces are most useful to have

Techzune · 2022-04-09T13:21:42Z

Ignore those previous bind_rows commits. It was a long week, and I didn't read the docs you added! 😅
Let me implement that real quick.

Techzune · 2022-04-09T14:06:00Z

✨ there we go!

machow · 2022-04-10T22:40:38Z

Ah, thanks a ton! I'm running the tests, and can take a closer look tonight or tomorrow!

I noticed there were a few places (like mutate) where the variable result was changed to df_result, was that to make it easier to understand at a glance?

Techzune · 2022-04-10T22:45:05Z

Ah, thanks a ton! I'm running the tests, and can take a closer look tonight or tomorrow!

I noticed there were a few places (like mutate) where the variable result was changed to df_result, was that to make it easier to understand at a glance?

Gah crud! That was a mistake on my end. I renamed my "result" variable, and I guess VS Code said "oooh rename this one!" I tend to write my variables as a definition of what they are in my data science work-- for example, a dataframe always starts with df_ and a list starts with list_. However, I'm not always perfect at it. Anyway, I'm sure that line change should be omitted from the commit.

# Conflicts: # siuba/dply/verbs.py

Added bind_rows and bind_cols

280132c

Techzune requested a review from machow as a code owner March 31, 2022 15:03

machow added 4 commits April 1, 2022 10:24

fix(pandas): allow toplevel importing of bind funcs

f99806c

data: add starwars dataset

6fc210f

docs: add stubbed out bind docs

46ae4c8

tests: add stubbed out bind tests

d500eaf

machow requested changes Apr 1, 2022

View reviewed changes

machow mentioned this pull request Apr 3, 2022

Does siuba have functions to bind rows and columns ? #348

Open

Techzune added 4 commits April 8, 2022 15:51

Merge branch 'machow:main' into main

fe24b50

added support for _id column and dict

899af0d

align bind_cols on row number instead of index

37ce740

removed inplace (oops)

c08ce47

support dplyr definition of bind_rows

dfa0758

Techzune added 2 commits April 9, 2022 09:19

Code cleanup

3dc7fb3

update docstring, needs examples

9c6a253

Techzune requested a review from machow April 9, 2022 17:43

Techzune added 2 commits April 10, 2022 20:04

support dplyr definition of bind_rows

81225a2

Merge remote-tracking branch 'origin/main'

575bffe

# Conflicts: # siuba/dply/verbs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add bind_rows and bind_cols #411

feat: Add bind_rows and bind_cols #411

Techzune commented Mar 31, 2022

machow commented Apr 1, 2022 •

edited

Loading

machow left a comment

Techzune commented Apr 9, 2022 •

edited

Loading

Techzune commented Apr 9, 2022

machow commented Apr 10, 2022

Techzune commented Apr 10, 2022

feat: Add bind_rows and bind_cols #411

Are you sure you want to change the base?

feat: Add bind_rows and bind_cols #411

Conversation

Techzune commented Mar 31, 2022

machow commented Apr 1, 2022 • edited Loading

bind_rows

bind_cols

machow left a comment

Choose a reason for hiding this comment

Techzune commented Apr 9, 2022 • edited Loading

Techzune commented Apr 9, 2022

machow commented Apr 10, 2022

Techzune commented Apr 10, 2022

machow commented Apr 1, 2022 •

edited

Loading

Techzune commented Apr 9, 2022 •

edited

Loading