New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatibility with tidygraph #252

Closed
gvdr opened this Issue Nov 22, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@gvdr
Copy link

gvdr commented Nov 22, 2018

I'm trying to explore the behaviour of janitor functions such as clean_names() on tidygraph objects as tbl_graphs. (I wrote about this previously here

For an example, let's create a tiny test graph with bad named variables:

library(tidygraph)
#> 
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(janitor)
test_graph <- play_erdos_renyi(10, 0.5) %>% 
  mutate(Strange_Name  = 1,
         `Other bad Name` = 2)

The following cleaning silently doesn't work (columns names are left unchanged):

test_graph %>%
  janitor::clean_names()
#> # A tbl_graph: 10 nodes and 50 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Node Data: 10 x 2 (active)
#>   Strange_Name `Other bad Name`
#>          <dbl>            <dbl>
#> 1            1                2
#> 2            1                2
#> 3            1                2
#> 4            1                2
#> 5            1                2
#> 6            1                2
#> # ... with 4 more rows
#> #
#> # Edge Data: 50 x 2
#>    from    to
#>   <int> <int>
#> 1     4     1
#> 2     6     1
#> 3    10     1
#> # ... with 47 more rows

On the other hand, and without much surprise, clean_names() works on the exported tibble:

test_graph %>%
  as_tibble() %>%
  janitor::clean_names()
#> # A tibble: 10 x 2
#>    strange_name other_bad_name
#>           <dbl>          <dbl>
#>  1            1              2
#>  2            1              2
#>  3            1              2
#>  4            1              2
#>  5            1              2
#>  6            1              2
#>  7            1              2
#>  8            1              2
#>  9            1              2
#> 10            1              2

It is not super clear to me why that happens. A simple rename() function as exported from tidygraph does the job, but that requires manual intervention:

test_graph %>%
  rename(lower_name = "Strange_Name",
         better_name = "Other bad Name")
#> # A tbl_graph: 10 nodes and 47 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Node Data: 10 x 2 (active)
#>   lower_name better_name
#>        <dbl>       <dbl>
#> 1          1           2
#> 2          1           2
#> 3          1           2
#> 4          1           2
#> 5          1           2
#> 6          1           2
#> # ... with 4 more rows
#> #
#> # Edge Data: 47 x 2
#>    from    to
#>   <int> <int>
#> 1     1    10
#> 2     4     1
#> 3     5     1
#> # ... with 44 more rows

and that is the case even with a direct call to dplyr::rename().

I am clearly missing something here. The same holds for the other janitor functions I tried.

Examples created on 2018-08-28 by the reprex package (v0.2.0).

@Tazinho

This comment has been minimized.

Copy link
Contributor

Tazinho commented Nov 22, 2018

This happens, as janitor::clean_names() works only for data frames. There is currently a discussion ongoing if it should become a generic with methods also implemented for other objects sf-objects and possibly tbl_graph-objects.

However, until then you might just want to use the approach to work on strings and use janitor::make_clean_names() from the dev-version of this pkg together with one of the rename()-helpers like rename_all(), rename_at(), rename_if(). For example

test_graph %>% dplyr::rename_all(janitor::make_clean_names)
#> # A tbl_graph: 10 nodes and 46 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Node Data: 10 x 2 (active)
#>   strange_name other_bad_name
#>          <dbl>          <dbl>
#> 1            1              2
#> 2            1              2
#> 3            1              2
#> 4            1              2
#> 5            1              2
#> 6            1              2
#> # ... with 4 more rows
#> #
#> # Edge Data: 46 x 2
#>    from    to
#>   <int> <int>
#> 1     1    10
#> 2     2     1
#> 3     5     1
#> # ... with 43 more rows
@sfirke

This comment has been minimized.

Copy link
Owner

sfirke commented Nov 25, 2018

Yep this is very similar to #247 as @Tazinho notes. I'd be open to a generic method to make clean_names work on a tbl_graph object if others think there's demand for it. Anyone who wishes could author a pull request similar to what was done on #249 - it will probably be simpler than that as the S3 generic has now been created, etc.

@JosiahParry

This comment has been minimized.

Copy link
Contributor

JosiahParry commented Nov 25, 2018

Hey, this was a super simple fix. Just created a PR (#254) that made a method that works on tbl_graph objects.

@gvdr

This comment has been minimized.

Copy link
Author

gvdr commented Dec 5, 2018

Thanks you all!

@gvdr gvdr closed this Dec 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment