Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: pyjanitor - clean APIs for data cleaning #475

Open
wants to merge 22 commits into
base: 2019
from

Conversation

Projects
None yet
3 participants
@ericmjl
Copy link

commented May 22, 2019

No description provided.

@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

Hi @ericmjl ! It looks like none of our reviewers were able to complete their reviews for your paper by the deadline, so I'll be stepping in to complete the review myself. Please expect comments from me some time in the next handful of days 🙂.

@deniederhut
Copy link
Member

left a comment

Thanks for the paper submission! Pyjanitor certainly sounds like a useful library, and I do enjoy the chained method syntax.

I've left a few inline comments where I had something specific to say about the content of the paper. I think my general impression is that your submission tells the reader that pyjanitor is useful, but could be more convincing if it showed the reader exactly why. For instance, I feel it's a bit light on details about:

  1. the capabilities of the library
  2. the implementation of those capabilities
    3 the benefit of your implementation over Pandas, or another Pandas interfacing library

I look forward to seeing your revisions!

Show resolved Hide resolved .gitignore Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
)
)
As of now, because symbolic parsing is unavailable, this fluent and declarative

This comment has been minimized.

Copy link
@deniederhut

deniederhut Jun 14, 2019

Member

Have you looked into patsy?

@ericmjl

This comment has been minimized.

Copy link
Author

commented Jun 14, 2019

@deniederhut, thanks for taking time to do the review. I suspect I may have accidentally resolved a pointer on my phone, when I didn’t intend to. I’ll go hunting for the review point when back on my computer!

@stargaser
Copy link
Contributor

left a comment

@ericmjl thank you for this interesting submission. I read it before seeing @deniederhut's review comments. I agree with those and I added a few more in-line.

tools of statistical modelling and machine learning: the ``DataFrame``.
However, there are inconsistencies in the `pandas` application programming
interface (API) that, while now idiomatic due to historical use, prevent use of
expressive, fluent programming idioms that enable self-documenting data science

This comment has been minimized.

Copy link
@stargaser

stargaser Jun 14, 2019

Contributor

Later in the paper, I find a reference to the 2005 post by Martin Fowler in which he coined the term "fluent interface". I haven't encountered this phrase as formal terminology before. It would be helpful to add more description of what is meant by "fluent" and why it is important, earlier in the paper. In the Wikipedia page https://en.wikipedia.org/wiki/Fluent_interface I find a succinct "with the goal of making the readability of the source code close to that of ordinary written prose, essentially creating a domain-specific language within the interface" which makes the term more understandable.

This comment has been minimized.

Copy link
@ericmjl

ericmjl Jun 15, 2019

Author

Yes, thanks for pointing this out, @stargaser; the current omission of this point is this paper’s main weakness, IMO. This will be addressed.

return df
Underneath each data cleaning function, we are free to use both the imperative
and functional APIs. What is exposed, though, is a functional and fluent API

This comment has been minimized.

Copy link
@stargaser

stargaser Jun 14, 2019

Contributor

This is the part of the paper where I first realized that "fluent" meant something formal, especially by seeing a reference. I would like the term to be defined more explicitly, sooner in the paper (per comment in the Introduction).

Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Show resolved Hide resolved papers/eric_ma/manuscript.rst Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.