New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add log names and types #397
Conversation
One solution might be to make Something like
That could easily be extended to include |
We could even extend this more such that any arbitrary function that takes a df and produces a string could be used. For example if I want to count the number of
|
One thing to be mindful of: pandas just had a new release and I think this is causing some of the old tests to fail. Will have a look today but it's good to be aware. |
It turns out there's one failing test due to numpy, it shouldn't influence this PR. |
I was working towards the direction indicated by @pim-hoeven, I didn't find a trivial way to create parametrize decorators. Finally went for the first answer of this question on stackoverflow. I've renamed the Also, tests are failing but they are also failing on other PRs, but I don't understand the reason. |
The tests were failing because of a breaking |
I don't know how @koaning feels about this, so please don't implement this before his approval, but I think it would be really nice to have some standard options that we can switch on or off as arguments and an extra option where the user can supply their own logging function. This would keep usage simple and clean (we only need a single decorator per function).
|
I'd like to get @MBrouns in on this one but when I think about one of the big life lessons of api design:
Then I might propose a slight edit (again, would appreciate @MBrouns' opinion here). Maybe we can generalize into two functions. def log_step(func, *, time_taken=True, shape=True, names=False, dtypes=False, level=logging.INFO):
...
def log_step_custom(func, extra, level=logging.INFO):
... I'm assuming that most people won't need the custom part. And if they need it, it most likely Another part of the thinking here is that |
@MBrouns (who I think is now back from holiday) got an opinion on this? After mulling it over I think this might be what is best; def log_step(func, *, time_taken=True, shape=True, names=False, dtypes=False, level=logging.INFO):
...
def log_step_custom(func, level=logging.INFO, **kwargs):
... The idea with @log_step_custom(n_user=lambda d: d['uid'].nunique(), n_sess=lambda d: d['session'].nunique())
def remove_outliers(dataf, max_days_per_user=100, max_rows_per_session=100):
... This way you also have the name (the key) as well as what to log (the lambda function). You can then have a logline like;
|
It's been a while (I got distracted) but I just want to check @david26694 are you still interested in working on this feature? I have a bit more bandwith now and would like to get this PR in :) |
Hi, sorry for the delay, I'm coming back from holidays. I'm a bit busier now, so if you want to get it in soon it might be better for somebody else to work on it. |
@koaning I would like to work on this coming Wednesday or Thursday |
@pim-hoeven cool! Is it possible for you to start a new branch from the branch from this branch made by @david26694. There's been some appreciated effort on his part and that way his name also makes it to the contributor list. |
aw, thanks for that! |
This got fixed in another PR. |
I feel that by copying the
log_step
function many times and slightly changing the logging section I'm repeating a lot of code. Do you have any suggestion to avoid this?This is WIP, some TODOs:
log_step
->log_shape
.log_dtypes