Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] move content from pydata global 2022 (transformers, pipelines tutorial) to sktime main repo #4381

Merged
merged 8 commits into from
Apr 24, 2023

Conversation

dashapetr
Copy link
Contributor

@dashapetr dashapetr commented Mar 23, 2023

Reference Issues/PRs

Fixes #4373

What does this implement/fix? Explain your changes.

Adds the content from the tutorials at pydata global 2022 on pipelines,
transformers to the sktime main repo:
https://github.com/sktime/sktime-tutorial-pydata-global-2022
Content is notebooks 2 and 3 from pydata global.
Added in 03_transformers.ipynb, 03a_transformers_cheat_sheet, and 03b_forecasting_transformers_pipelines_tuning.ipynb accordingly.

Does your contribution introduce a new dependency? If yes, which one?

no

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dashapetr dashapetr marked this pull request as ready for review March 23, 2023 21:08
@dashapetr dashapetr requested a review from fkiraly as a code owner March 23, 2023 21:08
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, much appreciated!

  • the notebooks should be linted, the linter is complaining... you can see the issues when going on "details" next to the failing check
  • have you checked whether there is content in one that's not there in the other, perhaps it makes sense to keep parts of the original notebook? I'm not sure myself, your opinion is appreciated.

@dashapetr
Copy link
Contributor Author

Thanks, @fkiraly. I will resolve it during the weekend.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 24, 2023

thanks! as said, would be interested in your opinion more generally, which material and which order of material is most useful to readers looking for a tutorial.

@dashapetr
Copy link
Contributor Author

dashapetr commented Mar 29, 2023

Hi @fkiraly, sorry for taking a while.
I took a closer look at transformers: the existing notebook and the one from pydata.

The biggest part of them is equal or almost equal: What are transformers, Different types of transformers, Broadcasting, Transformers as pipeline components, Searching for transformers.

What is different: the existing notebook contains a big section 'Transformers types and signatures'
Pydata notebook has a very good prelude 'Wherefore transformers?', also 'Combining transformers, feature engineering' and 'Appendix - cheat sheets'

IMHO there are two options:

  1. Combine missing parts and form 1 complete notebook dedicated to transformers.
  2. Make the pydata notebook shorter and lighter (like the intro), and move more heavy stuff to the existing one.

Let me know which one would be better to choose.

PS. regarding the pipelines notebook - I believe it could fit logically well as number 5 into examples (after benchmarking)

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 29, 2023

sorry for taking a while.

No worries, and thanks a lot for having a look!

I would go with option 1 in order to not lose content.

My further recommendations would be based on these high-level principles:

  • the notebook should start as much as possible with overview and gentle introduction, and content should be ordered by "how useful to the general user reading this" (with a focus on new users and beginners)
  • the sequence should respect conditionalities, e.g., if you need content A to explain B, A should be presented further up than B
  • the notebook should be "complete", i.e., also contain reference material or links (but possibly further down, given the other two points)

So, based on what you say, it seems the following would make sense?

  • start with the intro section "wherefore transformers"
  • continue with the "equal" part, but taken from the pydata global notebook (it's newer and probably more polished), and include "combining transformers and feature engineering" (these two seem like commonly requested content)
  • somehow splice the appendix (perhaps convert into sections with numbers) with the "types and signatures" section. Worst case, if it's not clear how it fits, just sequence it (but let's not lose it)

Questions:

  • based on your reading and understanding, are my recommendations a good translations of the principles applied to what you were saying?
  • what about another option to split the material into more notebooks, instead? like the forecasting material which has one main notebook and then notebooks for the more specialized material? Not sure how that would be end up (and it might not be obvious).

For the pipelines notebook, I would actually put it after transformers but before benchmarking - because of the "conditionality" in the sense that you need transformers to build pipelines, and you want to benchmark any kind of estimators (including pipelines).

@dashapetr
Copy link
Contributor Author

Hi @fkiraly,

Thanks for the detailed response.
I agree that putting all transformers' information into 1 notebook would be a better option.

I also agree with this:

  • For the pipelines notebook, I would actually put it after transformers but before benchmarking.

Regarding the question

  • based on your reading and understanding, are my recommendations a good translations of the principles applied to what you were saying?

I would say that the proposed structure is fine. The only possible question I have is about cheat sheets (they are the part of 2.4 Appendix of pydata transformers). I am not sure what is the best place for them. On the one hand, it is good to have consolidated information after the concept studying (so placing them in Appendix is fine). On the other hand, typically cheat sheets are frequently used, so maybe moving them higher (before the intro part) and placing them as part 0 would be a better option.

Let me know what are your thoughts on that.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 4, 2023

On the one hand, it is good to have consolidated information after the concept studying (so placing them in Appendix is fine). On the other hand, typically cheat sheets are frequently used, so maybe moving them higher (before the intro part) and placing them as part 0 would be a better option.

Hmm - I see the point!

The drawback of part 0 would be, the "new user" reader who expected a gradual introduction instead sees a summary of content they haven't seen yet.

How about moving the cheat sheet/notebook to a separate file, e.g., 03a_transformers_cheat_sheet, that way this is visible directly next to the intro (e.g., in the panel vignettes, in the examples folder, in the binder), and the "returning reader" who wants to see this just needs to click on that instead of the intro?

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 22, 2023

Hi @dashapetr, how is this going? Are you still working on this? Would you like some input or help?

Copy link
Contributor Author

@dashapetr dashapetr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes were made according to the discussion

@dashapetr
Copy link
Contributor Author

Hi @dashapetr, how is this going? Are you still working on this? Would you like some input or help?

Hi @fkiraly, I am so sorry for taking so long to proceed.

I have made changes as we discussed: a separate cheat sheet is created, pipelines are moved to the 4th position (between transformers and benchmarking), and transformers themselves are modified.

Could you please let me know if I need to do something else here? I see this red button 'Changes requested', but I only managed to verify my changes and add a comment.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 22, 2023

I am so sorry for taking so long to proceed.

Don't worry, we all have our lives and day jobs etc.

The help here is much appreciated, and I'm certain that this will also be of substantial help for our users!

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 23, 2023

Thanks a lot!

Tests were failing due to the tutorial importing from transformations.series.compose which was since deprecated in favour of transformations.compose - I just fixed this.

@dashapetr
Copy link
Contributor Author

Tests were failing due to the tutorial importing from transformations.series.compose which was since deprecated in favour of transformations.compose - I just fixed this.

Thank you!

I still don't get the requested changes. Are they just pending and waiting for review?

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 23, 2023

I still don't get the requested changes. Are they just pending and waiting for review?

I still have to take some time review this, the requested changes are still open from last time, but please disregard for now.

fkiraly
fkiraly previously approved these changes Apr 24, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, looks great!

I've added back a section that got lost along the way (the one with more tech details), at the end, apart from that, all perfect from my side!

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. Thanks!

@fkiraly fkiraly merged commit 8df3e7e into sktime:main Apr 24, 2023
@fkiraly fkiraly added the documentation Documentation & tutorials label Apr 24, 2023
@dashapetr
Copy link
Contributor Author

Thanks a lot @fkiraly! I am really excited that my first PR to sktime got approved and merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation & tutorials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] move content from pydata global 2022 (transformers, pipelines tutorial) to sktime main repo
2 participants