Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: RBFRepeater #20

Closed
koaning opened this issue Mar 1, 2019 · 11 comments
Closed

feature request: RBFRepeater #20

koaning opened this issue Mar 1, 2019 · 11 comments
Labels
sprint-material This is something that can be done in a single day sprint.

Comments

@koaning
Copy link
Owner

koaning commented Mar 1, 2019

feature generation that can be used for timeseries. trick from the london talk.

@koaning koaning changed the title feature request: RBF features feature request: RBFRepeater Mar 5, 2019
@MaxHalford
Copy link

Hey @koaning,

I stumbled upon your talk a few days ago and really enjoyed many of your talking points. I was curious about the RBF kernel trick so I decided to implement it in an online learning library me and some friends are working on. From what I understand The idea is simply to computing the distance between, say, a month and all the 12 months of the year using a RBF. This way September is closer to August than it is to March, which isn't taken into account if one simply one-hot encodes the month. Is this correct? If you're interested I coded it at the end of this notebook.

@koaning
Copy link
Owner Author

koaning commented Mar 15, 2019

  1. That cream stuff sounds cool beans. I'll give it a spin. Also: PyData Amsterdam has a CFP open at the moment. I'm still in the committee and that cream library sounds like something we'd love to host.
  2. The goal here is to make an sklearn compatible transformer that is general. Your example is good but our goal is to be very general; like be able to supply a date column and a number of RBFs you'd like per year. Or a column that you specify that will denote the timewindow. There's going to be a sprint this wednesday so I'll keep this thread up to date.

@MaxHalford
Copy link

MaxHalford commented Mar 15, 2019

  1. Sounds great! I've started making some slides (written in English) for an upcoming of the data science Meetup back here in Toulouse, so maybe I can reuse them.
  2. Okay good to know, I just wanted to make sure the maths were right. Indeed I think that having a transformer to extract date features would be nice because it could then pipeline into a RBFTransformer.

Good stuff!

Edit: if you're going to try creme I suggest you install the latest version from GitHub using pip install git+https://github.com/creme-ml/creme as there is a lot of stuff that isn't on PyPI yet.

@koaning
Copy link
Owner Author

koaning commented Mar 15, 2019

Question about creme: most of the learning that occurs, is that just a small SGD step that occurs per datapoint or is there something more happening? SKlearn has some passive agressive things api here, but creme is not doing that atm?

I like the idea of doing a rolling mean on an intercept by the way.

@MaxHalford
Copy link

I'm not 100% sure what you mean but here goes: you can provide an optimizer to LinearRegression and LogisticRegression. The default optimizer for both is called VanillaSGD and simply performs textbook online gradient descent. There are many optimisers you can use, such as PassiveAggressiveI, PassiveAggressiveII, Adam, etc. sklearn's SGDClassifier and SGDRegressor can only use plain gradient descent because they use a special trick for the intercept that isn't generic. Because we use a running statistic to compute the intercept we're "allowed" to use any optimizer we wish.

I hope I'm clear! I'm going to write an explanatory notebook when I get some time!

@koaning
Copy link
Owner Author

koaning commented Mar 15, 2019

Yep. This is all I wanted to know. Thanks!

Do consider sending that cfp tho: https://pydata.org/amsterdam2019/cfp/

@MaxHalford
Copy link

I just did :)

@koaning koaning added the sprint-material This is something that can be done in a single day sprint. label Mar 23, 2019
@MaxHalford
Copy link

@koaning when are the speakers for PyData Amsterdam annouced? I have to book a plane ticket early if I come.

@MBrouns
Copy link
Collaborator

MBrouns commented Apr 11, 2019

@MaxHalford tomorrow, but you're in! We're looking forward to seeing your talk!

@MaxHalford
Copy link

Cheers @MBrouns, I'm really excited! I'll book my ticket ASAP :)

@koaning
Copy link
Owner Author

koaning commented Aug 24, 2019

This feature has now been implemented. Documentation will follow.

@koaning koaning closed this as completed Aug 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint-material This is something that can be done in a single day sprint.
Projects
None yet
Development

No branches or pull requests

3 participants