Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General improvement opportunities #4

Merged
merged 22 commits into from
Jun 10, 2020

Conversation

DanielAtKrypton
Copy link
Contributor

Changed file name to csv2npz.

@DanielAtKrypton
Copy link
Contributor Author

The main improvement here is the automatic download of challenge files. See README for instructions on how to configure credentials.

There are other improvements though.
For example:

  • Improved README.md.
  • Hard coded values inside benchmark.ipynb are now variables.
  • Implementation of get_x_shape and get_y_shape methods.
  • Add option to use the transformer in benchmark.ipynb.
  • Code linted.

Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, again a good initiative, there are several issues in this repo that need solving. I have added a few comments where we need changes before merging, the rest in good.

The main points to attend to:

  • Removing the transformer folder (and the modifications to the benchmark notebook regarding transformer use). For your personal use, you can add the other repo as dependency, see the comment.
  • Adding parameters instead of hard coded values is an improvement, but we should add default values.

.gitignore Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
src/dataset.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
time_series_transformer/__init__.py Outdated Show resolved Hide resolved
src/dataset.py Outdated Show resolved Hide resolved
src/utils.py Show resolved Hide resolved
README.md Show resolved Hide resolved
@DanielAtKrypton
Copy link
Contributor Author

there are several issues in this repo that need solving

I tried to address opportunities from my point of view. Adding GitHub issues might be worth the time in order to allow developers to better understand current concerns.

The main improvement as of now I could think of was to add pytests.
Next step would be to add continuous integration so whenever a commit is made, it automatically triggers pytest and the training would take place within the server.

I believe the evolution of this repository can help the evolution of transformer repository too.
In my opinion, the creation of the TimeSeriesPredictor and LSTMTimeSeriesPredictor classes isolated concerns nicely.

After your guidance, maybe it is time to get back to the transformer repository and add pytests and continuous integration. I was reflecting on the later and I think there we could simply use a seaborn dataset for py testing. Once that is alright we can remove all transformer related stuff from this repository as per your request. TransformerTimeSeriesPredictor class can be useful there.

It may be also worth to create a python package for the TimeSeriesPredictor.

@DanielAtKrypton DanielAtKrypton changed the title Add output_path and filename parameters General improvement opportunities May 24, 2020
@DanielAtKrypton
Copy link
Contributor Author

I notice how faster the LSTMTimeSeriesPredictor performs when compared to the TransformerTimeSeriesPredictor. I think that might be a small detail related to GPU processing in the transformer repository.

The following image gives and overview of the TransformerTimeSeriesPredictor resources consumption even when using cuda:0:

Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the delay in answering your updates. There are still a few modifications needed before merging, mainly:

  • removing LSTMTimeSeriesPredictor and likes : these classes neither improve readability or results. They focus on improving the code itself, which is beyond the scope of this repo, as we aim here at demonstrating a benchmark solution to the challenge. Don't forget the target audience for this challenge are students in statistics that may not have extensive development knowledge.
  • removing test : same as for the previous point, they are beyond the scope of this repo. We aim at keeping the code as simple to understand for non programmers as possible.
  • removing transformer ref : these two repos should really stay independant. Anyone interested in using transformer for the challenge can navigate through our github account and easily make the connection. This is especially true now that you've added instructions as to convert csv to npz, etc.

Concerning implementation of tests, I'll consider adding them to the transformer repo, where they would be a nice addition. As for issues, they seem to be activated on both repos.

Concerning LSTM being faster than transformer, I have witnessed the same results. This could be due to a number of factors, for instance:

  • LSTM having a linear complexity in the time dimension, while the transformer in quadratic
  • LSTM are coded in the pytorch library, and thus optimized to run on the GPU, which is not the case for the transformer written in python. If you want to speed up the process, you may want to use pytorch's Multi Head Attention block.

requirements.txt Outdated Show resolved Hide resolved
src/lstm_tsp.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
src/time_series_predictor.py Outdated Show resolved Hide resolved
src/transformer_tsp.py Outdated Show resolved Hide resolved
@DanielAtKrypton
Copy link
Contributor Author

Hi, sorry for the delay in answering your updates.

I completely understand and respect your timings. I am very happy to be able to contribute and learn with such state-of-the-art technology when it comes to time series forecast. Kudos to you for that!

src/loss.py Outdated Show resolved Hide resolved
@maxjcohen
Copy link
Owner

Okay, this PL seems good now, so without further ado, i'll merge it. Thanks again for your contribution !

@maxjcohen maxjcohen merged commit 4a0cb1d into maxjcohen:master Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants