Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing and feature engineering 🚜 #30

Closed
8 tasks done
KarelZe opened this issue Nov 12, 2022 · 0 comments · Fixed by #104
Closed
8 tasks done

Data preprocessing and feature engineering 🚜 #30

KarelZe opened this issue Nov 12, 2022 · 0 comments · Fixed by #104
Assignees
Labels
code Everything related to code

Comments

@KarelZe
Copy link
Owner

KarelZe commented Nov 12, 2022

  • Write proposal for feature sets based on EDA / literature research
  • Write script for feature engineering
  • Create more features from quote and price data. See e. g., Rosenthal or Prado
  • Combine relative bid and ask into one measure. Look at distribution first. Look at information density feature found Rosenthal paper.
  • Perform adversarial validation on newly created feature sets e. g., with min-max-scaling, w /o log-transform etc. and get feature importances
  • Create features that are hard to learn for neural nets and gradient boosting machines
  • When loading data, verify wandb hashes
  • Add economic intuition to each feature. Which paper suggest the feature and why does it make sense? (Feedback from @CaroGrau). Research why log-transform on prices makes sense from a theoretical perspective?
@KarelZe KarelZe added the code Everything related to code label Nov 12, 2022
@KarelZe KarelZe self-assigned this Nov 12, 2022
KarelZe added a commit that referenced this issue Nov 12, 2022
KarelZe added a commit that referenced this issue Nov 30, 2022
KarelZe added a commit that referenced this issue Nov 30, 2022
KarelZe added a commit that referenced this issue Dec 1, 2022
KarelZe added a commit that referenced this issue Dec 1, 2022
* Add new features `issue_type`, `myn`, `sec_OM`🏖️

Relates to #30

* Update data splitting for new features 🎠

Adresses #6 and #30
KarelZe added a commit that referenced this issue Dec 18, 2022
KarelZe added a commit that referenced this issue Jan 7, 2023
KarelZe added a commit that referenced this issue Jan 7, 2023
* investigate cylical encoding

* fixed cylical encoding

* Finalize feature engineering script 🪄

* Add sample weighting to `TransformerClassifier` 🏋️ (#100)

Relates to #7.

* Early stopping based on accuracy for `TransformerClassifier`🧁 (#102)

Relates to #7

* Improve robustness and tests of `TabDataset` 🚀 (#101)

Adresses #7 .

* Add instructions on using `SLURM` 🐧 (#103)

Relates to #7 .

* rerun feature generation notebook

* Add refs to wandb 🪄

* Renamed notebooks for consistency 🍫

* Simplify notebook 🍫

* Add log-transform and encode day ⏰

* Add aversarial validation after feature engineering 🪄

* Update `build_features.py` to new features 🪄

* Update notes to feature set definition

* Update `Feature Sets.md`

Adresses #30.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Everything related to code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant