Data preprocessing and feature engineering 🚜 #30

KarelZe · 2022-11-12T08:07:28Z

Write proposal for feature sets based on EDA / literature research
Write script for feature engineering
Create more features from quote and price data. See e. g., Rosenthal or Prado
Combine relative bid and ask into one measure. Look at distribution first. Look at information density feature found Rosenthal paper.
Perform adversarial validation on newly created feature sets e. g., with min-max-scaling, w /o log-transform etc. and get feature importances
Create features that are hard to learn for neural nets and gradient boosting machines
When loading data, verify wandb hashes
Add economic intuition to each feature. Which paper suggest the feature and why does it make sense? (Feedback from @CaroGrau). Research why log-transform on prices makes sense from a theoretical perspective?

The text was updated successfully, but these errors were encountered:

Relates to #30

Adresses #6 and #30

* Add new features `issue_type`, `myn`, `sec_OM`🏖️ Relates to #30 * Update data splitting for new features 🎠 Adresses #6 and #30

Adresses #7 and #30

Adresses #30.

* investigate cylical encoding * fixed cylical encoding * Finalize feature engineering script 🪄 * Add sample weighting to `TransformerClassifier` 🏋️ (#100) Relates to #7. * Early stopping based on accuracy for `TransformerClassifier`🧁 (#102) Relates to #7 * Improve robustness and tests of `TabDataset` 🚀 (#101) Adresses #7 . * Add instructions on using `SLURM` 🐧 (#103) Relates to #7 . * rerun feature generation notebook * Add refs to wandb 🪄 * Renamed notebooks for consistency 🍫 * Simplify notebook 🍫 * Add log-transform and encode day ⏰ * Add aversarial validation after feature engineering 🪄 * Update `build_features.py` to new features 🪄 * Update notes to feature set definition * Update `Feature Sets.md` Adresses #30.

KarelZe added the code Everything related to code label Nov 12, 2022

KarelZe added this to the Implementation / Results ↪️ milestone Nov 12, 2022

KarelZe self-assigned this Nov 12, 2022

KarelZe added a commit that referenced this issue Nov 12, 2022

Add proposal for feature sets 🧃

cdee528

Relates to #30

KarelZe mentioned this issue Nov 12, 2022

Add proposal for feature sets 🧃 #31

Merged

KarelZe mentioned this issue Nov 22, 2022

Exploratory data analysis 🌋 #6

Closed

11 tasks

KarelZe added a commit that referenced this issue Nov 30, 2022

Add new features issue_type, myn, sec_OM🏖️

eb8234e

Relates to #30

KarelZe mentioned this issue Nov 30, 2022

Add new features issue_type, myn, sec_OM🏖️ #50

Merged

KarelZe added a commit that referenced this issue Nov 30, 2022

Add new features issue_type, myn, sec_OM🏖️ (#50)

1a3a146

Relates to #30

KarelZe added a commit that referenced this issue Dec 1, 2022

Update data splitting for new features 🎠

5021a50

Adresses #6 and #30

KarelZe mentioned this issue Dec 1, 2022

Add new features to train, validation and test set 🏖️ #51

Merged

KarelZe added a commit that referenced this issue Dec 1, 2022

Add new features to train, validation and test set 🏖️ (#51)

28e8d3d

* Add new features `issue_type`, `myn`, `sec_OM`🏖️ Relates to #30 * Update data splitting for new features 🎠 Adresses #6 and #30

KarelZe added a commit that referenced this issue Dec 18, 2022

Improve accuracy [~1.2 %] (#79)

436c1ed

Adresses #7 and #30

KarelZe mentioned this issue Dec 21, 2022

Add accuracy for rev lr on test set 🆕 #89

Merged

KarelZe mentioned this issue Jan 6, 2023

Finalize feature engineering🪄 #104

Merged

KarelZe added a commit that referenced this issue Jan 7, 2023

Update Feature Sets.md

31f2662

Adresses #30.

KarelZe closed this as completed in #104 Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data preprocessing and feature engineering 🚜 #30

Data preprocessing and feature engineering 🚜 #30

KarelZe commented Nov 12, 2022 •

edited

Loading

Data preprocessing and feature engineering 🚜 #30

Data preprocessing and feature engineering 🚜 #30

Comments

KarelZe commented Nov 12, 2022 • edited Loading

KarelZe commented Nov 12, 2022 •

edited

Loading