Skip to content

Conversation

@josiahls
Copy link
Owner

@josiahls josiahls commented Oct 26, 2019

Version 0.9.0 is a near ready version of the repo. This will be benchmarking model performance, and showing basic repo use. One of the final major objects that are missing is the Interpretation object. The previous one was a bunch of spaghetti code fit for a high-end Italian restaurant. We need to simplify it for now. One of the most important things is allowing an interpretation object to merge with another interpretation object to compare reward graphs. The goal will be to do this in parallel with making the jupyter notebooks for this. We will also begin finalizing the README.

Goals:

  • Jupyter Notebooks
    • DQN (ER, PER)
    • Fixed Target DQN (ER, PER)
    • Dueling DQN (ER, PER)
    • Double DQN (ER, PER)
    • DDDQN (ER, PER)
    • DDPG (ER, PER)
  • Misc
    • Add Reward Metric
  • Interpretation
    • Reward Logging (Cleaner)
    • Reward (value) overlay. So we can train 2 models, then compare their reward values.
    • Heat Mapping (Cleaner)
    • Q Value Estimation (Cleaner)
  • README
    • Exhaustive benchmark (5 run average) of 2-3 environments per model.
      to have it somewhere not tied to github.

Edit 10/27/2019:

  • Added Interpreter and GroupInterpreter for easy reward logging
    Edit 12/14/2019:
  • README changes will be made for the 1.0 pull request since hopefully the code will be more consistent
    Edit 12/15/2019:
  • Will push this to pypi I think so it is easier for people to download / install (?). Interested in testability.

Josiah Laivins added 6 commits October 25, 2019 22:37
- Interpreter (Cleaner) with cleaner code / closer to fastai
- To and From Pickle
- to_csv

Notes:
- I tried doing a from_csv implimentation, however I am seeing that
something like this might not be possible unless using system system
stuff. Not sure when I will ever get to this. I have some ideas about saving images
/ states as files with file paths... Maybe to_csv generates a file system also?
- Group Interpreter for combining model runs
- Initial fixed dqn notebook (soft of)

Fixed:
- recorder callback ordering
- renaming. It seems that fasti has some cool in-notebook test widgets
that we might want to use in the future
- Group Interpreter for combining model runs
- Initial fixed dqn notebook (soft of)

Fixed:
- recorder callback ordering
- renaming. It seems that fasti has some cool in-notebook test widgets
that we might want to use in the future
- Group Interpreter merging
- DQN base notebook
- Interpreters with by default close envs

Fixed:
- env closing
- Group Interpreter merging
- DQN base notebook
- Interpreters with by default close envs

Fixed:
- env closing
@josiahls josiahls added the enhancement New feature or request label Oct 28, 2019
josiahls and others added 23 commits October 28, 2019 12:40
- setup.py fastai needs to be min 1.0.59
- cpu / device issues.
- DQN Group Results
- Reward Metric
- Pipeline object
    - The idea is you have a fast_rl pipeline in a single function call,
    then the pipeline object just runs them on separate threads

Notes:
- I am realizing that we need sum reward smoothing. The graphs are way
too messy.
- DQN Group Results
- Reward Metric
- Pipeline object
    - The idea is you have a fast_rl pipeline in a single function call,
    then the pipeline object just runs them on separate threads

Notes:
- I am realizing that we need sum reward smoothing. The graphs are way
too messy.
- Analysis property to the group interpretation
- Pipeline is demoable
- PER crashing due to containing 0 items
- Group Interpretation value smoothing
- Value smoothing making the reward values way too big
- Tests take too long. If Image input, just do a shorter fit cycle
- PER batch size not updating
- Tests take too long. If Image input, just do a shorter fit cycle
- cuda issues
- Tests take too long. If Image input, just do a shorter fit cycle
- Bounds n_possible_values is only calculated when used.
Should make iteration faster.
Added:
- Smoothing for the scalar plotting
- cuda issues
Josiah Laivins and others added 29 commits December 15, 2019 22:56
- old code from README. Revisions coming.
- batch norm toggling. For now / forever defaulted to false
- batch norm toggling. For now / forever defaulted to false
- batch norm toggling. For now / forever defaulted to false
- batch norm toggling. For now / forever defaulted to false
- batch norm toggling. For now / forever defaulted to false
- batch norm toggling. For now / forever defaulted to false
- revised test script
- revised test script
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- package caching. Reference https://docs.microsoft.com/en-us/azure/devops/pipelines/caching/?view=azure-devops. Not easy to understand since it is yarn (weird example right?)
- Slowly adding tests.
- somehow trained_learner method in test was completely broken
- Slowly adding tests.
- Slowly adding tests.
- Interpreter edge control. can also show average line
- models being all shitty. Apparently, batch norm reaaally screws them up. If you use batch norm, the batch size needs to be massive (128 wasnt large enough). By default, you can kind of turn off batch_norm in the Tabular models, but they still, when given a continuous input, will have an entry batch norm. I over-wrote it and now they work significantly better :)
- models being all shitty. Apparently, batch norm reaaally screws them up. If you use batch norm, the batch size needs to be massive (128 wasnt large enough). By default, you can kind of turn off batch_norm in the Tabular models, but they still, when given a continuous input, will have an entry batch norm. I over-wrote it and now they work significantly better :)
- gitignore
- gitignore
@josiahls josiahls merged commit 0abb10a into master Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants