Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan - WIP #85

Closed
7 of 19 tasks
ardunn opened this issue Oct 2, 2018 · 5 comments
Closed
7 of 19 tasks

Plan - WIP #85

ardunn opened this issue Oct 2, 2018 · 5 comments

Comments

@ardunn
Copy link
Contributor

ardunn commented Oct 2, 2018

CODING TODOS, IN ORDER OF IMPORTANCE/STATE OF NEGLECT - 10.2.2018

1. (tie) Heuristic-based Featurizer Selection - Qi

[given df, return a set of featurizers]

  • A scheme to abandon featurizers unworkable for a given composition dataset.
  • A scheme to abandon featurizers unworkable for a given structure dataset.

1. (tie) Top level class - Alex D

[given dataframe, give reports and final models] (+ generating final reports)

  • A method for benchmarking
  • A method for predicting
  • Way to determine regression or classification

2. (tie) Featurization - Alex G

[given df and set of featurizers, featurize a df robustly]

  • Having good sets of featurizers, working on transfer learning part, cached features, etc.

2. (tie) Preprocessing - Alireza

[given featurized df, return an ML ready df]
Coming up with good methods for preprocessing and and feature reduction

  • preprocess should not have a new method: separation of concerns
  • adding RandomForest feature selection
  • adding sensitivity analysis selection as an option

3. Data - Daniel

[given a request for data, return a nice dataframe, citations, etc.]
Moving the datasets to matminer, adding to figshare, making sure all columns are numeric, making sure correct citation data is present, having all in json format

  • update matminer to support seaborn style dataset loading including tests
  • convert existing dataset metadata dict to json file and make a gui or cli interface to update it
  • write functions to interface with dataset metadata and give user info
  • convert existing matminer datasets to json
  • remove deprecated matminer functions
  • convert matbench datasets to json, add to figshare, add to matminer
  • update matbench interface with datasets to use the matminer interface
  • ensure all datasets have their dataframes formatted properly, (numeric data, etc.)

4. ML pipeline - Qi?

[given an ML ready df (or just X and y), return a model]
Checking defaults, adding or testing Neural network?

  • Check Tpot defaults
  • Add other adaptor classes for other backends?

5. Analysis + Visualization + Interpretability - Daniel

[given a model and dataframe, return cool informative stats (and graphs)]

@ardunn ardunn changed the title Roadmap - WIP Plan - WIP Oct 2, 2018
@Doppe1g4nger
Copy link
Contributor

I'm working on Data, can dive into whatever else once that's finished. Maybe Tpot ML.

@albalu
Copy link
Contributor

albalu commented Oct 2, 2018

I'm working on Preprocessing

@Qi-max
Copy link
Contributor

Qi-max commented Oct 2, 2018

I will work on Metalearning, maybe also optimizing Tpot ML.

@ardunn
Copy link
Contributor Author

ardunn commented Oct 2, 2018

I'll work on the top level class, and Analysis and Visualization. I'll also be dabbling in each sub package to help the top level class work with them

@ardunn
Copy link
Contributor Author

ardunn commented Oct 19, 2018

gonna close this issue in favor of issues + the project board

@ardunn ardunn closed this as completed Oct 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants