Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Event Culling via Experiment Engine #118

Closed
DavidBerdik opened this issue Sep 20, 2021 · 10 comments
Closed

Support Event Culling via Experiment Engine #118

DavidBerdik opened this issue Sep 20, 2021 · 10 comments
Assignees
Labels

Comments

@DavidBerdik
Copy link
Member

DavidBerdik commented Sep 20, 2021

Event culling is currently only available when using JGAAP via the user interface. Support for event culling should be added to the CLI-based experiment engine as well.

Warning: This change will break compatibility with CSV files built for use of the experiment engine up to this point because it will introduce the need for an additional entry to the CSV.

@DavidBerdik DavidBerdik self-assigned this Sep 20, 2021
@DavidBerdik
Copy link
Member Author

Useful starting point for researching how to implement this: https://github.com/evllabs/JGAAP/blob/master/src/com/jgaap/backend/ExperimentEngine.java#L183

@mvryan
Copy link
Member

mvryan commented Sep 20, 2021

It is supported in experiment engine

@mvryan
Copy link
Member

mvryan commented Sep 20, 2021

for (String event : events) {

As you can see here, you can include canonicizers and cullers on individual events. Canonicizers prefixed with @ and cullers prefixed with #. There was documentation outlining all of this but it must have been lost with the wiki.

@mvryan
Copy link
Member

mvryan commented Sep 20, 2021

I started a branch that uses much clearer and expensive JSON for experiments if you’re looking for something fun to build on.
Here is an example experiment in that framework. It also adds validation as a first class setting.
https://github.com/evllabs/JGAAP/blob/643f677f5fc5173a99099ba82b29381922c181e3/src/test/resources/experiment.json

@DavidBerdik
Copy link
Member Author

for (String event : events) {

As you can see here, you can include canonicizers and cullers on individual events. Canonicizers prefixed with @ and cullers prefixed with #. There was documentation outlining all of this but it must have been lost with the wiki.

Oh wow! I did not know about this. Even looking at an archived page of the wiki, it looks like that information is not available: http://web.archive.org/web/20100810065633/http://server8.mathcomp.duq.edu/jgaap/w/index.php/Command_line

Thank you for pointing this out so I didn't have to go hunting myself. This is a better implementation than what I was going to do, actually. I was going to have the canonicizers work the same way as they do in the UI, where they are applied to all events.

I started a branch that uses much clearer and expensive JSON for experiments if you’re looking for something fun to build on.
Here is an example experiment in that framework. It also adds validation as a first class setting.
https://github.com/evllabs/JGAAP/blob/643f677f5fc5173a99099ba82b29381922c181e3/src/test/resources/experiment.json

Ah I did not know about this either. How significant is the performance hit? I think that we should consider replacing the CSV approach with this or at least making it available in addition to the CSV format. It's far cleaner and more user-friendly.

@mvryan
Copy link
Member

mvryan commented Sep 21, 2021 via email

@DavidBerdik
Copy link
Member Author

https://web.archive.org/web/20160527010715/http://evllabs.com/jgaap/w/index.php/Experiment_Engine

Thanks for sharing! I did not realize that there was a more up-to-date version available out there. I'm going to scrape it and put recreating it and updating it where necessary on our list of things to do at some point.

Also, what do you think of my suggestion?

How significant is the performance hit? I think that we should consider replacing the CSV approach with this or at least making it available in addition to the CSV format. It's far cleaner and more user-friendly.

@mvryan
Copy link
Member

mvryan commented Sep 21, 2021

Yes it takes a bit of a performance hit but it’s not too bad. It is the way to go in my opinion because it enforces best practices better and has more readable results.

If you start here you can see it’s part of an effort to rebuild around Spring https://github.com/evllabs/JGAAP/blob/643f677f5fc5173a99099ba82b29381922c181e3/src/main/java/com/jgaap/rest/JGAAPApplication.java

My thought is if you have this rest based ee you can spin up a bunch ok jgaaps and have them work like a cluster / micro service

@mvryan
Copy link
Member

mvryan commented Sep 21, 2021 via email

@DavidBerdik
Copy link
Member Author

Nice! I like the idea of a Spring-based JGAAP. Being able to do it like this would be much nicer than the hacky methods that I have used to do large-scale experimenting in the past. (Populate a database table with experiment configurations and use some wrapper code to pull experiment configs from the table.)

It's not something that I can play with right now (too many other, higher priorities), but I definitely would be interested in exploring this further if you do not move forward with it yourself.

Here you can see we build a confusion matrix that’s way more informative than the hard to parse text files.

Cool! I think if the plan is to phase out the hard-to-parse text files, it would still be worth making the results available in a JSON format. Whenever I have a large set of results from an experiment, I use a set of Python scripts to read them. The thing is, those scripts, while functional, have to do awkward string manipulations to work and sometimes require slight modifications to work depending on the scenario. It would be great to have that output available in a JSON format so that if necessary, these kinds of tools could be used but would not require awkward string manipulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants