Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harness #28

Merged
merged 239 commits into from
May 18, 2015
Merged

Harness #28

merged 239 commits into from
May 18, 2015

Conversation

kowey
Copy link
Contributor

@kowey kowey commented May 18, 2015

Notes:

  • this is part of a cross-repo (including irit-rst-dt and irit-stac) refactor. So there'll be multiple pull requests with the same name
  • it only looks like there's a lot of patches, but what happened is that I merged the irit-rst-dt repo into attelo and did some git mv (irit-rst-dt still stands alone; I just wanted to preserve history) Only the changes after 646b99b matter

I may just merge this myself, as I understand Mathieu is busy

kowey added 30 commits July 14, 2014 07:45
Hide any extra arguments we may have to pass along
No need for our own stack trace.
It's confusing.
Makes it more convenient for saving results
Instead of using the monolithic attelo evaluate command,
break the evaluation process down into

- extracting folds (attelo enfold)
- looping over folds:
  - learning and saving model (attelo learn)
  - decoding with saved model (shared!) (attelo decode)
- summarising the results (attelo report)

This may also one day open the way to running folds concurrency
Only Pandoc supports definition lists, I suppose
We want to maximise code between various experiments
Pick up an evaluation where we left off
Kill:
- any scratch directories
- any incomplete eval dirs

It's also tempting to get rid of feature directories without
evaluations but it's not likely to be a common case in future
development
I want it to be as easy to know where we are at a glance.
kowey and others added 28 commits April 13, 2015 17:53
ENH add random forest and decision tree classifiers
Ooops! I must have done this only locally on the cluster
Ooops! I must have done this only locally on the cluster
Conflicts:
	irit_rst_dt/local.py
I just noticed that it seems possible to use joblib parallel
in a sort of producer/consumer pattern: we don't need to have
all our jobs ready in one go

So if I understand correctly, this means we can have a generator
expression that yields decoder jobs as soon as the parser for
them has been fit.

Outcome:

1. No more waiting for all the learners to complete before we
   decode; start decoding as soon as the relevant learner is
   done
2. But still apply parallelism across decoders for multiple
   configurations

So if all learners are fast, that's great, we get to run lots
of jobs in parallel.  If some learners are slow, that's hopefully
still OK because we can still work on decoding while they're
crunching away.

I hope this means less dead space.  More cores humming away in
parallel
…eptrons

FIX update and fix calls to perceptrons
Conflicts:
	.gitignore
	README.md
	requirements.txt
	setup.py
You will need to define HarnessConfig which contains settings,
file name conventions, and evaluations for the particular
harness.

This was taken from irit_rst_dt
Don't try to hash models we may not have built
kowey added a commit that referenced this pull request May 18, 2015
@kowey kowey merged commit 96468d6 into irit-melodi:master May 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants