Skip to content

Commit

Permalink
Removing scope
Browse files Browse the repository at this point in the history
  • Loading branch information
atz committed Aug 29, 2018
1 parent c78393b commit eb4b24e
Show file tree
Hide file tree
Showing 26 changed files with 64 additions and 507 deletions.
70 changes: 19 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,14 +177,11 @@ subdirectory.
# Normal run. Will restart and crete a new log file, overwriting any existing log file for that project.
bin/pre-assemble YAML_FILE

# Run in resume mode, which will automatically pick up where left off based on the log file. Passing the --resume flag overrides the actual value of resume from the YAML config.
bin/pre-assemble YAML_FILE --resume
# Run in limit mode (default of 200), which will automatically limit the number of items pre-assembled to 200 regardless of what is set in the YAML file.
bin/pre-assemble YAML_FILE --limit

# Run in limit mode (default of 200), which will automatically limit the number of items pre-assembled to 200 regardless of what is set in the YAML file. Useful with resume.
bin/pre-assemble YAML_FILE --limit --resume

# Run in limit mode (set to 100), which will automatically limit the number of items pre-assembled regardless of what is set in the YAML file. Useful with resume.
bin/pre-assemble YAML_FILE --limit=100 --resume
# Run in limit mode (set to 100), which will automatically limit the number of items pre-assembled regardless of what is set in the YAML file.
bin/pre-assemble YAML_FILE --limit=100
```

Again, you can add RAILS_ENV=XXXX to the beginning of the command
Expand All @@ -195,7 +192,6 @@ bin/pre-assemble YAML_FILE --limit=100 --resume
* Navigate to the production box, in the pre-assembly area.
* Set `RAILS_ENV=production`
* Run pre-assembly with nohup and in the background (`&`).
* Optionally, include the `--resume` option to override the resume parameter and set to true.
* Optionally, include the `--limit` option to override the limit
paramater. You can specify the limit, or you can let it default to 200.

Expand All @@ -217,18 +213,14 @@ bin/pre-assemble YAML_FILE --limit=100 --resume
3. `tail -999f log/production.log # Detailed logging info for the pre-assembly project itself.`
4. `tail -999f nohup.out # Errors, etc from unix output (or "another_nohup_filename.out" in the example above)`


Be sure to keep your progress log file somewhere useful and be aware if
you restart pre-assembly without using the `--resume` switch, it will be
overwritten. You will need the progress log for cleanup and restarting.
You will need the progress log for cleanup and restarting.

9. Running in batch mode, automatically splitting a large run in groups of
smaller jobs, using limits and resume:
smaller jobs, using limits:

bin/batch_run YAML_CONFIG [LIMIT]

bin/batch_run YAML_CONFIG [LIMIT]

This will run pre-assembly multiple times sequentially, using resume and
This will run pre-assembly multiple times sequentially, using
limits, allowing the process to end and restart each time. This is useful to
prevent memory errors on the server when running large jobs. It will
automatically compute the number of items remaining to be run, split the job
Expand Down Expand Up @@ -498,11 +490,10 @@ pre-assembly to terminate immediately (if the failure is non-recoverable) or
it will continue and log the errors. The progress log file you specified in
your YAML configuration will contain information about which bundles failed.
You can re-start pre-assembly and ask it to re-try the failed objects and
continue with any other objects that it hadn't done yet. To do this, use the
--resume flag when you run pre-assembly:
continue with any other objects that it hadn't done yet.

```bash
RAILS_ENV=production bin/pre-assemble YAML_FILE --resume
RAILS_ENV=production bin/pre-assemble YAML_FILE
```

## Post Accessioning Reports
Expand Down Expand Up @@ -647,15 +638,13 @@ If you would like to test your MODs template prior to actually accessioning,
you can run a "mods report", passing in the YAML config file, which references
your manifest and MODs template, and a writable output folder location. The
report will then generate a MODs file for each row in your manifest so you can
examine the results. You can limit the number of rows run by temporarily
modifying the "limit_n" parameter in the YAML file. Note that the output
examine the results. Note that the output
folder MUST exist and must be writable. Be aware it will become filled with
MODs files, one per object. So if you have a large number of rows in your
manifest, you will end up with many files in your output directory.

```bash
RAILS_ENV=production bundle exec bin/mods_report YAML_CONFIG_FILE
OUTPUT_DIRECTORY
RAILS_ENV=production bundle exec bin/mods_report YAML_CONFIG_FILE OUTPUT_DIRECTORY
```

## Accession of Specific Objects
Expand All @@ -670,48 +659,27 @@ For projects with a manifest (e.g. like Revs):
For projects that do not use a manifest (e.g. like Rumsey):

1. Create a new project config YAML file and set the parameter
'accession_items' using either the 'only' or

'except' parameter as needed. You can include only specific objects (useful
when you only want to run a few objects) or you can exclude specific objects
(useful when you want to run most). Set the 'reaccession' parameter to false
or nil. Also set a different progress log file so you can store the results of
your second run separately. See the `TEMPLATE.yaml` for some examples.

`accession_items` using either the `only` or `except` parameter as needed.
You can include only specific objects (useful when you only want to run a few objects)
or you can exclude specific objects (useful when you want to run most).
Also set a different progress log file so you can store the results of
your second run separately. See the `TEMPLATE.yaml` for some examples.
1. Run pre-assembly.

## Re-Accession of Specific Objects

Very similar to above, if you need to re-accession a batch of material (for
example, after remediating some files in your bundle), you can do this in two
ways, depending on your project setup.

For projects with a manifest (e.g. like Revs):
example, after remediating some files in your bundle), for projects with a manifest (e.g. like Revs):

1. Create a new manifest with only the objects you need re-accessioned.
2. Create a new project config YAML file referencing the new manifest and
write to a new progress log file.
3. "Cleanup" your existing objects that you will be re-accessioning using the
`Assembly::Utils.cleanup` method on a Ruby console as described below.
Since you will be re-registering objects, you will get new DRUIDs, and you
should therefore be sure to completely delete your old objects.
should therefore be sure to **completely delete** your old objects.
4. Re-run pre-assembly.


For projects that do not use a manifest (e.g. like Rumsey):

1. Create a new project config YAML file and set the parameter
'accession_items' and the 'only' parameter to an array of bundle names
(e.g. druid folder names) that you want to re-accession. Set the
'reaccession' parameter to true. Also set a different progress log file
so you can store the results of your second run separately. See the
`TEMPLATE.yaml` for some examples.
2. Re-run pre-assembly.


This process will perform an automatic cleanup on the items being
re-accessioned (but will leave your objects registered).

## Cleanup

### Removing Items From DOR and other locations
Expand Down
54 changes: 2 additions & 52 deletions config/projects/TEMPLATE.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ progress_log_file: ~ # Optional - if left as nil a progre
# NOTE: you probably won't be able to write to the thumper drives. Beware if that's where your config file is.
# In that case, you can specify /dor/preassembly, which is a good alternative and writable.
# Typically based on project name. A fully qualified path.
# Be sure to keep your progress log file somewhere useful and be aware
# if you restart pre-assembly without using the --resume switch, it will be overwritten.
# Be sure to keep your progress log file somewhere useful and be aware.
# You will need the progress log for cleanup and restarting.
# PLEASE DO NOT PLACE THIS IN THE LOG FOLDER OF THE PRE-ASSEMBLY CODE FOLDER ON THE SERVER. IT MAY BE DELETED IF YOU DO THIS.
'/dor/preassembly/progress_foo.yaml' # this is an example of specifying an alternate location
Expand Down Expand Up @@ -199,17 +198,14 @@ accession_items: ~ # Only valid for projects that do *not* use a manif
# In the "only" and "except" list, you should use names that exactly match the folder names in your bundle_dir,
# one per line, indented under "only" or "except" and preceeded by a dash as shown in the examples below.
# For a normal run, set "accession_items" to ~, which will process all items.
# Note that you can run a full re-accession for all items by simply leaving off the "only" and "except" lists, but
# still specifying a "reaccession: true". Do not specify both "only" and "except" unless you like flying experimental homebuilt aircraft.
# Do not specify both "only" and "except" unless you like flying experimental homebuilt aircraft.
# Examples below:
only:
- 'aa111aa1111'
- 'bb222bb2222' # this is an example of two objects that will be accessioned, put them one per line, prefixed by a space, a dash, and space, add quotes around each item
except:
- 'aa111aa1111'
- 'bb222bb2222' # this is an example of two objects that will be ignored, put them one per line, prefixed by a space, a dash, and space, add quotes around each item

reaccession: true # If running a re-accession, set this to true so that a cleanup will be performed, if this is a first accession attempt for these objects, set it to false or ~ or leave it off
# If set to true, the the files will be removed for /dor/assembly, /dor/workspace and the stacks before accessioning again. Objects will *not* be re-registered.

####
Expand Down Expand Up @@ -283,49 +279,3 @@ publish_attr: ~ # Most projects should set this to nil. If not specified or n
publish: 'no'
shelve: 'no'
preserve: 'yes'

####
# Run options.
#
# The typical values used in production are shown.
####

resume: false # If true, pre-assembly will skip objects that were
# already successfully pre-assembled, as indicated by
# the information in the project's progress_log_file.
# Normally, this option is false in the YAML file and
# is set to true on the command line with the --resume
# option.

limit_n: ~ # Set to an integer if you want to process only a limited
# number of the discovered objects. Useful for testing.

init_assembly_wf: true # Whether pre-assembly should initiate the assembly
# workflow for the object. Should always be true except for testing purposes.
# If set to false, the assembly robots will not operate.

####
# Other run options, mainly relevant for developers.
#
# The typical values used in production are shown.
####
throttle_time : ~ # The number of seconds to sleep between each object. Can be used to throttle the speed at which
# pre-assembly runs. If set to nil (or not set at all), no throttling is performed.

new_druid_tree_format: true # Determines druid tree directory format (defaults to "true").
# Check with Lyberteam to determine appropriate style to use. As of August 20, 2012, only old style is supported in production.
# old style: /oo/000/oo/0001
# new style /oo/000/oo/0001/oo000oo0001/content and /oo/000/oo/0001/oo000oo0001/metadata

compute_checksum: true # Whether pre-assembly should compute checksums.

validate_usage: true # Whether pre-assembly should confirm that all expected
# YAML parameters have been supplied.

show_progress: true # Whether to print druids as they are pre-assembled on the command line.

uniqify_source_ids: false # If true, pre-assembly attacheds a timestamp to source
# IDs. Used during integration testing to avoid duplicate source ID errors that come from DOR.

cleanup: false # If true, pre-assembly deletes objects from DOR after
# pre-assembly finishes. Relevant only during development.
9 changes: 0 additions & 9 deletions config/projects/reg_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,3 @@ content_md_creation:
style: 'default'

publish_attr: ~

resume: false
limit_n: ~
init_assembly_wf: true
compute_checksum: true
validate_usage: true
show_progress: true
uniqify_source_ids: false
cleanup: false
2 changes: 0 additions & 2 deletions integration/run_preassembly_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -144,8 +144,6 @@ def setup_bundle(proj)

# Override some params.
@params[:staging_dir] = @temp_dir
@params[:show_progress] = false
@params[:cleanup] = false
@params[:bundle_dir] = File.join(PRE_ASSEMBLY_ROOT, @params[:bundle_dir])
# Create the bundle.
@b = PreAssembly::Bundle.new(PreAssembly::BundleContext.new(@params))
Expand Down

0 comments on commit eb4b24e

Please sign in to comment.