Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README: remove v3-legacy branch info #545

Merged
merged 1 commit into from
Sep 12, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
188 changes: 5 additions & 183 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,40 +10,21 @@ assembled and then accessioned into the SUL digital library.

See the [RELEASES](./RELEASES.md) list.

### Web app
## Basics

The contemporary `Pre-Assembly` is a rails web-app.

The strongly preferred way of working with this code is to use it as a web app at https://sul-preassembly-prod.stanford.edu/. There is a link in the upper right to "Usage Instructions" which goes to the github wiki pages: https://github.com/sul-dlss/pre-assembly/wiki.

### Legacy

The legacy command-line version of this code is represented by the `v3-legacy` branch. That branch in -stage is deployed to `sul-lyberservices-test`
and in -prod is deployed to `sul-lyberservices-prod` and is still actively used by a small number of power users in the PSM group. Until the
[desired functionality](https://github.com/sul-dlss/pre-assembly/issues/221) in the `v3-legacy` branch has been ported to the web application in `master`,
we should continue to maintain the `v3-legacy` branch.

Note that we hope to retire the legacy branch; if you are writing a new script, please surface it in #dlss-infrastructure channel to see if there is a different way to get the desired result, without adding to our maintenance burden.

More about Running the legacy application below.
`Pre-Assembly` is a Rails web-app at https://sul-preassembly-prod.stanford.edu/. There is a link in the upper right to "Usage Instructions" which goes to the github wiki pages: https://github.com/sul-dlss/pre-assembly/wiki.

## Deployment

Deploy the Web app version from the `master` branch and the legacy version from the `v3-legacy` branch:
Deploy the Web app version in the usual capistrano manner:

```bash
cap stage deploy
cap prod deploy
```

Enter the branch or tag you want deployed. For the master branch, should usually be `master` (the default).
For the legacy CLI, it will always be `v3-legacy`.

See the `Capfile` for more info.

# Web app

### Documentation for the contemporary (web) app is in the wiki: https://github.com/sul-dlss/pre-assembly/wiki

## Setting up code for local development

Expand Down Expand Up @@ -103,7 +84,7 @@ brew install exiftool
bundle exec rspec
```

## Running the (contemporary web) application for local development
## Local development

Just the usual:

Expand All @@ -123,170 +104,11 @@ Because the application looks for user info in an environment variable, and beca
an Apache module setting that environment variable per request based on headers from Webauth/Shibboleth, dev just always
sets a single value in that env var at start time. So laptop dev instances basically only allow one fake login at a time.


# Running the legacy application

1. Gather information about your project, including:
* The location of the materials. You will need read access to this
location from the servers you will be accessioning in (e.g. test and production).
* Confirm all objects are already registered.
* The location of any descriptive metadata.
* Whether you will be flattening the folder structure of each object
when accessioning (e.g. discarding any folder structure provided to you in each object).
* The DRUID of the project's APO.
* The DRUID of the set object you will be associating your objects with (if any).
* If you are using a manifest file in CSV format and want to create
descriptive metadata, create a MODs XML template. See the
"descriptive metadata" section below for more details.

2. Create a project-configuration YAML file using the data you gathered
above. Store this file in a location where it can be accessed by the
server (test or production). You should create a YAML file for each
environment specifying the parameters as appropriate. Use the convention
of `projectname_environment.yaml`, e.g. `revs_test.yaml`. If you have
multiple collections to associate your objects with, you will need to run
in multiple batches with multiple YAML files. You can add your collection
name to the end of each YAML filename to keep track (e.g. `revs_test_craig.yaml`)

The YAML file can be stored anywhere that is accessible to the server you
are running the code on. However, for simplicity, we recommend you store
the YAML at the root of your bundle directory, or create a new project
folder, place your YAML file into it and then place your bundle directory
into your new project folder.

Example:

* Your content is on `/thumpers/dpgthumper-staing/Hummel`
* Create a YAML file at `/thumpers/dpgthumper-staging/Hummel/hummel_test.yaml`
* Move your content (if you can) into `/thumpers/dpgthumper-staging/Hummel/content`

If you cannot move your content, be sure your YAML bundle discovery glob
and/or regex are specific enough to correctly ignore your YAML file during
discovery. Or, alternatively, place your YAML file in a location other
than the bundle.

* See [`TEMPLATE.yaml`](spec/test_data/exemplar_templates/TEMPLATE.yaml) for a fully documented example of a configuration file.
* See [`reg_example.yaml`](spec/test_data/exemplar_templates/reg_example.yaml) for a specific example using a file system crawl.
* TODO: link example using manifest.

3. Check the permissions on the bundle directory, iteratively. You need read
permissions on all the bundle directory folders and files. You need to
have write permissions in the location you plan to write the log file too
(often this cannot be the thumper drives since it is mounted as
read-only).

4. You may benefit from running some objects in a local or test environment.
If your objects are already registered, this may require pre-registering a
sample set in test as well as production using the same DRUIDs that are
identified with your content. You may also have to move a small batch of
test content to a location that is visible to the stage server.
Since the thumper drives are not mounted on the test server, you can use
the `/dor/content` mount on test for this purpose.

5. Make sure you have an APO for your object, and that the
administrativeMetadata data stream has the `<assemblyWF>` defined in it.
If it does not, go to https://consul.stanford.edu/display/APO/Home and
find the "Current FoXML APO template" link at the bottom of the page.
Download and open the template, find the `<assembly>` node and copy it. Go
to Fedora admin for each relevant environment (test/production) and this
node to the administrativeMetadata stream. If you don't have this workflow
defined in your APO, then the assembly robots will never operate and
accessioning will not operate. This APO should be defined using the same
DRUID in test and production if you intend to run in both locations.


# Legacy Project Notes

The assembly robots will automatically create jp2 derivates from any TIFFs,
JP2s, or JPEGs. If you are working on a legacy project that has JP2s already
that were generated from source TIFFs, you should **not** stage those files
during pre-assembly, or else you will end up with two JP2s for each TIFF. You
can do this by using a regex to exclude .JP2 files or by only staging certain
subfolders. If you do stage the JP2 files and they have the same filename as
the TIFF (but with a different extension) they will be kept as is (i.e. they
will NOT have JP2s re-generated from the source TIFFs). If you do stage the
JP2 files and they have a different basename than the TIFFs, they WILL be
re-generated, and you will end up with two copies, in two different resources.


## Troubleshooting (the legacy application)

### Seeing an error like this when you try to run pre-assembly or a discovery report?
```
Psych::SyntaxError: (<unknown>): mapping values are not allowed in this
context at line 37 column 14
```
Its probably because your YAML configuration file is malformed. YAML is very
picky, particularly in tabs, spacing and other formatting trickeries. You verify
your YAML file inside `rails console` or `irb`:
```ruby
yaml_config = '/full/path/to/your/config.yaml'
params = YAML.load(File.read yaml_config)
```

If you get a hash of values back, it parsed correctly. If you get the
`Psych::SyntaxError`, it did not. The line number referenced in the error
should help you locate the part of your file that is having issues. Edit and
try loading the YAML again on the console to confirm.

1. If you don't see all of your objects being discovered or no files are
found in discovered objects, check the permissions on the bundle
directory. You need read permissions on all the bundle directory folders
and files.

3. Be sure you have read access to the YAML file you created from the server
you are running on.

6. If you don't see JP2s being created (or recreated) for your content, this
is probably due to one of the following problems:

1. The content metadata generated by pre-assembly didn't set a resource
type or set a resource type other than "image" or "page". Assembly
will only create jp2s for images containing in resources marked as
"image" or "page". Pre-assembly will do this automatically for
:simple_image and :simple_book projects, but check the output of the
content metadata to be sure.

2. The image was not a mimetype of 'image/jpeg' or 'image/tiff'. Any
other mimetype will be ignored.

3. Your input image was corrupt or missing a color space profile. This
will usually cause the jp2-create robot to fail and throw an error in
that workflow step.

4. You had an existing JP2 in the directory that matched a tiff or jpeg.
In this case the jp2-create robot will not overwrite any existing
files just to be safe.

5. You had an existing JP2 in the directory that matched a DPG style
filename (e.g. if you had existing tiff called `xx000yy1111_00_01.tif`
and a jp2 called `xx000yy1111_05_01.jp2`), you will not get another jp2
from that tiff even though there would not be a filename clash, under
the principle that it refers to the same image).

It is possible to force add color profiles to a single image or all of the
images in a given directory:

source_img = Assembly::Image.new('/input/path_to_file.tif') # add to a single image
source_img.add_exif_profile_description('Adobe RGB 1998')

or

Assembly::Images.batch_add_exif_profile_description('/full_path_to_tifs','Adobe RGB 1998') # add to multiple images

8. If you see incorrect content metadata being generated, note that the 'Process : Content Type' tag for each
existing object will be examined. If a known type is set in this tag, it
will be used to create content metadata instead of the default set in
[project_style](:content_structure). Check the tag for each object if the
style is not matching what you have set in the YAML configuration file.
Also note that if `content_md_creation[:style]` is set to 'none', then no
content metadata will be generated.

## Post Accessioning Reports

Use [Argo](https://argo.stanford.edu/).

## Manifests (applicable to both the legacy command line app and the contemporary web app)
## Manifests

Manifests are a way of indicating which objects you will be accessioning. A
manifest file is a CSV, UTF-8 encoded file and works for projects which have
Expand Down