Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need test data to exercise all expected wrinkles for automated tests #199

Closed
ndushay opened this issue Sep 5, 2018 · 8 comments
Closed
Assignees

Comments

@ndushay
Copy link
Contributor

ndushay commented Sep 5, 2018

In writing the automatable "smoke test" of run_preassembly_job we learned some stuff about image processing (e.g. we need exif installed on the new VMs) and it seems like a grand idea to have test data with manifests we can use that will exercise as many wrinkles as we're likely to encounter in the data.

Is it sufficient to create manifest files for the bundle_input_xx directories here https://github.com/sul-dlss/pre-assembly/tree/master/spec/test_data so we can use them? Or do we want to list out the different cases and match to the list?

My intent is to change the names of those directories from bundle_input_g to image_data and audio_data and so on, btw, so listing out the different cases may make sense.

We are currently using bundle_input_g in our end-to-end test, which seems to have .gif and .tif files organized in 2 diff ways.

@blalbrit
Copy link
Contributor

blalbrit commented Sep 6, 2018

Hi @ndushay - I agree that we want to have tests that match our input patterns. @peetucket and I were chatting a bit about this today, esp. regarding the exif installation (I'd be interested to know how that is being exercised by preassembly, since the heavy exif lifting happens in assembly on the robots machines... though I don't doubt it might be used in preassembly as well, just would like to know how/where).

bundle_input_g is organized in the three ways we often used to get content from one supplier - DPG. it is more complicated than our real-world use-cases as they have evolved.

The content organization models we want to support in this app are:

  • single flat directory with files mapped to druids in the manifest (bundle_input_a)
  • object level files with content flattened inside them and folders mapped to druids in the manifest (bundle_input_f)
  • smpl-style (bundle_input_e)

We would not expect a mixture of any of those styles (that is, the content for a preassembly job would be organized into one of those three types)

Each of those three bundle_input directories could have a config file connected to them. I may have misunderstood what you said above about manifest files, but those three directories each have existing good manifest files.

Hope that helps - don't hesitate to ping me for further clarification.

B.

@ndushay ndushay self-assigned this Sep 10, 2018
@ndushay
Copy link
Contributor Author

ndushay commented Sep 10, 2018

@blalbrit following up: exif is called here: https://github.com/sul-dlss/pre-assembly/blob/v3-legacy/lib/pre_assembly/bundle.rb#L594-L611

This shows the lovely error we get when exif is not installed - not helpful: https://travis-ci.org/sul-dlss/pre-assembly/builds/424596567

our code still seems to think validating files is an option, not a requirement - hope that's correct.

@blalbrit
Copy link
Contributor

Thanks @ndushay - validating files s/b an option in the legacy branch, but not in the new branch (image issues get caught in assembly anyway).

@ndushay
Copy link
Contributor Author

ndushay commented Sep 11, 2018

@blalbrit can you see any value in keeping any of these?

  • bundle_input_b (seems to be Rumsey data without a manifest)
  • bundle_input_c (seems to be reid dennis data)
  • bundle_input_d (seems to be gould data without a manifest)

or any of the wrinkles exercised by the project_config_files using them?

@ndushay
Copy link
Contributor Author

ndushay commented Sep 11, 2018

@blalbrit can you see any value in keeping any of these?

  • bundle_input_b (seems to be Rumsey data without a manifest)
  • bundle_input_c (seems to be reid dennis data with a manifest)
  • bundle_input_d (seems to be gould data without a manifest)

or any of the wrinkles exercised by the project_config_files using them?

@blalbrit
Copy link
Contributor

It looks like bundle_b is testing: get druid from container AND pass object-level descMD through from the container.

bundle_c is testing: get some (maybe not all?) content based on yaml config choices.

bundle_d is testing: get content from DPG-style legacy folder structures.

In all cases, for the app, those options/structures are not going to be used and therefore probably aren't useful to keep. They are likely (or definitely) still useful on the Legacy branch.

@ndushay

@ndushay
Copy link
Contributor Author

ndushay commented Sep 11, 2018

Thanks for that answer, @blalbrit. I ain't touching nuthin on the Legacy branch.

@ndushay
Copy link
Contributor Author

ndushay commented Sep 15, 2018

i'm calling this done.

@ndushay ndushay closed this as completed Sep 15, 2018
@ndushay ndushay removed the review label Sep 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants