Skip to content
This repository was archived by the owner on May 17, 2026. It is now read-only.

Pacific migration process

Paul Danelli edited this page Jul 12, 2021 · 1 revision

Prepare files for import (Copying files out of fedora data dir to useable locations)

  1. Get Francesco to update fcrepo.binary.directory with latest dump of files
  2. Create a hash of file checksums to filenames:
files_hash = {}
# Don't include file sets that don't have a depositor because they're blank (and don't have titles or checksums)
response = ActiveFedora.solr.conn.get 'select', params: { q: 'has_model_ssim:FileSet AND depositor_ssim:[* TO *] AND digest_ssim:[* TO *]', fl: ["id", "digest_ssim", "title_tesim"], rows: 100000 }
response["response"]["docs"].map do |d|
  hash = d["digest_ssim"].first.split(':').last
  files_hash[hash] ||= []
  files_hash[hash] += [d["title_tesim"].first]
end
outfile = File.new('/tmp/files_hash.txt', 'w+')
outfile.puts files_hash
outfile.close
  1. Download this hash: kubectl --namespace pac-r-one cp <pod-name>:/tmp/files_hash.txt ./files_hash.<date>.txt
  2. Compare this with the previous hash and find only the new items
  3. Create gsutil commands for copying/renaming:
def pair_tree
  [str.slice(0,2), str.slice(2,2), str.slice(4,2)].join('/')
end

infile = File.open('files_hash.txt')
outfile = File.open('gsutil_commands', 'w+')
lines = infile.readlines
lines.each do |line|
  parts = line.split(',')
  hash = parts[0]
  parts[1..].each do |f| 
    outfile.puts "echo gsutil cp gs://us-statictenant-chris-import/fcrepo.binary.directory/#{pair_tree(hash)}/#{hash} \"gs://us-statictenant-chris-import/#{f.strip}\""
    outfile.puts "gsutil cp gs://us-statictenant-chris-import/fcrepo.binary.directory/#{pair_tree(hash)}/#{hash} \"gs://us-statictenant-chris-import/#{f.strip}\""
  end
end
outfile.close
  1. Run and capture output: bash gsutil_commands > gsutil_commands.log
  2. Don't forget to make blank files for any empty rows

Copy users

  1. Dump important user information from Hyku1:
user_array = User.all.collect {|u| { id: u.id, email: u.email, display_name: u.display_name } }
  1. Load user_array into new Hyku, generating new passwords:
users = File.readlines('/tmp/users.txt').map(&:strip!)
user_array = users.map {|u| eval(u) }
user_array.reject {|u| u[:email].in? User.pluck(:email) }.each do |u|
  User.create!(id: u[:id], email: u[:email], display_name: u[:display_name], password: Devise.friendly_token.first(8)) unless User.exists?(u[:id])
end

Copy Admin Sets

  1. Dump important admin set information (id, title) from Hyku 1:
admin_set_array = ActiveFedora::SolrService.get('has_model_ssim:AdminSet', fl: [:id, 'title_tesim'], rows: 1000)['response']['docs'] 
  1. Load any new admin sets into new Hyku:
admin_set_array = File.readlines('/tmp/admin_sets.txt').map(&:strip).map {|l| eval(l) }
Hyrax::Workflow::WorkflowImporter.load_workflows
admin_set_id = AdminSet.find_or_create_default_admin_set_id
AdminSet.find(admin_set_id).update_index
#Skip the default admin set since it already exists
admin_set_array.slice(1..).each do |a| 
  Hyrax::AdminSetCreateService.call(admin_set: AdminSet.new(id: a['id'], title: a['title_tesim']), creating_user: nil)
end

Copy Collections

  1. Dump important collection information (id, title, description, visibility) from Hyku 1:
ActiveFedora::SolrService.get('has_model_ssim:Collection', fl: ['id', 'title_tesim', 'description_tesim', 'visibility_ssi'], rows: 1000)['response']['docs']
  1. Load any new collections into new Hyku:
collection_array = File.readlines('/tmp/collections.txt').map(&:strip).map {|l| eval(l) }
collection_type = Hyrax::CollectionType.first
user = User.find_by(email: "elisa.barrett@ubiquitypress.com")
collection_array.each do |ca|
  c = Collection.new({id: ca['id'], title: ca['title_tesim'], description: ca['description_tesim'], visibility: ca['visibility_ssi'], collection_type: collection_type})
  c.apply_depositor_metadata("elisa.barrett@ubiquitypress.com")
  c.save
  Hyrax::Collections::PermissionsCreateService.create_default(collection: c, creating_user: user)
end

Run the migration!

  1. Export from Pacific Live (full database, NewsClipping, and TextWork)
  2. Tranform exports to prepare them for import:
bundle exec rails r bin/transform_export_batch.rb -i ~/Downloads/all_database.20210412.csv -o ~/Downloads/all_database.20210412.mod.csv
bundle exec rails r bin/transform_export_batch.rb -i ~/Downloads/news_clipping.20210412.csv -o ~/Downloads/news_clipping.20210412.mod.csv
bundle exec rails r bin/transform_export_batch.rb -i ~/Downloads/text_work.20210412.csv -o ~/Downloads/text_work.20210412.mod.csv
  1. Import these

Post-migration data cleanup

  1. Gather important information about files from the old server:
AccountElevator.switch!(Account.find_by(name: 'live').cname)
ids = File.readlines('/tmp/ids.txt').map(&:strip)
mapping = ids.collect do |id|
  doc = ActiveFedora::SolrService.get("id:#{id}", fl: [:id, :hasRelatedMediaFragment_ssim, :hasRelatedImage_ssim, :file_set_ids_ssim])['response']['docs'][0]
  fs_docs = []
  fs_docs = ActiveFedora::SolrService.get("id:(#{doc['file_set_ids_ssim'].join(" OR ")})", rows:1000, fl: [:id, :label_ssi, :description_tesim, :visibility_ssi, :date_uploaded_dtsi, :visibility_during_embargo_ssim, :visibility_after_embargo_ssim, :embargo_release_date_dtsi, :embargo_history_ssim])['response']['docs'] if doc['file_set_ids_ssim'].present?
  { id => fs_docs.collect {|fs| { id: fs['id'], label: fs['label_ssi'], description: fs.fetch('description_tesim', []), visibility: fs['visibility_ssi'], thumbnail: (fs['id'] == doc['hasRelatedImage_ssim']&.first),  representative: (fs['id'] == doc['hasRelatedMediaFragment_ssim']&.first), date_uploaded: (DateTime.parse(fs['date_uploaded_dtsi']) rescue nil).to_s, visibility_during_embargo: fs['visibility_during_embargo_ssim']&.first, visibility_after_embargo: fs['visibility_after_embargo_ssim']&.first, embargo_release_date: (DateTime.parse(fs['embargo_release_date_dtsi']) rescue nil).to_s, embargo_history: fs.fetch('embargo_history_ssim', []) } } }
end
fileout = File.open('/tmp/file_mapping.txt', 'w+')
fileout.puts mapping
fileout.close
  1. Also work visibility:
work_visibility = ActiveFedora::SolrService.get('generic_type_sim:Work', fl: ['id', 'visibility_ssi'], rows: 100000)['response']['docs']
  1. Load information on the new Hyku:
AccountElevator.switch!(Account.find_by(name: 'pacific').cname)
mapping = File.readlines('/tmp/file_mapping.txt').map { |m| eval(m.strip) }
work_visibility = File.readlines('/tmp/work_visibility_hash.txt').map {|m| eval(m.strip) }.first
mapping.each do |map|
  HykuAddons::PostMigrationFixWorkJob.perform_later(map.keys.first, map.values, work_visibility[map.keys.first])
end

Post-migration split work source field (split on |) for any work with a multi-valued source. Only an issue with the sample test work Tom created.

Post-migration make users admins again (looking up this info from old system) u.add_role :admin, Site.instance

Post-migration set titles of collections again because they will have reverted to ids in the migration. Fix collection titles:

collection_array = File.readlines('/tmp/collections.txt').map(&:strip).map {|l| eval(l) }
collection_array.each do |ca|
  coll = Collection.find(ca['id'])
  next if coll.title == ca['title_tesim']
  coll.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX
  coll.update(title: ca['title_tesim'])
end

Then reindex works:

AccountElevator.switch!(Account.find_by(name: 'pacific').cname)
need_indexing = ActiveFedora::SolrService.get('generic_type_sim:Work AND member_of_collections_ssim:/[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}/', fl: [:id, :member_of_collections_ssim], rows:10000)['response']['docs']
need_indexing.map {|l| HykuAddons::ReindexWorkJob.perform_later(l['id'])}

Post-migration create missing workflow records:

user = User.first
missing_workflows = ActiveFedora::SolrService.get('generic_type_sim:Work AND -workflow_state_name_ssim:[* TO *]', rows:1000, fl: [:id])['response']['docs'].pluck('id')
missing_workflows.each do |work_id|
  work = ActiveFedora::Base.find(work_id)
  next if Sipity::Entity.find_by(proxy_for_global_id: work.to_global_id.to_s).present?
  ::Hyrax::Workflow::WorkflowFactory.create(work, work.attributes, user)
end

Post-migration look for files with spaces and remigrate them! Also look for any collisions for files named "Main File"!!!

Post-migration copy over featuredworks. There are no featured works in the old system.

Turn Import mode off on the tenant.

Clone this wiki locally