This repository was archived by the owner on May 17, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Pacific migration process
Paul Danelli edited this page Jul 12, 2021
·
1 revision
- Get Francesco to update fcrepo.binary.directory with latest dump of files
- Create a hash of file checksums to filenames:
files_hash = {}
# Don't include file sets that don't have a depositor because they're blank (and don't have titles or checksums)
response = ActiveFedora.solr.conn.get 'select', params: { q: 'has_model_ssim:FileSet AND depositor_ssim:[* TO *] AND digest_ssim:[* TO *]', fl: ["id", "digest_ssim", "title_tesim"], rows: 100000 }
response["response"]["docs"].map do |d|
hash = d["digest_ssim"].first.split(':').last
files_hash[hash] ||= []
files_hash[hash] += [d["title_tesim"].first]
end
outfile = File.new('/tmp/files_hash.txt', 'w+')
outfile.puts files_hash
outfile.close
- Download this hash:
kubectl --namespace pac-r-one cp <pod-name>:/tmp/files_hash.txt ./files_hash.<date>.txt - Compare this with the previous hash and find only the new items
- Create gsutil commands for copying/renaming:
def pair_tree
[str.slice(0,2), str.slice(2,2), str.slice(4,2)].join('/')
end
infile = File.open('files_hash.txt')
outfile = File.open('gsutil_commands', 'w+')
lines = infile.readlines
lines.each do |line|
parts = line.split(',')
hash = parts[0]
parts[1..].each do |f|
outfile.puts "echo gsutil cp gs://us-statictenant-chris-import/fcrepo.binary.directory/#{pair_tree(hash)}/#{hash} \"gs://us-statictenant-chris-import/#{f.strip}\""
outfile.puts "gsutil cp gs://us-statictenant-chris-import/fcrepo.binary.directory/#{pair_tree(hash)}/#{hash} \"gs://us-statictenant-chris-import/#{f.strip}\""
end
end
outfile.close
- Run and capture output:
bash gsutil_commands > gsutil_commands.log - Don't forget to make blank files for any empty rows
- Dump important user information from Hyku1:
user_array = User.all.collect {|u| { id: u.id, email: u.email, display_name: u.display_name } }
- Load user_array into new Hyku, generating new passwords:
users = File.readlines('/tmp/users.txt').map(&:strip!)
user_array = users.map {|u| eval(u) }
user_array.reject {|u| u[:email].in? User.pluck(:email) }.each do |u|
User.create!(id: u[:id], email: u[:email], display_name: u[:display_name], password: Devise.friendly_token.first(8)) unless User.exists?(u[:id])
end
- Dump important admin set information (id, title) from Hyku 1:
admin_set_array = ActiveFedora::SolrService.get('has_model_ssim:AdminSet', fl: [:id, 'title_tesim'], rows: 1000)['response']['docs']
- Load any new admin sets into new Hyku:
admin_set_array = File.readlines('/tmp/admin_sets.txt').map(&:strip).map {|l| eval(l) }
Hyrax::Workflow::WorkflowImporter.load_workflows
admin_set_id = AdminSet.find_or_create_default_admin_set_id
AdminSet.find(admin_set_id).update_index
#Skip the default admin set since it already exists
admin_set_array.slice(1..).each do |a|
Hyrax::AdminSetCreateService.call(admin_set: AdminSet.new(id: a['id'], title: a['title_tesim']), creating_user: nil)
end
- Dump important collection information (id, title, description, visibility) from Hyku 1:
ActiveFedora::SolrService.get('has_model_ssim:Collection', fl: ['id', 'title_tesim', 'description_tesim', 'visibility_ssi'], rows: 1000)['response']['docs']
- Load any new collections into new Hyku:
collection_array = File.readlines('/tmp/collections.txt').map(&:strip).map {|l| eval(l) }
collection_type = Hyrax::CollectionType.first
user = User.find_by(email: "elisa.barrett@ubiquitypress.com")
collection_array.each do |ca|
c = Collection.new({id: ca['id'], title: ca['title_tesim'], description: ca['description_tesim'], visibility: ca['visibility_ssi'], collection_type: collection_type})
c.apply_depositor_metadata("elisa.barrett@ubiquitypress.com")
c.save
Hyrax::Collections::PermissionsCreateService.create_default(collection: c, creating_user: user)
end
- Export from Pacific Live (full database, NewsClipping, and TextWork)
- Tranform exports to prepare them for import:
bundle exec rails r bin/transform_export_batch.rb -i ~/Downloads/all_database.20210412.csv -o ~/Downloads/all_database.20210412.mod.csv
bundle exec rails r bin/transform_export_batch.rb -i ~/Downloads/news_clipping.20210412.csv -o ~/Downloads/news_clipping.20210412.mod.csv
bundle exec rails r bin/transform_export_batch.rb -i ~/Downloads/text_work.20210412.csv -o ~/Downloads/text_work.20210412.mod.csv
- Import these
- Gather important information about files from the old server:
AccountElevator.switch!(Account.find_by(name: 'live').cname)
ids = File.readlines('/tmp/ids.txt').map(&:strip)
mapping = ids.collect do |id|
doc = ActiveFedora::SolrService.get("id:#{id}", fl: [:id, :hasRelatedMediaFragment_ssim, :hasRelatedImage_ssim, :file_set_ids_ssim])['response']['docs'][0]
fs_docs = []
fs_docs = ActiveFedora::SolrService.get("id:(#{doc['file_set_ids_ssim'].join(" OR ")})", rows:1000, fl: [:id, :label_ssi, :description_tesim, :visibility_ssi, :date_uploaded_dtsi, :visibility_during_embargo_ssim, :visibility_after_embargo_ssim, :embargo_release_date_dtsi, :embargo_history_ssim])['response']['docs'] if doc['file_set_ids_ssim'].present?
{ id => fs_docs.collect {|fs| { id: fs['id'], label: fs['label_ssi'], description: fs.fetch('description_tesim', []), visibility: fs['visibility_ssi'], thumbnail: (fs['id'] == doc['hasRelatedImage_ssim']&.first), representative: (fs['id'] == doc['hasRelatedMediaFragment_ssim']&.first), date_uploaded: (DateTime.parse(fs['date_uploaded_dtsi']) rescue nil).to_s, visibility_during_embargo: fs['visibility_during_embargo_ssim']&.first, visibility_after_embargo: fs['visibility_after_embargo_ssim']&.first, embargo_release_date: (DateTime.parse(fs['embargo_release_date_dtsi']) rescue nil).to_s, embargo_history: fs.fetch('embargo_history_ssim', []) } } }
end
fileout = File.open('/tmp/file_mapping.txt', 'w+')
fileout.puts mapping
fileout.close
- Also work visibility:
work_visibility = ActiveFedora::SolrService.get('generic_type_sim:Work', fl: ['id', 'visibility_ssi'], rows: 100000)['response']['docs']
- Load information on the new Hyku:
AccountElevator.switch!(Account.find_by(name: 'pacific').cname)
mapping = File.readlines('/tmp/file_mapping.txt').map { |m| eval(m.strip) }
work_visibility = File.readlines('/tmp/work_visibility_hash.txt').map {|m| eval(m.strip) }.first
mapping.each do |map|
HykuAddons::PostMigrationFixWorkJob.perform_later(map.keys.first, map.values, work_visibility[map.keys.first])
end
Post-migration split work Only an issue with the sample test work Tom created.source field (split on |) for any work with a multi-valued source.
Post-migration make users admins again (looking up this info from old system) u.add_role :admin, Site.instance
Post-migration set titles of collections again because they will have reverted to ids in the migration. Fix collection titles:
collection_array = File.readlines('/tmp/collections.txt').map(&:strip).map {|l| eval(l) }
collection_array.each do |ca|
coll = Collection.find(ca['id'])
next if coll.title == ca['title_tesim']
coll.reindex_extent = Hyrax::Adapters::NestingIndexAdapter::LIMITED_REINDEX
coll.update(title: ca['title_tesim'])
end
Then reindex works:
AccountElevator.switch!(Account.find_by(name: 'pacific').cname)
need_indexing = ActiveFedora::SolrService.get('generic_type_sim:Work AND member_of_collections_ssim:/[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12}/', fl: [:id, :member_of_collections_ssim], rows:10000)['response']['docs']
need_indexing.map {|l| HykuAddons::ReindexWorkJob.perform_later(l['id'])}
Post-migration create missing workflow records:
user = User.first
missing_workflows = ActiveFedora::SolrService.get('generic_type_sim:Work AND -workflow_state_name_ssim:[* TO *]', rows:1000, fl: [:id])['response']['docs'].pluck('id')
missing_workflows.each do |work_id|
work = ActiveFedora::Base.find(work_id)
next if Sipity::Entity.find_by(proxy_for_global_id: work.to_global_id.to_s).present?
::Hyrax::Workflow::WorkflowFactory.create(work, work.attributes, user)
end
Post-migration look for files with spaces and remigrate them! Also look for any collisions for files named "Main File"!!!
Post-migration copy over featuredworks. There are no featured works in the old system.
Turn Import mode off on the tenant.