ImportUrlJob Fails on Common Urls #3467

orangewolf · 2019-01-04T17:49:13Z

Descriptive summary

In both 2.x and master, the ImportUrlJob fails any time a URL ends in a directory. An example: "http://commons.ptsem.edu/?cover=temperancerallyi00hull" will fail, because even though it has params, its path is "/". "http://commons.ptsem.edu/somepath/?cover=temperancerallyi00hull" would also fail because the path would be "/somepath/".

Rationale

There is a bug in https://github.com/samvera/hyrax/blob/master/app/jobs/import_url_job.rb#L65 which is where we pick a file name. Another convention entirely needs to be used or fallbacks should be provided.

Expected behavior

Any url that returns a file should be able to be uploaded using the ImportUrlJob

Actual behavior

Is a directory @ rb_sysopen - /tmp/d20190104-32433-ff4tes/
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:69:in `initialize'
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:69:in `open'
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:69:in `copy_remote_file'
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:34:in `perform'

Steps to reproduce the behavior

Create a work with remote_files set to ["http://commons.ptsem.edu/?cover=temperancerallyi00hull"]
Job will fail

Related work

Link to related tickets or prior related work here.

no-reply · 2019-01-04T18:03:48Z

@orangewolf: to make sure I understand correctly: this is because URI#path returns '/' for these URL strings?

orangewolf · 2019-01-04T18:15:24Z

Yep, it causes the tmp file to be a directory which it then can not write too. Happy to submit a fix for this, but would like to get some idea of how we want to fix it first =)

this is what we did to get around the problem:

    # Make sure the file we write has a usable name
    # @param uri [URI] the uri of the file to download
    def safe_filename(uri)
      filename = File.basename(uri.path)
      filename.gsub!('/', '')
      filename.present? ? filename : file_set.id
    end

no-reply · 2019-01-04T18:29:22Z

What about using the #path + #query + #fragment?

It seems like the intention is to populate the file metadata with a name as close to the original as possible(?)

orangewolf · 2019-01-18T06:17:38Z

I like it.

jlhardes · 2021-01-27T22:11:46Z

@orangewolf this is still open but it looks like there is an agreed upon path forward. Is this done or is it still possible for you to submit the fix for this?

orangewolf · 2021-01-28T06:45:41Z

I vaguely remember this =-) I can take time Friday (my next day with community time carved out) to see where this is at and either point it at the finished work or see if I can get it done to no-replys specification.

orangewolf · 2021-03-03T19:05:38Z

@jlhardes I can confirm this is not longer an issue in 3.x - the way file names are determined has completely changed. This can be closed and samvera/bulkrax#270 will track the changes needed to Bulkrax to accommodate this change.

no-reply added the bug label Jan 4, 2019

no-reply added this to the 3.x series milestone Jan 4, 2019

jlhardes added this to To Do in Hyrax Maintenance WG - January-June 2021 Jan 25, 2021

orangewolf self-assigned this Jan 28, 2021

orangewolf closed this as completed Mar 3, 2021

jlhardes moved this from To Do to Done in Hyrax Maintenance WG - January-June 2021 Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImportUrlJob Fails on Common Urls #3467

ImportUrlJob Fails on Common Urls #3467

orangewolf commented Jan 4, 2019

no-reply commented Jan 4, 2019

orangewolf commented Jan 4, 2019 •

edited

no-reply commented Jan 4, 2019

orangewolf commented Jan 18, 2019

jlhardes commented Jan 27, 2021

orangewolf commented Jan 28, 2021

orangewolf commented Mar 3, 2021

ImportUrlJob Fails on Common Urls #3467

ImportUrlJob Fails on Common Urls #3467

Comments

orangewolf commented Jan 4, 2019

Descriptive summary

Rationale

Expected behavior

Actual behavior

Steps to reproduce the behavior

Related work

no-reply commented Jan 4, 2019

orangewolf commented Jan 4, 2019 • edited

no-reply commented Jan 4, 2019

orangewolf commented Jan 18, 2019

jlhardes commented Jan 27, 2021

orangewolf commented Jan 28, 2021

orangewolf commented Mar 3, 2021

orangewolf commented Jan 4, 2019 •

edited