Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportUrlJob Fails on Common Urls #3467

Closed
orangewolf opened this issue Jan 4, 2019 · 7 comments
Closed

ImportUrlJob Fails on Common Urls #3467

orangewolf opened this issue Jan 4, 2019 · 7 comments

Comments

@orangewolf
Copy link
Member

Descriptive summary

In both 2.x and master, the ImportUrlJob fails any time a URL ends in a directory. An example: "http://commons.ptsem.edu/?cover=temperancerallyi00hull" will fail, because even though it has params, its path is "/". "http://commons.ptsem.edu/somepath/?cover=temperancerallyi00hull" would also fail because the path would be "/somepath/".

Rationale

There is a bug in https://github.com/samvera/hyrax/blob/master/app/jobs/import_url_job.rb#L65 which is where we pick a file name. Another convention entirely needs to be used or fallbacks should be provided.

Expected behavior

Any url that returns a file should be able to be uploaded using the ImportUrlJob

Actual behavior

Is a directory @ rb_sysopen - /tmp/d20190104-32433-ff4tes/
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:69:in `initialize'
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:69:in `open'
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:69:in `copy_remote_file'
/opt/atla/shared/vendor/bundle/ruby/2.5.0/gems/hyrax-2.3.3/app/jobs/import_url_job.rb:34:in `perform'

Steps to reproduce the behavior

  1. Create a work with remote_files set to ["http://commons.ptsem.edu/?cover=temperancerallyi00hull"]
  2. Job will fail

Related work

Link to related tickets or prior related work here.

@no-reply no-reply added the bug label Jan 4, 2019
@no-reply no-reply added this to the 3.x series milestone Jan 4, 2019
@no-reply
Copy link
Member

no-reply commented Jan 4, 2019

@orangewolf: to make sure I understand correctly: this is because URI#path returns '/' for these URL strings?

@orangewolf
Copy link
Member Author

orangewolf commented Jan 4, 2019

Yep, it causes the tmp file to be a directory which it then can not write too. Happy to submit a fix for this, but would like to get some idea of how we want to fix it first =)

this is what we did to get around the problem:

    # Make sure the file we write has a usable name
    # @param uri [URI] the uri of the file to download
    def safe_filename(uri)
      filename = File.basename(uri.path)
      filename.gsub!('/', '')
      filename.present? ? filename : file_set.id
    end

@no-reply
Copy link
Member

no-reply commented Jan 4, 2019

What about using the #path + #query + #fragment?

It seems like the intention is to populate the file metadata with a name as close to the original as possible(?)

@orangewolf
Copy link
Member Author

I like it.

@jlhardes
Copy link
Contributor

@orangewolf this is still open but it looks like there is an agreed upon path forward. Is this done or is it still possible for you to submit the fix for this?

@orangewolf
Copy link
Member Author

I vaguely remember this =-) I can take time Friday (my next day with community time carved out) to see where this is at and either point it at the finished work or see if I can get it done to no-replys specification.

@orangewolf orangewolf self-assigned this Jan 28, 2021
@orangewolf
Copy link
Member Author

@jlhardes I can confirm this is not longer an issue in 3.x - the way file names are determined has completely changed. This can be closed and samvera/bulkrax#270 will track the changes needed to Bulkrax to accommodate this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

3 participants