Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WORK IN PROGRESS] - preparsing job recognises uploaded archive better #492

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

philippbayer
Copy link
Member

@philippbayer philippbayer commented Mar 27, 2018

  • use bash tools to check file type

  • use the right gem to extract from the archive

  • make sure everything downstream still works, needs more testing from me

  • Add more test files in different formats

end

# now that they are unzipped, check if they're actually proper files
file_is_ok = false
fh = File.open(genotype.genotype.path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this ever work?? It parses the original uploaded file the way this looks like, that can be anything before extraction

end

# now that they are unzipped, check if they're actually proper files
file_is_ok = false
fh = File.open(genotype.genotype.path)
fh = File.open "#{Rails.root}/public/data/#{genotype.fs_filename}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.

rescue
logger.info "nothing to unzip, seems to be a text-file in the first place"
else
system("cp #{genotype.genotype.path} #{Rails.root}/public/data/#{genotype.fs_filename}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.

is_collection = true
end


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line detected.

is_collection = true
when /Zip archive data/
logger.info "File is zip"
reader = Zip::File.method("open")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

reader = Gem::Package::TarReader.method("new")
is_collection = true
when /Zip archive data/
logger.info "File is zip"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

is_collection = false
when /gzip compressed data, was/
reader = Zlib::GzipReader.method("open")
logger.info "File is gz"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

reader = File.method("open")
is_collection = false
when /gzip compressed data, was/
reader = Zlib::GzipReader.method("open")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

case filetype
when /ASCII text/
logger.info "File is flat text"
reader = File.method("open")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

filetype = %x{file #{genotype.genotype.path}}
case filetype
when /ASCII text/
logger.info "File is flat text"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer single-quoted strings when you don't need string interpolation or special symbols.

#
# There are two possible outcomes - file is a collection of files (tar, tar.gz, zip)
# or file is a single file (ASCII, gz)
filetype = %x{file #{genotype.genotype.path}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks around command string.

rescue
logger.info "nothing to unzip, seems to be a text-file in the first place"
else
system("cp #{genotype.genotype.path} #{Rails.root.join('public', 'data', genotype.fs_filename)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.
Line is too long. [103/100]

system("mv #{Rails.root}/tmp/#{genotype.fs_filename}.csv #{Rails.root}/public/data/#{genotype.fs_filename}")
logger.info "copied file"
zipfile.extract(biggest, Rails.root.join('tmp', "#{genotype.fs_filename}.csv"))
system("mv #{Rails.root.join('tmp', "#{genotype.fs_filename}.csv")} #{Rails.root.join('public', 'data',genotype.fs_filename)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.
Line is too long. [135/100]
Space missing after comma.

logger.info 'file is gz'
is_collection = false
when /gzip compressed data, last modified/
reader = -> (zipfile) { Gem::Package::TarReader.new(Zlib::GzipReader.open(zipfile)) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use spaces between -> and opening brace in lambda literals

rescue
logger.info "nothing to unzip, seems to be a text-file in the first place"
else
system("cp #{genotype.genotype.path} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Rails.root.join('path', 'to') instead.

logger.info "copied file"
zipfile.extract(biggest, Rails.root.join('tmp', "#{genotype.fs_filename}.csv"))
system("mv #{Rails.root.join('tmp', "#{genotype.fs_filename}.csv")} \
#{Rails.root.join('public', 'data',genotype.fs_filename)}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space missing after comma.

logger.info 'file is gz'
is_collection = false
when /gzip compressed data, last modified/
reader = ->(zipfile){ Gem::Package::TarReader.new(Zlib::GzipReader.open(zipfile)) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space missing to the left of {.

#
# There are two possible outcomes - file is a collection of files (tar, tar.gz, zip)
# or file is a single file (ASCII, gz)
filetype = `file #{genotype.genotype.path}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some .docx files as well.. not sure if after extraction or before..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah, some folks aren't great at uploading the right file types. I'd say we should reject anything that's not ASCII post-unzip :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I also saw a few post-genotype analyses from Promethease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants