Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OJS3] Filename problems #2003

Closed
pkel opened this issue Nov 16, 2016 · 18 comments
Closed

[OJS3] Filename problems #2003

pkel opened this issue Nov 16, 2016 · 18 comments
Assignees
Milestone

Comments

@pkel
Copy link
Contributor

pkel commented Nov 16, 2016

This article was inserted in the system by using the native XML import on this file.

If you try to download the the R Source Package or the R example code, you will see that the file extensions are not preserved. .tar.gz is replaced by .gz and .R by .r.

Perhaps this is related to #1924, where the same extensions cause problems on migration.

@asmecher asmecher added this to the OJS 3.0.2 milestone Nov 16, 2016
@pkel
Copy link
Contributor Author

pkel commented Nov 23, 2016

This problem is a general problem and is not due xml import. I will add a different issue.

Edit. Since this is in milestone, I add information here.

1. No filename set on public download

If the download is triggered from a readers view, ojs does not specify a filename. downloadFile() in lib/pkp/classes/file/FileManager.inc.php then defaults to server-side name.

This can be solved by adding the line

if(!isset($filename)) $filename = $submissionFile->getClientFileName();

to downloadFile() in lib/pkp/classes/file/SubmissionFileManager.inc.php

2. getClientFileName() is not capable of handling some extensions

.R -> .r
.RData -> .rdata
.tar.gz -> .gz

This wrong behaviour happens also on generating server-side filenames.

3. Use original filename instead

I adopted getClientFileName() to return getOriginalFilename(). We need this feature because we are dealing with software packages where generated filenames don't make sense. Any chance of pushing this back? Can you give a hint where I can read off how to insert a new option to the config and then use it in a library function?

@pkel pkel closed this as completed Nov 23, 2016
@pkel pkel reopened this Nov 23, 2016
@pkel pkel changed the title [OJS3] Wrong file extensions after native XML import [OJS3] Filename problems Nov 23, 2016
@asmecher
Copy link
Member

@bozana, did #1924 resolve this?

@bozana
Copy link
Collaborator

bozana commented Jan 25, 2017

No this is a general problem, I think, that the file names are saved only with the 'wrong' ending, e.g. .gz instead of .tar.gz, and thus, when downloading the suggested file name to save is with that 'wrong' or as mentioned above without an ending.
I will have to take a deeper look to see what are the solution possibilities...

@pkel
Copy link
Contributor Author

pkel commented Jan 25, 2017

No this is a general problem, I think, that the file names are saved only with the 'wrong' ending, e.g. .gz instead of .tar.gz, and thus, when downloading the suggested file name to save is with that 'wrong' or as mentioned above without an ending.

I came to the same conclusion, while patching the code. My solution was

I adopted getClientFileName() to return getOriginalFilename().

Another suggestion is to change the pkp naming scheme to a prefix scheme. This would circumvent all problems with file endings. Currently package-v1.tar.gz is renamed to a weird looking concatenation of staging information XY-11-foo-bar.gz. If you rename to XY-11-foo-bar.package-v1.tar.gz instead, you can easily go back to the original filename afterwards by splitting on the first dot (assuming your prefix does not contain dots). Reading the file extension from a file name will be problematic and probably requires a sort of (incomplete) whitelist.

If fiddling with the server side file naming scheme is to much for a minor release, getClientName() could be tweaked to append the original filename from the database to some staging indentifiers to make the filename unique. Ideally this happens optionally and configurable as proposed before.

@bozana
Copy link
Collaborator

bozana commented Jan 25, 2017

I believe we cannot use the original name, especially not in the backend, because the users could use unanonymized names... Thus I will have to take a look...
and maybe @asmecher has immediately an idea how to best solve this?

@asmecher
Copy link
Member

It would be a mistake (I think) to alter server-side filenames in the general case -- though it might be good to correct incorrectly-suffixed double-barreled suffixes (e.g. .tar.gz) for server-side storage. (The server-side filename doesn't have to have any relevance to the download filename or make any sense at all.)

There seems to be general confusion about how to handle compound extensions. See exhibit 1 and exhibit 2. The pathinfo tools (rightly, but maybe small-mindedly) consider the .gz to be the only "true" extension, and in UNIX dogma, file extensions are informational only and the system shouldn't care about them.

I would suggest adding a special case for filenames ending in .tar.gz -- we may have to add others -- and otherwise continuing to look only at the extension as defined by pathinfo.

@pkel
Copy link
Contributor Author

pkel commented Jan 26, 2017

I believe we cannot use the original name, especially not in the backend, because the users could use unanonymized names... Thus I will have to take a look...

Is this the reason for building this complex workaround? Can't we assume that users who are able to avoid name leakage within a file also can set an anonymized filename?

What about the case, where non original filenames render the workflow into a nightmare? I talk about software packages, where (a) the filename is the name of the package or (b) a script for reproduction assumes specific filenames on i.e. datasets. Can you consider an option to switch between generated and original filenames?

@bozana bozana modified the milestones: OJS 3.1, OJS 3.0.2 Jan 30, 2017
@bozana
Copy link
Collaborator

bozana commented Mar 24, 2017

Hi @pkel, I'v just taken a look at this again. The actual problem is only in OJS 2.4.x, right? -- In OJS 3.x the original file name is part of the suggested default file name and the file name can be renamed/edited. Thus, that problem seems to be solved in 3.x, right?
For the current stable OJS 2.4.8-2 version: the lower and upper cases in extensions are preserved, thus I could not reproduce this:

.R -> .r
.RData -> .rdata

The problem with .tar.gz still exist in OJS 2.4.8-2 and we will try to consider that...

@bozana
Copy link
Collaborator

bozana commented Mar 24, 2017

@asmecher, could you take a look at this PR: #2382
It only considers .tar.gz for now...
lower and upper cases seem to be always preserved...

bozana added a commit to bozana/pkp-lib that referenced this issue Mar 24, 2017
@pkel
Copy link
Contributor Author

pkel commented Mar 24, 2017

My post definitly was about ojs3. We delayed migration, that's why I'm not in the topic any more. I can reconstruct the followong from above:

  • Handling of some file extensions during server side file name generation was broken. (.R, .RData, .tar.gz)
  • Server side names where exposed to users when downloading files from the public frontend. I fixed this by defaulting to client file name as decribed in my second post. This fix was not applied. A pull request will follow. It may be that this was fixed differently.
  • It would be nice to have an option to set download names to original names for selected "Article Components". We then could apply this to our software packages where generated file names add no value. Currently we have to enforce this for all downloads by modification of getClientFilename().

pkel pushed a commit to pkel/pkp-lib that referenced this issue Mar 24, 2017
As proposed in pkp#2003. This is a fix I applied on Nov 28 to my OJS3 installation. I am not sure whether this was fixed somewhere else.
@bozana
Copy link
Collaborator

bozana commented Mar 24, 2017

Thanks @pkel. I will then take a look at OJS 3.x too and will have to ask @asmecher for some decisions...

bozana added a commit that referenced this issue Mar 27, 2017
#2003 consider TAR GZ as extension for the server file name
@bozana
Copy link
Collaborator

bozana commented Mar 27, 2017

@asmecher, we have to do here (in OJS 3.x) with several things:

  1. we always provide the files for download with lower case extension, which apparently is not always good -- e.g. apparently, in this example, .R and .RData should stay as they are in order to be correctly interpreted. Thus shall we change it to provide the correct/original extension, i.e. with upper and lower cases?
  2. we provide server file name for galley files download. Is this correct? Should we change it to the client file name (function getClientFileName also used elsewhere, for the download of the files in the backend)? Or even to something else?
  3. Apparently there is also a need to sometimes (i.e. as an option) provide the original file names for download. Is this something we would like consider?

Would it maybe make sense to provide the user entered localized file name also for download?
THANKS!!!

bozana added a commit to bozana/pkp-lib that referenced this issue Mar 29, 2017
bozana added a commit that referenced this issue Mar 29, 2017
#2003 consider TAR GZ as extension for the server file name
@bozana
Copy link
Collaborator

bozana commented Mar 29, 2017

Consider .tar.gz PRs:
ojs-stable-2_4_8: #2382
master: #2394

@bozana
Copy link
Collaborator

bozana commented Mar 29, 2017

Download file name to client file name and preservation of the upper and lower cases in the extension PR:
master: #2395

@bozana
Copy link
Collaborator

bozana commented Mar 29, 2017

@asmecher, could you take a look at this PR: #2395 ? -- Above all, if we want it to be so, or maybe to just one of the features/commits there?
Also, what about a setting option to either provide the client file name or original name as requested in this issue?

@asmecher
Copy link
Member

Thanks, @bozana -- this looks OK to me, though I haven't tested it. The server-side filenames are generated using SubmissionFile::_generateFilename, which still includes a strtolower_codesafe -- thus content added before/after this change should still map to an existing file on the server-side.

bozana added a commit that referenced this issue Mar 30, 2017
#2003 download file names and their extensions
@bozana
Copy link
Collaborator

bozana commented Apr 11, 2017

@pkel, the first two issues are solved and merged. We will not provide a setting for the file names (at least not for now), so may I close this issue?

@pkel
Copy link
Contributor Author

pkel commented Apr 17, 2017

Thanks.

@pkel pkel closed this as completed Apr 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants