Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload a document that is already existing but that we cannot find #40

Closed
shenriod opened this issue Feb 12, 2018 · 11 comments
Closed

Upload a document that is already existing but that we cannot find #40

shenriod opened this issue Feb 12, 2018 · 11 comments
Labels
Milestone

Comments

@shenriod
Copy link
Contributor

shenriod commented Feb 12, 2018

Short Description of the issue

When you upload a document that is already in the K-Box, but in a place where you don't have access (someone else's personal collection, trash, project where you don't have access, etc.), the system forbids you to upload the document but also doesn't allow you to see or localize the already existing document.

Expected behavior

The system should allow to localize the document and to add it to a collection where I have access. For example by customizing the error message:

  • The document you are trying to upload already exists [in the collection XXX | in a collection you do not have access to | in a private collection of another user ]. Would you like to add it to the collection YYY?
  • The document you are trying to upload already exists [in a trash]. Would you like to take it out of the trash and to add it to the collection YYY?

And, in any case, the error message should include a link to the edit or preview page of the document

Steps to reproduce the incorrect behavior

  1. Upload a document that is already in the K-Box, but in a location where you don't have access

  2. The system gives you an error message but doesn't inform you where the document is located
    screenshot

  3. You cann neither upload your document nor see the one that is already in the system

General information

Version of your K-Box: 0.18 - 0.20

URL of the page where the problem has occured: Any upload

Type and version of your browser: Firefox 58

In what language are you using your K-Box? English

@shenriod
Copy link
Contributor Author

NB (related to concerns about disclosing information about private collections): The personal collections should remain private. But I think that the following solution is ok:

• “The document you are trying to upload already exists [in the collection XXX | in a collection you do not have access to | in a private collection of another user ]. Would you like to add it to the collection YYY?”

Because:
• The error message doesn’t disclose to which user it belongs
• If I upload a document that is already in a private collection of another user, it means that I already have a physical copy of the document. It thus cannot be a private / sensitive document of this user and it is thus not problematic if I add it to any other collection

Also, the current behavior is quite ironical: If I don´t want a document to be available to my colleagues in the K-Box, the best way to do is to upload it in my personal collections, so that nobody else can upload it and so that nobody can identify where it is hiding :-)

@shenriod
Copy link
Contributor Author

Related to #7

@xamanu
Copy link
Contributor

xamanu commented May 30, 2018

Thanks, @shenriod for reporting!

We analysed the problem in-depth and see there two possible scenarios:

1. Fix this bug

In order to fix this bug we suggest to take away the "unique constraint" of a document. Which means, that in principle, a document can be uploaded twice. This will be the case, if a user uploads a document and it is already somewhere they have not access to.

In order to achieve this we:

  1. Get rid of the strong unique constraint
  2. Change preview and sharing url to be based on uuid instead of the hash (and establish redirects for existing urls). In this way is always clear what file (of the N uploaded) to show.
  3. Include a check after the upload of a document, if there is already a document with the same hash and the user has access to. In this case give a proper message, where to find it.
  4. In case the user uploads a file which already exists but they have no access, then just save the second file and handle it seperately.

This solution will not allow to produce a message like this (it'd rather just save the second file):

The document you are trying to upload already exists [in the collection XXX | in a collection you do not have access to | in a private collection of another user ].

This mesage is a message which it is worth to relfect if it make sense at all: it reveals information about other users and their documents. Something I would not like to provide also in terms of privacy.

2. Improve architecture to support several documents to refer to the same file

This is the way to improve the K-Box on a deeper level still maintaining the "unique constraint" (a duplicated document should be referenced instead of uploaded twice). This is definitively a feature request, which requires planning and implementation on significant changes of the database structure.

@xamanu
Copy link
Contributor

xamanu commented May 30, 2018

For now, we plan on fixing the bug (solution 1) in this issue. And see if the deeper changes to the database can be done later.

@xamanu xamanu added the bug label May 30, 2018
@xamanu xamanu added this to the 0.22 milestone May 30, 2018
@shenriod
Copy link
Contributor Author

Thanks @xamanu for the detailed explanation.

As usual, I probably underestimate the actual complication of the task but wouldn't it be simpler and more consistent to have only the message

The document you are trying to upload already exists [in the collection XXX | in a collection you do not have access to | in a private collection of another user ]. Would you like to add it to the collection YYY?

?

In other words, this would just allow to emulate what the K-Box Librarian can do manually (adding the already existing document to another project / collection), having access to all collections.

I see several advantages:

  1. We do not need to take away the "unique constraint" which, I think, is quite a nice feature to increase the consistency of the library
  2. If the document is already in a collection to which the user does not have access, we do not disclose where it is or to whom it belongs. We only inform the user that "it is already somewhere", so I do not see any issue with privacy.
  3. The document can be added to the wished collection in 1-click

While I perceive the following disadvantages with your option (1):

  1. This is clearly a temporary work-around, that (I assume?) will have to be reverted when we can develop a more consistent solution
  2. Allowing a duplication of files can clearly lead to issues when a document is being updated (new version uploaded), since users might not all be aware that there exist 2 parallel versions of the seemingly same document

@xamanu
Copy link
Contributor

xamanu commented May 30, 2018

Would you like to add it to the collection YYX?

Unfortunately we can not have one file referenced in two locations (with different access rules, and metadata - for both see description of Problem 1 and 2 in the next comment) in the current implementation of the database schema.

The only options are to take away the unique constraint or to enhance the database.

The document you are trying to upload already exists [ in the collection XXX | in a collection you do not have access to | in a private collection of another user ].

A message that the uploaded file is in another collection the user has no access or in some private collection, reveals some indirect information. Yes, you have to think about edge case scenarios, but it can be inconvenient. We can maintain this, but I think this information doesn't help the user anyway, so I would try to not reveal even indirectly any data.

@xamanu
Copy link
Contributor

xamanu commented May 31, 2018

Probably the complexity is easier to understand by giving an example: If we didn't go with either of our proposals, and we would just implement your suggestions, these kind of scenarios would emerge:

  • Alice uploads document D1 into her private collection P1

  • Bob uploads the same document D1 into his private collection P2.

  • The system would ask Bob to add it also into his collection P2. For Bob to have access, the K-Box "shares" D1 with his user account.

  • Bob renames the document D1 to DXZ.

  • Problem 1: Alice and Bob can see each others names in the sharing settings.

  • Problem 2: Alice, who is not aware of her document being shared can not expect it's name to suddenly change and would probably not find the document anymore.

You can change the scenario: having Bob uploading the file into a project collection and for Alice a former private document suddenly seems to be exposed to a whole group; or make Alice renaming the document title before Bob uploads, so then Bob will be confused after upload to have the challenge to look for a document with a different name that he expects.

@xamanu
Copy link
Contributor

xamanu commented May 31, 2018

And I have another interesting one:

  • Alice uploads D1 to the K-Box
  • Alice shares an accessible link with Charlotte, her client. Let's assume D1 are important information.
  • Bob uploads the same document D1 into his private collection P2
  • Bob uploads a new version of the document (with different content, or even some comments)
  • Charlotte downloads the document and receives Bob's version instead the one Alice wanted to share with here. Critical information might be spoiled.

@xamanu
Copy link
Contributor

xamanu commented Jun 11, 2018

Instead of " give a proper message, where to find it", this fix should probably include a "Click on Ok if you want to add D1 to YYY or click here if you want to see D1 in XXX. That would allow user B to directly add D1 to the collection where he originally wanted to upload it.

@xamanu
Copy link
Contributor

xamanu commented Jul 6, 2018

During the implementation the constraints of the bug fix versus a solid fix in improving the data base structure are becoming concrete:

Expectation on the user interaction.

  • The FileAlreadyExists messages will be removed. All file uploads will succeed even if the document is a duplicate
  • When user upload a document that is a duplicate
    • if another doc the user doesn't have access: no action
    • if an old version of an existing document: no action
    • if a document I have in the trash: inform the user about the document being a duplicate of a trashed one
  • When user upload a document D1 that is a duplicate of a doc the user has access to: inform the user and offer to
    • reference the document in the collection where the upload is happening, if the document is in a project collection (no matter of the collection I'm currently uploading). If the user accept the new upload will be trashed. Please keep in mind that the owner of the document will be the one of the already existing document.
    • if the upload is performed in Personal (not in a collection under personal), the duplicate notice will be presented, but no action will be offered as personal (with no collection) implies that the user is the owner of the document. A document cannot have two owners.

Please take note that this change increases the chance to receive duplicated documents in the search result (the docs might have different titles, but are the same)

@xamanu
Copy link
Contributor

xamanu commented Jul 6, 2018

Due to architectural reasons the check of duplication can not be done during the upload process, and will be triggered after the user gets the ok message. In order to inform the user about a same document existing in the system, the K-Box will send the user who uploaded an email notification, just some minutes after the upload. This email contains information about the document the user uploaded (filename and collection it is in), the information about the same other document the user has access to (also filename and collection it is in), and a link to the edit page of the newly uploaded document. On the edit page of of this document and only to the user who uploaded will be presented the information about the duplication and the option to replace the current document with a reference to the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants