Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve indexing flow #398

Open
4 tasks
pdelboca opened this issue May 23, 2024 · 4 comments
Open
4 tasks

Improve indexing flow #398

pdelboca opened this issue May 23, 2024 · 4 comments

Comments

@pdelboca
Copy link
Member

pdelboca commented May 23, 2024

Overview

This ticket is a continuation of #391 and #397. It aims to improve our index workflow to fix some current issues.

User Story

AS a User
WHEN I manually fix errors in the datagrid and save the changes ,
I WANT the report of error to be refreshed to reflect the latest changes.

Technical details

Currently the indexing of files is done at read time (we have only one method fileIndex called upon load). It is probably a good idea to migrate this to a more traditional approach to run the report and store the results when creating/saving the file and then read it when opening the file.

This might be working but it is not quite clear, we should create new actions and methods to explicitly create and get instead of a single fileIndex method that handles multiple scenarios.

Proposal

  • Separate fileIndex into two methods: setIndex and getIndex.
  • Call setIndex when creating the file or saving it.
  • Call getIndex when loading the file (and after saving it).
  • Rename index? This is quite a technical terms and it's definitely confusing for end users. (This could be done in a separated ticket.)

This way we can have specific actions that can be called independently to better manage the sincronization of file <-> index.

@pdelboca
Copy link
Member Author

pdelboca commented May 23, 2024

@roll @romicolman @guergana let me know your thoughts!

@guergana
Copy link
Collaborator

I am with Romina on this... what do we mean by indexing? I also don't get it. Why do the users need to index the files? Indexing is usually a way of optimizing the file for faster searches, I see some validation for errors going on the in the server code. 🙈 what is the original intention of this button? I agree with @pdelboca that this terminology is too technical for end users and if this is table optimization it should be hidden from the users and if we are using this button to validate then we should give it a clearer name.

@roll
Copy link
Collaborator

roll commented May 27, 2024

I think we discussed it with Romina that for end-users it needs to be called "Validate" instead of "Index". @guergana, it's a technical term from frictionless-py and it's not for optimizing it's basically the whole process of the file ingestion into the system

@romicolman
Copy link
Collaborator

romicolman commented May 27, 2024

Hi all! A couple of comments from my side:

  • The name of the errors button (Validate) will probably change. We ran a survey, got ideas for names and we are waiting for the UX consultant to make a final decision. As you know, right now, INDEX and VALIDATE are two different buttons. However, here I suggest the ideal workflow so we can discuss if this is technically possible.

Ideal workflow

  • The user opens a tabular file with errors in the ODE.
  • The user edits cells to correct errors. Cells can be edited one by one (error 1, error 2, clicks on SAVE to apply changes ).
  • As @pdelboca mentioned, validation is now produced ONLY ONCE: when the user opens the file in the ODE. It would be great if the validation could be re-run every time the user clicks on SAVE, to see the remaining errors. If this is not possible:
  • If validation is non-automatic (after clicking the SAVE button), the user edits cell/cells, clicks on SAVE and the VALIDATE button (name to be confirmed).

Again, I understand that INDEX and VALIDATE are two different buttons right now. If we cannot make both functions work together, we need to rename INDEX to make it understandable to users. For me, INDEX is a kind of RELOAD/REPROCESS data.

One more thing to add to this discussion. I checked the Data Curator documentation here to check how they addressed this issue, but maybe you see something in the code that is useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: On hold
Development

No branches or pull requests

4 participants