-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a vector of uids to navigate the units #8
Comments
With all the merged PRs into the feature branch, the outstanding issues are:
|
Missing separators between stores in edit mode is the last missing feature that prevents new code from being merged into master. Here is how I suggest us to have it implemented:
Example:
{
"uids": [1001, 1002, 1025, 2007, 2250, 3140, 4096],
"headers": [1001, 2007, 3140, 4096]
} Internally in the client code, "headers" needs to be converted into an object with the corresponding keys for fast lookups.
The returned result will look something like: {
"1001": {
"source": "...",
"source_lang": "...",
"target": "...",
"target_lang": "...",
"project_id": "foobar",
"project": "Foo Bar",
"path": "/path/to/file1.html"
},
"1002": {
"source": "...",
"source_lang": "...",
"target": "...",
"target_lang": "..."
},
"1025": {
"source": "...",
"source_lang": "...",
"target": "...",
"target_lang": "..."
},
"2007": {
"source": "...",
"source_lang": "...",
"target": "...",
"target_lang": "...",
"project_id": "foobar",
"project": "Foo Bar",
"path": "/path/to/file2.html"
}
} Notice that units "1001" and "2007" have more properties. Once this data gets into a React component, it will not only render the view row by itself, but also a separator row above it. |
This commit adds rendering of header rows between stores. This is WIP, because it doesn't display any text in the header, just the blank separators; XHR and backend need to be adjusted to pass extra header data for rendering. This is a partial implementation of #8 (comment)
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
I have checked the implementation and while it works for our current use case, I have the feeling the fact that we currently want to display this information in the header is leaking through the implementation: we are mixing what information we get (unit's store-related metadata) with how and where we display it (header). It won't be late until we want to use the file-related information we are fetching in other places; for instance in the header of the editing unit or anywhere else in the UI. How can units know in which language/project/file they are? As far as I see there is no direct way to know that without extra code, as this information is only available to the first unit in the store for the result set. One way to avoid the leakage would be to return the list of uids grouped in lists, e.g. the example above would become: {
"uids": [[1001, 1002, 1025], [2007, 2250], [3140], [4096]]
} Likewise, this endpoint could potentially already provide store-related metadata which will be shared across the units in each group: [
{"source_lang": "ab", "target_lang": "cd", <...metadata...>, "uids": [1002, 1002, 1025]},
{"source_lang": "ab", "target_lang": "cd", <...metadata...>, "uids": [2007, 2250]},
{"source_lang": "ab", "target_lang": "cd", <...metadata...>, "uids": [3140]},
{"source_lang": "ab", "target_lang": "cd", <...metadata...>, "uids": [4096]}
] So at the expense of adding a few bytes more in the initial response, the |
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
I force-pushed the fixes to the concerns/issues outlined above and in the line comments, except for the last part about using nested lists. Speaking of the API:
We definitely don't want to do this, because this would prefetch header data for the entire slice (up to 1000 uids) and in the worst case scenario return the header data for up to 1000 stores. We only want to fetch header data that is about to be visible to the user, and usually we do this just one row at a time, as the user advances through the list of units, allowing us to break payload in smaller chunks. Grouping units as a list of lists will definitely be more compact when sending the vector of uids from the server to the client, so I'll experiment with this. |
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
Latest pushes introduce this format. |
A bit aside of the point I was making, I think we want to seriously reconsider the limit of 1000 units — it's pretty high, probably way above the threshold of the amount of units a translator will work continuously without taking a break. Halving it wouldn't be bad. |
We should probably make a distinction between when/how-much data is fetched, and how it is stored. With the current implementation, in all cases regardless of units being part of the same language pair/project/store or not, the network request is fetching All of this is then being accumulated in |
The idea was exactly to make sure people won't reach the end of the list if the list is long, giving an impression that we're not limiting them in any way. |
I agree that source / target language belong to the store, not to the unit, and don't have to be transferred with each view row. You also have correctly noted that we only store this information for the units that are visible on the screen (+ a few extra units at the edges) and clean unused rows immediately as we move up or down the list, so the overall impact of optimizing this (both in terms of transferred data and in-client memory use) is arguably small. So I'd say this is not worth the trouble for now unless you have a better picture what this optimization will bring to the table. |
Let's estimate that a translator translates 2000 source words per day; if we analyze the data of our most proficient translators, this doesn't happen that often. In order to get 500 units, that would be about 4 words per unit, which is likely to be lower than our average number of words per unit (I haven't done the math, but skimming through the data I would say so). Even if this was a constant daily throughput, it is highly unlikely translators will go through all these units in one go, as strings are scattered over different projects and allows them to group and logically split the work. Personally, on my 10 years+ experience as a localizer, I have never gone through 500 units in one go. I am pretty sure full-time professional localizers are more efficient that I am, however stepping through 500 consecutive units is as well enough reason to encourage people to take a break. So with 500 units, and unless I missed some important detail when estimating roughly, I will need a very convincing argument to think they are not enough number of units because people will be hitting the end of the list. |
I wonder what is special about the In this regards, I have made a few tests to measure the impact of changing the data the When it comes to the DB performance, there is no negligible difference in my tests and all queries run below 5ms time. This is because even if we select a few more/less fields, the tables that need to be joined for retrieving such fields are joined anyway, because of permission checking etc. Regarding network overhead, I measured two types of requests (the initial one, with a sample of 35 units; consequent ones, with a sample of 5 units) with payload sizes for the lightest and the heaviest shapes which includes all fields.
So the network impact is less of a concern with gzip, which compresses duplicates pretty well. This means we can opt for using the heavy variant, which ensures a straightforward endpoint while not compromising performance. And this is not attached to any UI concepts, and there is no need to use extra query string parameters. In terms of storing the data, having store metadata in a separate entity is not only about having a smaller memory footprint (which is good regardless of any internal garbage cleanups), but also about being able to query the data differently, enabling future use cases, and not have it tied to specific UI views (in this aspect, This is not very different from a traditional backend, where data is split in logical entities and managed by relations: the list of uids we get at the beginning would allow us to build the relation between units and stores, and unit retrieval payloads would allow us to normalize the data into both entities. |
These are non-goal, actually. these XHRs are not a part of some generic public API — they are there to support particular client-side UI with the data the client needs. If we have new requirements from the client side in the future, we will update our API as needed.
There's no trouble now. What our API does is it serves exactly the right data needed to render regular rows (which includes source/target string and source/target language) and, for header rows, serves extra data required to render those. No more, no less. Yes, the data is somewhat denormalized, which makes it easy to work with without complicating the client-side code further. First you're saying that the data needs to be normalized, but then suggest another — completely opposite — route of denormalizing all the data and send headers with every view unit knowing that they won't be used, anyway. And all of this to avoid adding a notion of two kinds of rows (ones with and without headers) into the internal API contract which sole purpose is to support the particular edit view. Both approaches are the extremes that have their downsides. Again, what we currently have is we serve exactly the data needed to render rows, and we don't complicate the client and backend code much. |
After voice conversation we decided to settle on the following:
|
Done in 68f88a3 |
I addressed both lead-in row label positioning and an ability to sort search results. |
This commit adds rendering of header rows between stores. This is an implementation of serge-community#8 (comment)
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
This commit adds rendering of header rows between stores. This is an implementation of #8 (comment)
This is now fixed and part of v0.2.0. |
Bugs like translate/pootle#4983 show that the current unit retrieval approach can't reliably work by design. We want to experiment with reverting the offset-related changes and at the same time use this as an opportunity to rethink the API between the client and the server and make this a foundation to the future translation UI improvements, e.g. continuous (non-paginated) scrolling. We also want to be considerate of the memory used by the browser, because this was the primary concern of the Pootle team with the old code.
Moving relevant bits from this comment here:
Let's keep the snapshot of uids on the client and not redo any search. It would work as follows:
The text was updated successfully, but these errors were encountered: