Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load data selectively #623

Closed
fkleedorfer opened this issue Jun 10, 2016 · 19 comments
Closed

Load data selectively #623

fkleedorfer opened this issue Jun 10, 2016 · 19 comments
Assignees

Comments

@fkleedorfer
Copy link
Member

Instead of loading everything on startup, determine what needs to be loaded and load only that.

@pheara
Copy link
Collaborator

pheara commented Jul 8, 2016

I think we should really break this down in seperate tasks and merge work on them into a feat_load_selectively branch.

A breakdown of steps might be as follows:

  1. Chat messages: only load when the connection is opened.
  2. Chat messages:
    • only load latest 10 messages
    • "see more"-action
    • spinning-wheel while it's loading
  3. in the list-views: load matches/requests/conversations of a post only when it's folded-out
  4. in the list-views:
    • only load the first 10
    • "see more matches/requests/conversations"-action
    • spinning-wheel while it's loading
  5. in the grid/tiles view for matches:
    • only load the 10 matches that are the latest across all needs (will need implementation on the owner-server as the node doesn't know which needs belong to the same user).
    • "see more"-action
    • spinning-wheel while it's loading
  6. post-overview
    • only load 10 posts with the newest updates (where unread(?) messages trump unread(?) requests trump matches). This probably will need to be implemented on the owner-server as the node doesn't know which needs belong to the same user.
    • "See more posts"-action
    • spinning-wheel while it's loading
  7. feed:
    • load 5 posts with the latest events – and 3 of the latter for each – with prioritization/sorting as specified above
    • "see more events"-action for a given post
    • "see more posts"-action
    • spinning-wheels in all of the above places while they're loading

These intereactions should be introduced as new action-creators that load the data and then dispatch new actions of those new action-types. Some of these actions are triggered by clicks on the "more"-links, some by scrolling (e.g. in the feed)

A hook in or – if the former is not possible – wrapper for the routing-change/stateGo-action-creator would probably be the best place to trigger the loading for the first items visible in a given view. A third option would be to have an agent, that looks at the route, checks if these first items are present and if they aren't, dispatches via the respective action-creator.

The spinning wheels can be created by tagging the ownNeeds/post/connection-objects in the state with a isFetchingMore boolean – which also would be a better way to handle the "Pending…" in create-post.

@pheara pheara added in progress and removed ready labels Jul 8, 2016
@pheara pheara self-assigned this Jul 8, 2016
@pheara
Copy link
Collaborator

pheara commented Jul 13, 2016

The root-level items in the list above should be seperate pull requests.

@pheara
Copy link
Collaborator

pheara commented Jul 13, 2016

The bottle-neck atm: the crawling on the server:
deep requests

counterparts and events that aren't included in the deep request:
the rest

@pheara
Copy link
Collaborator

pheara commented Jul 13, 2016

Two different goals:

  • As few requests as possible (which is important on low-bandwidth networks)
  • minimal crawling effort on server (use the paging feature)
    • we can use the timeoff parameter with the original page-load time to make sure the ranking and thus the slicing is constant
    • we need to make sure to load new items as well however.

@pheara
Copy link
Collaborator

pheara commented Jul 14, 2016

[in progress]
Search for solutions:

  • redux-api-middleware takes care of making http requests and dispatching the results as actions. Probable not useful for as, as most http-requests are made via the linked-data specific utilities in won.js and linkeddata-service-won.js that utilize a client-side rdfstore in the background.
  • redux-saga might simplify our action-creator mess. It works by creating effects (no effects yet) that get evaluated by the middle-ware (effects happen). Architecture-wise the 'Saga's would directly react to actions and generate new actions by themselves. Though I think the benefit we'd gain from the library is rather slim. It would allow us to write async-code in a sync-style similiar to async-await. And it would make it immensly easier to write unit tests for the code that lives in our action-creators atm; however, tests aren't at the front of our priorities right now.
  • graphql roughly does the same as what sparql / the server-side linkeddata-stores we use do.
  • relay uses an architecture where each component knows which data it needs (by attaching a graphQL-query). Every component's query includes it's childrens'. The framework loads the data as needed. It only works with react though, so it can only be a study/reference for us.

@pheara
Copy link
Collaborator

pheara commented Jul 14, 2016

Other thoughts:

  • The fetching-flags should be true in the initial state.
  • Only use reselect-selectors and in a routing-actioncreator-wrapper or agent read that info and load the corresponding parts of the state (?)
  • agent that looks at the route and knows which views need which initial data + dedicated action to fetch more
  • component's dispatches action on-load (component shouldn't have side-effects though)
  • paging info should be in state
  • "which data is required to be loaded" should be specified in the component somehow (and recursively combined) to keep the dependency information all in one place (for reasons of maintainability). onRouteChange this information should be extracted from the angular-directives/views, the respective data loaded and dispatched.

@pheara
Copy link
Collaborator

pheara commented Aug 11, 2016

It probably would make sense to start with the feed as pain-points will most likely show up there first anyway.

@pheara
Copy link
Collaborator

pheara commented Aug 17, 2016

Documentation for our linkeddata-paging API

This was referenced Aug 17, 2016
@pheara
Copy link
Collaborator

pheara commented Aug 17, 2016

#749 introduces the necessary code in won.fetch. However several parameters don't work yet (e.g. deep for event-containers, type, timeof), but at least they can be more easily tested now.

Example usage that fetches page 2 with pages of seven events each:

won.fetch(eventContainerUri, { 
  requesterWebId: reqWebId, 
  pagingSize: 7, 
  queryParams: { p: 2, deep: true }
})
.then(args => {
  const uris = args['@graph'][0]['rdfs:member'].map(e => e['@id']);
  console.log(uris)
})

@pheara
Copy link
Collaborator

pheara commented Aug 17, 2016

Discussion Notes

The notes I took while discussing this issue with @fkleedorfer:

Misc

most difficult cases "only load 10 posts with the newest updates" and "last 10 matches over all needs"

owner has to cache need info (e.g. what has been seen, when the last updates of which type has happened, etc) necessary to implement "only load 10 posts with the newest updates".

does this ultimatively require the owner to cache the entire node-state for that user? how to avoid huge startup-cost, i.e. loading needs for every user? don't load data for users that haven't logged in for a while? only load delta since last poll?

client-to-server = marking things as seen: POST list of eventUris
server-to-client: bloom-filter? list of unread uris (might be long)? list of latest unread + aggregated numbers? a special message if another client-instance marks the event as seen?

When previous requests have been finished and no new action triggered, the client could load preemptively -- e.g. load connection with new unread events (as they might grab the users attention) or after the list of conversations has finished loading, start fetching the first few conversations. We just need to make sure, user-triggered actions always have precedence. The agent can publish multiple actions for each of these loaded data-packages.

Most of the loading should happen through the more declarative construct queries!!

always use deep=true to resolve collections

Note that "the last 10 events" (N) isn't the same as "the last 10 events the user gets to see" (N'), the difference for example being success messages. Until the server-API reflects this issue, N = 3 * N' can be used.

architectural approaches on the client:

  • Complex agent that looks at the route and the data already loaded and fetches missing data as required.
    • agent that makes sure data is loaded based on route. split agent and put depenendency info into components (getRequiredData)? nvm, "only load latest" is a seperate concern -- the components should only deal with translating state to html.
    • Generic load-action? kinda makes redux pointless or duplicates the information for how the state should look like. unless the load action can use a wide range of payload structures and the reducers can deal with all of them.
  • Components that trigger call asynchronous action-creators in their constructors (or watches on the specific routing parameters)
    • loading additional pages will aready happen there
    • using the crawlableQuery-structure (and extending it a bit) will allow us to specify the data-dependencies once, if we extend the selector-functions a bit as well.
  • Elm-style reducer that produce commands + generic fetch-agent? (action, state) → (cmd, state') as opposed to (action, state) → state'
    • see agent for generic points
    • Can we replace our current action-creators with these? The ACs contain far too much logic for something that should just encapsulate side-effects.
    • Atm the messaging-agent breaks the single responsibility principle quite a bit. It should only be a thin binding layer between the websocket and redux – and not do any parsing.
    • commands:
      • construct_query + query + paging-parameters
      • getNode / getNeed + paging-parameters
      • fetch + params
  • FB data-centric apps architecture? components with construct-queries + things-to-load? successive accumulate queries for nested components.

Caching

the events received via the websocket should be pushed to the rdf-store, including rdfs:member entries, and they should be marked as cached.

if dirty: only load beginning with latest member automatically to avoid unnecessary server-load.

selective loading and caching:

  • "last 10": reload anyway
  • "last 10 @ time_1": reload if time_1 > time_0 (+c ?)
  • "TYPE": only load if this type hasn't been loaded yet
  • "resumebefore x1": load if x1 is newer than x0
  • "resumeafter x1": always load
  • "page number x1": always load (as needs get added to page 1 and push others back)

But actually we shouldn't need this smart caching as we get all necessary information through the web-socket. Everything should be either initially fetched at page-load or in case the user clicks "more" be connections/events older than the previously oldest uri – thus a cache-miss anyway. We could implement checks as the ones above to detect non-well-behaved code though. The only exception is connection-messages we send ourselves, that we need to invalidate and fetch after posting to get the timestamps from our owner. So the simplified variant is: "always reload if it's a mutable ressource (i.e. collection)"

@pheara
Copy link
Collaborator

pheara commented Aug 18, 2016

Note: the won.deleteNode(uri) somewhere around linkeddata-service-won.js:~790 might be problematic, as in it's current form it deletes the entire container when fetching a new page.

It's ignored if the data loaded was a partial only, i.e. paging was used. Thankfully the store doesn't add duplicate nodes, so we simply add the triples the usual way. The only possible tripping hazard is blank nodes. These are always given unique identifiers and thus always result in unique triples.

@pheara
Copy link
Collaborator

pheara commented Aug 24, 2016

Here's an example redux-app that uses pagination (on github-API content): https://github.com/reactjs/redux/tree/master/examples/real-world

@pheara
Copy link
Collaborator

pheara commented Aug 24, 2016

The following relay-example on paging might provide inspiration on how to embed crawlable queries in components: https://www.reindex.io/blog/redux-and-relay/#relay-4

@pheara
Copy link
Collaborator

pheara commented Aug 25, 2016

For the decisions:

The Elm-Architecture would need too much refactoring right now or introduce another style and thus increase the complexity of the code-base. Implementing it as an agent would require to specify the data-dependencies twice or traverse the currently visible component tree on every update. Also it can only look at the state, in particular the routing parameters, not actions (e.g. a person clicking on "Show more" or scrolling down. Thus the loading will continue to happen in asynchronous action-creators. But instead of having one automatic call as part of the page-load, the components will call some kind of requestData-action-creator every time (additional) data is required, e.g. while the component is initialized, a critical routing parameter changes or when a person requests more data. These action-creators should be kept as small as possible, though!

Uris will not be marked as cached if they were only fetched partially, i.e. using pagination – even if multiple partial fetches should cover the entirety of a collection.

Components will now not only know what data they need from and where it is in the state (encoded in the select-statements they use) but also where they data is in on the server and in the rdf-graph. This information will be provided to the action-creator mentioned above. Ideally it's encoded purely declaratively and both the crawlableQuery and the select-statement can be drawn from that info

The order of operations is:

  1. components @constructor
  2. that calls actionCreators.ensureLoaded(<dependencies>)
  3. that calls:
    1. executeCrawlableQuery(<query>).then(data => {… dispatch(<dataRequestedAction>) …}
  4. state update
  5. component's select, that is drawn from the components dependency declaration.

@pheara
Copy link
Collaborator

pheara commented Sep 2, 2016

#782 Spinners!

@pheara
Copy link
Collaborator

pheara commented Sep 2, 2016

There's a spaghetti-code snippet in the branch, that makes sure all events of the connection are loaded whenever the conversation is accessed (I'll refactor it to conform to the architecture specified above, once all edge-cases have been found).

One of these problems: Unless all queries to a connection pass along the paging-size, all event-uris get loaded into the rdfstore. Followingly, all queries that go on to load all referenced events will then load all events. Even if a paging-size was specified for the query. This will be a tough one to avoid:

  • Make all crawlableQueries using a paging-size use a temporary rdfstore first and only recurse among the triples contained therein. The obvious downside is, that always everything in the crawlable query will be loaded again, e.g. also the connection when querying for connection plus events.
  • Make the executeCrawlableQuery aware (maybe inside the sparql-query?) of how paging works on the backend, so it only recurses into the nodes the server would have returned. That would mean two implementations which must be kept perfectly idempotent.
  • EDIT: Always fetch deep and with paging; then the rdfs:members in the store should map directly to also loaded member-nodes. In any other case the container shouldn't be resolved anyway. As won.getConnection currently resolves the event-container, all uses of that function need to be changed, so they'd also work with won.getNode directly; that and the handling of the hasEventContainer-property in the connection-reducer.

@pheara
Copy link
Collaborator

pheara commented Sep 7, 2016

The connectionMessages gotten via the websocket should be added to the rdf-store as well (including a rdfs:member triple)

fkleedorfer added a commit that referenced this issue Sep 9, 2016
fkleedorfer added a commit that referenced this issue Sep 9, 2016
@fkleedorfer fkleedorfer removed the ready label Feb 15, 2017
@quasarchimaere
Copy link
Contributor

pretty sure we can close this, since we implemented some of that with the skeleton screens and so on, i think this issue is obsolete now, or should be boiled down to something more specific (as its own issue) @fkleedorfer if you agree please close this issue

@quasarchimaere
Copy link
Contributor

closing this, if we need to do more we will create separate issues for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants