Skip to content
This repository has been archived by the owner on Apr 28, 2022. It is now read-only.

Optimize protocol for server or client? #5

Open
jesper-sjovall opened this issue Oct 24, 2013 · 12 comments
Open

Optimize protocol for server or client? #5

jesper-sjovall opened this issue Oct 24, 2013 · 12 comments

Comments

@jesper-sjovall
Copy link
Contributor

A question I have from the issues #2 how to allow custom result on query base..

What is our goal in discussion of optimization, I can see two mindset to
handle the problem.

  • optimize for easily of reading data from the system
    This will do this easy for the clients to fetch and handle result.
    But will increase the load of the server, because on the requested data is harder to predict.
  • optimization to easily export resulted from the server with minimal load.
    That allow the server to create a well define result that can be cached on disk for best performance but limit the customize of the result that can be fetch.
    For example the server export result on day to day base.
@morozstepan
Copy link

Hi!
I defenitely appreciate putting feed files to disk, and not going to database during requests.
So, all that offset'n'limit theme is good for small data, but it's difficult to keep it consistent.
Let's throw out these params, always return fixed number of apps, and give only link to next page?
If there is few applications in feed, I believe, several thousands may be placed into single page.
What do you think?

@jesper-sjovall
Copy link
Contributor Author

Yeh, As I say before this is a crossroad, in one end we keep it as it is now, in the other end we optimize the flow for the server.
And as you say, if we do this way, we want to remove all except "packagename" and "date", that do in short
the only selection a client can do is select the app and the day the client want data for.
And also use "nextPage" if the result is more then say 100 or 500 or 1000 results.

I think this is good If Oleg also have a opinion here, as this would affect Open AEP very much in this case.

@oorlov
Copy link
Contributor

oorlov commented Dec 2, 2013

I have to agree with suggestion to simplify everything:

  • use single limit about 1000 (assume 1 purchase is about 2k of plain text, ~500b compressed, so 0.5M of data per 1 request if gzip is enabled - easy even if your server is running on Android device :)

To track and generate sequential id for every purchase there should be a counter that is SPOF, so let's use
softer requirements for order:

  • file A can include purchases for second X and second X+1 and reference to file B that also included purchases for second X and file X+1 (data could be collected on multiple servers from different countries so service don't need to wait until all nodes provide their own piece for particular second)
  • file A created for day M must not contain purchases from day M-1 and from day M+1
  • file A that contains reference to file A+1 is considered read-only. If reference to file A+1 is not specified - file A can be changed (requires to re-read for next time)

Would it work?

I suggest not to wait for my opinion for a long time. If several specialists agreed on some point - lets just do it. It's better to create new version of protocol later if it's not good enough

@jesper-sjovall
Copy link
Contributor Author

Ok.

1000 item per request is good, as your say this will scale very well on both big and small servers.
I suggest for handle the problem with time and next items is the system only allow use to fetch data from yesterday.
That automatic will solve the problem with fetch data that is change in real time.
(I guess we do not have a lot of download or payments in the past)

From this I will create a update version of the protocol.

@jesper-sjovall
Copy link
Contributor Author

Please check the update version of the spef here.
https://github.com/jesper-sjovall/OpenAEP/blob/a64c5c95c02cfdf02ba05c382c933c59915f7b27/specification/openaep_spec_1_0.md
I have in short remove a loot of select options and is now only left "package" and "date" options.

@oorlov
Copy link
Contributor

oorlov commented Dec 2, 2013

I suggest not to restrict access to today's download and let Source store decide how often it needs to retrieve data and how fresh data should be. If you don't want to rescan last file all the time - download only previous day. If you want to show application growth to developer immediately - request last updates every minute.

It's also possible to use reference in last file for "end-of-day" flag. When all your servers finished register purchases for last day - put this flag in last file of the day and nobody will poll you after that.

@oorlov
Copy link
Contributor

oorlov commented Dec 2, 2013

Pull-requests are easier for review and merge, don't hesitate to add one

@jesper-sjovall
Copy link
Contributor Author

Yes, This is possible to allow the system to fetch data from not end days.
And use a "end-of-day" flag to signal different in not yet complete data and complete data that has no more data to fetch in the future

@jesper-sjovall
Copy link
Contributor Author

Update version of the spef.
#10

@morozstepan
Copy link

Hi!
I suggest not to use special flags like "end of day" and to write records, sorted by time, from newer to older.
So, the client just crawls within the files until it sees the record (or file), which is already processed.
If we support ETags, it will be much easier - client just verify the checksum and that's it.
Let me try to prepare renewed specification and show you in another pull request.

@jesper-sjovall
Copy link
Contributor Author

Accept. All response is sort in newer to older. This will do a "end of day" flag depicted.
Yes good to use ETag if supported, My problem is I don't know if the server support this.
In short use of ETag must be optimal to use.

I think we now has solved all key points and await a update of the spef to be commit.

@jesper-sjovall
Copy link
Contributor Author

Hi morozstepan, how is the work on the renewed specifications progress, is this something I can help your width.
Or will this be more easy for your if I create a pull request for the specifications that your can check?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants