Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data in S3 archive? #80

Closed
mrdavidlaing opened this issue Jun 2, 2012 · 6 comments
Closed

Missing data in S3 archive? #80

mrdavidlaing opened this issue Jun 2, 2012 · 6 comments
Assignees

Comments

@mrdavidlaing
Copy link
Collaborator

I'm trying to plot a latency scatterplot from the raw data collected on May 23rd ( when the new TradingAPI was released, and we see the increase in measured latency )

I'm pulling my data from https://s3.amazonaws.com/cityindex.appmetrics/CiapiLatencyCollector/2012-05/*.zip

However, my plot seems to be missing data for a bunch of dates:

Is it possible that S3 is missing some source data?

@ghost ghost assigned fandrei Jun 2, 2012
@fandrei
Copy link
Owner

fandrei commented Jun 4, 2012

Yes, it's possible.
Backup sends file to S3 when it's last update is 7 days old, and before this date all records from that file are not present in S3 storage

@mrdavidlaing
Copy link
Collaborator Author

Could we make the archiving happen daily?

On 4 June 2012 04:16, fandrei <
reply@reply.github.com

wrote:

Yes, it's possible.
Backup sends file to S3 when it's last update is 7 days old, and before
this date all records from that file are not present in S3 storage


Reply to this email directly or view it on GitHub:
#80 (comment)

David Laing
Open source @ City Index - github.com/cityindex
http://davidlaing.com
Twitter: @davidlaing

@fandrei
Copy link
Owner

fandrei commented Jun 7, 2012

Sure, or even more frequently. But the problem is that file won't be sent to S3 until appropriate user session is finished, and some sessions are many days long. It's made this way because it's impossible to "append" data to S3 object, only completely rewrite it. But I can change this behavior.

@mrdavidlaing
Copy link
Collaborator Author

I think we should ensure that sessions recycle every few hours, to better
simulate real users. Also, I we have sessions running of days, we're going
to start triggering the anti data theft system I'm support to be
developing...

On Thursday, June 7, 2012, fandrei wrote:

Sure, or even more frequently. But the problem is that file won't be sent
to S3 until appropriate user session is finished, and some sessions are
many days long. It's made this way because it's impossible to "append" data
to S3 object, only completely rewrite it. But I can change this behavior.


Reply to this email directly or view it on GitHub:
#80 (comment)

David Laing
Open source @ City Index - github.com/cityindex
http://davidlaing.com
Twitter: @davidlaing

@fandrei
Copy link
Owner

fandrei commented Jun 7, 2012

As "session" here I mean AppMetrics session, not CIAPI

@fandrei
Copy link
Owner

fandrei commented Jun 19, 2012

Getting raw data directly from AppMetrics server:
https://github.com/fandrei/AppMetrics/wiki/Getting-raw-data

@fandrei fandrei closed this as completed Jun 19, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants