Is is possible to add a timestamp as parameter?
I am currently working on a script that processes e-mail bounce statuses. This script runs every half hour, but I do have the timestamps of the reception of the bounce station. I'd like to use this timestamp for my metrics, because this will give a better view of what happens when.
So is it possible to send the timestamps (which are probably in the past) as well?
We collect log from the mobile devices too, which arrives when the device connects to internet. Is there a way to specify timestamp parameter.
this is not possible at the moment, the timestamp to send is generated at flush time and not stored with a specific metric. Adding this feature would mean some rather big changes to StatsD. This doesn't mean I'm opposed to having this feature, I think it might be useful in some cases.
so more conceptually speaking: an "offline" mode where you process historical data (not realtime), if we can assume the data is fed in in order (this seems reasonable) we can build an implementation that partitions the time in intervals based on the given timestamps (and then processes the data like usual), as opposed to a realtime clock.
+1 this would be great for those cases where producers, say logstash, have an event but the actual messages have been persisted for processing later.
This would be very useful. We have situations where we wish to correlate real-time data that is pushed immediately to statsd and real-time data that is not pushed immediately to statsd (however has timestamps to indicate when it was created).
+1 This would be great!
i started working on an implementation for historical processing in statsdaemon, but realized that doing this (i.e. adding timestamp support to a statsd network service) is not the right way to do it.
remember, a statsd network service has the responsability to send metrics to a backend (graphite) with the summarized stats, as soon as it can, but if it has to keep listening for old timestamps, that won't work very well. out of order input could mess things up as well. you could alleviate some of it by marking session boundaries but then realize that udp is an unreliable protocol.
I think a better approach would be to make a statsd "library" instead of network service, where you just iterate over your logs and emit the metrics into the library, and now you actually can call api functions to close timeframes (once you're sufficiently past it, for example)
@Dieterbe , you might want to look into https://github.com/elasticsearch/logstash
In my opinion, there is another important use case where this feature is needed. I want to contact several webservices and download a huge XML file of metrics and their generation timestamp from each one. Some servers cache the metrics they generate for a certain period and other throttle down the download speed, so it might take a minute to download the whole file. For these reasons I am not interested in the timestamp when the download finished, but rather in the timestamp that comes in the XML payload. And I want to use statsd because of the UDP protocol.
What makes adding this feature so difficult ? I do not know how the program is structured but what is needed is just to parse another input parameter and send with the other data for gauge metrics.
You may want to take a look at https://github.com/etsy/logster which can be used to parse log files and send metrics to graphite, using the timestamp from the log file.