Include timestamps in download data #2416

purcell · 2015-01-17T20:13:35Z

I'd like to be able to show charts of download activity over time for individual packages, which would involve dumping per-package json which includes timestamps (or at least dates). Do we have all the old logfiles in order to produce this? The sqlite aggregation would need to be changed, of course.

milkypostman · 2015-01-17T22:56:15Z

We have it. There is no SQLite that I know of going on. I'd have to rewrite
it to use SQLite but can do that easily. Cool idea. I'll work on the
backend part.

On Saturday, January 17, 2015, Steve Purcell notifications@github.com
wrote:

I'd like to be able to show charts of download activity over time for
individual packages, which would involve dumping per-package json which
includes timestamps (or at least dates). Do we have all the old logfiles in
order to produce this? The sqlite aggregation would need to be changed, of
course.

—
Reply to this email directly or view it on GitHub
#2416.

purcell · 2015-01-18T09:29:52Z

I was pretty sure the logs get slurped into sqlite, and then results queried from there get splooged into downloads.json.

milkypostman · 2015-01-18T17:00:07Z

Did you make that change? Cause I didn't. We talked about it but then I
fixed storing all dates from every IP. That fixed my naive momory only
version.

On Sunday, January 18, 2015, Steve Purcell notifications@github.com wrote:

I was pretty sure the logs get slurped into sqlite, and then results
queried from there get splooged into downloads.json.

—
Reply to this email directly or view it on GitHub
#2416 (comment).

purcell · 2015-01-18T17:05:24Z

Looks like you did it in October: 8acf077

LOL.

milkypostman · 2015-01-18T17:06:50Z

OMG MY MEMORY IS GOING!!!!! Arrrgh

Ok

On Sunday, January 18, 2015, Steve Purcell notifications@github.com wrote:

Looks like you did it in October: 8acf077
8acf077

LOL.

—
Reply to this email directly or view it on GitHub
#2416 (comment).

purcell · 2015-01-18T17:19:01Z

You must have been on a 3-day coding vision quest at the time, and now you have only patchy memories...

milkypostman · 2015-01-18T17:22:36Z

Yes, now I remotely remember a conversation about how I probably didn't
need to use SQLite but would just do it anyways cause it seemed like the
better way to do it.

On Sunday, January 18, 2015, Steve Purcell notifications@github.com wrote:

You must have been on a 3-day coding vision quest at the time, and now you
have only patchy memories...

—
Reply to this email directly or view it on GitHub
#2416 (comment).

purcell · 2015-01-18T17:40:18Z

And you were right!

cpitclaudel · 2016-03-28T22:21:51Z

Hi people,

Some thoughts about this:

The data doesn't seem to currently be available: the downloads database to only store packages and IP addresses, but not download times (is that right?). If so, then the schema would need to be changed, and an extra column added.
This would probably be separate from the existing download_counts file; the download data for every package will be too large to download in full upon visiting melpa.org (loading the website already sends every client about 1MB worth of data, though gzipped and thus reduced to about 200k). Given the design of the current website (no dynamic queries IIUC), it could make sense to have one "download-stats" file per package. Each of these files could be a simple json dictionary of date → count.

Here is some code that shows a prototype of how this might work:

import collections
import datetime
import json
import random
import sqlite3
import string
import time

CREATE_QUERY = "CREATE TABLE pkg_ip_time (pkg STRING NOT NULL, ip INT, dl_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP)"
COUNT_QUERY = "SELECT pkg, DATE(dl_time), COUNT(ip) FROM pkg_ip_time GROUP BY pkg, DATE(dl_time)"

def main():
    connection = sqlite3.connect(':memory:')
    cursor = connection.cursor()

    print("0. Creating table")
    cursor.execute(CREATE_QUERY)

    print("1. Creating random records")
    packages = string.ascii_letters
    stats = [(random.choice(packages), random.randint(0, 2 ** 32),
              datetime.datetime.now() - datetime.timedelta(seconds=random.randint(0, 60*60*24*30)))
             for _ in range(100*1000)]

    print("2. Inserting created records")
    cursor.executemany("INSERT INTO pkg_ip_time VALUES (?, ?, ?)", stats)
    connection.commit()

    print("3. Generating statistics")
    start = time.time()
    stats_per_package = collections.defaultdict(dict)
    for package, date, downloads in cursor.execute(COUNT_QUERY):
        stats_per_package[package][date] = downloads
    for package, stats in stats_per_package.items():
        with open("melpa-{}-stats.json".format(package), mode="w") as stats_file:
            json.dump(stats, stats_file)
    print(time.time() - start)

main()

I imagine the dl_time field would in fact allow NULLS, so as to permit importing the existing data. On the client side, I imagine we'd want to load an extra charting library, and just feed it that data :)

Let me know if there are other things I can look into to help :)

cpitclaudel · 2016-07-12T11:23:23Z

Ping? I'd love to get such a feature.

purcell added feature website labels Jan 21, 2015

purcell mentioned this issue Mar 28, 2016

Timeline of downloads? #3715

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include timestamps in download data #2416

Include timestamps in download data #2416

purcell commented Jan 17, 2015

milkypostman commented Jan 17, 2015

purcell commented Jan 18, 2015

milkypostman commented Jan 18, 2015

purcell commented Jan 18, 2015

milkypostman commented Jan 18, 2015

purcell commented Jan 18, 2015

milkypostman commented Jan 18, 2015

purcell commented Jan 18, 2015

cpitclaudel commented Mar 28, 2016

cpitclaudel commented Jul 12, 2016

Include timestamps in download data #2416

Include timestamps in download data #2416

Comments

purcell commented Jan 17, 2015

milkypostman commented Jan 17, 2015

purcell commented Jan 18, 2015

milkypostman commented Jan 18, 2015

purcell commented Jan 18, 2015

milkypostman commented Jan 18, 2015

purcell commented Jan 18, 2015

milkypostman commented Jan 18, 2015

purcell commented Jan 18, 2015

cpitclaudel commented Mar 28, 2016

cpitclaudel commented Jul 12, 2016