Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate analytics data from npm downloads #17

Merged
merged 5 commits into from
Apr 23, 2020
Merged

Generate analytics data from npm downloads #17

merged 5 commits into from
Apr 23, 2020

Conversation

jasonkarns
Copy link
Member

@jasonkarns jasonkarns commented Apr 23, 2020

Generic Rake extensions:

  • Extends rake's FileTask to be an IO-readable. (and Pathname-able)
  • Extracts logic for any task to represent a JSON file (can be included
    in any IO-readable)
  • Defines HttpResourceTask to be an IO-readable with an HTTP-specific
    definition for its timestamp

Extracted Analytics-specific classes:

  • JsonFileTask's timestamp is dependent on an internal property (unique
    to homebrew's analytics json structure)
  • NpmApiTask is an HttpResourceTask for JSON resources but needs a
    custom timestamp because the api doesn't provide Last-Modified header
  • BrewApiTask is an HttpResourceTask that represents JSON resources

Also includes initial scrape of npm data and bugfix for homebrew's analytics.

This is step 1 for #9

Also for reference: https://github.com/npm/registry/blob/master/docs/download-counts.md

Generic Rake extensions:

- Extends rake's FileTask to be an IO-readable. (and Pathname-able)
- Extracts logic for any task to represent a JSON file (can be included
  in any IO-readable)
- Defines HttpResourceTask to be an IO-readable with an HTTP-specific
  definition for its timestamp

Extracted Analytics-specific classes:
- JsonFileTask's timestamp is dependent on an internal property (unique
  to homebrew's analytics json structure)
- NpmApiTask is an HttpResourceTask for JSON resources but needs a
  custom timestamp because the api doesn't provide Last-Modified header
- BrewApiTask is an HttpResourceTask that represents JSON resources
Apparently, OpenURI#read can only be called once.
The value is not buffered or preserved. Subsequent reads return the
empty string. Somewhere, read was getting called a second time, so the
very first http task was returning "" for the contents.

This change now memoizes the results of read and json so the body is
not lost (and the JSON isn't parsed more often than necessary). This
introduces a tradeoff between memory consumption and performance, since
the json data for all files will be saved in memory. Memory-sensitive
scenarios will want to override this.

Also, improve the filter/map to use grep which filters and maps at the
same time. (Needs only a single pass of the array.)
@jasonkarns jasonkarns merged commit d0cdb4a into master Apr 23, 2020
@jasonkarns jasonkarns deleted the npm branch April 23, 2020 15:07
@jasonkarns jasonkarns mentioned this pull request Apr 24, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant