Skip to content

clean up npm downloads data for graphs #1241

@patak-dev

Description

@patak-dev

removing outliers

See for example npmx.dev/package/vite

Image

Probably a rogue CI created the spike you see here in the graph. This isn't significant information, and destroys the scale of the graph.

We should do a clean step so we can remove these kind of spikes. There are several packages that are ruined by them, more when doing full-history graphs. It may require some tweaking, and we may need to reach to some data experts here to help us but the big majority of these spikes are clearly outliers that should be easy to remove.

We could do a few things after the cleanup:

  1. remove that data points and not show them
  2. extrapolate in the missing region so we get a full graph
  3. extrapolate using a dotted line so users know that this part of the graph is estimated

I think we should go with the last option.

remove zeroed data

Another issue in npm data is that sometimes it gets borked for a few days. During these periods, sometimes lasting a few days, npm returns zero. See for example the Nov 9, 2024 incident. We can extrapolate when the data is zerod out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backServer, Data

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions