Statistics Pipeline Service

This repository contains code that processes NDT data and provides aggregate metrics by day for standard global, and some national geographies. The resulting aggregations are made available in JSON format, for use by other applications.

The stats-pipeline service is written in Go, runs on GKE, and generates and updates daily aggregate statistics. Access is provided in public BigQuery tables and in per-year JSON formatted files hosted on GCS.

Documentation Provided for the Statistics Pipeline Service

(This document) Overview of the stats-pipeline service, fields provided (schema), output formats, available geographies, and API URL structure.
What Statistics are Provided by stats-pipeline, and How are They Calculated?
Geographic Precision in stats-pipeline
Statistics Output Format, Schema, and Field Descriptions
Statistics API URL Structure, Available Geographies & Aggregations

General Recommendations for All Aggregations of NDT data

In general, our recommendations for research aggregating NDT data are:

Don't oversimplify
Aggregate by ASN in addition to time/date and location
Be aware of, and illustrate multimodal distributions
Use histogram and logarithmic scales
Take into account, and compensate for, client bias and population drift

Roadmap

Below we list additional features, methods, geographies, etc. which may be considered for future versioned releases of stats-pipeline.

Geographies

US Zip Codes, US Congressional Districts, Block Groups, Blocks

Output Formats

histogram_daily_stats.csv - Same data as the JSON, but in CSV. Useful for importing into a spreadsheet.
histogram_daily_stats.sql - A SQL query which returns the same rows in the corresponding .json and .csv. Useful for verifying the exported data against the source and to tweak the query as needed by different use cases.

Name		Name	Last commit message	Last commit date
Latest commit History 346 Commits
annotation/exports		annotation/exports
cmd/stats-pipeline		cmd/stats-pipeline
config		config
docs		docs
exporter		exporter
formatter		formatter
histogram		histogram
k8s/data-pipeline		k8s/data-pipeline
maptiles		maptiles
output		output
pipeline		pipeline
statistics		statistics
.gitignore		.gitignore
.travis.yml		.travis.yml
ANNOTATION.md		ANNOTATION.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
compose-annotation-export.yaml		compose-annotation-export.yaml
compose-hopannotation1-export.yaml		compose-hopannotation1-export.yaml
config.json		config.json
cors-settings.json		cors-settings.json
create_statistics_api.sh		create_statistics_api.sh
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistics Pipeline Service

Documentation Provided for the Statistics Pipeline Service

General Recommendations for All Aggregations of NDT data

Roadmap

Geographies

Output Formats

About

Releases 11

Packages

Contributors 5

Languages

License

m-lab/stats-pipeline

Folders and files

Latest commit

History

Repository files navigation

Statistics Pipeline Service

Documentation Provided for the Statistics Pipeline Service

General Recommendations for All Aggregations of NDT data

Roadmap

Geographies

Output Formats

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 5

Languages

Packages