Skip to content

samkenxstream/SamKenX_public-datasets-pipelines

 
 

Google Cloud Datasets: Data Pipelines and Documentation Set

public-datasets-pipelines

This repository contains the following:

  • Cloud-native, data pipeline architecture for onboarding public datasets to Google Cloud Datasets.
  • Documentation set containing tutorials, samples, and other articles making use of the datasets hosted by the program.

For detailed documentation, please see the Wiki Pages.

Datasets

Here are some of the featured datasets onboarded using this repository/architecture.

Packages

No packages published

Languages

  • Python 77.4%
  • HCL 10.7%
  • Jupyter Notebook 10.0%
  • Dockerfile 1.7%
  • Jinja 0.2%