Skip to content

Commit

Permalink
docs: reorganize ordering of connect data (#1726)
Browse files Browse the repository at this point in the history
  • Loading branch information
ryscheng committed Jun 27, 2024
1 parent 7f07b58 commit 9f76dc3
Show file tree
Hide file tree
Showing 8 changed files with 30 additions and 12 deletions.
3 changes: 2 additions & 1 deletion apps/docs/docs/contribute/connect-data/airbyte.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: 🏗️ Connect via Airbyte
sidebar_position: 2
sidebar_position: 7
sidebar_class_name: hidden
---

## Replicating external databases
Expand Down
4 changes: 4 additions & 0 deletions apps/docs/docs/contribute/connect-data/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: 🏗️ Crawl an API
sidebar_position: 3
---
3 changes: 2 additions & 1 deletion apps/docs/docs/contribute/connect-data/cloudquery.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: Connect via CloudQuery
sidebar_position: 3
sidebar_position: 8
sidebar_class_name: hidden
---

[CloudQuery](https://cloudquery.io) can be used to integrate external data sources
Expand Down
4 changes: 4 additions & 0 deletions apps/docs/docs/contribute/connect-data/dagster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: 🏗️ Custom Dagster Assets
sidebar_position: 5
---
4 changes: 4 additions & 0 deletions apps/docs/docs/contribute/connect-data/database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: 🏗️ Replicate a Database
sidebar_position: 2
---
17 changes: 10 additions & 7 deletions apps/docs/docs/contribute/connect-data/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,21 @@ We're always looking for new data sources to integrate with OSO and deepen our c
There are currently the following patterns for integrating new data sources into OSO,
in order of preference:

1. [BigQuery public datasets](./bigquery.md): If you can maintain a BigQuery public dataset, this is the preferred and easiest route.
2. [Airbyte plugins](./airbyte.md): Airbyte plugins are the preferred method for crawling APIs.
3. [Database replication via Airbyte](./airbyte.md): Airbyte maintains off-the-shelf plugins for database replication (e.g. from Postgres).
4. [CloudQuery plugins](./cloudquery.md): CloudQuery offers another, more flexible avenue for writing data import plugins.
5. [Files into Google Cloud Storage (GCS)](./gcs.md): You can drop Parquet/CSV files in our GCS bucket for loading into BigQuery.
6. Static files: If the data is high quality and can only be imported via static files, please reach out to us on [Discord](https://www.opensource.observer/discord) to coordinate hand-off. This path is predominantly used for [grant funding data](./funding-data.md).
1. [**BigQuery public datasets**](./bigquery.md): If you can maintain a BigQuery public dataset, this is the preferred and easiest route.
2. [**Database replication**](./database.md): Replicate your database into an OSO dataset (e.g. from Postgres).
3. [**API crawling**](./api.md): Crawl an API by writing a plugin.
4. [**Files into Google Cloud Storage (GCS)**](./gcs.md): You can drop Parquet/CSV files in our GCS bucket for loading into BigQuery.
5. [**Custom Dagster assets**](./dagster.md): Write a custom Dagster asset for other unique data sources.
6. **Static files**: If the data is high quality and can only be imported via static files, please reach out to us on [Discord](https://www.opensource.observer/discord) to coordinate hand-off. This path is predominantly used for [grant funding data](./funding-data.md).
7. (deprecated) [Airbyte plugins](./airbyte.md): Airbyte plugins are the preferred method for crawling APIs.
8. (deprecated) [CloudQuery plugins](./cloudquery.md): CloudQuery offers another, more flexible avenue for writing data import plugins.

We generally prefer to work with data partners that can help us regularly
index live data that can feed our daily data pipeline.
All data sources should be defined as
[software-defined assets](https://docs.dagster.io/concepts/assets/software-defined-assets) in our Dagster configuration.

ETL is the messiest, most high-touch part of the OSO data pipeline.
Please reach out to us for help on [Discord](https://www.opensource.observer/discord).
Please reach out to us for help on
[Discord](https://www.opensource.observer/discord).
We will happily work with you to get it working.
1 change: 1 addition & 0 deletions apps/docs/docs/integrate/fork-pipeline.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: 🏗️ Fork the Data Pipeline
sidebar_position: 6
sidebar_class_name: hidden
---

:::warning
Expand Down
6 changes: 3 additions & 3 deletions apps/docs/docs/integrate/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ That means all source code, data, and infrastructure is publicly available for u

- [**Get Started**](../get-started/index.mdx): to setup your Google account for data access and run your first query
- [**Data Overview**](./overview/index.mdx): for an overview of all data available
- [**BigQuery Studio Guide**](./query-data.mdx): to quickly query and download any data
- [**API access**](./api.md): to integrate OSO metrics into a live production application
- [**SQL Query Guide**](./query-data.mdx): to quickly query and download any data
- [**Python notebooks**](./python-notebooks.md): to do more in-depth data science and processing
- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model
- [**Connect OSO to 3rd Party tools**](./3rd-party.mdx): like Hex.tech, Tableau, and Metabase
- [**API access**](./api.md): to integrate OSO metrics into a live production application
- [**Fork the data pipeline**](./fork-pipeline.md): to setup your own data pipeline off any OSO model
- [**oss-directory**](./oss-directory.md): to leverage [oss-directory](https://github.com/opensource-observer/oss-directory) data separate from OSO

0 comments on commit 9f76dc3

Please sign in to comment.