The fs-bq-import-collection
script is for use with the official Firebase Extension Stream Firestore to BigQuery.
The import script (fs-bq-import-collection
) can read all existing documents in a Cloud Firestore collection and insert them into the raw changelog table created by the Stream Firestore to BigQuery extension. The import script adds a special changelog for each document with the operation of IMPORT
and the timestamp of epoch. This ensures that any operation on an imported document supersedes the import record.
You may pause and resume the import script from the last batch at any point.
-
You must run the import script over the entire collection after installing the Stream Firestore to BigQuery extension; otherwise the writes to your database during the import might not be exported to the dataset.
-
The import script can take up to O(collection size) time to finish. If your collection is large, you might want to consider loading data from a Cloud Firestore export into BigQuery.
-
You will see redundant rows in your raw changelog table if either of the following happen:
- If document changes occur in the time between installing the extension and running the import script.
- If you run the import script multiple times over the same collection.
-
You can use wildcard notation in the collection path. Suppose, for example, you have collections
users/user1/pets
andusers/user2/pets
, but alsoadmins/admin1/pets
. If you set${COLLECTION_GROUP_QUERY}
totrue
and provide the collection path as${users/{uid}/pets}
, the import script will import the former two collections but not the later, and will populate thepath_params
column of the big query table with the relevantuid
s. -
You can also use a collectionGroup query. To use a
collectionGroup
query, provide the collection name value as${COLLECTION_PATH}
, and set${COLLECTION_GROUP_QUERY}
totrue
. For example, if you are trying to import/collection/{document}/sub_collection
, the value for the${COLLECTION_PATH}
should be provided assub_collection
. Keep in mind that if you have another sub collection with the same name (e.g./collection2/{document}/sub_collection
, that will be imported too. -
Warning: The import operation is not idempotent; running it twice, or running it after documents have been imported will likely produce duplicate data in your bigquery table.
You can also use a simple collectionGroup query. To use a collectionGroup
query, provide the collection name value as ${COLLECTION_PATH}
, and set ${COLLECTION_GROUP_QUERY}
to true
.
Warning: A collectionGroup
query will target every collection in your Firestore project with the provided ${COLLECTION_PATH}
. For example, if you have 10,000 documents with a sub-collection named: landmarks
, the import script will query every document in 10,000 landmarks
collections.
The import script requires several values from your installation of the extension:
${PROJECT_ID}
: the project ID for the Firebase project in which you installed the extension${BIGQUERY_PROJECT_ID}
: the project ID for the GCP project in which the BigQuery instance is located. Defaults to Firebase project ID.${COLLECTION_PATH}
: the collection path that you specified during extension installation${COLLECTION_GROUP_QUERY}
: uses acollectionGroup
query if this value is"true"
. For any other value, acollection
query is used.${DATASET_ID}
: the ID that you specified for your dataset during extension installation
Run the import script using npx
(the Node Package Runner) via npm
(the Node Package Manager).
-
Make sure that you've installed the required tools to run the import script:
- To access the
npm
command tools, you need to install Node.js. - If you use
npm
v5.1 or earlier, you need to explicitly installnpx
. Runnpm install --global npx
.
- To access the
-
Set up credentials. The import script uses Application Default Credentials to communicate with BigQuery.
One way to set up these credentials is to run the following command using the gcloud CLI:
gcloud auth application-default login
Alternatively, you can create and use a service account. This service account must be assigned a role that grants the
bigquery.datasets.create
permission. -
Run the import script interactively via
npx
by running the following command:npx @firebaseextensions/fs-bq-import-collection
Note: The script can be run non-interactively. To see its usage, run the above command with
--help
. -
(Optional) When prompted, you can enter the BigQuery project ID to use a BigQuery instance located in a GCP project other than your Firebase project.
-
When prompted, enter the Cloud Firestore collection path that you specified during extension installation,
${COLLECTION_PATH}
. -
(Optional) You can pause and resume the import at any time:
-
Pause the import: enter
CTRL+C
The import script records the name of the last successfully imported document in a cursor file called:from-${COLLECTION_PATH}-to-${PROJECT_ID}:${DATASET_ID}:${rawChangeLogName}
, which lives in the directory from which you invoked the import script. -
Resume the import from where you left off: re-run
npx @firebaseextensions/fs-bq-import-collection
from the same directory that you previously invoked the scriptNote that when an import completes successfully, the import script automatically cleans up the cursor file it was using to keep track of its progress.
-
-
In the BigQuery web UI, navigate to the dataset created by the extension. The extension named your dataset using the Dataset ID that you specified during extension installation,
${DATASET_ID}
. -
From your raw changelog table, run the following query:
SELECT COUNT(*) FROM `${PROJECT_ID}.${COLLECTION_PATH}.${COLLECTION_PATH}_raw_changelog` WHERE operation = "IMPORT"
The result set will contain the number of documents in your source collection.