Skip to content

Commit

Permalink
Clean up documentation (#63)
Browse files Browse the repository at this point in the history
* add content to faq pages

* add autosummary to module pages

* update add-new-schema isntructions

* fix bug in example

* add toctree to api home page

* comment out a TODO

* update CHANGELOG
  • Loading branch information
troyraen committed Jul 22, 2024
1 parent c78449c commit acdf6b4
Show file tree
Hide file tree
Showing 18 changed files with 142 additions and 64 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
### Changed

- Reorganize and update data listings.
- Add FAQ content.
- Clean up docs. Remove 'TODO's. Add autosummary to module pages.

## \[v0.3.10\] - 2024-07-22

Expand Down
24 changes: 12 additions & 12 deletions docs/source/api-reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@
pittgoogle
==========

[FIXME] This lists a subset of classes the user will interact with most.
Is this what we want?
Should at least add some text to clarify.
.. toctree::
:caption: API Reference
:maxdepth: 1

.. autosummary::

pittgoogle.alert.Alert
pittgoogle.bigquery.Table
pittgoogle.pubsub.Consumer
pittgoogle.pubsub.Subscription
pittgoogle.pubsub.Topic
pittgoogle.registry.ProjectIds
pittgoogle.registry.Schemas
alert
auth
bigquery
exceptions
pubsub
registry
schema
types_
utils
7 changes: 6 additions & 1 deletion docs/source/faq/what-is-bigquery.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
What is BigQuery?
=================

Google Cloud's BigQuery is ... # [TODO] I've written this several times before -- find them.
Google `BigQuery <https://cloud.google.com/bigquery/docs/introduction>`__ is a fully managed data warehouse with
a SQL-based analytics engine.
It is optimized for complex analytical queries on large datasets.
It uses a columnar storage format and relational table structure with support for nested and repeated fields.
Data can be loaded via batch jobs or streaming inserts.
Streamed data is typically available to queries immediately.
13 changes: 10 additions & 3 deletions docs/source/faq/what-is-cloud-run.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
What is Cloud Run?
==================
What is Cloud Functions and Cloud Run?
======================================

Google Cloud's Cloud Run is ... # [TODO] I've written this several times before -- find them.
Google `Cloud Functions <https://cloud.google.com/functions/docs/concepts/overview>`__ and
Google `Cloud Run <https://cloud.google.com/run/docs/overview/what-is-cloud-run>`__
are managed-compute services run by Google Cloud.
They both run containers that are configured as HTTP endpoints.
They can be used to process live message streams by attaching Pub/Sub push subscriptions.
Incoming requests (i.e., messages) are processed in parallel.
The number of container instances scales automatically and nearly instantaneously to meet incoming demand.
Differences between the services are essentially tradeoffs between efficiency (at large scale) and ease of use.
7 changes: 6 additions & 1 deletion docs/source/faq/what-is-cloud-storage.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
What is Cloud Storage?
======================

Google Cloud's Cloud Storage is ... # [TODO] I've written this several times before -- find them.
Google `Cloud Storage <https://cloud.google.com/storage/docs/introduction>`__ is Google's object
(file) storage service.
Objects are stored in buckets.
Buckets have a flat namespace (meaning there is no such thing as a directory or folder), but
folder-style functionality is provided by most of the access tools (e.g., console and APIs) which
interpret folder hierarchies from slashes in the object name.
9 changes: 8 additions & 1 deletion docs/source/faq/what-is-pubsub.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
What is Pub/Sub?
=================

Google Cloud's Pub/Sub is ... # [TODO] I've written this several times before -- find them.
Google `Pub/Sub <https://cloud.google.com/pubsub/docs/overview>`__ is a messaging service that
uses the publish-subscribe pattern.
Publishers and subscribers communicate asynchronously, with the Pub/Sub service handling all message storage and delivery.
Publishers send messages to a topic, and Pub/Sub immediately delivers them to all attached subscriptions.
Subscriptions can be configured to either push messages to a client automatically or to wait for a client to make a pull request.
The owner of the topic sets the access rights that determine who is allowed to attach a subscription.
Messages published to a topic prior to a subscription being created will not be available to the subscriber.
By default, Pub/Sub messages are not ordered.
44 changes: 13 additions & 31 deletions docs/source/for-developers/add-new-schema.rst
Original file line number Diff line number Diff line change
@@ -1,33 +1,22 @@
Add a new schema to the registry
================================

[FIXME] This information is old. It needs to be updated to describe to the SchemaHelpers and Schema
child classes.

This page contains instructions for adding a new schema to the registry so that it can be loaded
using :meth:`pittgoogle.Schemas.get` and used to serialize and deserialize the alert bytes.

You will need to update at least the "Required" files listed below, and potentially one or more of the
others. The schema format is expected to be either Avro or Json.
Only Avro and JSON schemas have been implemented so far.

First, a naming guideline:

- Schema names are expected to start with the name of the survey. If the survey has more than one schema,
the survey name should be followed by a "." and then schema-specific specifier(s).

Required
--------

pittgoogle/registry_manifests/schemas.yml
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-----------------------------------------

*pittgoogle/registry_manifests/schemas.yml* is the manifest of registered schemas.

Add a new section to the manifest following the template provided there. The fields are the same as
those of a :class:`pittgoogle.schema.Schema`. The ``helper`` field must point to code that can find and load
the new schema definition; more information below.

[FIXME]
those of a :class:`pittgoogle.schema.Schema`.

Case 1: The schema definition is not needed in order to deserialize the alert bytes. This is true for
all Json, and the Avro streams which attach the schema in the data header. You should be able to use the
Expand All @@ -37,34 +26,27 @@ The rest of the cases assume the schema definition is required. This is true for
which do not attach the schema to the data packet.

Case 2: You can write some code that will get the schema definition from an external repository. You will
probably need to write your own ``helper`` method (more below). Follow ``lsst`` as an example. This is
probably need to write your own helper method (more below). Follow LSST as an example. This is
preferable to Case 3 because it's usually easier to access new schema versions as soon as the survey
releases them.

Case 3: You want to include schema definition files with the ``pittgoogle-client`` package. Follow
``elasticc`` as an example. (1) Commit the files to the repo under the *pittgoogle/schemas* directory. It
is recommended that the main filename follow the syntax "<schema_name>.avsc". (2) Point ``path``
Case 3: You want to include schema definition files with the pittgoogle-client package. Follow
ELAsTiCC as an example. (1) Commit the files to the repo under the *pittgoogle/schemas* directory. It
is recommended that the main filename follow the syntax "<schema_name>.avsc". (2) Point 'path'
at the main file, relative to the package root. If the Avro schema is split into multiple files, you
usually only need to point to the main one. (3) If you've followed the recommendations then the default
``helper`` should work, but you should check (more below). If you need to implement your own helper
helper should work, but you should check (more below). If you need to implement your own helper
or update the existing, do it.

Potentially Required
--------------------

pittgoogle/schema.py
^^^^^^^^^^^^^^^^^^^^

# [FIXME]
*pittgoogle/schema.py* is the file containing the :class:`pittgoogle.schema.Schema` class.
--------------------

If ``schemaless_alert_bytes='false'``, the defaults (mostly null/None) should work and you can ignore
this file (skip to the next section).
*pittgoogle/schema.py* is the file containing the :class:`pittgoogle.schema.Schema` class and helpers.

A "helper" method must exist in :class:`pittgoogle.schema.Schema` that can find and load your new schema
definition. The ``helper`` field in the yaml manifest (above) must be set to the name of this method. If a
A "helper" method must exist in :class:`pittgoogle.schema.SchemaHelpers` that can find and load your new schema
definition. The 'helper' field in the yaml manifest (above) must be set to the name of this method. If a
suitable helper method does not already already exist for your schema, add one to this file by following
existing helpers like :meth:`pittgoogle.schema.Schema.default_schema_helper` as examples. **If your helper
existing helpers like :meth:`pittgoogle.schema.SchemaHelpers.default_schema_helper` as examples. **If your helper
method requires a new dependency, be sure to add it following
:doc:`/main/for-developers/manage-dependencies-poetry`.**

Expand Down
4 changes: 1 addition & 3 deletions docs/source/for-developers/get-alerts-for-testing.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
Get alerts for testing
======================

[FIXME] Everyone needs this, not just developers. Move this page to the user-demos repo.

Setup
-----

Expand All @@ -26,7 +24,7 @@ Here are examples that get an alert from each of our "loop" streams:
loop_sub.touch()
alert = loop.pull_batch(max_messages=1)[0]
alert = loop_sub.pull_batch(max_messages=1)[0]
Get alerts from a file on disk
-------------------------------
Expand Down
3 changes: 2 additions & 1 deletion docs/source/one-time-setup/google-sdk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,5 @@ Instruct gcloud to authenticate using your key file containing
You may want to `create a configuration <https://cloud.google.com/sdk/docs/configurations>`__ if you use multiple projects or want to control settings like the default region.

# [TODO] give instructions to add the ``gcloud auth`` command to the conda activation file and/or to create a configuration and activate it with the conda env.
..
# [TODO] give instructions to add the ``gcloud auth`` command to the conda activation file and/or to create a configuration and activate it with the conda env.
9 changes: 8 additions & 1 deletion pittgoogle/alert.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# -*- coding: UTF-8 -*-
"""Classes for working with astronomical alerts."""
"""Classes for working with astronomical alerts.
.. autosummary::
Alert
----
"""
import base64
import datetime
import importlib.resources
Expand Down
7 changes: 7 additions & 0 deletions pittgoogle/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
:doc:`/one-time-setup/authentication`. The recommendation is to use a
:ref:`service account <service account>` and :ref:`set environment variables <set env vars>`.
In that case, you will not need to call this module directly.
.. autosummary::
Auth
----
"""
import logging
import os
Expand Down
10 changes: 9 additions & 1 deletion pittgoogle/bigquery.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# -*- coding: UTF-8 -*-
"""Classes to facilitate connections to BigQuery datasets and tables."""
"""Classes to facilitate connections to BigQuery datasets and tables.
.. autosummary::
Client
Table
----
"""
import logging
from typing import TYPE_CHECKING, Optional

Expand Down
10 changes: 10 additions & 0 deletions pittgoogle/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
# -*- coding: UTF-8 -*-
"""Exceptions.
.. autosummary::
BadRequest
CloudConnectionError
SchemaError
----
"""
class BadRequest(Exception):
"""Raised when a Flask request json envelope (e.g., from Cloud Run) is invalid."""

Expand Down
12 changes: 11 additions & 1 deletion pittgoogle/pubsub.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# -*- coding: UTF-8 -*-
"""Classes to facilitate connections to Google Cloud Pub/Sub streams."""
"""Classes to facilitate connections to Google Cloud Pub/Sub streams.
.. autosummary::
Consumer
Response
Subscription
Topic
----
"""
import concurrent.futures
import datetime
import importlib.resources
Expand Down
10 changes: 9 additions & 1 deletion pittgoogle/registry.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# -*- coding: UTF-8 -*-
"""Pitt-Google registries."""
"""Pitt-Google registries.
.. autosummary::
ProjectIds
Schemas
----
"""
import importlib.resources
import logging
from typing import Final
Expand Down
17 changes: 12 additions & 5 deletions pittgoogle/schema.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# -*- coding: UTF-8 -*-
"""Classes to manage alert schemas."""
"""Classes to manage alert schemas.
.. autosummary::
Schema
SchemaHelpers
----
"""
import importlib.resources
import io
import json
Expand Down Expand Up @@ -68,11 +76,10 @@ def elasticc_schema_helper(schema_dict: dict) -> "Schema":

@staticmethod
def lsst_schema_helper(schema_dict: dict) -> "Schema":
"""Load the Avro schema definition for lsst.v7_1.alert.
"""Load the Avro schema definition for lsst.v7_1.alert."""
# [FIXME] This is hack to get the latest schema version into pittgoogle-client
# until we can get :meth:`SchemaHelpers.lsst_auto_schema_helper` working.

[FIXME] This is hack to get the latest schema version into pittgoogle-client
until we can get :meth:`SchemaHelpers.lsst_auto_schema_helper` working.
"""
if not schema_dict["name"] == "lsst.v7_1.alert":
raise NotImplementedError("Only 'lsst.v7_1.alert' is supported for LSST.")

Expand Down
9 changes: 8 additions & 1 deletion pittgoogle/types_.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# -*- coding: UTF-8 -*-
"""Classes defining new types."""
"""Classes defining new types.
.. autosummary::
PubsubMessageLike
----
"""
import datetime
import importlib.resources
import logging
Expand Down
9 changes: 8 additions & 1 deletion pittgoogle/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# -*- coding: UTF-8 -*-
"""Classes and functions to support working with alerts and related data."""
"""Classes and functions to support working with alerts and related data.
.. autosummary::
Cast
----
"""
import base64
import collections
import io
Expand Down

0 comments on commit acdf6b4

Please sign in to comment.