### BigFrames StreamingDataFrame
bigframes.streaming.StreamingDataFrame is a special DataFrame type that allows simple operations and can create streaming jobs to process real-time data and reverse ETL output to Bigtable and Pub/Sub using [BigQuery continuous queries](https://cloud.google.com/bigquery/docs/continuous-queries-introduction).

In this notebook, we will:
* Create a StreamingDataFrame from a BigQuery table
* Do some operations like select, filter and preview the content
* Create and manage streaming jobs to both Bigtable and Pub/Sub

In [1]:
import bigframes
# make sure bigframes version >= 1.12.0
bigframes.__version__

'1.31.0'

In [2]:
import bigframes.pandas as bpd
import bigframes.streaming as bst
bigframes.options._bigquery_options.project = "bigframes-load-testing" # Change to your own project ID
job_id_prefix = "test_streaming_"

In [3]:
# Copy a table from the public dataset for streaming jobs. Any changes to the table can be reflected in the streaming destination.
df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
df.to_gbq("birds.penguins_bigtable_streaming", if_exists="replace")

'birds.penguins_bigtable_streaming'

### Create, select, filter and preview
Create the StreamingDataFrame from a BigQuery table, select certain columns, filter rows and preview the output

In [4]:
sdf = bst.read_gbq_table("birds.penguins_bigtable_streaming")



In [5]:
sdf = sdf[["species", "island", "body_mass_g"]]
sdf = sdf[sdf["body_mass_g"] < 4000]
# BigTable needs a rowkey column
sdf = sdf.rename(columns={"island": "rowkey"})
print(type(sdf))
sdf



<class 'bigframes.streaming.dataframe.StreamingDataFrame'>


Unnamed: 0,species,rowkey,body_mass_g
0,Adelie Penguin (Pygoscelis adeliae),Torgersen,3875.0
1,Adelie Penguin (Pygoscelis adeliae),Torgersen,2900.0
2,Adelie Penguin (Pygoscelis adeliae),Biscoe,3725.0
3,Adelie Penguin (Pygoscelis adeliae),Dream,2975.0
4,Adelie Penguin (Pygoscelis adeliae),Torgersen,3050.0
5,Chinstrap penguin (Pygoscelis antarctica),Dream,2700.0
6,Adelie Penguin (Pygoscelis adeliae),Dream,3900.0
7,Adelie Penguin (Pygoscelis adeliae),Biscoe,3825.0
8,Chinstrap penguin (Pygoscelis antarctica),Dream,3775.0
9,Adelie Penguin (Pygoscelis adeliae),Dream,3350.0


### BigTable
Create BigTable streaming job

In [6]:
job = sdf.to_bigtable(instance="streaming-testing-instance", # Change to your own Bigtable instance name
    table="garrettwu-no-col-family", # Change to your own Bigtable table name
    service_account_email="streaming-testing-admin@bigframes-load-testing.iam.gserviceaccount.com", # Change to your own service account
    app_profile=None,
    truncate=True,
    overwrite=True,
    auto_create_column_families=True,
    bigtable_options={},
    job_id=None,
    job_id_prefix=job_id_prefix,)



In [7]:
print(job.running())
print(job.error_result)

True
None


In [8]:
job.cancel()

True

### Pub/Sub
Create Pub/Sub streaming job

In [9]:
# Pub/Sub requires a single column
sdf = sdf[["rowkey"]]



In [10]:
job = sdf.to_pubsub(
        topic="penguins", # Change to your own Pub/Sub topic ID
        service_account_email="streaming-testing@bigframes-load-testing.iam.gserviceaccount.com", # Change to your own service account
        job_id=None,
        job_id_prefix=job_id_prefix,
    )



In [11]:
print(job.running())
print(job.error_result)

True
None


In [12]:
job.cancel()

True