### BigFrames StreamingDataFrame
bigframes.streaming.StreamingDataFrame is a special DataFrame type that allows simple operations and can create steaming jobs to BigTable and PubSub.

In this notebook, we will:
* Create a StreamingDataFrame from a BigQuery table
* Do some opeartions like select, filter and preview the content
* Create and manage streaming jobs to both BigTable and Pubsub

In [1]:
import bigframes
import bigframes.streaming as bst

In [2]:
bigframes.options._bigquery_options.project = "bigframes-load-testing"
job_id_prefix = "test_streaming_"

### Create, select, filter and preview
Create the StreamingDataFrame from a BigQuery table, select certain columns, filter rows and preview the output

In [3]:
sdf = bst.read_gbq_table("birds.penguins_bigtable_streaming")



In [4]:
sdf = sdf[["species", "island", "body_mass_g"]]
sdf = sdf[sdf["body_mass_g"] < 4000]
# BigTable needs a rowkey column
sdf = sdf.rename(columns={"island": "rowkey"})
print(type(sdf))
sdf



<class 'bigframes.streaming.dataframe.StreamingDataFrame'>




Unnamed: 0,species,rowkey,body_mass_g
0,Adelie Penguin (Pygoscelis adeliae),Torgersen,3875
1,Adelie Penguin (Pygoscelis adeliae),Torgersen,2900
2,Adelie Penguin (Pygoscelis adeliae),Biscoe,3725
3,Adelie Penguin (Pygoscelis adeliae),Dream,2975
4,Adelie Penguin (Pygoscelis adeliae),Torgersen,3050
5,Chinstrap penguin (Pygoscelis antarctica),Dream,2700
6,Adelie Penguin (Pygoscelis adeliae),Dream,3900
7,Adelie Penguin (Pygoscelis adeliae),Biscoe,3825
8,Chinstrap penguin (Pygoscelis antarctica),Dream,3775
9,Adelie Penguin (Pygoscelis adeliae),Dream,3350


### BigTable
Create BigTable streaming job

In [5]:
job = sdf.to_bigtable(instance="streaming-testing-instance",
    table="garrettwu-no-col-family",
    service_account_email="streaming-testing-admin@bigframes-load-testing.iam.gserviceaccount.com",
    app_profile=None,
    truncate=True,
    overwrite=True,
    auto_create_column_families=True,
    bigtable_options={},
    job_id=None,
    job_id_prefix=job_id_prefix,)



In [6]:
print(job.running())
print(job.error_result)

True
None


In [7]:
job.cancel()

True

### PubSub
Create Pubsub streaming job

In [8]:
sdf = sdf[["rowkey"]]



In [9]:
job = sdf.to_pubsub(
        topic="penguins",
        service_account_email="streaming-testing@bigframes-load-testing.iam.gserviceaccount.com",
        job_id=None,
        job_id_prefix=job_id_prefix,
    )



In [10]:
print(job.running())
print(job.error_result)

True
None


In [11]:
job.cancel()

True