# Tutorial 3: Connecting to Your Data Source

<div class="alert alert-block alert-info"> <b>Before we get started: </b> 
    <ul style="list-style-type: none;margin: 0;padding: 0;">
        <li>✍️ To run this notebook, you need to have Ponder installed and set up on your machine. If you have not done so already, please refer to our <a href="https://docs.ponder.io/getting_started/quickstart.html">Quickstart guide</a> to get started.</li>
        <li>📖 Otherwise, if you're just interested in browsing through the tutorial, keep reading below!</li>
    </ul>
</div>

In [1]:
import ponder; ponder.init()
import modin.pandas as pd
from google.cloud import bigquery
from google.cloud.bigquery import dbapi
from google.oauth2 import service_account
import json
bigquery_con = dbapi.Connection(bigquery.Client(credentials=service_account.Credentials.from_service_account_info(json.loads(open("../credential.json").read()),scopes=["https://www.googleapis.com/auth/bigquery"])))

2023-05-11 19:48:21 - Creating session f63zYxNE0LCTfkmsJEu2_9j5kadjjtlHKHioGUga7g


Before we start can start our analysis, we need to first connect to a data source. Ponder currently supports `read_csv` for operating on CSV files and `read_sql` for operating on tables that are already stored in BigQuery.

## ``read_sql:``Working with existing tables

To work with data stored in an existing table in BigQuery, we use the ``read_sql`` command and provide the name of the table ``PONDER_CUSTOMER`` and pass in the connections object we created earlier.

In [2]:
df = pd.read_sql("TEST.PONDER_CUSTOMER", bigquery_con)

Now that we have a Ponder DataFrame that points to the ``PONDER_CUSTOMER`` table in your data warehouse, you can now work on your DataFrame ``df`` just like you would typically do with any pandas dataframe – with all the computation happening on your warehouse!

In [3]:
df

Unnamed: 0,C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
0,60082,Customer#000060082,"x3V6vEbLSeUjYdjS1MvR2,u4gB0S 9d8UEJ",0,10-729-863-1818,3645.47,BUILDING,the accounts. furiously unusual
1,60080,Customer#000060080,"g7cKdEj2mzUQLSKFFnWsmL,3GaOIrBmfi",0,10-192-161-6631,689.24,BUILDING,"slyly pending, permanent packages. special fo..."
2,60018,Customer#000060018,lQ8PB9FGW53C36XQX2uq0,0,10-310-354-8579,5759.83,BUILDING,ckly bold deposits. carefully bold accounts in...
3,60062,Customer#000060062,"1SI,x4F9 zO22 F7OGksMBSUWu5AUpP",0,10-604-525-3386,6210.99,FURNITURE,ons cajole blithely. bold theodolites along
4,60022,Customer#000060022,"I2XoZQLC,63R3zIG z6i3VMCS",0,10-513-498-1045,-759.74,FURNITURE,across the blithely ironic sentiments. thinly...
...,...,...,...,...,...,...,...,...
95,60058,Customer#000060058,"X9NS,0Ddki",23,33-146-680-6559,6672.12,MACHINERY,ess requests. special requests wake blit
96,60079,Customer#000060079,dwwsJWhDr0fnRJnyhe6gtls,24,34-197-192-3607,3329.55,BUILDING,ly special somas poach carefully. furiously un...
97,60059,Customer#000060059,"dZISBokE9NWaz13 b5WbOHrd8DifA,e2yict0",24,34-348-323-9173,2337.46,HOUSEHOLD,ndencies. excuses sleep. quickly daring dugout...
98,60033,Customer#000060033,fwvb5ua8ZcB,24,34-142-708-2404,-493.59,MACHINERY,lithely final packages. quickly regular reques...


<div class="alert alert-block alert-info"> <b>Note: </b> <span> Unlike in pandas, the data ingestion (read_*) command in Ponder does not actually load in the data into a dataframe in memory. Instead, you can think of the Ponder DataFrame acting as a pointer to the table in BigQuery that stores the data and relays all the operations to be performed on the tables in BigQuery. </span></div>

Going beyond ``read_sql``, we need to configure Ponder to leverage the BigQuery connection that we established earlier. 

In [2]:
ponder.configure(bigquery_dataset='TEST', default_connection=bigquery_con)

## ``read_csv:`` Working with CSV files

Then, we can use the ``read_csv`` command to feed in the file path to the CSV file.

In [4]:
df = pd.read_csv("https://github.com/ponder-org/ponder-datasets/blob/main/tpch/orders.csv?raw=True", header=0)

Now that your data is loaded into a temporary table in your data warehouse and Ponder DataFrame is pointing to the table, you can now work on your DataFrame ``df`` just like you would typically do with any pandas dataframe – with all the computation happening on your warehouse!

In [5]:
df

Unnamed: 0,O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT
0,603014,60040,O,102891.88,2/15/1998,5-LOW,Clerk#000000337,0,egular theodolites. always special ideas sleep...
1,611105,60011,F,85107.80,6/28/1992,2-HIGH,Clerk#000000423,0,platelets; dependencies
2,612353,60085,O,174365.24,9/3/1997,5-LOW,Clerk#000000685,0,g pending pinto beans according to the deposit...
3,613283,60002,F,81616.21,10/6/1992,2-HIGH,Clerk#000000298,0,ly unusual requests wake furiously atop the pa...
4,617699,60022,F,253288.15,10/6/1994,5-LOW,Clerk#000000421,0,ly express excuses sleep furiously packages. s...
...,...,...,...,...,...,...,...,...,...
140,242343,60067,O,130940.07,2/24/1996,1-URGENT,Clerk#000000792,0,ackages haggle fluffily against
141,242722,60064,F,82821.03,4/7/1992,5-LOW,Clerk#000000514,0,according to the silent
142,243297,60085,O,279667.51,7/16/1997,4-NOT SPECIFIED,Clerk#000000985,0,. regularly special packages
143,244579,60085,P,159397.20,6/11/1995,4-NOT SPECIFIED,Clerk#000000404,0,realms haggle blithely slyly permanent ideas. ...


## ``read_parquet:`` Working with Parquet files

To work with Parquet files, use the ``read_parquet`` command to feed in the file path to the file that you'd like to work with.

In [None]:
df = pd.read_parquet("https://github.com/ponder-org/ponder-datasets/blob/main/userdatasample.parquet?raw=True",header=0)

In [None]:
df

Ponder will automatically process your Parquet file and load it into a temporary table in your warehouse for analysis.

## Summary

In this tutorial, we learned how you can leverage the same pandas API for `pd.read_*` to work with your database tables, CSV and Parquet files. 

In our [next tutorial](https://github.com/ponder-org/ponder-notebooks/blob/main/bigquery/tutorial/04-writing-data.ipynb), we will discuss how you can use `pd.to_*` to save your dataframes with Ponder.