# Tutorial 3: Connecting to Your Data Source

In [1]:
import ponder; ponder.init()
import modin.pandas as pd
import duckdb
duckdb_con = duckdb.connect("../ponder.db")
ponder.configure(default_connection=duckdb_con)



Before we start can start our analysis, we need to first connect to a data source. Ponder currently supports `read_csv` for operating on CSV files and `read_sql` for operating on tables that are already stored in DuckDB.

## ``read_sql:``Working with existing tables

To work with data stored in an existing table in DuckDB, we use the ``read_sql`` command and provide the name of the table ``PONDER_CUSTOMER`` and pass in ``auto`` to the connection parameter to auto-populate the connection information based on what we provided earlier

In [2]:
df = pd.read_sql("PONDER_CUSTOMER", duckdb_con)

Now that we have a Ponder DataFrame that points to the ``PONDER_CUSTOMER`` table in your database, you can now work on your DataFrame ``df`` just like you would typically do with any pandas dataframe – with all the computation happening in DuckDB!

In [3]:
df

Unnamed: 0,C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
0,60001,Customer#000060001,9Ii4zQn9cX,14,24-678-784-9652,9957.56,HOUSEHOLD,l theodolites boost slyly at the platelets: pe...
1,60002,Customer#000060002,ThGBMjDwKzkoOxhz,15,25-782-500-8435,742.46,BUILDING,beans. fluffily regular packages
2,60003,Customer#000060003,"Ed hbPtTXMTAsgGhCr4HuTzK,Md2",16,26-859-847-7640,2526.92,BUILDING,fully pending deposits sleep quickly. blithely...
3,60004,Customer#000060004,"NivCT2RVaavl,yUnKwBjDyMvB42WayXCnky",10,20-573-674-7999,7975.22,AUTOMOBILE,furiously above the ironic packages. slyly br...
4,60005,Customer#000060005,"1F3KM3ccEXEtI, B22XmCMOWJMl",12,22-741-208-1316,2504.74,MACHINERY,express instructions sleep quickly. ironic bra...
...,...,...,...,...,...,...,...,...
95,60096,Customer#000060096,T9KQ0gc6NvnTSSsFkJOk,12,22-822-538-4011,4620.25,AUTOMOBILE,ial platelets wake carefully express theodolit...
96,60097,Customer#000060097,I55jg art2HQL8YEHwh8FgEx,21,31-526-630-1617,1626.61,FURNITURE,. even asymptotes sleep even dependencies. bli...
97,60098,Customer#000060098,"2y,ZeGm0u0 LYJ7waqsZkmWqmU8vn",0,10-972-910-3772,1449.68,AUTOMOBILE,al requests; packages cajole accounts; idly ev...
98,60099,Customer#000060099,Zc1GskAO8ANH8yGchAqhs31MrKzHbAlhpyy3,21,31-696-159-3613,8767.65,HOUSEHOLD,ns detect slyly quickly bold fox


<div class="alert alert-block alert-info"> <b>Note: </b> <span> Unlike in pandas, the data ingestion (read_*) command in Ponder does not actually load in the data into a dataframe in memory. Instead, you can think of the Ponder DataFrame acting as a pointer to the table in DuckDB that stores the data and relays all the operations to be performed on the tables in DuckDB. </span></div>

## ``read_csv:`` Working with CSV files

To work with ``CSV`` files, use the ``read_csv`` command to feed in the filepath to the CSV file.

In [4]:
df = pd.read_csv("https://github.com/ponder-org/ponder-datasets/blob/main/tpch/orders.csv?raw=True", header=0)

Now that your data is loaded into a temporary table in your database and Ponder DataFrame is pointing to the table, you can now work on your DataFrame ``df`` just like you would typically do with any pandas dataframe – with all the computation happening on DuckDB!

In [5]:
df

Unnamed: 0,O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT
0,603014,60040,O,102891.88,2/15/1998,5-LOW,Clerk#000000337,0,egular theodolites. always special ideas sleep...
1,611105,60011,F,85107.80,6/28/1992,2-HIGH,Clerk#000000423,0,platelets; dependencies
2,612353,60085,O,174365.24,9/3/1997,5-LOW,Clerk#000000685,0,g pending pinto beans according to the deposit...
3,613283,60002,F,81616.21,10/6/1992,2-HIGH,Clerk#000000298,0,ly unusual requests wake furiously atop the pa...
4,617699,60022,F,253288.15,10/6/1994,5-LOW,Clerk#000000421,0,ly express excuses sleep furiously packages. s...
...,...,...,...,...,...,...,...,...,...
140,242343,60067,O,130940.07,2/24/1996,1-URGENT,Clerk#000000792,0,ackages haggle fluffily against
141,242722,60064,F,82821.03,4/7/1992,5-LOW,Clerk#000000514,0,according to the silent
142,243297,60085,O,279667.51,7/16/1997,4-NOT SPECIFIED,Clerk#000000985,0,. regularly special packages
143,244579,60085,P,159397.20,6/11/1995,4-NOT SPECIFIED,Clerk#000000404,0,realms haggle blithely slyly permanent ideas. ...


In [6]:
duckdb_con.close()