# Tutorial 3: Connecting to Your Data Source

In [None]:
import os; os.chdir("..")
import credential
import ponder.snowflake
import modin.pandas as pd
snowflake_con = ponder.snowflake.connect(user=credential.params["user"],password=credential.params["password"],account=credential.params["account"],role=credential.params["role"],database=credential.params["database"],schema=credential.params["schema"],warehouse=credential.params["warehouse"])
ponder.snowflake.init(snowflake_con,enable_ssl=True)

Before we start can start our analysis, we need to first connect to a data source. Ponder currently supports `read_csv` for operating on CSV files and `read_sql` for operating on tables that are already stored in Snowflake.

<img src="https://docs.ponder.io/_images/architecture.png"></img>

## ``read_sql:``Working with existing tables

To work with data stored in an existing table in Snowflake, we use the ``read_sql`` command and provide the name of the table ``PONDER_CUSTOMER`` and pass in ``auto`` to the connection parameter to auto-populate the connection information based on what we provided earlier

In [None]:
df = pd.read_sql("PONDER_CUSTOMER", snowflake_con)

Now that we have a Ponder DataFrame that points to the ``PONDER_CUSTOMER`` table in your data warehouse, you can now work on your DataFrame ``df`` just like you would typically do with any pandas dataframe – with all the computation happening on your warehouse!

In [None]:
df

<div class="alert alert-block alert-info"> <b>Note: </b> <span> Unlike in pandas, the data ingestion (read_*) command in Ponder does not actually load in the data into a dataframe in memory. Instead, you can think of the Ponder DataFrame acting as a pointer to the table in Snowflake that stores the data and relays all the operations to be performed on the tables in Snowflake. </span></div>

## ``read_csv:`` Working with CSV files

### Working with remote CSV files
To work with ``CSV`` files, use the ``read_csv`` command to feed in the filepath to the CSV file. If the filepath is a remote path to the CSV (e.g., filepath to S3, GCS, or a public dataset URL), you can enter the path directly as follow. Ponder will automatically process your CSV file and load it into a temporary table in your data warehouse account for analysis.

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/ponder-org/ponder-datasets/main/orders.csv", header=0)

Now that your data is loaded into a temporary table in your data warehouse and Ponder DataFrame is pointing to the table, you can now work on your DataFrame ``df`` just like you would typically do with any pandas dataframe – with all the computation happening on your warehouse!

In [None]:
df

### Working with your own local CSV files

If you have a CSV file locally that you want to analyze with Ponder, we provide an interface that allows you to stage the file for analysis.

**1. Uploading to Ponder:** If you have a CSV file on your local machine, you must first upload them through the notebook interface. You can upload files to your Jupyter directory using the file upload functionality provided by Jupyter notebook.

<img src="https://docs.ponder.io/_images/upload2.png" width="50%"></img>

**2. Staging CSV file to a remote path:** After uploading your files to the Jupyter directory, you will need to stage the file to a remote path so that it is accessible by read_csv, as following:

In [4]:
!wget -q "https://raw.githubusercontent.com/ponder-org/ponder-datasets/main/movies.csv"

zsh:1: command not found: wget


In [6]:
!curl "https://raw.githubusercontent.com/ponder-org/ponder-datasets/main/movies.csv" > movies.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 36626  100 36626    0     0   398k      0 --:--:-- --:--:-- --:--:--  441k


In [8]:
from ponder.utils.core import Teleporter
t = Teleporter()
remote_path = t.depulso("movies.csv")

2023-03-13 20:30:46,382 - depulso - INFO - Compression took 0.002811908721923828s


TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

**3. Read your CSV file with ``read_csv``**: Once the file is staged to the remote_path, you can load it in via `pd.read_csv` as usual.

In [None]:
df = pd.read_csv(remote_path, header=0)

In [None]:
df