# Tutorial 2: Primer to Ponder

In [1]:
import os; os.chdir("..")
import credential
import ponder; ponder.init()
import modin.pandas as pd
import snowflake.connector
snowflake_con = snowflake.connector.connect(user=credential.params["user"],password=credential.params["password"],account=credential.params["account"],role=credential.params["role"],database=credential.params["database"],schema=credential.params["schema"],warehouse=credential.params["warehouse"])
ponder.configure(default_connection=snowflake_con)



## What is Ponder?

Ponder lets you run your pandas code directly in your data warehouse. This means that you can continue to write pandas, but with the scalability and security benefits of a modern data warehouse. 

### Key Features

- **Data science at all scales**: With Ponder's technology, the same pandas workflows can be run at all scales, from megabytes to terabytes, without changing a single line of code. 

- **No change to user workflow:** Data scientists can continue running their existing pandas workflows and writing pandas code in their favorite IDE of choice, and benefit from seamless scalability improvements.

- **Simplify your data infrastructure:** No need to set up and maintain compute infrastructure required for other parallel processing frameworks (e.g., Spark, Ray, Dask, etc.) to perform large scale data analysis with pandas.

- **Guaranteed security:** All your pandas workflows will be executed in Snowflake, thus benefiting from the rigorous security guarantees offered by Snowflake.

In the following sections, we will showcase some examples of how Ponder works and how it can be used in your work.

### Demo: Write SQL no more, Ponder in action!

Under the hood, pandas operations are automatically compiled down to SQL queries that get pushed to Snowflake. Queries are executed directly on Snowflake, with users benefiting from the performance, scalability, and security benefits provided by Snowflake as the computation engine.  

Here is an architecture of how Ponder works: 

<img src="https://ponder.io/wp-content/uploads/2023/04/ponder_architecture.png" width="75%"></img>


To show you that this is actually running in the data warehouse, you can log onto your [Snowflake web interface](https://app.snowflake.com/). The pandas operations you execute on Ponder correspond to the SQL queries shown on the `Query History` page in Snowflake web interface.

In [2]:
df = pd.read_sql("PONDER_BOOKS", snowflake_con)

You can look at the corresponding SQL queries for the pandas operations ran in Ponder by going to `Activity` > `Query History` in your Snowflake web interface. The history page lets you view and drill into the details of queries executed in your Snowflake account in the last 14 days.

<img src="https://docs.ponder.io/_images/mon2.png" width="75%"></img>

In this case, you can see that as we connected to the table via `pd.read_sql`, this corresponding SQL query was generated: 

```sql
CREATE TEMP TABLE "Ponder_zmyffjgcyh" AS SELECT *, ROW_NUMBER() OVER (ORDER BY 1) -1 AS _PONDER_ROW_NUMBER_, _PONDER_ROW_NUMBER_ AS _PONDER_ROW_LABELS_ FROM ("PONDER_BOOKS")
```

You might recall that in the last tutorial, we performed z-score normalization on all the numerical columns. 

In [3]:
x = df.select_dtypes(include='number').columns
(df[x] - df[x].mean())/df[x].std()

Unnamed: 0,bookID,average_rating,isbn13,num_pages,ratings_count,text_reviews_count
0,-1.627423,1.814764,0.046421,1.308412,18.465698,10.495965
1,-1.627347,1.586444,0.046420,2.212309,18.976518,11.128467
2,-1.627194,1.386663,0.046420,0.064518,-0.103230,-0.115700
3,-1.627118,1.786224,0.046420,0.408662,20.633287,13.885086
4,-1.626888,2.414107,0.046420,9.758604,0.208673,-0.146743
...,...,...,...,...,...,...
11114,1.857626,0.359219,0.048950,0.727928,-0.158127,-0.202621
11115,1.857779,0.416299,0.045744,1.237925,-0.152555,-0.188651
11116,1.857855,0.073818,0.045744,0.325735,-0.152226,-0.173518
11117,1.858237,-0.611144,0.045565,0.404515,-0.152679,-0.156444


Now take a look at Snowflake's `Query History`, the corresponding SQL query is 200+ lines long!!

```sql
SELECT 
  "_PONDER_ROW_LABELS_", 
  "bookID", 
  "average_rating", 
  "isbn13", 
  "  num_pages", 
  "ratings_count", 
  "text_reviews_count" 
FROM 
  (
    SELECT 
      * 
    FROM 
      (
        SELECT 
          "_PONDER_ROW_NUMBER_", 
          "_PONDER_ROW_LABELS_", 
          "bookID" / "bookID_ponder_right" AS "bookID", 
          "average_rating" / "average_rating_ponder_right" AS "average_rating", 
          "isbn13" / "isbn13_ponder_right" AS "isbn13", 
          "  num_pages" / "  num_pages_ponder_right" AS "  num_pages", 
          "ratings_count" / "ratings_count_ponder_right" AS "ratings_count", 
          "text_reviews_count" / "text_reviews_count_ponder_right" AS "text_reviews_count" 
        FROM 
          (
            SELECT 
              "_PONDER_ROW_NUMBER_", 
              "_PONDER_ROW_LABELS_", 
              "bookID" - "bookID_ponder_right" AS "bookID", 
              "average_rating" - "average_rating_ponder_right" AS "average_rating", 
              "isbn13" - "isbn13_ponder_right" AS "isbn13", 
              "  num_pages" - "  num_pages_ponder_right" AS "  num_pages", 
              "ratings_count" - "ratings_count_ponder_right" AS "ratings_count", 
              "text_reviews_count" - "text_reviews_count_ponder_right" AS "text_reviews_count" 
            FROM 
              (
                SELECT 
                  "_PONDER_ROW_NUMBER_", 
                  "_PONDER_ROW_LABELS_", 
                  "bookID", 
                  "average_rating", 
                  "isbn13", 
                  "  num_pages", 
                  "ratings_count", 
                  "text_reviews_count" 
                FROM 
                  (
                    SELECT 
                      "bookID", 
                      "title", 
                      "authors", 
                      "average_rating", 
                      "isbn", 
                      "isbn13", 
                      "language_code", 
                      "  num_pages", 
                      "ratings_count", 
                      "text_reviews_count", 
                      "publication_date", 
                      "publisher", 
                      "_PONDER_ROW_NUMBER_", 
                      "_PONDER_ROW_LABELS_" 
                    FROM 
                      "Ponder_zmyffjgcyh" 
                    ORDER BY 
                      "_PONDER_ROW_NUMBER_"
                  )
              ) AS _PONDER_LEFT_ CROSS 
              JOIN (
                SELECT 
                  "bookID" AS "bookID_ponder_right", 
                  "average_rating" AS "average_rating_ponder_right", 
                  "isbn13" AS "isbn13_ponder_right", 
                  "  num_pages" AS "  num_pages_ponder_right", 
                  "ratings_count" AS "ratings_count_ponder_right", 
                  "text_reviews_count" AS "text_reviews_count_ponder_right" 
                FROM 
                  (
                    SELECT 
                      0 AS _PONDER_ROW_NUMBER_, 
                      0 AS _PONDER_ROW_LABELS_, 
                      AVG("bookID") AS "bookID", 
                      AVG("average_rating") AS "average_rating", 
                      AVG("isbn13") AS "isbn13", 
                      AVG("  num_pages") AS "  num_pages", 
                      AVG("ratings_count") AS "ratings_count", 
                      AVG("text_reviews_count") AS "text_reviews_count" 
                    FROM 
                      (
                        SELECT 
                          "bookID" :: FLOAT AS "bookID", 
                          "average_rating" :: FLOAT AS "average_rating", 
                          "isbn13" :: FLOAT AS "isbn13", 
                          "  num_pages" :: FLOAT AS "  num_pages", 
                          "ratings_count" :: FLOAT AS "ratings_count", 
                          "text_reviews_count" :: FLOAT AS "text_reviews_count", 
                          "_PONDER_ROW_LABELS_", 
                          "_PONDER_ROW_NUMBER_" 
                        FROM 
                          (
                            SELECT 
                              "_PONDER_ROW_NUMBER_", 
                              "_PONDER_ROW_LABELS_", 
                              "bookID", 
                              "average_rating", 
                              "isbn13", 
                              "  num_pages", 
                              "ratings_count", 
                              "text_reviews_count" 
                            FROM 
                              (
                                SELECT 
                                  "bookID", 
                                  "title", 
                                  "authors", 
                                  "average_rating", 
                                  "isbn", 
                                  "isbn13", 
                                  "language_code", 
                                  "  num_pages", 
                                  "ratings_count", 
                                  "text_reviews_count", 
                                  "publication_date", 
                                  "publisher", 
                                  "_PONDER_ROW_NUMBER_", 
                                  "_PONDER_ROW_LABELS_" 
                                FROM 
                                  "Ponder_zmyffjgcyh" 
                                ORDER BY 
                                  "_PONDER_ROW_NUMBER_"
                              )
                          )
                      )
                  )
              ) AS _PONDER_RIGHT_
          ) AS _PONDER_LEFT_ CROSS 
          JOIN (
            SELECT 
              "bookID" AS "bookID_ponder_right", 
              "average_rating" AS "average_rating_ponder_right", 
              "isbn13" AS "isbn13_ponder_right", 
              "  num_pages" AS "  num_pages_ponder_right", 
              "ratings_count" AS "ratings_count_ponder_right", 
              "text_reviews_count" AS "text_reviews_count_ponder_right" 
            FROM 
              (
                SELECT 
                  0 AS _PONDER_ROW_NUMBER_, 
                  0 AS _PONDER_ROW_LABELS_, 
                  STDDEV("bookID") AS "bookID", 
                  STDDEV("average_rating") AS "average_rating", 
                  STDDEV("isbn13") AS "isbn13", 
                  STDDEV("  num_pages") AS "  num_pages", 
                  STDDEV("ratings_count") AS "ratings_count", 
                  STDDEV("text_reviews_count") AS "text_reviews_count" 
                FROM 
                  (
                    SELECT 
                      "_PONDER_ROW_NUMBER_", 
                      "_PONDER_ROW_LABELS_", 
                      "bookID", 
                      "average_rating", 
                      "isbn13", 
                      "  num_pages", 
                      "ratings_count", 
                      "text_reviews_count" 
                    FROM 
                      (
                        SELECT 
                          "bookID", 
                          "title", 
                          "authors", 
                          "average_rating", 
                          "isbn", 
                          "isbn13", 
                          "language_code", 
                          "  num_pages", 
                          "ratings_count", 
                          "text_reviews_count", 
                          "publication_date", 
                          "publisher", 
                          "_PONDER_ROW_NUMBER_", 
                          "_PONDER_ROW_LABELS_" 
                        FROM 
                          "Ponder_zmyffjgcyh" 
                        ORDER BY 
                          "_PONDER_ROW_NUMBER_"
                      )
                  )
              )
          ) AS _PONDER_RIGHT_
      ) 
    WHERE 
      "_PONDER_ROW_NUMBER_" IN (
        '0', '1', '2', '3', '4', '5', '6', '7', 
        '8', '9', '10', '11', '12', '13', '14', 
        '15', '16', '17', '18', '19', '20', 
        '21', '22', '23', '24', '25', '26', 
        '27', '28', '29', '30', '11088', '11089', 
        '11090', '11091', '11092', '11093', 
        '11094', '11095', '11096', '11097', 
        '11098', '11099', '11100', '11101', 
        '11102', '11103', '11104', '11105', 
        '11106', '11107', '11108', '11109', 
        '11110', '11111', '11112', '11113', 
        '11114', '11115', '11116', '11117', 
        '11118'
      )
  ) 
ORDER BY 
  _PONDER_ROW_NUMBER_ 
LIMIT 
  10001
```

In this example, we saw how something as easy to express in pandas in a single line can in fact take *many* lines of SQL to write. 

Using Ponder leads to huge time-savings since you can think and work natively in pandas when interacting with your data warehouse.

### Summary

In this tutorial, we saw how Ponder lets you run pandas on Snowflake. Ponder simplifies your experience in working with data. It does this by translating your pandas queries to corresponding SQL queries to run on your data warehouse. Ponder gives you the flexibility of working in pandas directly and often there are queries that are easier to write in pandas than having to craft hundreds of lines of SQL!

As we can see, there are many benefits from being able to leverage the pandas API (over writing SQL directly) on your data warehouse, as summarized in this table. 

|               | pandas | SQL | Ponder |
|---------------|--------|-----|--------|
| Easy to use   | ✅      | ❌   | ✅      |
| Flexible      | ✅      | ❌   | ✅      |
| Scalable      | ❌      | ✅   | ✅      |
| Secure access | ❌      | ✅   | ✅      |


To learn more about Ponder, check out our product blogpost [here](https://ponder.io/run-pandas-on-1tb-directly-in-your-data-warehouse/).