# Tutorial 4: Saving Your Results

<div class="alert alert-block alert-info"> <b>Before we get started: </b> 
    <ul style="list-style-type: none;margin: 0;padding: 0;">
        <li>✍️ To run this notebook, you need to have Ponder installed and set up on your machine. If you have not done so already, please refer to our <a href="https://docs.ponder.io/getting_started/quickstart.html">Quickstart guide</a> to get started.</li> 
        <li>📁 This tutorial makes use of the <code>ponder.db</code> database that we created in <a href="https://github.com/ponder-org/ponder-notebooks/blob/main/duckdb/tutorial/01-getting-started.ipynb">Tutorial #1</a>. You can also download the file <a href="https://github.com/ponder-org/ponder-datasets/raw/main/ponder.db">here</a>.</li> 
        <li>📖 Otherwise, if you're just interested in browsing through the tutorial, keep reading below!</li>
    </ul>
</div>

In [1]:
import ponder; ponder.init()
import modin.pandas as pd
import duckdb
duckdb_con = duckdb.connect("../ponder.db")

Let's say that you used Ponder to run some analysis and you want to store the results back to the database. In this tutorial, we will show how you can use the ``to_sql`` command to write your dataframe to table in your database. 

In [2]:
df = pd.read_sql("PONDER_CITIBIKE",con=duckdb_con)

After connecting to our `PONDER_CITIBIKE` table, we see that there are a lot of records with missing values. So we drop these rows to clean up our dataset.

In [3]:
df_cleaned = df.dropna()

In [4]:
df_cleaned

Unnamed: 0,tripduration,starttime,stoptime,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,end_station_longitude,bikeid,usertype,birth_year,gender
0,730,2/9/15 8:37,2/9/15 8:49,72,W 52 St & 11 Ave,40.767272,-73.993929,520,W 52 St & 5 Ave,40.759923,-73.976485,18809,Subscriber,1975.0,male
1,704,11/20/13 20:21,11/20/13 20:33,72,W 52 St & 11 Ave,40.767272,-73.993929,470,W 20 St & 8 Ave,40.743453,-74.000040,20515,Subscriber,1981.0,male
2,425,1/6/16 17:01,1/6/16 17:08,72,W 52 St & 11 Ave,40.767272,-73.993929,469,Broadway & W 53 St,40.763441,-73.982681,17116,Subscriber,1947.0,male
3,373,11/9/15 12:50,11/9/15 12:56,72,W 52 St & 11 Ave,40.767272,-73.993929,469,Broadway & W 53 St,40.763441,-73.982681,20892,Subscriber,1947.0,male
4,1149,8/3/13 17:14,8/3/13 17:33,72,W 52 St & 11 Ave,40.767272,-73.993929,325,E 19 St & 3 Ave,40.736245,-73.984738,17711,Subscriber,1981.0,female
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10884,172,1/2/16 21:29,1/2/16 21:32,116,W 17 St & 8 Ave,40.741776,-74.001497,212,W 16 St & The High Line,40.743349,-74.006818,20945,Subscriber,1961.0,male
10885,692,7/20/16 9:29,7/20/16 9:40,116,W 17 St & 8 Ave,40.741776,-74.001497,2021,W 45 St & 8 Ave,40.759291,-73.988597,19345,Subscriber,1983.0,male
10886,432,8/26/15 10:14,8/26/15 10:21,116,W 17 St & 8 Ave,40.741776,-74.001497,509,9 Ave & W 22 St,40.745497,-74.001971,23512,Subscriber,1962.0,female
10887,827,9/2/14 21:38,9/2/14 21:52,116,W 17 St & 8 Ave,40.741776,-74.001497,439,E 4 St & 2 Ave,40.726281,-73.989780,21211,Subscriber,1985.0,male


Like in pandas, when you are working with a dataframe, you are always working on a temporary copy of the data. 

By default, all pandas operations returns a copy of the dataframe that was operated on. Ponder never make any modifications to your original data. This is especially important in the data warehouse context as the tables are often regarded as the “source of truth” and can be shared across many teams.

This also means that the reference to your dataframe df is only accessible throughout the session. That means that once you exit the session, the dataframe is no longer accessible. 

So if we plan to use the cleaned up data for our analysis later, we can persist it in a table via `to_sql`.

In [5]:
df_cleaned.to_sql("PONDER_CITIBIKE_CLEANED",con=duckdb_con, index=False, if_exists='replace')

Now we can access new new table and continue our analysis whereever we want:

In [6]:
pd.read_sql("PONDER_CITIBIKE_CLEANED",con=duckdb_con)

Unnamed: 0,tripduration,starttime,stoptime,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,end_station_longitude,bikeid,usertype,birth_year,gender
0,730,2/9/15 8:37,2/9/15 8:49,72,W 52 St & 11 Ave,40.767272,-73.993929,520,W 52 St & 5 Ave,40.759923,-73.976485,18809,Subscriber,1975.0,male
1,704,11/20/13 20:21,11/20/13 20:33,72,W 52 St & 11 Ave,40.767272,-73.993929,470,W 20 St & 8 Ave,40.743453,-74.000040,20515,Subscriber,1981.0,male
2,425,1/6/16 17:01,1/6/16 17:08,72,W 52 St & 11 Ave,40.767272,-73.993929,469,Broadway & W 53 St,40.763441,-73.982681,17116,Subscriber,1947.0,male
3,373,11/9/15 12:50,11/9/15 12:56,72,W 52 St & 11 Ave,40.767272,-73.993929,469,Broadway & W 53 St,40.763441,-73.982681,20892,Subscriber,1947.0,male
4,1149,8/3/13 17:14,8/3/13 17:33,72,W 52 St & 11 Ave,40.767272,-73.993929,325,E 19 St & 3 Ave,40.736245,-73.984738,17711,Subscriber,1981.0,female
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9571,172,1/2/16 21:29,1/2/16 21:32,116,W 17 St & 8 Ave,40.741776,-74.001497,212,W 16 St & The High Line,40.743349,-74.006818,20945,Subscriber,1961.0,male
9572,692,7/20/16 9:29,7/20/16 9:40,116,W 17 St & 8 Ave,40.741776,-74.001497,2021,W 45 St & 8 Ave,40.759291,-73.988597,19345,Subscriber,1983.0,male
9573,432,8/26/15 10:14,8/26/15 10:21,116,W 17 St & 8 Ave,40.741776,-74.001497,509,9 Ave & W 22 St,40.745497,-74.001971,23512,Subscriber,1962.0,female
9574,827,9/2/14 21:38,9/2/14 21:52,116,W 17 St & 8 Ave,40.741776,-74.001497,439,E 4 St & 2 Ave,40.726281,-73.989780,21211,Subscriber,1985.0,male


# Summary

In this tutorial, we learned how to leverage the same pandas API for `pd.to_sql` to save your dataframe to your database. This is often useful if you want to persist the work done on the dataframe beyond your current session. 

In our [next tutorial](https://github.com/ponder-org/ponder-notebooks/blob/main/snowflake/tutorial/05-advanced-sql.ipynb), we will discuss how you can easily move between using Ponder and using SQL when developing your data workflows.

In [7]:
duckdb_con.close()