# Tutorial 4: Saving Your Results

<div class="alert alert-block alert-info"> <b>Before we get started: </b> 
    <ul style="list-style-type: none;margin: 0;padding: 0;">
        <li>✍️ To run this notebook, you need to have Ponder installed and set up on your machine. If you have not done so already, please refer to our <a href="https://docs.ponder.io/getting_started/quickstart.html">Quickstart guide</a> to get started.</li>
        <li>📖 Otherwise, if you're just interested in browsing through the tutorial, keep reading below!</li>
    </ul>
</div>

In [1]:
import os; os.chdir("..")
import credential
import ponder; ponder.init()
import modin.pandas as pd
import snowflake.connector
snowflake_con = snowflake.connector.connect(user=credential.params["user"],password=credential.params["password"],account=credential.params["account"],role=credential.params["role"],database=credential.params["database"],schema=credential.params["schema"],warehouse=credential.params["warehouse"])

Let's say that you used Ponder to run some analysis and you want to store the results back to the database. In this tutorial, we will show how you can use the ``to_sql`` command to write your dataframe to table in your database. 

In [2]:
df = pd.read_sql("PONDER_CITIBIKE",con=snowflake_con)

After connecting to our `PONDER_CITIBIKE` table, we see that there are a lot of records with missing values. So we drop these rows to clean up our dataset.

In [3]:
df_cleaned = df.dropna()

In [4]:
df_cleaned

Unnamed: 0,tripduration,starttime,stoptime,bikeid,usertype,gender,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,birth_year
0,732.0,2019-11-06 15:11:28.721,2019-11-06 15:23:41.2970,25785.0,Subscriber,1.0,E 47 St & 1 Ave,516.0,E 43 St & 5 Ave,3680.0,40.752069,-73.967844,40.754121,-73.980252,1994.0
1,313.0,2019-02-21 17:17:19.919,2019-02-21 17:22:33.9070,26506.0,Subscriber,2.0,E 41 St & Madison Ave,3235.0,Broadway & W 32 St,498.0,40.752165,-73.979922,40.748549,-73.988084,1960.0
3,754.0,2015-07-29 06:58:54.000,7/29/2015 07:11:29,23061.0,Subscriber,1.0,W 15 St & 7 Ave,482.0,E 47 St & Park Ave,359.0,40.739355,-73.999318,40.755103,-73.974987,1981.0
4,353.0,2015-07-29 17:39:44.000,7/29/2015 17:45:37,14927.0,Subscriber,1.0,Lafayette St & E 8 St,293.0,St Marks Pl & 2 Ave,236.0,40.730287,-73.990765,40.728419,-73.987140,1954.0
7,175.0,2016-04-20 11:58:42.000,4/20/2016 12:01:38,16823.0,Subscriber,1.0,Reade St & Broadway,330.0,Duane St & Greenwich St,276.0,40.714505,-74.005628,40.717488,-74.010455,1948.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
118858,379.0,2019-04-30 10:48:37.091,2019-04-30 10:54:56.5590,20959.0,Customer,1.0,Canal St & Rutgers St,307.0,Cleveland Pl & Spring St,151.0,40.714275,-73.989900,40.722104,-73.997249,1996.0
118859,354.0,2019-02-05 18:05:10.292,2019-02-05 18:11:04.9850,26144.0,Subscriber,1.0,W 21 St & 6 Ave,435.0,8 Ave & W 31 St,3255.0,40.741740,-73.994156,40.750585,-73.994685,1978.0
118860,300.0,2019-02-01 18:28:37.926,2019-02-01 18:33:38.4590,19127.0,Subscriber,1.0,Lispenard St & Broadway,257.0,Fulton St & Broadway,319.0,40.719392,-74.002472,40.711066,-74.009447,1972.0
118862,1615.0,2017-06-09 14:15:17.000,2017-06-09 14:42:13,27010.0,Subscriber,1.0,Madison Ave & E 82 St,3362.0,E 52 St & 2 Ave,441.0,40.778131,-73.960694,40.756014,-73.967416,1986.0


Like in pandas, when you are working with a dataframe, you are always working on a temporary copy of the data. 

By default, all pandas operations returns a copy of the dataframe that was operated on. Ponder never make any modifications to your original data. This is especially important in the data warehouse context as the tables are often regarded as the “source of truth” and can be shared across many teams.

This also means that the reference to your dataframe df is only accessible throughout the session. That means that once you exit the session, the dataframe is no longer accessible. 

So if we plan to use the cleaned up data for our analysis later, we can persist it in a table via `to_sql`.

In [None]:
df_cleaned.to_sql("PONDER_CITIBIKE_CLEANED",con=snowflake_con, index=False)

Now we can access new new table and continue our analysis whereever we want:

In [None]:
pd.read_sql("PONDER_CITIBIKE_CLEANED",con=snowflake_con)

# Summary

In this tutorial, we learned how to leverage the same pandas API for `pd.to_sql` to save your dataframe to your database. This is often useful if you want to persist the work done on the dataframe beyond your current session. 

In our [next tutorial](https://github.com/ponder-org/ponder-notebooks/blob/main/snowflake/tutorial/05-advanced-sql.ipynb), we will discuss how you can easily move between using Ponder and using SQL when developing your data workflows.