# Saving a pandas Dataframe as a CSV File or a Pickle: *to_csv* and *to_pickle*   
- A pandas dataframe exists only in RAM, but often we want to save it to a file.  
- A term used for this is *Serializing* the dataframe to a data file.  


- [**Saving a *pandas* dataframe as a *CSV* file**](#Saving-a-pandas-dataframe-as-a-CSV-file)  
  - A CSV file is a text file  
  - CSV stands for Comma Separated Values  
  
  
- [**Saving a *pandas* dataframe as a *pickle* file**](#Saving-a-pandas-dataframe-as-a-pickle-file)  
  - *pickle* is a python library utility that ships with the standard python distribution that allows us to serialize a pandas dataframe as a file
  - A pickle file is binary and it has it's own file format.


In [1]:
import pandas as pd

import pickle 

### Read the csv file into a pandas dataframe

#### Read the csv file into a pandas dataframe

In [2]:
# Read the CSV file data into a pandas dataframe
df = pd.read_csv("Data/olympics.csv")

#Display the top rows of the dataframe
df.head()

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


#### Display info about the new pandas dataframe

In [3]:
# Display of number of rows and columns in the dataframe to doublecheck
print("Number of Rows:  ", df.shape[0])
print("Number of Columns:  ", df.shape[1])

Number of Rows:   87
Number of Columns:   6


In [4]:
# Display the datatypes of the columns in the dataframe
df.dtypes

Rank        int64
Country    object
Gold        int64
Silver      int64
Bronze      int64
Total       int64
dtype: object

# Saving a *pandas* dataframe as a *CSV* file 

### Save the dataframe as a CSV file

In [5]:
# Save the df dataframe as a CSV file in the Data folder
df.to_csv("Data/olympics_saved.csv", index=False)

### Test new CSV file by reading and displaying it

In [6]:
# Read the saved csv file 
df_test = pd.read_csv("Data/olympics_saved.csv")

In [7]:
# Display the pandas dataframe read from the pickle file
df_test.head(3)

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70


In [8]:
# Display of number of rows and columns in the dataframe to doublecheck
print("Number of Rows:  ", df_test.shape[0])
print("Number of Columns:  ", df_test.shape[1])

Number of Rows:   87
Number of Columns:   6


# Saving a *pandas* dataframe as a *pickle* file 

### Save the dataframe as a pickle file

In [9]:
# Save the df dataframe as a pickle file in the Data folder
df.to_pickle("Data/olympics_saved.pkl")

### Test new pickle file by reading and displaying it

In [10]:
# Read (unpickle) the saved pickle data frame 
df_test = pd.read_pickle("Data/olympics_saved.pkl")

In [11]:
# Display the pandas dataframe read from the pickle file
df_test.head(3)

Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70


In [12]:
# Display of number of rows and columns in the dataframe to doublecheck
print("Number of Rows:  ", df_test.shape[0])
print("Number of Columns:  ", df_test.shape[1])

Number of Rows:   87
Number of Columns:   6
