# Dataframes
## Why use dataframes? 
- They help us store, organize, and save data. 
- Using the Pandas module, we can also perform many different types of operations on columns of a dataframe.

In [1]:
# We will import pandas to make dataframes. This also allows you to read .csv, .tsv, .parquet, etc files to a dataframe
import pandas as pd

# We'll import pathlib to define where the data are
import pathlib


# Define path to data
data_dir = pathlib.Path("data/Time_Memory.csv")
my_df = pd.read_csv(data_dir)

# We now have our dataframe! The column all the way to the left is the index for the dataframe
# Using <name_of_dataframe>.head(), we can visualize (print) the dataframe below
my_df.head()

Unnamed: 0,Model,Score,SSIM,PSNR,DS,Time,Memory
0,HiCARN-1,0.9154,0.9275,36.8328,16,3.54,21.66
1,HiCARN-1,0.9214,0.908,35.0725,16,3.54,21.66
2,HiCARN-1,0.9054,0.9058,33.7143,16,3.54,21.66
3,HiCARN-1,0.9268,0.9064,34.7086,16,3.54,21.66
4,HiCARN-2,0.9124,0.922,36.6044,16,6.09,23.47


In [2]:
# To call a column of a dataframe
ssim = my_df["SSIM"]
print("Selecting one column of the dataframe")
print(ssim.head())

# To call the column as a list
ssim = my_df["SSIM"].to_list()
print("\nSSIM column to list:")
print(ssim)
# We can now use this list for other things

# We can also call subsets of columns
my_df_subset = my_df[["Model", "SSIM", "PSNR"]]
print("\nSelecting multiple columns:")
print(my_df_subset.head())

# We can also subset by row values
my_df_subset_byRows = my_df.loc[my_df["Model"] == "HiCSR"]
print("\nSelecting rows where model == HiCSR:")
print(my_df_subset_byRows.head())

# Reset the index when you subset by rows. Inplace means that you don't have to create a new dataframe. drop=True removes the old index column
my_df_subset_byRows.reset_index(inplace=True, drop=True)
print("\nSee how the index is now reset:")
my_df_subset_byRows.head()

Selecting one column of the dataframe
0    0.9275
1    0.9080
2    0.9058
3    0.9064
4    0.9220
Name: SSIM, dtype: float64

SSIM column to list:
[0.9275, 0.908, 0.9058, 0.9064, 0.922, 0.9028, 0.8999, 0.9002, 0.9133, 0.8941, 0.8874, 0.895, 0.9244, 0.9003, 0.8732, 0.9041, 0.9205, 0.8973, 0.8783, 0.8989, 0.9005, 0.8742, 0.8502, 0.8758, 0.9132, 0.8945, 0.8763, 0.893, 0.9117, 0.8927, 0.8742, 0.8905, 0.9024, 0.8835, 0.8653, 0.8819, 0.8999, 0.8802, 0.8619, 0.8772]

Selecting multiple columns:
      Model    SSIM     PSNR
0  HiCARN-1  0.9275  36.8328
1  HiCARN-1  0.9080  35.0725
2  HiCARN-1  0.9058  33.7143
3  HiCARN-1  0.9064  34.7086
4  HiCARN-2  0.9220  36.6044

Selecting rows where model == HiCSR:
    Model   Score    SSIM     PSNR  DS   Time  Memory
12  HiCSR  0.8852  0.9244  32.3773  16  20.71   24.61
13  HiCSR  0.8925  0.9003  30.7684  16  20.71   24.61
14  HiCSR  0.7268  0.8732  29.5508  16  20.71   24.61
15  HiCSR  0.9056  0.9041  30.2303  16  20.71   24.61

See how the index is now

Unnamed: 0,Model,Score,SSIM,PSNR,DS,Time,Memory
0,HiCSR,0.8852,0.9244,32.3773,16,20.71,24.61
1,HiCSR,0.8925,0.9003,30.7684,16,20.71,24.61
2,HiCSR,0.7268,0.8732,29.5508,16,20.71,24.61
3,HiCSR,0.9056,0.9041,30.2303,16,20.71,24.61


In [3]:
# Making your own dataframe
import pandas as pd

my_dict = {"Names": ["Paker", "Ceanne", "Carson", "Michael"], "Ages": [23, 22, 20, 32]}

# We can create a dataframe from a dict
my_df_fromDict = pd.DataFrame.from_dict(my_dict)
my_df_fromDict.head()

Unnamed: 0,Names,Ages
0,Paker,23
1,Ceanne,22
2,Carson,20
3,Michael,32


## Your turn!
### Create a Pandas DataFrame from a dictionary with two columns that make sense to you
#### Use .head() to print the dataframe


In [None]:
# Your code here:

## Now you'll learn how to save a Pandas DataFrame as a `.parquet` file

The `.parquet` file format allows for faster saving/loading of data (compared to `.csv`) as it's stored in binary format.

In [None]:
# Let's load our data again
# We're going to load Time_Memory.csv again from our data directory
# Follow the steps in code chunk 1 of this file to 

# Your code here:

# Step 1: import the pathlib and pandas modules

# Step 2: define the path to the data

# Step 3: read the .csv file into a Pandas dataframe