In [None]:
# To install a package, use "!pip install"  
!pip install pandas

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

In [None]:
___ = ''

In [None]:
pd.__version__

# Pandas DataFrames

## What is a DataFrame?

A DataFrame, simply put, is a **Table** of data.  It is a structure that contains multiple rows, each row containing the same labelled collection of data types.  For example, a DataFrame might look like this:

| (index), | Name, | Age, | Height, | LikesIceCream |
| :---: | :--: | :--: | :--: | :--: |
| 0     | "Nick" | 22 | 3.4 | True |
| 1     | "Jenn" | 55 | 1.2 | True |
| 2     | "Joe"  | 25 | 2.2 | True |

Because each row contains the same data, DataFrames can also be thought of as a collection of same-length columns!

**Pandas** is a Python package that has a DataFrame class.  Using either the **DataFrame** class constructor or one of Pandas' many **read_()** functions, you can make your own DataFrame from a variety of sources.  

## Making DataFrames Directly

#### From a List of Dicts

Dicts are named collections.  If you have many of the same dicts in a list, the DataFrame constructor can convert it to a Dataframe:

In [None]:
friends = [
    {'Name': "Nick", "Age": 31, "Height": 2.9, "Weight": 20},
    {'Name': "Jenn", "Age": 55, "Height": 1.2},
    {"Name": "Joe", "Height": 1.2, "Age": 25, },
]
pd.DataFrame(friends)

#### From a Dict of Lists

In [None]:
df = pd.DataFrame({
    'Name': ['Nick', 'Jenn', 'Joe'], 
    'Age': [31, 55, 25], 
    'Height': [2.9, 1.2, 1.2],
})
df

#### From a List of Lists

if you have a collection of same-length sequences, you essentially have a rectangular data structure already!  All that's needed is to add some column labels.

In [None]:
friends = [
    ['Nick', 31, 2.9],
    ['Jenn', 55, 1.2],
    ['Joe',  25, 1.2],
]
pd.DataFrame(friends, columns=["Name", "Age", "Height"])

#### From an empty DataFrame
If you prefer, you can also add columns one at a time, starting with an empty DataFrame:

In [None]:
df = pd.DataFrame()
df['Name'] = ['Nick', 'Jenn', 'Joe']
df['Age'] = [31, 55, 25]
df['Height'] = [2.9, 1.2, 1.2]
df

### Exercise: Making DataFrames from Scratch

Please recreate the table below as a Dataframe using one of the approaches detailed above:

| Year | Product | Cost |
| :--: | :----:  | :--: |
| 2015 | Apples  | 0.35 |
| 2016 | Apples  | 0.45 |
| 2015 | Bananas | 0.75 |
| 2016 | Bananas | 1.10 |

Show the column named "Cost"

Plot the column named "Cost"

Calculate the mean cost

Make a summary of the statistics of the cost

Create a new column named "Delicious", indicating that it's True

**Discuss**: Which approach did you choose?  What did you like about it?

### Reading Data from Files into a DataFrame


| File Format | File Extension | `read_xxx()` function | Dataframe Write Method | 
| :--:  | :--: | :--: | :--: |
| Comma-Seperated Values      | .csv           | `pd.read_csv()` | `df.to_csv()` |
| Tab-seperated Valuess       | .tsv, .tabular, .csv | `pd.read_csv(sep='\t')`, `pd.read_table()` | `df.to_csv(sep='\t')` `df.to_table()` |
| Excel Spreadsheet           |  .xls | `pd.read_excel()`                    | `df.to_excel()`  |
| Excel Spreadsheet 2010      | .xlsx | `pd.read_excel(engine='openpyxl')`   | `df.to_excel(engine='openpyxl')` |
| JSON                        | .json | `pd.read_json()`                     | `df.to_json()` |
| Tables in a Web Page (HTML) | .html | `pd.read_html()[0]`                  | `df.to_html()` |
| HDF5 | .hdf5, .h5, | `pd.read_hdf5()` |  `df.to_hdf5()` |

In [None]:
import pandas as pd

### File Format Exercises: "Roundtripping" write-read

run the code below to download the Titanic passengers dataset, and transform it into different file formats

In [None]:
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv'
df = pd.read_csv(url)
df[:5]

#### Tab-Seperated Values

Save the dataframe to a TSV file.

Read the TSV file into Pandas again.  

Open the file in Jupyter (right-click on it in the file browser, click open with editor).  What does the file it look like?

#### JSON 

Save the dataframe to a JSON file.

Read the JSON file into Pandas again.  

Open the file in Jupyter (right-click on it in the file browser, click open with editor).  What does the file it look like?

#### HTML 

Save the dataframe to a HTML file.

Read the HTML file into Pandas again.  

#### Excel 

Note: Because XLS and XLSX are proprietary formats, you may need to install a couple extra packages for this to work (code below)

In [None]:
!pip install openpyxl

Save the dataframe to an Excel file.

Read the Excel file into Pandas again.  

Download the file onto your computer, then open it in your spreadsheet program (Excel, LibreOffice Sheets, Google Sheets, etc).  What does it look like?

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=54078b40-015e-4105-9858-c4755630da81' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>