___

<a href='https://www.youtube.com/FallinPython'> <img src="../_images/FallinPython_Jupyter-01.jpg" width="750" height="400" align="center"/></a>
___

# Pandas Library
* Website:       https://pandas.pydata.org/ 
* Install Pandas:       https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
* Documentation: https://pandas.pydata.org/docs/
* User Guide: https://pandas.pydata.org/docs/user_guide/index.html#user-guide

___
# Content

[[1] DataFrame Data Structure Definition](#section_DataFrame)<br>
[[2] How to Create DataFrames](#section_createDataFrame)<br>
[[3] Basic Operations on DataFrames](#section_BasicOperation)<br>
___

<a id='section_DataFrame'></a>

In [None]:
import pandas as pd
import numpy as np

# 1. DataFrame

Pandas DataFrame is a two-dimensional labeled data structure capable of holding any data type (integers, strings, float, Python objects, etc.). <br>
* Spreadsheet-like object containing rows and columns.
* Each column can contain a different data type.
* The most important Pandas data structure.
* It's built on top of NumPy arrays.

<img src="_images/pandas_data_structures.png" width="1000" height="400" align="center" /><br>
___

<a id='section_createDataFrame'></a>

## 1.1 Creating DataFrame

There are several ways to create a pandas `DataFrame`. The most common ways are:
* Pandas I/O API: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
* Manually

### From Pandas I/O API 

In [None]:
#pd.read_

In [None]:
# dataframe created using the read_csv() method
# source: https://archive.ics.uci.edu/ml/datasets/Automobile  (University of California)
df_cars = pd.read_csv("_data/CarPrice_Assignment.csv")
df_cars

### Manually: using pandas.DataFrame( ) method

When creating a pandas `DataFrame` manually, you will use the DataFrame method $\rightarrow$ **pandas.DataFrame( )**

In [None]:
pd.DataFrame()

When created manually, `NumPy array` and `dictionary` data types are your best friends. The most common ways to create `DataFrame` manually are:
* DataFrame from a Dictionary of Lists $\rightarrow$ (Important to know)
* DataFrame from NumPy array $\rightarrow$ (Good to know)

**Creating DataFrame from `Dictionary of Lists`**

In [None]:
# subset of CarPrice dataset
data = {"Car Name"  : ["Honda Prelude", "Mitsubishi Outlander", "alfa-romero Quadrifoglio","toyota corolla"], 
        "Doors"     : [4, 4, 2, 4],
        "Fuel Type" : ["gas", "gas", "gas", "diesel"],
        "Price"     : [8845.0, 9279.0, 16500.0, 7788.0]}

df_from_list = pd.DataFrame(data)
df_from_list

**Creating DataFrame from `NumPy arrays`**

In [None]:
# for very quick data frame creation
data = np.arange(36).reshape(6,6)
print(data)

# default index and column labels
df_from_array = pd.DataFrame(data, columns=["a","B","c","D","e","F"])
df_from_array

<a id='section_BasicOperation'></a>

## 1.2 Basic Operations on DataFrames

In [None]:
# Original dataframe: https://archive.ics.uci.edu/ml/datasets/Automobile
# The dataset has been modified for teaching purposes (by Fall in Python)
df_cars = pd.read_csv("_data/CarPrice_modified.csv")
df_cars

### How to view our dataset?

In [None]:
# Return the first n rows
df_cars.head(3)

In [None]:
# Return the last n rows
df_cars.tail(3)

In [None]:
# Return a random number of samples
df_cars.sample(3)

**Do I modify my dataset when using `head()`, `tail()` and `sample()` methods?**

In [None]:
df_cars

### DataFrame: main components

Pandas DataFrame is a two-dimensional labeled data sctructure which contains 3 main components:
* Index
* Column Labels
* Values

**How to get the index?**

In [None]:
df_cars.head(3)

In [None]:
# get the index
df_cars.index.to_list()

**How to get the column labels?**

In [None]:
# get the columns name or the headers
df_cars.columns.to_list()

**How to get the values (data)?**

In [None]:
df_cars.head(8)

In [None]:
# get values
df_cars.to_numpy()   # .values

### How to inspect our dataset?

In [None]:
df_cars.head()

**How to get the shape?**

In [None]:
# get shape attribute: Return a tuple representing the dimensionality of the DataFrame
df_cars.shape

**How to get the data types?**

In [None]:
# Return the data types of each column in the DataFrame.
df_cars.dtypes

**How to get general information about the dataframe?**

In [None]:
# Return a concise summary of a DataFrame.
df_cars.info()

**How to get basic statistics of the dataframe?**

In [None]:
df_cars.head(4)

In [None]:
df_cars.describe(include="all")

### Accessing columns

The basic ways for accessing columns are:
* bracket notation
* dot notation

In [None]:
df_cars.head(3)

In [None]:
# bracket notation
df_cars["Price [EUR]"].head(3)

In [None]:
# dot notation
df_cars.CarName.head(3)

### Creating and deleting columns

Arithmetic operation happens element-wise as Pandas is built on top of NumPy.
The detail you need to pay attention is to check if the series have the same index:

In [None]:
df_cars.head()

In [None]:
# create column
factor = 2.54
df_cars["CarLength [cm]"] = df_cars["CarLength [in]"] * factor
df_cars.head(5)

In [None]:
# delete column
del df_cars["CarLength [in]"]

## Extra Exercise

In [None]:
df_cars.describe(include='all')

In [None]:
test_boolean = df_cars["CarName"]=="toyota corona"
df_cars[test_boolean]

<a id='section_methodsAtributtes'></a>

## DataFrame Methods and Attributes

* Check them out in the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/frame.html

Pandas `DataFrame` has plenty of methods and attributes and I will not go through all of them, however I will point out some useful ones that will help us during this course. You can check the complete list using the link from the online documentation above or using the python buil-in function `dir`.