## Data Manipulation with Pandas - Outline
#### Chapter 1
- Subsetting and Sorting
- Adding New Columns
#### Chapter 2
- Aggregating and Grouping
- Summary Statistics
#### Chapter 3
- Indexing
- Slicing
#### Chapter 4
- Visualizations
- Reading and Writing CSVs

In [None]:
## Missing values.

# df.isna() # Detecting missing values
# df.isna().any # Detecting any missing values in the columns
# df.isna().sum() # Counting missing values in a column

## Visualizing Missing values.
# import matplotlib.pyplot as plt
# df.isna().sum().plot(kind="bar")
# plt.show()

# cols_missing = []
# df[cols_missing].hist()
# plt.show()

In [None]:
# Handling Missing Values
# Removing missing values

# df.dropna()
# df.fillna(0)  # Fill NA values with 0


Creating a Dataframes
- Dictionaries
- List of Dictionaries
- Dictionary of Lists

Reading and Writing CSVs
- CSV - comma separated values
- Designed for DataFrame-like data

In [None]:
# Creating a Dataframe

# Sample Lists of Dictionaries
# list_of_dicts = [
#     {"name":"Ginger", "breed":"Poodle", "age":100},
#     {"name":"Ginger", "breed":"Poodle", "age":100}
# ]

# Sample Dictionary of Lists
# dict_of_lists = {
#     "name" : ["Ginger", "Scout"],
#     "breed" : ["Poodle", "ToyPoodle"],
#     "age" : [100, 100]
# }

In [None]:
# CSV
# import pandas as pd
# pd.read_csv("")
# df.to_csv("")

## Chapter 2

In [None]:
### Chapter 1 - Introduction
# DataFrame or Tabular Data

# df.head()
# df.info()
# df.shape
# df.describe()
# df.values
# df.columns
# df.index

In [None]:
### Chapter 1 - Introduction 
# Sorting and Subsetting 

# df.sort_values("column_name", ascending=True)
# df.sort_values([list_of_cols])
# df["column_name"]
# df[[list_of_cols]]
# df["column"] > 50
# df["column"] > "Labrador"
# df[df["column"] > 50]

# Subsetting based on multiple conditions
# is_lab = df["breed"]=="Labrador"
# is_brown = df["color"]=="brown"
# df[is_lab & is_brown]     # Condition

# Subsetting using isni()
# is_black_or_brown = df["color"].isin(["black","brown"])
# df["is_black_or_brown"]

# Subsetting Another Example
# south_mid_atlantic = homelessness[(homelessness["region"] == "South Atlantic")\
    # | (homelessness["region"] == "Mid-Atlantic")]

## Chapter 2 - Introduction

In [None]:
### Chapter 2 - New Columns
# df["meters"] = df["cm"] * 1000

In [None]:
### Chapter 2 - Summary Statistics

# Mean, Median, Mode, Min, Max, Var, Std, Sum
# df["columns"].mean()
# df["date"].min()

# df["col"].agg(pct30)
# df[[list_of_cols]].agg(pct30)
# df["cols"].agg([list_of_fn])

# cumsum, cummax, cummin, cumprod
# df["cols"].cumsum()


# Function for agg
# def pct30(column):
#     return column.quantile(0.30)

In [None]:
### Chapter 2 - Counting
# df = None
# df.drop_duplicates(subset="name") # Dropping rows with duplicate data
# df.drop_duplicates(subset=["name","breed"]) # Dropping duplicates pairs
# df["cols"].value_counts()
# df["cols"].value_counts(sort=True)
# df["cols"].value_counts(normalize=True)   # Percentage

In [None]:
### Chapter 2 - Grouped Summary Statistics

# Groupby Color Cols, then get the weight_kg cols, start mean operations
# df.groupby("color")["weight_kg"].mean()
# df.groupby("color")["weight_kg"].agg([min, max, sum])
# df.groupby(["color", "breed"])[["weight_kg", "height_cm"]].mean()

In [None]:
### Chapter 2 - Pivot Tables

# import numpy as np

# Same as above example
# df.pivot_table(values="weight_kg", index="color")
# df.pivot_table(values="weight_kg", index="color", aggfunc=[np.mean,np.median])

# df.pivot_table(values="weight_kg", index="color", columns="breed", 
#               fill_value=0, margins=True)

## Chapter 3

In [None]:
# Setting the col as the index
# df.set_index("name")
# df.reset_index(drop=True)

# Subsetting
# df[df["name"].isin(["Bella","Stella"])]
# vs
# df.loc[["Bella", "Stella"]]

# Multi-level indexes aka hierarchical indexes
# df.set_index(["breed", "color"])

# Subsetting the outer level
# df.loc(["Labrador", "Chihuahua"])

# Subsetting the inner level
# df.loc[[("Labrador", "Brown"), ("Chihuahua", "Tan")]]

In [None]:
# Controlling Sorting
# df.sort_index(level=["color", "breed"], ascending=[True, False])

# Subsetting and slicing with loc and iloc
# Slicing
# df[2:5]
# df [:3]     # Get the list until the passed index
# df [:]      # Get the whole list

# # Slicing through the index names (outer level)
# df.loc["Chow Chow": "Poodle"]

# # Slicing through the index names (inner level)
# df.loc[("Labrador", "Brown"):("Something": "Something")]

# # Slicing through the index names (inner level) - with cols
# df.loc[("Labrador", "Brown"):("Something": "Something"),
#        "name":"height"]

In [None]:
# Pivot Table
df = None
sample = df.pivot_table("height_cm",     # Main datas within the dataframe - (within the center)
                        index="breed", 
                        columns="color")

sample.loc["Chow chow", "Poodle"]
sample.loc[("Sample","Sample"):("Sample","Sample")]

sample.mean(axis="index")   # Across the Breeds
sample.mean(axis="columns") # Across the Colors

## Chapter 4

In [None]:
import matplotlib.pyplot as plt

df = None
df["height_cm"].hist()
plt.show()

In [None]:
# Get the average weight by breed
avg_weight_by_breed = df.groupby("breed")["weight_kg"].mean()