In [None]:
import pandas as pd
import numpy as np

## Initial exploration

- `df.head()`
- `df.info()`
- `df.value_counts("colA")`
- `df.describe()`

In [None]:
# Returns the first 5 rows of the DataFrame
df.head()

# Provides a summary of the DataFrame including the number of non-null entries in each column
df.info()

# Returns the count of unique values in the specified column "colA"
df["colA"].value_counts()

# Provides statistical summary of the DataFrame like mean, median, standard deviation etc.
df.describe()

## Data validation 

- Detecting data types: `df.dtypes`, `df.info()`
- Validating ranges : `df["colA"].min()`,  `df["colA"].max()`
- Updating data types: `df["year"] = df["year"].astype(int)`
- Validating categorical data: `df["genre"].isin("Fiction","Non Fiction")`


In [None]:
# Returns the data types of each column in the DataFrame
df.dtypes

# Returns the smallest value in the column "colA"
df["colA"].min()

# Returns the largest value in the column "colA"
df["colA"].max()

# Changes the data type of the "year" column to int
df["year"] = df["year"].astype(int)

# Checks if every value in the "genre" column is either "Fiction" or "Non Fiction"
df["genre"].isin(["Fiction", "Non Fiction"]).all()

Data Summarization
Pandas provides several methods to summarize our data.

- `.groupby()`
- `.sum()`
- `.count()`
- `.min()`
- `.max()`
- `.var()`
- `.std()`

In [None]:
# Sums up the values in each column grouped by "genre"
df.groupby("genre").sum()

# Counts the number of values in each column grouped by "genre"
df.groupby("genre").count()

# Returns the smallest value in each column grouped by "genre"
df.groupby("genre").min()

# Returns the largest value in each column grouped by "genre"
df.groupby("genre").max()

# Returns the variance in each column grouped by "genre"
df.groupby("genre").var()

# Returns the standard deviation in each column grouped by "genre"
df.groupby("genre").std()

## Aggregating Ungrouped Data
- `agg()` function is used to apply one or more aggregating functions across one or more DataFrame columns.

In [None]:
df.agg(["sum", "min", "max", "mean", "std"])

## Aggregating Grouped Data

We can use agg() to apply multiple different functions to the grouped data.

In [None]:
df.groupby("genre").agg(mean_rating=("rating", "mean"),
                        std_rating=("rating", "std"),
                        median_year=("year", "median"))