### Introduction


In [None]:
import pandas as pd

In [None]:
movies = pd.read_csv("../learn_pandas/csv/movies.csv")
movies.head(5)

### Display summary statistics for all columns


In [None]:
movies.describe(include="all")

### Central Tendency for Quantitative Data

There are several common measures of central tendency:

- Mean: The average value of the variable, calculated as the sum of all values divided by the number of values.
- Median: The middle value of the variable when sorted.
- Mode: The most frequent value of the variable.
- Trimmed mean: The mean excluding x percent of the lowest and highest data points.


In [None]:
mean_budget = movies.production_budget.mean()
mean_budget

In [None]:
med_budget = movies.production_budget.median()
med_budget

In [None]:
mode_budget = movies.production_budget.mode()
mode_budget

In [None]:
from scipy.stats import trim_mean

trimmed_mean = trim_mean(movies.production_budget, proportiontocut=0.2)
trimmed_mean

### Spread for Quantitative Data

The spread of a quantitative variable describes the amount of variability.

- Range: The difference between the maximum and minimum values of a variable.
- Interquartile range (IQR): The difference between the 75th and 25th percentile values.
- Variance: The average of the squared distance from each data point to the mean.
- Standard deviation (SD): The square root of the variance.
- Mean absolute deviation (MAD): The mean absolute value of the distance between each data point and the mean.


In [None]:
# Save the range to range_budget
range_budget = movies.production_budget.max() - movies.production_budget.min()
range_budget

In [None]:
# Save the interquartile range to iqr_budget
from scipy.stats import iqr

iqr_budget = iqr(movies.production_budget)
iqr_budget

In [None]:
# Save the variance to var_budget

var_budget = movies.production_budget.var()
var_budget

In [None]:
# Save the standard deviation to std_budget

std_budget = movies.production_budget.std()
std_budget

### Visualizing Quantitative Variables

For quantitative variables, boxplots and histograms are two common visualizations. These plots are useful because they simultaneously communicate information about minimum and maximum values, central location, and spread. Histograms can additionally illuminate patterns that can impact an analysis (e.g., skew or multimodality).


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
sns.boxplot(x="production_budget", data=movies)
plt.show()
plt.close()

In [None]:
sns.histplot(x="production_budget", data=movies)
plt.show()
plt.close()

### Value Counts for Categorical Data

The pandas library offers the .value_counts() method for generating the counts of all values in a DataFrame column:


In [None]:
# Save the counts to genre_counts

genre_counts = movies.genre.value_counts()
genre_counts

### Value Proportions for Categorical Data

We can calculate the proportion for each category by dividing its count by the total number of values for that variable:


In [None]:
# Save the proportions to genre_props
genre_props = movies.genre.value_counts() / len(movies.genre)
genre_props

### Visualizing Categorical Variables

Python’s seaborn library offers several functions that can create bar charts. The simplest for plotting the counts is countplot():


In [None]:
# Create a bar chart for movie genre
sns.countplot(x="genre", data=movies)
plt.show()
plt.close()

In [None]:
# Create a pie chart for movie genre

movies.genre.value_counts().plot.pie()
plt.show()
plt.close()

### Review
- We can use .describe(include='all') to quickly display common summary statistics for all columns in a pandas DataFrame.
- For quantitative variables, measures of central tendency (e.g., mean, median, mode) and spread (e.g., range, variance, standard deviation) are good ways to summarize the data. Boxplots and histograms are often used for visualization.
- For categorical variables, the relative frequencies of each category can be summarized using a table of counts or proportions. Bar charts and pie charts are often used for visualization.