<img src="../../../images/banners/seaborn.png" width="500"/>

# <img src="../../../images/logos/seaborn.png" width="23"/>  Visualizing distributions of data (Problems)

**Question:**  
How can you change the kind of a `displot()` in Seaborn?

**Answer:**  
The `kind` parameter specifies the type of plot to be created by `displot()`, which can be one of the following values:
- **hist**: creates a histogram (default)
- **kde**: creates a kernel density estimate plot
- **ecdf**: represents the proportion or count of observations falling below each unique value in a dataset.

Additionally, a `rugplot()` can be added to any kind of plot to show individual observations.

---

**Question:**  
Which axes-level plotting functions are underlying the `displot()`?

**Answer:**  
- `histplot()`
- `kdeplot()`
- `ecdfplot()`
- `rugplot()`


---

**Question:**  
What is the use of `histplot()` in Seaborn?

**Answer:**  
In Seaborn, `histplot()` is a function that creates a histogram of a single variable in a dataset.

A histogram is a graphical representation of the distribution of a dataset, showing the frequency or proportion of observations that fall within certain ranges or "bins" of the variable being measured.

---

**Question:**  
What is the use of `kdeplot()` in Seaborn?

**Answer:**  
In Seaborn, `kdeplot()` is a function that creates a kernel density estimate (KDE) plot of a single variable in a dataset.

A kernel density estimate plot is a non-parametric way to estimate the probability density function of a random variable. It provides a smoothed estimate of the distribution of the data, which can be useful for visualizing the shape of the distribution and identifying any patterns or trends in the data.


---

**Question:**  
What are the disadvantages of `kdeplot()` in Seaborn?

**Answer:**  
1. **Computationally intensive**: Calculating the kernel density estimate for a large dataset can be computationally intensive, especially if you want a smooth estimate with a narrow bandwidth. This can make KDE plots slow to generate and may require more computing resources than other types of plots.

2. **Sensitive to bandwidth choice**: The shape of the KDE plot is highly dependent on the bandwidth parameter, which controls the width of the kernels used to estimate the density. If the bandwidth is too narrow, the estimate may be oversmoothed and miss important features of the distribution. If the bandwidth is too wide, the estimate may be undersmoothed and show too much noise. Choosing an appropriate bandwidth can be challenging, especially if the data has complex or multimodal distributions.

3. **Underlying distribution is smooth and unbounded**: This means that the distribution has no discontinuities or sharp corners, and that it extends infinitely in all directions. If the underlying distribution is not smooth or unbounded, KDE may not be an appropriate method for estimating the probability density function.

---

**Question:**  
What is the use of  `ecdfplot()` in Seaborn?

**Answer:**  
The `ecdfplot()` function in Seaborn is used to visualize the empirical cumulative distribution function (ECDF) of a dataset. The ECDF is a non-parametric way to describe the distribution of a variable.
The `ecdfplot()` function in Seaborn creates a step function that plots the proportion or cumulative fraction of observations that are less than or equal to a given value on the x-axis. The y-axis shows the proportion or cumulative fraction of observations that fall in that range. 

---

**Question:**  
What is the use of `rugplot()` in Seaborn?

**Answer:**  
A `rugplot()` is a plot of data points on the x-axis or y-axis that gives a sense of the density of the data. Each data point is represented by a short vertical line or a "tick" on the axis, so the rug plot looks like a rug with the data points representing the fibers of the rug.


---

**Question:**  

**Answer:**  
Univariate, bivariate, and multivariate are terms used to describe the number of variables or dimensions being analyzed in a statistical or data analysis context.

- **Univariate analysis**: It is the analysis of one variable at a time, without considering its relationship with other variables. Common univariate techniques include measures of central tendency (e.g., mean, median, mode), measures of dispersion (e.g., range, variance, standard deviation), and graphical displays such as histograms, box plots, and density plots.

- **Bivariate analysis**: It is the analysis of the relationship between two variables. Common bivariate techniques include correlation analysis, regression analysis, and scatter plots.

- **Multivariate analysis**: It is the analysis of the relationship between multiple variables. Common multivariate plotting in Seaborn are `pairplot()` and `jointplot()`


---

**Question:**  
How can you use `displot()` function to plot **bivariate** distributions?

**Answer:**  
By specifying both the **x** and **y** variables using the x and y parameters. When both x and y parameters are specified, `displot()` creates a bivariate plot.


---

**Question:**  
What is the **binwidth** parameter used for in `displot()` ?

**Answer:**  
This parameter sets the width of the bins used in the histogram. By default, `displot()` chooses an appropriate bin width based on the data, but you can specify a custom bin width using this parameter.

---

**Question:**  
What is the **bins** parameter used for in `displot()` ?

**Answer:**  
This parameter sets the number of bins used in the histogram. If `binwidth` is also specified, `bins` will be ignored.

---

**Question:**  
What is the **discrete** parameter used for in `displot()` ?

**Answer:**  
This parameter is a boolean that indicates whether the data is discrete or continuous. If `discrete=True`, `displot()` will create a histogram with discrete bins and no kernel density estimate. If `discrete=False` (the default), `displot()` will create a histogram with continuous bins and a kernel density estimate.

---

**Question:**  
What is the **shrink** parameter used for in `displot()` ?

**Answer:**  
This parameter is a float between 0 and 1 that is used with `discrete=True` and specifies the fraction of the total bar width that should be empty space between the bars. By default, shrink=0.8, which means that 80% of the total bar width is used for the bars, and the remaining 20% is empty space between the bars.

---

**Question:**  
What is the **element** parameter used for in `displot()` ?

**Answer:**  
This parameter is used to specify the type of plot element to use in the visualization and can take one of following values:
- **"bars"**: This is the default value and creates a histogram with bars.
- **"step"**: This creates a histogram with stepped bars.
- **"poly"**: This creates a histogram with a filled curve that connects the tops of the bars.

---

**Question:**  
What is the **multiple** parameter used for in `displot()` ?

**Answer:**  
This parameter is used with `kind='hist'` to specify the approach to resolving multiple elements when semantic mapping creates subsets. Only relevant with univariate data.

This parameter can take one of the following values:
- **"layer"**
- **"dodge"**
- **"stack"**
- **"fill"**

---

**Question:**  
What is the **stat** parameter used for in `displot()` ?

**Answer:**  
This parameter is used to specify the type of statistic to compute for each bin in the histogram and can take following values:    
- **"count"**: show the number of observations in each bin
- **"frequency"**: show the number of observations divided by the bin width
- **"percent"**: normalize such that bar heights sum to 100
- **"density"**: normalize such that the total area of the histogram equals 1
- **"probability** or **"proportion"**: normalize such that bar heights sum to 1

---

**Question:**  
What is the **bw_adjust** parameter used for in `displot()` ?

**Answer:**  
This parameter is used to adjust the bandwidth of the kernel density estimate. A value greater than 1 will result in a wider bandwidth and a smoother density estimate, while a value less than 1 will result in a narrower bandwidth and a more jagged density estimate.

---

**Question:**  
What is the **fill** parameter used for in `displot()` ?

**Answer:**  
This parameter is used to fill the area between the histogram bars with color. If `fill=True`, the area between the bars will be filled, and if `fill=False` (the default), it will not be filled.

---

**Question:**  
What is the **log_scale** parameter used for in `displot()`?

**Answer:**  
This parameter controls whether the x-axis or y-axis of a `displot()` is displayed on a logarithmic scale.

Setting log_scale to True applies a logarithmic transformation to the corresponding axis, which can be useful for visualizing data that spans several orders of magnitude or that follows an exponential distribution.


---

**Question:**  
What is the **rug** parameter used for in `displot()`?

**Answer:**  
This parameter controls whether a rug plot is added to the plot.

When `rug=True` is set in `displot()`, a rug plot will be added to the plot. The rug plot can be used to show the density of data points and to identify any gaps or outliers in the data. 

---

**Question:**  
What is the use of `jointplot()` in Seaborn?

**Answer:**  
The `jointplot()` is a way of understanding the relationship between two variables and the distribution of individuals of each variable. The joint plot mainly consists of three separate plots in which, one of it was the middle figure that is used to see the relationship between x and y. So, this area will give the information about the joint distribution, while the remaining two areas will provide us with the marginal distribution for the x-axis and y-axis.


---

**Question:**  
What is the `JointGrid()` in Seaborn?

**Answer:**  
In Seaborn, `JointGrid()` is a class that allows you to create a custom joint plot with more flexibility and control than the `jointplot()` function. 

The `JointGrid()` class provides methods for customizing the plot, including `plot_joint()` and `plot_marginals()`.

For example:
```
grid = sns.JointGrid(data, x='x', y='y')
grid.plot_joint(sns.scatterplot)
grid.plot_marginals(sns.histplot, kde=True)
```

---

**Question:**  
What is the use of `pairplot()` in Seaborn?

**Answer:**  
In Seaborn, `pairplot()` is a powerful visualization tool that creates a matrix of **scatter plots** (upper and lower the matrix diameter) and **histograms** (on the matrix diameter) to visualize the pairwise relationships between multiple variables in a dataset. 

`pairplot()` can be used to quickly visualize the distribution of data, the relationships between variables, and potential patterns or trends in the data. It's particularly useful when you have many variables and want to explore how they are related to each other.

---

**Question:**  
What is the `PairGrid()` in Seaborn?

**Answer:**  
In Seaborn, `PairGrid()` is a class that allows you to create a custom pair plot with more flexibility and control than the `pairplot()` function. 

The `PairGrid()` class provides methods, including: `map_diag()`, `map_offdiag()`, `map_lower()` and `map_upper()`

---