# When to use and how to use plots



#pie ,countplot ,histogram,heatmaps,distplot,pairplot

# countplot

***A countplot is a type of data visualization that is commonly used in data analysis and data science to display the frequency or count of categorical variables. It's a useful tool for understanding the distribution of categorical data and identifying patterns or trends within the data.***

***When to Use Countplot:***


Frequency Distribution: What is the distribution of categories within a dataset?

Comparing Categories: How do the frequencies of different categories compare to each other?

Identifying Dominant Categories: Are there any categories that dominate the dataset?

Detecting Outliers or Anomalies: Are there any rare or unexpected categories?

### how to use countplot: 

Step 1: Import Required Libraries

Step 2: Load Your Data

Step 3: Create the Countplot

synatx: sns.countplot(data=data, x='Major')


Step 5: Display the Plot

 ### Example:


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.DataFrame({'Major': ['Engineering', 'Biology', 'Business', 'Engineering', 'Business', 'Biology', 'Engineering', 'Business'],})
sns.countplot(data=data, x='Major', palette='Set2')
plt.xlabel('Majors')
plt.ylabel('Count')
plt.title('Distribution of Student Majors')
plt.xticks(rotation=45)
plt.show() 


### synatx of count plot: 

seaborn.countplot(
    x=None,         # Variable on the x-axis (categorical).
    
    y=None,         # Variable on the y-axis (optional).
    
    data=None,      # DataFrame or data source (e.g., a Pandas DataFrame).
    
    hue=None,       # Additional variable for grouping (categorical).
    
    order=None,     # Order to display categories.
    
    hue_order=None, # Order for grouping variable.
    
    orient=None,    # 'v' (vertical) or 'h' (horizontal) orientation.
    
    palette=None,   # Color palette to use.
    
    saturation=0.75, # Intensity of the palette colors.
    
    ax=None,        # Matplotlib axis object for plotting.
    
    **kwargs       # Additional keyword arguments for customizing the plot.
    
)


* x: This parameter specifies the categorical variable that you want to display on the x-axis of the countplot. It represents the categories you want to count.

* y: If you provide this parameter, it specifies the variable for the y-axis. However, using only x is more common, and y is typically left as None. If you use y, the countplot will have a horizontal orientation.

* data: This parameter is where you provide the data source, typically in the form of a Pandas DataFrame or any other data structure that Seaborn can work with. It's the dataset that contains the categorical variable you're interested in.

* hue: Use this parameter to introduce an additional categorical variable that will be used to group and color the bars. For example, you might want to show the count of categories for different groups within your data.

* order and hue_order: These parameters allow you to specify the order in which the categories or groups should be displayed. You can use lists of category names to control the order.

* orient: This parameter specifies the orientation of the plot. Use 'v' for a vertical (default) orientation, or 'h' for a horizontal orientation.

* palette: You can set the color palette for the plot. Seaborn provides various built-in color palettes, or you can create your own custom palette.

* saturation: This parameter controls the intensity of the colors in the chosen palette.

* ax: If you want to create the countplot on a specific Matplotlib axis object, you can specify it with this parameter. If ax is not provided, a new axis will be created.

* **kwargs: You can include additional keyword arguments to customize the appearance of the plot, like title, labels, etc.

# Pie chart

A pie chart is a type of data visualization that's commonly used to represent the proportions or percentages of different categories within a dataset. It is a circular chart that's divided into slices or wedges, where each slice represents a category, and the size of each slice corresponds to the proportion or percentage of that category relative to the whole.

### When to Use a Pie Chart:


Pie charts are most useful when you want to display the parts of a whole and show how each part contributes to the whole. Here are some situations where pie charts can be beneficial:

* Showing a Composition: Pie charts are great for illustrating the composition of a single variable, such as the distribution of expenses in a budget, market share of different products, or the breakdown of student majors in a college.

* Comparing Proportions: You can use a pie chart to compare the relative sizes of different categories or components. It's easier to grasp proportions when they are represented visually.

* Highlighting a Dominant Category: If one category dominates the dataset, a pie chart can make it visually obvious.

### How to Use a Pie Chart 

* We first import the necessary libraries, which include Pandas and Matplotlib for data manipulation and visualization.

* We create a dictionary with sample data. In this case, it's the number of students in different majors.

* We create a Pandas DataFrame from the data.

* Use Matplotlib to create the pie chart. Specify the explode parameter to emphasize a particular slice (if needed) and use colors to make the chart more visually appealing.
    
    explode = (0.1, 0, 0, 0)  )

    colors = ['gold', 'lightcoral', 'lightskyblue', 'lightgreen']

syntax: plt.pie(expenses, labels=categories, explode=explode, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)

1. explode makes the "Salaries" slice pop out a bit (optional).
2. colors specifies the colors for each slice (optional).
3. autopct adds percentage labels on the chart.
4. shadow adds a shadow effect to the chart.
5. startangle sets the initial angle of the chart.

* We add a title to the chart using plt.title.

* Finally, we display the chart with plt.show()

### Example

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Major': ['Engineering', 'Computer Science', 'Biology', 'Business', 'Psychology'],'Number of Students': [150, 120, 80, 60, 40]}
df = pd.DataFrame(data)
plt.figure(figsize=(8, 8))
plt.pie(df['Number of Students'], labels=df['Major'], autopct='%1.1f%%', startangle=140)
plt.axis('equal') .
plt.title('Distribution of Students by Major')
plt.show()


### syntax for matplotlib:
matplotlib.pyplot.pie(
    x,                   # The values to be plotted as slices.
    
    labels=None,         # Labels for each slice (category names).
    
    autopct=None,        # Format for displaying percentages.
    
    colors=None,         # List of colors for the slices.
    
    explode=None,        # Explode a slice (separate it from the pie).
    
    startangle=0,        # The angle at which the first slice starts.
    
    shadow=False,        # Add a shadow effect to the pie chart.
    
    counterclock=True,   # Direction of rotation (counterclockwise by default).
    
    wedgeprops=None,     # Properties for the wedges (e.g., edge color).
    
    textprops=None,      # Properties for text labels (e.g., font size).
    
    center=(0, 0),       # Center of the pie chart (default is the center).
    
    radius=1.0,          # Radius of the pie chart (default is 1).
    
    frame=False,         # Add a frame around the pie chart.
    
    labeldistance=None,  # Distance of labels from the center.
    
    pctdistance=0.6,     # Distance of percentage labels from the center.
    
    title=None,          # Title for the pie chart.
    
    normalize=False      # If True, x will be normalized to sum to 1.
)


* x: This parameter represents the values to be plotted as slices in the pie chart. These values should typically be numerical and represent the proportions or sizes of the different categories you want to visualize.

* labels: You can provide a list of labels as strings to assign names to each slice in the pie chart. These labels appear as category names next to their respective slices.

* autopct: Use this parameter to format how percentages are displayed on each slice. You can specify a format string (e.g., '%1.1f%%') to control the appearance of the percentages.

* colors: If you want to specify custom colors for each slice, you can provide a list of colors. By default, Matplotlib will use a set of predefined colors.

* explode: To emphasize a specific slice, you can use the explode parameter by providing a list of values where one or more entries are greater than 0. This separates the corresponding slices from the rest of the pie.

* startangle: This parameter determines the angle at which the first slice starts. By default, it's set to 0 degrees, but you can change it to control the rotation of the pie chart.

* shadow: If set to True, it adds a shadow effect to the pie chart, giving it a 3D appearance.

* counterclock: By default, the pie chart rotates counterclockwise. You can set this parameter to False to make it rotate clockwise.

* wedgeprops: You can use this parameter to specify properties for the wedges, such as the edge color or other properties related to the individual slices.

* textprops: To customize the text labels, such as font size or style, you can provide properties using this parameter.

* center: This parameter defines the center of the pie chart. By default, it's at coordinates (0, 0), but you can change it if needed.

* radius: The radius parameter sets the radius of the pie chart. The default value is 1, which makes it a unit circle.

* frame: If set to True, it adds a frame around the pie chart.

* labeldistance: You can specify the distance of labels from the center of the pie chart. A larger value moves the labels away from the center.

* pctdistance: Use this parameter to set the distance of percentage labels from the center of the pie chart. It's a fraction of the radius.

* title: You can provide a title for your pie chart using this parameter.

* normalize: If set to True, it normalizes the values of x to sum to 1, effectively converting the pie chart into a "proportional pie chart" where the values represent proportions instead of absolute counts.

# Histogram

A histogram is a type of data visualization that's used to understand the distribution of a continuous variable. It's a way of summarizing data into different "buckets" or "bins" to show how frequently values fall into each bin. Histograms are useful for visualizing the shape of a dataset and identifying patterns or characteristics of a variable. 

### When to Use a Histogram:


* Explore Data Distribution: Understand how your data is spread out. Are the values clustered around a particular range, or are they spread out evenly?

* Identify Patterns: Detect patterns such as peaks, valleys, and modes in your data. For example, you can see if a test score dataset has more students scoring in one range than another.

* Check for Symmetry or Skewness: Determine if your data is symmetric (evenly distributed) or skewed (lopsided).

* Find Outliers: Identify data points that are significantly different from the rest, which might be errors or indicate unique behavior.

### How to Use a Histogram 

* Import Matplotlib
* Prepare Your Data
* Create the Histogram: Use the plt.hist() function to create the histogram. You pass your data as the first argument and can customize the number of bins, color, and other properties as needed.
syntax: plt.hist(exam_scores, bins=5, color='blue', edgecolor='black')

   1.exam_scores is your data.

  2.bins specifies the number of bins (intervals) you want to divide your data into.

  3.color sets the color of the bars.
    
   4.edgecolor sets the color of the borders of the bars.

* Add Labels and Title : 

 plt.xlabel('Exam Scores')

 plt.ylabel('Number of Students')

 plt.title('Distribution of Exam Scores')

* Display the Histogram: plt.show()



### syntax of histogram

matplotlib.pyplot.hist(
    x,                  # The dataset you want to create a histogram for (list, array, or sequence of data).
    
    bins=None,         # The number of bins or the specific bin edges (int, sequence, or array).
    
    range=None,        # The range of values to consider when constructing bins (tuple).
    
    density=False,     # If True, the histogram represents a probability density function (PDF).
    
    cumulative=False,  # If True, the histogram represents a cumulative distribution function (CDF).
    
    bottom=None,       # The location of the bottom of the bars.
    
    histtype='bar',    # Type of histogram ('bar', 'barstacked', 'step', 'stepfilled').
    
    align='mid',       # Alignment of bins and bars ('left', 'mid', 'right').
    
    orientation='vertical',  # Orientation of the histogram ('vertical' or 'horizontal').
    
    rwidth=None,       # Width of the bars as a fraction of the bin width.
    
    log=False,         # If True, the y-axis will be in logarithmic scale.
    
    color=None,        # Color or list of colors for the bars.
    
    label=None,        # Label for the legend.
    
    stacked=False,     # If True, the bars are stacked on top of each other.
    
    **kwargs           # Additional keyword arguments for customizing the plot.
)


* x: This parameter represents the dataset for which you want to create a histogram. It can be a list, array, or sequence of data values.

* bins: You can specify the number of bins (int) you want to divide your data into. Alternatively, you can provide an array or sequence that defines the bin edges, allowing for custom bin widths and intervals.

* range: The range parameter specifies the lower and upper bounds of the bins as a tuple. It determines the range of values considered when constructing bins. Values outside this range are ignored.

* density: If set to True, the histogram represents a probability density function (PDF) where the area under the histogram equals 1.

* cumulative: If set to True, the histogram represents a cumulative distribution function (CDF). It shows the cumulative sum of data points.

* bottom: This parameter allows you to specify the location of the bottom of the bars. It is useful when creating stacked histograms.

* histtype: You can specify the type of histogram to create. Options include 'bar' (default), 'barstacked', 'step', and 'stepfilled'.

* align: Specifies how the bins and bars are aligned. Options include 'left', 'mid' (default), and 'right'.

*orientation: Determines the orientation of the histogram. Set it to 'vertical' for a vertical histogram (default) or 'horizontal' for a horizontal histogram.

* rwidth: If specified, it sets the width of the bars as a fraction of the bin width.

* log: If set to True, the y-axis of the histogram will be displayed in logarithmic scale.

* color: You can specify the color or a list of colors for the bars.

* label: Use this parameter to provide a label for the histogram, which will be displayed in the legend if you create one.

* stacked: If set to True, it stacks the bars on top of each other, useful for comparing the distribution of multiple datasets.

* **kwargs: Additional keyword arguments that you can use to customize the appearance of the plot. For example, you can set labels for the x and y axes, title, and more.

### Example

import matplotlib.pyplot as plt
data = [35, 42, 38, 45, 50, 55, 63, 70, 78, 90, 100]
plt.hist(data, bins=5, color='skyblue', edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram of Sample Data')
plt.show()

# Heatmaps


A heatmap is a popular data visualization tool used to represent data in a tabular format by using colors to encode the values of the individual cells. Heatmaps are particularly useful when you want to visualize relationships and patterns in a dataset, especially when working with two-dimensional data like a correlation matrix or a grid of values.

### When to Use a Heatmap:



* Visualizing Relationships: Heatmaps are used when you want to visualize the relationships and patterns in a dataset, especially when dealing with two-dimensional data. They are commonly used to explore correlations, dependencies, and interactions between variables.

* Correlation Analysis: Heatmaps are an effective way to visualize the correlation matrix of a dataset. They help in identifying which variables are positively, negatively, or weakly correlated.

* Data Comparison: Heatmaps allow you to compare data across multiple categories or dimensions. For example, you can use them to visualize sales data by product category and time period.

* Highlighting Differences: Heatmaps can highlight differences, outliers, and anomalies in the data. They make it easy to spot values that stand out in a grid of numbers.

### How to Create a Heatmap:



* import the necessary libraries
* load a sample dataset
* create a heatmap using sns.heatmap. We provide the correlation matrix as the data, set annot to True to display the values in each cell, and specify a color map (cmap) for the visualization.
* We can customize the title of the heatmap using plt.title
* The heatmap is displayed using plt.show().



### Example

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data (correlation matrix)
data = sns.load_dataset("iris")
corr_matrix = data.corr()

# Create a heatmap
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")

# Customize the title
plt.title("Correlation Heatmap")

# Show the heatmap
plt.show()


### syntax

seaborn.heatmap(
    
    data,                   # The 2D data to be visualized.

    annot=None,             # If True, display the data values in each cell.
    
    fmt='.2g',              # Format specification for annot values.
    
    cmap='viridis',         # Colormap for the heatmap.
    
    center=None,            # The value in data that corresponds to the center of the colormap.
    
    linewidths=0,           # Width of the lines separating cells.
    
    linecolor='white',      # Color of the lines separating cells.
    
    cbar=True,              # Whether to display a colorbar.
    
    cbar_kws=None,          # Keyword arguments for configuring the colorbar.
    
    cbar_ax=None,           # Matplotlib Axes object to draw the colorbar.
    
    square=False,           # Force the aspect ratio of the plot to be square.
    
    xticklabels='auto',     # Whether to display x-axis tick labels.
    
    yticklabels='auto',     # Whether to display y-axis tick labels.
    
    mask=None,              # A mask for cells to hide.
    
    ax=None                 # Matplotlib Axes to draw the heatmap.
)


* data: This is the 2D dataset you want to visualize as a heatmap. It's typically a DataFrame, an array, or any data structure that can be represented as a grid.


* annot: If set to True, it displays the actual data values in each cell of the heatmap. You can also provide a 2D array of the same shape as data to specify the values manually.
    

* fmt: This parameter specifies the format of the annotation values. It uses Python's string formatting syntax. For example, '.2g' displays values with two significant figures.


* cmap: The colormap defines the color palette used for the heatmap. You can choose from various predefined colormaps, such as 'viridis,' 'coolwarm,' or 'Blues,' or create custom colormaps.
    

* center: If you want to set a specific value in your data as the midpoint of the colormap, you can use this parameter.
    

* linewidths: It controls the width of the lines separating the cells in the heatmap. Setting it to a non-zero value adds lines to separate cells visually.
    

* linecolor: You can specify the color of the lines that separate the cells in the heatmap.

* cbar: If True, a colorbar is displayed next to the heatmap, showing the color-to-value mapping. You can set it to False if you don't want a colorbar.


* cbar_kws: This parameter allows you to pass keyword arguments to configure the colorbar, such as changing the label or adjusting its position.


* cbar_ax: You can specify a Matplotlib Axes object where the colorbar will be drawn.
    

* square: Setting this parameter to True forces the aspect ratio of the heatmap to be square.
    

* xticklabels and yticklabels: You can control the visibility of the x-axis and y-axis tick labels. Set to 'auto' to automatically determine if they should be displayed or use a list of labels to customize them.


* mask: If you want to hide specific cells in the heatmap, you can provide a Boolean mask of the same shape as your data.


* ax: This is the Matplotlib Axes object where the heatmap will be drawn. If not provided, a new Axes object is created.

# distplot 

A distplot, short for "distribution plot," is a data visualization tool used to explore the distribution of a single variable in a dataset. It combines a histogram with a kernel density estimate (KDE) and often includes additional elements like rug plots. It's a helpful way to understand the shape and characteristics of the data

### When to Use a Distplot:



* Exploring Data Distribution: Use a distplot when you want to understand how the values of a single variable are distributed. This is particularly useful for continuous variables like age, income, or test scores.

* Identifying Patterns: A distplot can help you identify patterns in the data, such as whether it's symmetric (evenly distributed), skewed (lopsided), or has multiple peaks (modes).

* Data Quality Check: It's a valuable tool in data exploration to check for outliers or unusual data points that may need further investigation.

* Choosing Analysis Techniques: When you're preparing for statistical analysis, a distplot can provide insight into the data's characteristics and help you decide which analysis techniques are appropriate.

### How to Use a Distplot:



* import the necessary libraries, Seaborn and Matplotlib.
*  provide a sample dataset in the data list
* create a distplot using sns.distplot. We specify the data, the number of bins, and customize its appearance, such as setting the color, displaying a histogram, KDE, and rug plot.
*  customize labels for the x and y axes and add a title for clarity.
* display the distplot using plt.show()

 It combines a histogram, a KDE curve, and rug plots to help you understand the data's distribution and characteristics. You can easily spot patterns, identify outliers, and assess the quality of the dataset using this visualization.

### Syntax for Distplot: 

seaborn.distplot(
    a,                  # The dataset or array of data to be visualized.
    
    bins=None,          # Number of bins for the histogram or a specific bin specification.
    
    hist=True,          # Whether to display a histogram (True by default).
    
    kde=True,           # Whether to display a kernel density estimate (KDE) curve (True by default).
    
    rug=False,          # Whether to display a rug plot along the x-axis (False by default).
    
    fit=None,           # A parametric distribution function to fit to the data (None by default).
    
    hist_kws=None,      # Additional keyword arguments for customizing the histogram.
    
    kde_kws=None,       # Additional keyword arguments for customizing the KDE curve.
    
    rug_kws=None,       # Additional keyword arguments for customizing the rug plot.
    
    fit_kws=None,       # Additional keyword arguments for customizing the fitted distribution curve.
    
    color=None,         # Color of the plot.
    
    vertical=False,     # If True, the plot is oriented vertically (False by default).
    
    axlabel=None,       # Label for the x-axis.
    
    label=None,         # Label for the plot.
    
    ax=None             # Matplotlib Axes object to draw the plot.
)


let's break down each parameter in the syntax:

* a: This is the dataset or array of data you want to visualize in the distplot. It is typically a one-dimensional sequence or array, like a list, NumPy array, or Pandas Series.

* bins: You can specify the number of bins for the histogram. Alternatively, you can provide a specific bin specification, such as a list of bin edges or an integer.

* hist: If set to True (which is the default), a histogram is displayed. You can set it to False to omit the histogram.

* kde: If set to True (default), a kernel density estimate (KDE) curve is displayed. Set it to False to exclude the KDE.

* rug: You can set this parameter to True to display a rug plot along the x-axis, which marks the data points. It's False by default.

* fit: This parameter allows you to specify a parametric distribution function to fit to the data. By default, it's set to None, meaning no distribution is fitted.

* hist_kws, kde_kws, rug_kws, fit_kws: These are dictionaries of additional keyword arguments for customizing the appearance of the histogram, KDE curve, rug plot, and fitted distribution curve, respectively.

* color: You can specify the color of the plot. It accepts a variety of color specifications, such as names, RGB tuples, or hex codes.

* vertical: If set to True, the plot is oriented vertically; otherwise, it's horizontal (default is False).

* axlabel: This is a label for the x-axis. It's used to provide a meaningful description for the variable you're visualizing.

* label: You can set a label for the plot, which is useful when creating legends for multiple plots in the same figure.

* ax: If you have an existing Matplotlib Axes object, you can pass it to this parameter to draw the distplot on that Axes.

### Example

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
data = [85, 92, 78, 90, 88, 76, 94, 89, 82, 93, 87, 91, 70, 83, 96]

# Create a distplot
sns.distplot(data, bins=5, hist=True, kde=True, rug=True, color='skyblue')

# Customize labels and title
plt.xlabel('Test Scores')
plt.ylabel('Density')
plt.title('Distribution Plot of Test Scores')

# Show the distplot
plt.show()
