##### _Data Visualization with Python_
---

## 2. Types of Plots

1. **Line Plot (Line Chart)**
    * Displays data as a series of points connected by straight lines. The independent variable is usually on x-axis and the dependent variable is on y-axis.
    * **Uses:**
        * **Showing trends over time:** Ideal for time series data (e.g., stock prices, temperature changes, population growth).
        * **Comparing datasets with a continuous independent variable:** For example, showing how a response variable changes with age.
        * **Illustrating cause-and-effect relationships:** Example, how sales revenue changes with different marketing budgets.
        * **Visualizing continuous data:** Such as changes in height measurement over time.
    * **Potential Pitfalls:**
        * **Misleading scales:** If the axes' scales aren't chosen carefully, the plot can exaggerate or minimize trends. _Always consider starting the y-axis at zer unless there's a strong reason not to_.
    * **Example:** A line plot showing a company's revenue over the past 10 years. You'd easily see if revenue is generally increasing, decreasing, or fluctuating.
2. **Bar Plot (Bar Chart)**
    * Uses rectangular bars to represent the data. The height (for vertical bars) or length (for horizontal bars) of each bar corresponds to the magnitude of the data.
    * **Types:**
        * **Vertical bar chart (column chart):** Bars are oriented vertically.
        * **Horizontal bar chart:** Bars are oriented horizontally.
    * **Uses:**
        * **Comparing categories or groups:** Excellent for comparing discrete data (e.g., sales by region, population by country, number of students in different majors).
        * **Showing contributions to a whole:** Like a pie chart, but often easier to compare individual categories. 
        * **Visualizing rankings:** Displaying the top-selling products, most popular movies, etc.
    * **Potential Pitfalls:**
        * **Inaccurate bar choices/scales:** Similar to line plots, starting the y-axis at a value other than zero can distort the visual comparison.
    * **Example:** A bar chart comparing the sales of different product categories. You could easily see which category generates the most revenue.
3. **Scatter Plot**
    * Displays values for two variables as a collection of points. Each point's position is determined by its values on the horizontal (x) and vertical (y) axes.
    * **Uses:**
        * **Examining relationships between two continuous variables:** Is there a correlation? Is it positive, negative, or no correlation? (e.g., height vs. weight, advertising spending vs. sales).
        * **Investigating patterns and trends:** Looking for clusters, linear relationships, or other patterns.
        * **Detecting outliers:** Identifying unusual data points that deviate significantly from the overall pattern.
        * **Visualizing with many observations:** Identify clusters.
    * **Example:** A scatter plot of study hours vs. exam scores. You might see a positive correlation, but there might also be outliers (students who studied a lot but did poorly, or vice versa).
4. **Box Plot (Box and Whisker Plot)**
    * Displays the distribution of a dataset, showing key statistical measures.
        * **Box:** Represents the interquartile range (IQR), which contains the middle 50% of the data.
        * **Line inside the box:** Represents the median (the middle value).
        * **Whiskers:** Extend from the box to show the range of the data (excluding outliers).
        * **Outliers:** Often plotted as individual points beyond the whiskers.
    * **Uses:**
        * **Comparing distributions across categories:** Comparing the salaries of employees in different departments, the test scores of students in different schools, etc.
        * **Examining the spread and skewness:** How spread out is the data? Is it symmetrical or skewed to one side?
        * **Visualizing quartiles and outliers:** Quickly see the median, IQR, and potential outliers.
        * **Comparing multiple variables:** Distributions of multiple variables can be compared side by side.
    * **Potential Pitfalls:**
        * **Outliers:** As with scatter plots, outliers can significantly impact the interpretation. You need to consider them carefully.
    * **Example:** Comparing exam scores between different classes using box plot.
5. **Histogram**
    * Shows the distribution of a _single_ numerical variable. It divides the data into intervals (bins) and displays the frequency (count) or relative frequency (proportion) of data points within each bin using bars.
    * **Uses:**
        * **Understanding data distribution:** Is the data symmetrical, skewed, bimodal (having two peaks), or uniform?
        * **Assessing skewness:** Is the data skewed to the left (long tail on the left) or right (long tail on the right)?
        * **Identifying outliers (potentially):** Very sparsely populated bins at the extremes might indicate outliers.
        * **Showcasing data variability:** Observe concentrations, gaps, and clusters.
    * **Potential Pitfalls:**
        * **Bin choice:** The number and width of bins can significantly affect the appearance of the histogram. Too few bins can oversimplify the distribution; too many can make it look noisy. There's no single _right_ number of bins; it depends on the data.
        * **Scale and labeling:** Ensure clear labels for the axes and appropriate scales.
    * **Example:** A histogram of the ages of customers in a database. You could see if the customer base is mostly young, old, or evenly distributed.