##### Data Visualization with Python
---

# 2. Types of Plots

## 2.1. Line Plot (Line Chart)
A line plot, also known as a line chart, displays data as a series of points connected by straight line segments. It's a fundamental chart type used to visualize trends and changes in data over a continuous interval, most commonly *time*. The independent variable (often time) is plotted on the x-axis, and the dependent variable is plotted on the y-axis.

### Suitable Variable Types
* **X-axis (Independent Variable):** Usually a continuous variable, most often representing time (e.g., years, months, days) or another ordered quantity. Can be ordinal or interval/ratio level data. While less common, the x-axis *could* represent categories if those categories have a clear, inherent order (e.g., stages of a process), but a bar chart is often better in those cases.
* **Y-axis (Dependent Variable):** Typically a numerical variable (interval or ratio level data). The y-axis shows the value of the variable that is changing in response to the independent variable.

### Use Cases
1. **Showing Trends Over Time (Time Series Data):** This is the most common use case. Examples include:
    * Stock prices over days, months, or years.
    * Temperature fluctuations over a period.
    * Population growth over time.
    * Company revenue or profit over quarters or years.
    * Website traffic over days or weeks.
2. **Visualizing Continuous Changes:** Line plots are effective for showing how a variable changes continuously in response to another, even if the independent variable isn't strictly time. Examples include:
    * The relationship between speed and fuel efficiency of a car.
    * The change in a chemical reaction rate with increasing temperature.
    * The growth of a plant in relation to the amount of sunlight it receives.
3. **Comparing Multiple Series:** Line plots can display multiple lines on the same graph, making it easy to compare trends across different groups or categories.  For example:
    * Comparing the stock prices of several different companies.
    * Tracking the sales of different product lines over time.
    * Comparing the growth rates of different populations.
4. **Highlighting Patterns:** Can show patterns, fluctuations, increases, decreases, and rates of change.

### Potential Pitfalls
1. **Misleading Scales:**  The choice of scale on the y-axis can dramatically alter the perception of the trend.  A truncated y-axis (not starting at zero) can exaggerate changes, while an overly wide y-axis range can minimize them.  It's crucial to choose scales thoughtfully and ethically.  Always consider starting at zero unless there's a very strong and justifiable reason not to.
2. **Overplotting (Too Many Lines):** If you plot too many lines on the same graph, it can become cluttered and difficult to interpret.  Consider using separate plots, small multiples, or interactive features (like tooltips or toggles) to handle many series.
3. **Interpolation Issues:** The straight lines connecting data points *imply* a continuous trend, even if the underlying data is only collected at discrete intervals.  Be cautious about interpreting values *between* the plotted points, especially if the data is sparse or the underlying phenomenon isn't truly continuous.
4. **Ignoring Irregular Intervals:** If the data points are *not* evenly spaced along the x-axis (e.g., unevenly spaced time intervals), a standard line plot can be misleading.  It will visually distort the rate of change.  In such cases, consider a scatter plot with connected points, or explicitly indicate the uneven intervals.
5. **Extrapolation:** Extending a trend line beyond the observed time frame.
6. **Causation vs Correlation:** Easy to assume causation.

### Example (Conceptual)
Imagine plotting the monthly average temperature in a city over several years. The x-axis would represent time (months and years), and the y-axis would represent temperature. The line plot would clearly show the seasonal temperature cycle, any long-term warming or cooling trends, and potentially any unusual temperature spikes or dips.

## 2.2. Bar Chart (Bar Graph)

A bar chart, also known as a bar graph, uses rectangular bars to represent the value of a categorical variable. The length (or height) of each bar is proportional to the value it represents. Bar charts are excellent for comparing values across different categories or groups.

### Suitable Variable Types
* **X-axis (typically):** Represents the *categories* being compared. This is usually a categorical (nominal or ordinal) variable.
* **Y-axis (typically):** Represents the *numerical value* associated with each category. This is usually a numerical variable (interval or ratio).
* **Note:** It's also possible to have the axes swapped (horizontal bar chart), in which case the variable types would also be swapped.

### Use Cases
1. **Comparing Values Across Categories:** This is the primary use case. Examples include:
    * Comparing sales figures for different product lines.
    * Showing the population of different countries.
    * Displaying the number of students enrolled in different courses.
    * Comparing average income across different professions.
2. **Displaying Frequencies or Counts:** Showing how many items fall into each category.  For example:
    * The number of respondents who selected each answer choice in a survey.
    * The frequency of different types of errors in a system.
3. **Tracking Changes Over Time (Limited Time Points):** While line graphs are generally preferred for time series data, bar charts can be used effectively if you have only a *few* distinct time points. For example, comparing sales figures for Q1, Q2, Q3, and Q4 of a single year. *Avoid* using bar charts for many time points; use a line graph instead.
4. **Ranking:** Displaying ranked values (e.g., top 10 products by sales).

### Potential Pitfalls
1. **Misleading Scales:** Similar to line plots, the y-axis scale can significantly impact the visual impression. A truncated y-axis (not starting at zero) can exaggerate differences between bars. Always consider starting the y-axis at zero, especially when representing counts or frequencies.  If you *must* use a truncated axis, clearly indicate this to the viewer.
2. **Too Many Categories:** If you have a very large number of categories, a bar chart can become cluttered and difficult to read. Consider grouping categories, using a horizontal bar chart (which often handles many categories better), or using a different visualization type.
3. **Ordering of Categories:** For nominal categories (no inherent order), consider ordering the bars by value (e.g., descending order of frequency) to make comparisons easier. For ordinal categories, maintain the logical order.
4. **3D Bar Charts:** Avoid 3D bar charts. They add no extra information and often distort the data, making it harder to accurately compare bar lengths.
5. **Overlapping Bars:** Avoid overlapping bars, which can make it difficult to read the values.  Use grouped or stacked bar charts (discussed separately) instead.
6. **Comparing Groups of Unequal Size**: Be careful when making direct bar-to-bar comparisons when groups have different sample sizes.

### Example (Conceptual)
Imagine you want to compare the number of students enrolled in different academic departments (e.g., Biology, Chemistry, Physics, Mathematics).  A bar chart would be ideal. The x-axis would list the departments (categories), and the y-axis would represent the number of students (numerical value). Each department would have a bar, and the height of the bar would directly correspond to the enrollment number, making comparisons easy.