{{< include _include_d3.qmd >}}

In [None]:
#| eval: true
#| echo: false
#| output: false

import pandas as pd
df = pd.read_csv('/home/sol-nhl/rnd/d/cca-cce/csv/iris.tsv', sep='\t')

## Frequency diagrams

The frequency diagram, or histogram, visually represents the distribution of the "sepal_length" variable from the `df` DataFrame. In the example, we used Seaborn's `histplot` function to plot the diagram with 20 bins. The kernel density estimation (KDE) curve is also overlaid on the histogram to give a smoother representation of the data distribution. The x-axis represents the range of "sepal_length" values, while the y-axis shows the frequency of occurrences for each bin.

Here's the code snippet to generate the frequency diagram:

In [None]:
#| eval: true
#| echo: true
#| output: true

import seaborn as sns
import matplotlib.pyplot as plt

# Create a frequency diagram for 'sepal_length'
sns.histplot(df['sepal_length'], bins=20, kde=True)

# Add labels and title
plt.xlabel('Sepal Length')
plt.ylabel('Frequency')
plt.title('Frequency Diagram of Sepal Length')

# Show the plot
plt.show()

This visualization allows you to quickly grasp the shape, center, and spread of the "sepal_length" data.

## Bar plots

Bar plots are useful for displaying the relationship between a categorical variable and a numerical variable. In the example, we use Seaborn's `barplot` function to visualize the average "sepal_length" for each species in the `df` DataFrame. The x-axis represents the different species, and the y-axis shows the average "sepal_length" for each. 

Here's the code snippet to generate the bar plot:

In [None]:
#| eval: true
#| echo: true
#| output: true

import seaborn as sns
import matplotlib.pyplot as plt

# Create a barplot for the 'species' column showing the average 'sepal_length'
sns.barplot(x='species', y='sepal_length', data=df, ci=None)

# Add labels and title
plt.xlabel('Species')
plt.ylabel('Average Sepal Length')
plt.title('Average Sepal Length by Species')

# Show the plot
plt.show()

In this case, the `ci=None` parameter removes the confidence interval bars, focusing solely on the mean values. The plot provides a quick way to compare the average "sepal_length" across different species.

## Scatter plots

Scatter plots are excellent tools for visualizing relationships between two numerical variables. In the given example, we use Seaborn's `scatterplot` function to create a scatter plot of "sepal_length" against "sepal_width" from the `df` DataFrame. The points are colored based on the "species" category, providing an additional layer of information. 

Here's the code snippet to generate the scatter plot:

In [None]:
#| eval: true
#| echo: true
#| output: true

import seaborn as sns
import matplotlib.pyplot as plt

# Create a scatter plot for 'sepal_length' and 'sepal_width' colored by 'species'
sns.scatterplot(x='sepal_length', y='sepal_width', hue='species', data=df)

# Add labels and title
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Scatter Plot of Sepal Dimensions by Species')

# Show the plot
plt.show()

This scatter plot allows you to identify patterns or relationships between "sepal_length" and "sepal_width" while also considering the species. It's a powerful way to explore multidimensional data.

## Heatmaps

Heatmaps are excellent tools for visualizing complex relationships between numerical variables. In Python, the Seaborn library provides an easy-to-use `heatmap` function for this purpose. For instance, you can create a heatmap of the correlation matrix of numerical features in the `df` DataFrame. The color gradients in the heatmap represent the strength and direction of correlation, making it easier to identify highly or weakly correlated variables.

Here's an inline code example to generate the heatmap:

In [None]:
#| eval: true
#| echo: true
#| output: true

import seaborn as sns
import matplotlib.pyplot as plt

# Drop the 'species' column to only keep numerical columns
numerical_df = df.drop('species', axis=1)

# Calculate the correlation matrix for the numerical columns
correlation_matrix = numerical_df.corr()

# Create a heatmap to visualize the correlation matrix
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')

# Add title
plt.title('Heatmap of Feature Correlations')

# Show the plot
plt.show()

This heatmap makes it easier to understand the relationships between different numerical features, aiding in feature selection and further data analysis.
