# Tutorial 3: Analyzing and Visualizing Data</font> <a id='home'></a>

Welcome to this tutorial, where you'll learn to analyse and visualize your Excel data.

Excel isn’t just a tool for basic data entry—it’s a powerful platform for calculations, data restriction, pivot tables, and visual analytics. Whether you’re optimizing budgets for ad campaigns, examining audience engagement metrics, or preparing sales reports, this notebook will guide you through hands-on examples. You’ll build essential skills to efficiently analyze and present your data.

1. [Calculate using functions](#calculate) ([quizzes](#quiz1-3) #1 #2 #3)
2. [Restrict data entry](#restrict) ([quizzes](#quiz4-6) #4 #5 #6)
3. [Create pivot tables](#pivot) ([quizzes](#quiz7-9) #7 #8 #9)
4. [Generate charts](#charts) ([quizzes](#quiz10-12) #10 #11 #12)
4. [Practice data processing](#practice) ([quizzes](#quiz13-15) #13 #14 #15)
5. [Evaluate yourself!](#evaluate) ([quizzes](#quiz16-18) #16 #17 #18)


This tutorial allows you interact by:
- making changes in cells for calculations and restrictions,
- taking quizzes to test your understanding, and
- applying examples directly in your own spreadsheet.



## Calculate using functions <a id="calculate"></a> ([top](#home))

In the first section of this tutorial, we'll learn to use functions for essential calculations, such as sum totals, averages, and complex conditional calculations. The data is the same as that used at the end of the previous tutorial, describing online advertising campaigns for a music streaming platform. The file contains data on 74 online advertising campaigns with the following variables:
- **Campaign_ID:** Unique identifier for each campaign
- **Campaign_Name:** Name of each campaign
- **Genre:** Genre targeted by the campaign
- **Target_Audience:** Primary audience
- **Platform:** Platform where the ad was run
- **Month:** Month the ad was run
- **Impressions:** Number of times the ad was shown
- **Clicks:** Number of clicks the ad received
- **Click_Through_Rate (%):** Percentage of clicks relative to impressions, calculated as: $$\frac{\text{Clicks}}{\text{Impressions}} \times 100$$
- **Ad_Spend (USD):** Amount spent on each campaign
- **Conversion_Rate (%):** Estimated percentage of clicks that led to subscriptions
- **Engagement_Score:** Custom score for engagement level based on impressions, clicks, and conversions (e.g., a weighted score calculated based on clicks and conversion rate).

Download the data file: [Tutorial2xls4-adcampaigns.xlsx](https://github.com/matgithub-picardy/ExcelPythonCourse/raw/refs/heads/main/Tutorial_Excel_3/Tutorial2xls4-adcampaigns_v2.xlsx).

We'll exploit the campaign data by calculating meaningful indicators to identify the best-performing campaigns. Let's get started!
 
 
 ### Calculating total impressions for the quarter: Function `SUM`

To analyze the reach of all campaigns, let’s calculate the total number of *Impressions* generated across campaigns. This will help us understand the potential audience reach.


To create a sum in Excel, follow these steps:
- Click on the cell where you want the sum to appear (e.g.,``H76`).
- Type `=SUM(` to begin the formula.
- While holding down the Shift key, click on the starting cell (`H2`), then click on the ending cell (`H75`). This will select the range``H2:H75`.
- Press Enter to complete the formula.

You should now see the total number of *Impressions* across all campaigns. You must have realized that it's not even necessary to add the right parenthesis! Alternatively, you can hold down the left mouse button to drag and select the cells from H2 to H75 instead of using the Shift key.

The video below shows how to use the function `SUM` in Excel (source: YouTube channel "Microsoft 365"):

<div style="margin-left: auto; margin-right: auto; width: 50%">
    <figure style="margin: 10px;">
        <video controls src="Tutorial_Excel_3/Use_SUM_function_Excel.mp4" width="500" />
        <figcaption style="text-align: center;">Microsoft Excel</figcaption>
    </figure>
</div>

You can also access the `SUM` function via the tabs / menus, but this is less practical and finding a function can sometimes be a bit tedious! In what follows, we'll be learning new functions as and when we need them, and we'll be writing them directly into a cell. Excel will then automatically fill in the function name and indicate the parameters to be entered. This approach is simpler and faster.


### Calculating the average Click-Through Rate (CTR): Function `AVERAGE`

To understand the engagement level of the campaigns, let’s calculate the average Click-Through Rate (%):
1. Select the cell where you want the result (e.g., `I76`).
2. Type `=AVERAGE(`
3. Select the range of CTR values (`I2:I75`).
4. Press Enter.

This gives you the average CTR across all campaigns, helping you benchmark individual campaign performance. A good practice may be to indicate the content of the cell in an adjacent cell ("Impressions sum" or "Moy. CTR").


### Calculating bonus payouts for top campaigns: Function `IF`

The music streaming service wants to reward high-performing campaigns with a bonus. If a campaign’s Engagement Score is above 70, it receives a &#36;500 bonus; otherwise, it receives no bonus:
1. Select the cell for the bonus calculation (e.g., `M2`).
2. Type `=IF(`. Note that a help box appears to remind you of the function's syntax.
3. Set up the formula as follows (each value must be separated by a comma for Excel or a semicolon for LibreOffice):
   - **Logical Test**: `L2 > 70` (K2 being the Engagement Score)
   - **Value if True**: `500`
   - **Value if False**: `0`
4. Drag the formula down to apply it to all campaigns.

<div style="display: flex; justify-content: center; align-items: center;">
    <figure style="margin: 10px;">
        <img src="Tutorial_Excel_3/function_excel_1_XL.jpeg" alt="LibreOffice Chart" width="1000"/>
        <figcaption style="text-align: center;">Microsoft Excel</figcaption>
    </figure>
    <figure style="margin: 10px;">
        <img src="Tutorial_Excel_3/function_excel_1_LO.jpeg" alt="Excel Chart" width="1000"/>
        <figcaption style="text-align: center;">LibreOffice</figcaption>
    </figure>
</div>


### Campaign metrics summary

At this point, you can use various Excel functions to create a summary of key metrics from the ad campaigns data.

1. **Total ad spend:** Use the `SUM` function on the Ad Spend (USD) column.
2. **Highest conversion rate:** Use the `MAX` function on the Conversion Rate (\%) column to identify the best-performing campaign.
3. **Count of high-engagement campaigns:** Use the `COUNTIF` function to count campaigns with an Engagement Score above 70.

Summarize these key metrics in a dedicated section at the bottom of the campaign data, creating a quick-reference dashboard for future analysis.


### Check your results with Python
We can use Python to check the results obtained in Excel. First, let's load your data from the Excel file. Here's a sample setup to import the data and display the first few rows.

In [None]:
import pandas as pd

# Load the data from Excel
df = pd.read_excel('Tutorial_Excel_3/Tutorial2xls4-adcampaigns_v2.xlsx')

# Display the first few rows
df.head()

### Python equivalent for `SUM`, `AVERAGE`, and `IF`
In the below, we calculate:
- the total impressions across all campaigns
- the average click-through rate (CTR)

We also create a new column that assigns a &#36;500 bonus to campaigns with an Engagement Score above 0, and 0 otherwise.

In [None]:
# Calculate the total number of impressions
total_impressions = df['Impressions'].sum()
print(f"Total Impressions: {total_impressions}")

# Calculate the average click-through rate
average_ctr = df['Click_Through_Rate (%)'].mean()
print(f"Average Click-Through Rate (%): {average_ctr:.2f}")

# Apply the IF logic using a lambda function
df['Bonus'] = df['Engagement_Score'].apply(lambda x: 500 if x > 70 else 0)

# Display the updated dataframe with the Bonus column
df[['Campaign_Name', 'Engagement_Score', 'Bonus']].head()

### Code explanation
- `total_impressions = df['Impressions'].sum(`: We calculate the total number of impressions across all campaigns by summing the values in the Impressions column of df.
- `average_ctr = df['Click_Through_Rate (%)'].mean(`: We calculate the average click-through rate by taking the mean of the Click_Through_Rate (%) column.
- `{average_ctr:.2f}` formats the output to two decimal places to make it look clean and concise.
- 'df['Bonus'] = df['Engagement_Score'].apply(lambda x: 500 if x > 70 else 0)': This line uses a lambda function to create a new column, "Bonus", based on a condition applied to the "Engagement_Score" column. The lambda function checks if each Engagement_Score is above 70. If true, it assigns a value of 500; otherwise, it assigns 0. This conditional assignment is similar to an "IF" statement in Excel.

### Additional example: Filtering top campaigns
If you want to filter out high-engagement campaigns (those with an Engagement Score above 70):

In [None]:
# Filter high-engagement campaigns
high_engagement_campaigns = df[df['Engagement_Score'] > 70]
print(high_engagement_campaigns[['Campaign_Name', 'Engagement_Score']])

This snippet will output a list of campaigns with an Engagement Score over 70, helping to identify top performers.

### Final Python code
Here's a comprehensive example that calculates key metrics from your ad campaign dataset, similar to the final activity proposed.

In [None]:
# Calculate summary metrics
summary_metrics = {
    "Total ad spend (USD)": df['Ad_Spend (USD)'].sum(),
    "Average CTR (%)": df['Click_Through_Rate (%)'].mean(),
    "Highest conversion rate (%)": df['Conversion_Rate (%)'].max(),
    "High engagement campaigns count": df[df['Engagement_Score'] > 70].shape[0]
}

# Display the summary metrics
for metric, value in summary_metrics.items():
    print(f"{metric}: {value}")


This code generates key summary metrics for your ad campaigns, similar to the dashboard you would create in Excel. Adjust it based on the specific metrics you want to track, and the ones we've already seen.

### Quiz #1 <a id="quiz1-3"></a> ([top](#home))
**You want to calculate the total ad spend across all campaigns. Which function would you use?**

1. SUM
2. AVERAGE
3. MIN
4. COUNT


<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 1. SUM

</details>

### Quiz #2
**To understand campaign effectiveness, you want to find the average Conversion Rate (%) across all campaigns. Which function should you use?**

1. MAX
2. AVERAGE
3. COUNTIF
4. MIN

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 2. AVERAGE

</details>

### Quiz #3
**You want to mark campaigns as "High Engagement" if the Clicks are above 5,000, otherwise "Low Engagement." Which formula would work?**

1. `IF(H2 = 5000, "High Engagement", "Low Engagement")`
2. `IF(H2 < 5000, "Low Engagement", "High Engagement")`
3. `IF(H2 = "High Engagement", 5000, "Low Engagement")`
4. `IF(H2 > 5000, "High Engagement", "Low Engagement")`

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 4. `IF(H2 > 5000, "High Engagement", "Low Engagement")`

</details>

## Restrict data entry<a id="restrict"></a> ([top](#home))
Here we'll look at how to control the data that can be entered, to avoid errors and ensure data consistency. Data validation allows you to specify the types of data that can be entered into the cells of a spreadsheet, in order to guarantee the consistency and accuracy of the data entered. This function is particularly useful for datasets whose consistency is crucial to meaningful analysis.

For example, we want the entries of the column “Platform” to be restricted to "Facebook", “Instagram”, “TikTok”, or "YouTube". If a user enter “Tik Tok” or “Youtube”, our dataset could show inconsistencies, complicating filtering or analysis by category. To ensure consistent data entry, we can configure data validation to allow only authorized entries.

### Setting up data validation
In Excel, data validation can be applied to a column by selecting the cells, clicking on the "Data" tab, and choosing "Data Validation" in "Data Tools" group (Excel) / "Validity..." (LibreOffice). You can then specify validation criteria in the dialog box that appears.

For example, if we want to limit input to the values "Facebook", “Instagram”, “TikTok”, or "YouTube":
1. Select the cells in the User Category column.
2. Go to "Data" and "Data Validation" / "Validity"
 - Excel provides its own breakdowns. If these don't correspond to what you're looking for, you can create your own.
 - For both programs, you can choose whether to create the pivot table in the same sheet as the data, or in a new sheet.
<div style="display: flex; justify-content: center; align-items: center;">
    <figure style="margin: 10px;">
        <img src="Tutorial_Excel_3/function_excel_2_XL.jpeg" alt="LibreOffice Chart" width="1000"/>
        <figcaption style="text-align: center;">Microsoft Excel</figcaption>
    </figure>
    <figure style="margin: 10px;">
        <img src="Tutorial_Excel_3/function_excel_2_LO.jpeg" alt="Excel Chart" width="1000"/>
        <figcaption style="text-align: center;">LibreOffice</figcaption>
    </figure>
</div>

3. Choose “List” from the drop-down options, and add the desired entries.

<div style="display: flex; justify-content: center; align-items: center;">
    <figure style="margin: 10px;">
        <img src="Tutorial_Excel_3/function_excel_3_XL.jpeg" alt="LibreOffice Chart" width="1000"/>
        <figcaption style="text-align: center;">Microsoft Excel</figcaption>
    </figure>
    <figure style="margin: 10px;">
        <img src="Tutorial_Excel_3/function_excel_3_LO.jpeg" alt="Excel Chart" width="1000"/>
        <figcaption style="text-align: center;">LibreOffice</figcaption>
    </figure>
</div>

Now test one of the cells with a restriction: only "Facebook", “Instagram”, “TikTok”, or "YouTube" will now be accepted in this column.

Data validation can also display messages for users when they select a cell, guiding them on the type of data to enter. This feature is helpful when data entry rules are complex. For instance, a message might prompt users to select allowed entries for a specific streaming platform. Another feature, the Error Alert, displays an error if an invalid value is entered.

<div style="margin-left: auto; margin-right: auto; width: 50%">
    <figure style="margin: 10px;">
        <video controls src="Tutorial_Excel_3/Data_validation_Excel.mp4" width="500" />
        <figcaption style="text-align: center;">Microsoft Excel</figcaption>
    </figure>
</div>

### Quiz #4<a id="quiz4-6"></a> ([top](#home))
**Which of the following is a primary benefit of using data validation in a digital media dataset?**

1. To allow more flexibility in user data entry
2. To ensure consistent data entries for analysis
3. To increase the size of the dataset
4. To make data entry faster

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 2. To ensure consistent data entries for analysis
</details>

### Quiz #5
**What does an error alert do in data validation?**

1. It changes the value of invalid entries
2. It allows only numbers to be entered
3. It displays a message when an invalid entry is made
4. It prevents users from entering any data

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 3. It displays a message when an invalid entry is made

</details>

### Quiz #6

**If you’re creating a column in a dataset for "User Status" that should only contain the values "Active" or "Inactive," which data validation setting would be best?**

1. Date
2. List
3. Decimal
4. Any Value

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 2. List

</details>

## Create pivot tables<a id="pivot"></a> ([top](#home))
Discover the potential of pivot tables to easily summarize and analyze large data sets.

Pivot tables (referred to as TCD for "Tableau Croisé Dynamique" in French) are a highly useful feature in Excel, commonly used across industries for data analysis. A pivot table allows you to summarize large datasets and perform calculations based on specified criteria.

Let's explain what a pivot table is, based literally on the words of its French translation:
 - **Table:** In Excel, a table is a set of data organized in rows and columns, as in a database. Each column represents a different variable (month, salesperson, product family, etc.), and each row a record (an entry in the dataset), for example, sales in a specific month for a specific salesperson.
 - **Crossed:** Variables are said to be crossed because the data is analyzed by combining categories. You may want to see sales data broken down both by month and by salesperson, for example, total sales for each salesperson for each month. This “crossover” view allows you to see interactions between two or more variables in the same table. So, instead of looking at sales data only by month or by salesperson, we cross-reference them for a more detailed analysis.
 - **Dynamic:** The table can be adjusted automatically as data changes. If you update the original data, you can quickly update the table.

Pivot tables can help analyze metrics such as monthly revenue, content type, and creator contributions. This section will walk through creating a pivot table using sales data and explore ways to use it for insights.


### Monthly revenue analysis in a media company

Imagine a media company tracking monthly revenue for different content categories. The dataset records:

- **Month**: January to December
- **Content Creator**: Various creators (e.g., A, B, C, D)
- **Content Type**: Types like Video, Audio, Article, Image
- **Revenue**: Monthly revenue generated


| Month	  | Content Creator | Content Type	| Revenue |
|---------|-----------------|---------------|---------|
| January | Creator A    	| Video    	    | 1200    |
| January | Creator B   	| Audio  	    | 1500    |
| February| Creator A       | Video         | 1700    |
| February| Creator B       | Article       | 1300    |
| March   | Creator C	    | Audio    	    | 1600    |
| March	  | Creator D   	| Image  	    | 1400    |
| April	  | Creator C   	| Video 	    | 1500    |
| April	  | Creator D   	| Article       | 1800    |

The pivot table will allow us to calculate total revenue by month, creator, or content type, and apply filters to understand revenue for specific periods or content categories.

The python code below creates a DataFrame with the data:

In [None]:
# Importing necessary library
import pandas as pd

# Creating sample data to use in the pivot table
data = {
    'Month': ['January', 'January', 'February', 'February', 'March', 'March', 'April', 'April'],
    'Content Creator': ['Creator A', 'Creator B', 'Creator A', 'Creator B', 'Creator C', 'Creator D', 'Creator C', 'Creator D'],
    'Content Type': ['Video', 'Audio', 'Video', 'Article', 'Audio', 'Image', 'Video', 'Article'],
    'Revenue': [1200, 1500, 1700, 1300, 1600, 1400, 1500, 1800]}

# Convert to DataFrame
df = pd.DataFrame(data)

# Display the dataset
df

### Steps to create a pivot table

Let’s create a pivot table to calculate total revenue by month and by content type. To create a pivot table in Excel, follow these steps:

1. Copy and paste the data above in an Excel sheet 
2. Ensure your data is clean, with no empty rows or columns, and each column has a title.
3. Go to the **Insert** tab in Excel and click **Pivot Table**.
4. Select the range of your data.
5. Choose where to place the pivot table (usually in a new worksheet).
6. Use the **Fields Pane** to build your table by dragging columns (fields) into Rows, Columns, Values, or Filters.

When building a pivot table, you can specify fields as:

- **Rows**: Categories like months or creators (used for sorting data).
- **Columns**: Fields you want separated in columns, such as content type.
- **Values**: The numerical data you want to analyze, e.g., total revenue.
- **Filters**: Optional filters that let you refine data display, such as filtering by a specific content type or creator.

To get total revenue by month, drag **Month** to Rows, **Content Type** to Columns, and **Revenue** to Values.

In [None]:
# Using pandas to create a pivot table similar to Excel
pivot_table = df.pivot_table(values='Revenue', index='Month', columns='Content Type', aggfunc='sum')
pivot_table

### Code explanation

- `pivot_table = df.pivot_table(...)`: Creates a pivot table from DataFrame `df`.
- `values='Revenue'`: Specifies the data column ("Revenue") that we want to aggregate. This will be the metric we’re analyzing.
- `index='Month'`: Defines the rows of our pivot table. Each unique value in the "Month" column of `df` will become a row in the pivot table.
- `columns='Content Type'`: Defines the columns of the pivot table. Each unique value in the "Content Type" column of `df` will become a column in the pivot table.
- `aggfunc='sum'`: Specifies the aggregation function we want to apply. Here, we’re summing up the Revenue values for each combination of "Month" and "Content Type".


### Updating a pivot table

In Excel, changes to the original data do not automatically update the pivot table. To refresh:
1. Click inside the pivot table.
2. Go to the **Pivot Table Analyze** tab.
3. Click **Refresh** in the Data group.

This will sync your pivot table with the latest changes in your data.


### Theoretical issues of pivot tables:

Pivot tables are very useful in economics because they allow analysts to **break down and summarise data** across several dimensions. This **reveals patterns** and information that are often hidden in the raw data, particularly when dealing with large datasets with many variables.

1. **Simplifying complex relationships:** in economic analysis, it is often necessary to understand how different factors interact with each other and influence results. Pivot tables allow us to quickly summarise data in order to observe these interactions. This quick summary helps economists discover patterns without complex coding or in-depth statistical analysis.

For example:
- Monthly sales by region and product type: An analysis of sales data by month, region and product type can help identify seasonal trends and regional preferences.
- Employment by sector and year: By breaking down employment data by sector and year, we can identify trends in employment growth or decline in certain sectors over time.

2. **Proposing a preliminary approach prior to econometric analysis**
Econometrics focuses on quantifying relationships between variables and testing hypotheses. Pivot tables are useful in this respect because they allow us to :
- Find raw trends and correlations: Before moving on to advanced econometric models, pivot tables allow us to see raw associations. For example, trends in income levels as a function of education levels).
- Identify interaction effects: Grouping data into categories allows us to analyse possible interaction effects, such as how two variables combined (for example, age and education) influence a third variable (income). This makes it possible to determine which variables should be included in further econometric modelling.

3. **A real example: Examining the impact of a policy**
Economists frequently use pivot tables to examine the impact of policies. For example, after an increase in the minimum wage, a pivot table showing employment levels by sector and by region could provide a quick indication of whether the number of jobs has changed significantly in sectors that rely heavily on minimum wage labour. This type of breakdown makes it easy to analyse whether the effects vary according to region or industry before implementing a more complex model.

### Quiz #7<a id="quiz7-9"></a> ([top](#home))
**Which of the following best describes a pivot table?**

1. A table that only displays raw data in Excel.
2. A summary table that allows calculations based on selected criteria.
3. A method of creating graphs from raw data.
4. A tool only used for data cleaning.

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 2. A summary table that allows calculations based on selected criteria.
</details>

### Quiz #8

**If you want to display total revenue for each content type per month, where should you place "Content Type" in a pivot table?**

1. Rows
2. Values
3. Filters
4. Columns

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 4. Columns

</details>

### Quiz #9
**After modifying the original dataset, what must you do to update the pivot table in Excel?**

1. Recreate the pivot table from scratch.
2. Click on the pivot table and hit the "Enter" key.
3. Click inside the pivot table and select "Refresh."
4. None of the above.

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 3. Click inside the pivot table and select "Refresh."

</details>

### Activity: Analyzing digital revenue data

### Activity

1. Try replacing `index='Month'` in the above code with `index=['Content Creator', 'Month']`, then with `index=['Month', 'Content Creator']`. What data did you get? Describe them.

2. Using the sample data below, create a pivot table in Excel to answer the following questions (you can copy it to Excel or recreate it if needed):

| Month     | Content Creator | Content Type | Revenue |
|-----------|------------------|--------------|---------|
| January   | Creator A       | Video        | 1200    |
| January   | Creator B       | Audio        | 1500    |
| February  | Creator A       | Video        | 1700    |
| February  | Creator B       | Article      | 1300    |
| March     | Creator C       | Audio        | 1600    |
| March     | Creator D       | Image        | 1400    |
| April     | Creator C       | Video        | 1500    |
| April     | Creator D       | Article      | 1800    |

Once your pivot table is set up, try experimenting with different arrangements of rows, columns, and filters to gain insights from the data.

- What is the total revenue for each content type (Video, Audio, etc.)?
- Which content creator generated the most revenue?
- What is the total monthly revenue for each month?

3. Think about which variables in the campaign dataset *Tutorial2xls4-adcampaigns.xlsx* might influence or impact key outcomes. For example:
- What factors might influence "Clicks"? 
- Do certain "Platforms" drive more engagement?
- How does "Target Audience" affect "Conversion Rate"?

Try creating a pivot table to examine some of these relationships.


## Generate charts<a id="charts"></a> ([top](#home))
Data visualizations help us interpret data more effectively by presenting it in a clear, graphical format. Excel makes this straightforward, offering charts such as histograms, line graphs, scatter plots, and pie charts, which update automatically when data changes. Python, with libraries like `pandas` and `matplotlib`, allows similar visualizations that are flexible and customizable.

In this section, we'll use the dataset from the advertising campaigns we've already analysed to create some key graphs (*Tutorial2xls4-adcampaigns.xlsx*). We'll learn how to select the data to plot, choose the appropriate chart type and customise the appearance of the charts.

Your file is open in Excel and you also open it in Python with the code below:

In [None]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
# Assuming 'campaign_data.csv' contains the dataset with the columns specified
df = pd.read_excel('Tutorial_Excel_3/Tutorial2xls4-adcampaigns_v2.xlsx')
df.head()

### Selecting data

In Excel, you can use your mouse to select the cells you want to view directly. In our example, let's say we want to create a chart to show total advertising spend by month and by platform. This will enable us to see which months and which platforms have seen the highest levels of advertising investment.

In Python, we need to filter the dataset to include only the data we want to visualize.

In [None]:
# Aggregate data by month and platform to calculate total ad spend
ad_spend_data = df.pivot_table(values='Ad_Spend (USD)', index='Month', columns='Platform', aggfunc='sum')
ad_spend_data.plot(kind='bar', stacked=True)
plt.title('Ad Spend by Month and Platform')
plt.xlabel('Month')
plt.ylabel('Ad Spend (USD)')
plt.legend(title='Platform')
plt.show()

### Choosing chart type

Choosing the right chart is essential for effective data visualization:
- **Bar Charts** are useful for comparing data across categories.
- **Line Charts** are ideal for showing trends over time.
- **Pie Charts** display proportions within a single category.

In Excel / LibreOffice, charts are created by selecting `Insert > Chart`. In Python, we use the `plot()` function from `matplotlib` or `pandas` for a similar effect.

In [None]:
# Calculate total clicks per month
clicks_data = df.groupby('Month')['Clicks'].sum()

# Plot a line chart
clicks_data.plot(kind='line', marker='o')
plt.title('Monthly Clicks')
plt.xlabel('Month')
plt.ylabel('Clicks')
plt.show()

## Customizing the chart

Charts are more informative when titles, labels, and legends are clearly specified:
- **Title**: A concise description of what the chart shows
- **Axis labels**: Describe what each axis represents
- **Legend**: Indicates categories in the chart

In Python, `matplotlib` allows customization of these elements. Let’s add labels and a legend to our chart.

In [None]:
# Calculate average conversion rate per genre
conversion_data = df.groupby('Genre')['Conversion_Rate (%)'].mean()

# Plot a pie chart
conversion_data.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title('Average Conversion Rate by Genre')
plt.ylabel('')  # Hide y-axis label for pie chart clarity
plt.show()

## Activity: Analyze Engagement Score by Target Audience and Month

For this activity, create a pivot table that shows the average engagement score by target audience and month. Then, use this table to create a heatmap to visualize how engagement varies across months and audiences. This can give insights into which audiences are most engaged at different times.

### Instructions:
1. Aggregate the data to calculate the average engagement score by `Target_Audience` and `Month`.
2. Use a heatmap to visualize this aggregated data, where darker colors represent higher engagement scores.

Here's some starter code to help:


In [None]:
import seaborn as sns

# Aggregate engagement score by target audience and month
engagement_data = df.pivot_table(values='Engagement_Score', index='Month', columns='Target_Audience', aggfunc='mean')

# Plot a heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(engagement_data, annot=True, cmap="YlGnBu", fmt=".1f")
plt.title('Engagement Score by Target Audience and Month')
plt.xlabel('Target Audience')
plt.ylabel('Month')
plt.show()

### Quiz #10<a id="quiz10-12"></a> ([top](#home))
**When selecting data for a chart, which of the following should you consider?

1. Including both the labels (e.g., months) and values for clarity
2. Only numerical values, ignoring labels
3. Using only the smallest possible data sample
4. Ignoring categorical values like platforms

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 1. Including both the labels (e.g., months) and values for clarity
</details>

### Quiz #11
**Which chart type is best for showing data that changes over time?

1. Bar chart
2. Pie chart
3. Scatter plot
4. Line chart

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 4. Line chart

</details>

### Quiz #12

**When customizing a chart, what should be included to improve clarity?**

1. Only a title
2. Title, axis labels, and legend
3. Just the data points
4. Axis labels and nothing else

Answer: 2

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 2. Title, axis labels, and legend

</details>

## Practice data processing<a id="practice"></a> ([top](#home))
Get hands-on experience with a variety of data processing techniques to improve your analytical workflow.

### Quiz #13<a id="quiz13-15"></a> ([top](#home))
**

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 
</details>

### Quiz #14
**

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 

</details>

### Quiz #15

**

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 

</details>

## Evaluate yourself!<a id="evaluate"></a> ([top](#home))
Test your knowledge by applying all of the skills learned in this tutorial to a comprehensive data set.

### Quiz #16<a id="quiz16-18"></a> ([top](#home))
**

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 
</details>

### Quiz #17
**

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 

</details>

### Quiz #18

**

<details>
<summary>Click here to see the answer</summary>

**The correct answer is:** 

</details>