# 1 Storytelling with Data

Let's start with the importance of data storytelling and the elements you need to tell stories with data. You'll learn best practices to influence how decisions are made before learning how to translate technical results into stories for non-technical stakeholders.

## 1.1 Fundamentals of storytelling

## 1.2 The story begins

You recently started working as a data scientist at a company named *Communicatb*. For your first project, you and your team need to analyze churn customer data for a cell phone company. The goal is to predict their behavior and help develop a program to retain customers.

Your team lead knows you are an expert on storytelling. She asks you to explain to the team why crafting a compelling story is important when delivering results. You write down a list of reasons to be prepared.

---

One of the statements you wrote is **false**. Can you select which one is it?

### 1.2.1 Answer the question

### Possible Answers

- [ ] It will be easier for the audience to remember an anecdote on why customers churn than the correlation coefficients between customer traits.

- [ ] Your findings will be better aligned with change-adverse stakeholder expectations. They will be most likely to implement the program to retain customers.

- [x] Even if your data do not reveal a distinct customer behavior, storytelling might influence stakeholders to create the retention program.

- [ ] The marketing team will have a better understanding of the impact of your model. It is central since they are creating the retention program.

## 1.3 Building a story

You nailed it! Your explanation of why stories are efficient when conveying insights went very well! Now, your team lead would like you to give a short presentation. You're going explain the different steps involved in telling a story with data to the team. It will be the starting point for delivering the results of the churn project when ready.

You know it is an important task. To prepare your talk, you look for your notes on the storytelling course you took, but realize that some parts are erased. So you need to remember the do's and dont's of data storytelling.

Which of the following statements about effective data storytelling are true, and which are false?

### 1.3.1 Instructions

Correctly classify the statements as either true or false.

| True                                                                                                                       | False                                                                                            |
|:--------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------:|
| To drive change, center your story around how the company's profits will increase if the retention program is implemented. | For adding value, include all customers traits analyzed to understand why customers churn.       |
| Present one supporting customers data point after each other so they naturally reach a conclusion.                         | To drive change, include data from a successful retention program launched in a similar company. |
| To create a compelling narrative, connect the most important findings to actions of the retention programs.                | To be clear and concise, build one story and present it to the managers and your technical team. |

## 1.4 Translating technical results

## 1.5 A non-tech story

The exploratory data analysis on the churn project is finished! It's now time for the monthly update meeting. You will have to explain your results to the operation specialist and the program director. You are addressing a non-technical audience, and want to make sure that your presentation is adapted to the audience you're addressing so that your message gets across.

You write down some statements you could use to explain your work, but you believe some of them are more suitable for a non-technical story, while others are too technical to include.

Can you select which sentences you should use in this case?

### 1.5.1 Instructions

Correctly classify the examples as more suitable either for a tech or non-tech stories.

| Tech story                                                                                                                                | Non-tech story                                                                                                                                   |
|:-----------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------:|
| Churn and no-churn customers show a different probability density distribution of the number of the months a customer has subscribed for. | Imagine that it rains and you have to go to an event. What factors will you consider to go or stay at home? That's how feature importance works. |
| The ANOVA showed that payment methods affect churn rate even though SEM is very high.                                                     | The churn customers have subscribed for fewer months than customers that did not churn.                                                          |
| After several iterations, the elbow method showed that 4 was the optimal number of clusters to run K-means.                               | The clustering analysis, a model to group customers based on their similarities, showed four types of customers to target with the program.      |
| To understand which  features had the most predictive relevance, the feature importance permutation was used.                             | A customer that pays with credit card trends to churn less than a customers that pays with a mailed check.                                       |

## 1.6 Be aware

The meeting was a success! The program director asks you to send your results to the business specialists. You need to write a report and send it by email by the end of the week. You have never met them, so you ask for their background and goals.

After you gather data, you realize you will be communicating your results to a different audience. You want them to understand your results.

---

Can you select which of the following examples are **best practices** to translate your results?

### 1.6.1 Instructions

Correctly classify the examples as either best practice or bad practice.

| Best practice                                                                                                                                  | Bad practice                                                                                                                                |
|:----------------------------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------:|
| Explain that the predictions you made will help target specifically some customers. It will save money for the company in marketing campaigns. | Adjust your content to include some explanation on why you choose your variables for analysis, from a business perspective.                 |
| At the end of the report, include a page with all the terminology definitions and the acronyms you could not avoid.                            | When sending the report, anticipate any questions the business team will have. Answer them thoroughly so they know you are willing to help. |
|                                                                                                                                                | Do not include analogies to explain a concept with simple terms as it could be confusing and disengaging.                                   |

## 1.7 Impacting the decision-making process

## 1.8 Is it a true story?

You have done an amazing job explaining your exploratory data analysis on the churn project. Now, it's time to run the model to predict customer churn. You know that you will have to craft an effective story to present these results.

You want to be prepared. So you read your notes on how to build a compelling narrative. But you realized that one of your notes is not accurate.

---

Which of the following statement is **false**?

### 1.8.1 Answer the question

### Possible Answers

- [ ] A compelling narrative is key to presenting relevant insights to your target audience in a meaningful and impactful way.

- [ ] Because you should shape the narrative to your target audience, showing only key points or findings is a good practice.

- [x] Unless you have a great data groundwork to support your central insight, your findings will need a well-formed and compelling narrative to drive action and change.

## 1.9 Structured to impact

Your project on customer churn is done. You analyzed the data and built your model. You followed the steps for storytelling. Now, it's time to structure your story to have an impact at the decision-making level. You want stakeholders to follow your recommendations.

You like to write things down. So you take a pen and paper, and write down the different things you want to say in order on sticky notes. The window suddenly opens, throwing all of your notes on the floor.

Can you organize the steps for telling a story with data that is solid enough to influence the decision-makers?

### 1.9.1 Instructions

Order the steps chronologically: the first step should be on top and the last step at the bottom.

1. Explain with a line plot that the company usually has a churn rate of 5% but last year that rate suddenly increased to 15%.

2. Using boxplots, show that the percentage of churn customers with more than one dependent in their household has increased, affecting the total rate.

3. Add further evidence by showing that a higher percentage of customers with more than one dependent in their household with DSL service churn.

4. Show a barplot that revel that monthly charges are the most important predictor of customer churn.

5. recommend to implement promotional prices to churn-intending customers and show that this will result in 10% more earning with a barplot.

## 1.10 A story to compare

Great job organizing your narrative structure! The next step is to think about how you will present your insights.

You start reading and discover there are several ways to present data stories. You can compare your data, show correlation, cluster your data…

You are curious to know what type of data story would be a good fit for your data. You write down the central finding, your insights and the supporting evidence.

Can you classify your findings into the following categories?

### 1.10.1 Instructions

Correctly classify the following examples as comparison, correlation or clustering.

| Comparison                                                                                                                         | Correlation                                                                                                 | Clustering                                                                                                                           |
|:----------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------:|
| 50% of the churning customers have a pay-as-you-go contact. While 90% of non-churning customers are in a 2-year or 3-year contact. | The number of times a customers that churned streamed movies is higher if they pay higher monthly charge.   | There are customers with low monthly charges and low streaming time and customers with high monthly charges and high streaming time. |
| About 50% of the churning customers are married, while only 30% of the non-churning customers have dependents.                     | The monthly charges decrease as the number of months a non-churning customers has subscribed for increases. |                                                                                                                                      |

# 2 Preparing to communicate the data

Deepen your storytelling knowledge. Learn how to avoid common mistakes when telling stories with data by tailoring your presentations to your audience. Then learn best practices for including visualizations and choosing between oral or written formats to make sure your presentations pack a punch!

## 2.1 Selecting the right data

## 2.2 The truth about salaries

Your predictive model for customer churn, which you worked on in Chapter 1, has been deployed. Your project manager asks you to work on a new internal project. The goal is to analyze a database with employee salaries in San Francisco, USA.

After doing an exhaustive exploratory data analysis, you have to present your findings to the human resources team. They want to compare San Francisco salary growth to the one at the company; they need to understand how to forecast salaries for the next year. You are about to copy the graphs from your analysis. Your manager reminds you to select the right data for your stakeholders.

You start by writing down what you believe can help you choose the proper findings.

---

One of the statements you wrote is **false**. Can you select which one it is?

### 2.2.1 Answer the question

### Possible Answers

- [ ] The human resource team would likely be interested in knowing how the average salary has been increasing in the last 10 years in San Francisco.

- [ ] The human resource team has no knowledge of data analysis techniques, so code shouldn't be included when listing the top 5 job titles.

- [ ] Select categorical data, such as the salaries on the top 10 rated companies in industry the company evolves in, that provides context to support the idea of the increased salaries.

- [x] Select all collected numerical data about San Francisco salaries and show them in a big dashboard so it helps understand in detail why salaries have been increasing.

## 2.3 Earning interests

Well done! Your presentation with the human resource department was a success. Your team lead asks you to show your data analysis results to different stakeholders. Before you dive into preparing the presentation or the report, you want to make sure that you are aligned with their interests.

With that goal in mind, you define several personas. It will help you select the suitable data later. You write down the personas, their knowledge, and their interest on this project.

Can you classify your notes into the following audience personas?

### 2.3.1 Instructions

Correctly classify the following examples as Human Resources Director, technical supervisor, or marketing staff.

| Human Resources Director                                                                                                      | Technical supervisor                                                                                  | Marketing staff                                                                                                                   |
|:-----------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------:|
| Basic technical knowledge on data analysis. Wants to raise the salary of the company employees based on actual data.          | Show the variance, mean, and distribution of numerical variable, such as base pay and total benefits. | Select data demonstrating that better company benefits impact the employee performance to help attract talent on the career page. |
| Select data comparing employee satisfaction in best-paying companies compared to others to support employees salary increase. | Expert knowledge on statistical methods. Wants to analyze the salary of different European countries. | General knowledge on data analysis. Wants to understand how salary impacts work-life balance to advertise it on the career page.  |

## 2.4 Showing relevant statistics

## 2.5 Salary variation

You have selected suitable data for your story on San Francisco salaries. Now, you evaluate which metrics you should use.

You want to convey the following message to the human resource team: "*The total pay of employees has increased or decreased according to their job title from 2017 to 2018*."

You prepared the two visualizations below, but you are unsure which one is best.
![1.png](attachment:1.png)
One of the following options is **True**. Can you select which one is it?

### 2.5.1 Answer the question

### Possible Answers

- [ ] Both graphs showed clearly how the salary changed for the three different jobs. Both are meaningful ways of expressing the same insight and convey clearly the message.

- [ ] The graph on the left, showing total values, is the most suitable one. The changes in salary are observed. No adjustment is needed as you'd rather show raw than transform data.

- [x] The graph on the right is the best way to convey the message. With percentage change, the magnitude of the salary change depending on the job is more evident.

## 2.6 On a payroll

Good job on selecting the most impactful visualization! Your insight made an impact. The human resource team lead asked you to show more findings. You go back to your exploratory data analysis and select some data.

But you want to explore different variants of the same data to discover the best one for explaining your distinct insights to the human resource team.

Can you decide what data variant would be best suited depending on the finding you want to show?

### 2.6.1 Instructions

Correctly classify the following examples as total values, change or ratio.

| Total                                                                                                                     | Change                                                                                                                                | Ratio                                                                              |
|:-------------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------:|
| You want to highlight how much total overtime a company  paid for employees working as electronic maintenance technician. | An important finding is the fact that engineer experimented a 50% increase in their salary in 2018, while business analysts only 30%. | The average total pay per worker did not significantly increase from 2014 to 2015. |
| It would be interesting to show the total monthly benefits employee will have the next six months.                        | The number of people working in the private sector increased by 100k from 2017 to 2018.                                               |                                                                                    |

## 2.7 It's not significant

You have a big deadline ahead. You need to submit a report on the data analysis for the project on San Francisco salaries to your technical lead. He will read it and report your results to the senior management team.

You have a story, and you select data to support it. You want to show comparisons of average pay for people with different job titles.

You are hesitant to show p-values. You know that there are a lot of misconceptions around it. You decide to use it anyway. However, you plan to clarify any confusion about p-values, so your audience understands its meaning and trusts your results.

Can you classify these statements as **good use** or **misuse** of the p-value?

### 2.7.1 Instructions

Correctly classify the following examples as either a good use or a misuse.

| Good use                                                                                                                                           | Misuse                                                                                                                                               |
|:--------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------:|
| The base pay salary is higher (p-value: 0.03) for a physician than for a special nurse. You will report it but interpret this result with caution. | The mean base pay is higher for an accountant than a lawyer. P-value is the strongest evidence. There is no other interesting alternative to check.  |
| The mean overtime pay show higher values (p-value: 0.04) for a department head than for medical examiner. You don't consider this strong evidence. | From evaluating the p-value, you consider there is enough proof that a nurse and a doctor have the same base pay.                                    |
|                                                                                                                                                    | The mean total pay is lower (p-value: 0.002) for firefighters than for department chiefs. It's a bigger difference than for captain (p-value: 0.01). |

## 2.8 Visualizations for different audiences

## 2.9 Salary development

You are presenting your data analysis on San Francisco salaries to the business development department.

You have your story, and you select data and metrics to support it. But choosing the visualizations is still an ongoing task. You decide to speed it up by getting hands-on. You want to follow the best practices.

The message you want to convey is: "San Francisco salaries have been constantly going up in the last 4 years. The percentage variation is 10% annually. The number of people working in the private sector, such as software or biotechnology, have increased by 100k."

Can you classify if these practices would be **good** or **bad** when presenting to the *business* department?

### 2.9.1 Instructions

Correctly classify the following statements as being either true or false for choosing an effective visualization.

| Good practice                                                                                                             | Bad practice                                                                                                                        |
|:-------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------:|
| Aggregate the average salary in the public sector into one bar.                                                           | Include a line plot that shows the average salary in the public health sector from 2017 to 2021 to compare with the private sector. |
| Instead of labeling the visualization with complex terms, use a terminology that is accessible to a business stakeholder. | Include a detailed feature importance graph, with coefficient, to show which factor has influenced the most.                        |
| Include a barplot that shows the average salary in the private software and biotechnology sector from 2017 to 2021.       |                                                                                                                                     |

## 2.10 Salary on demand

You have selected visualizations for the business development department.

It's time to include them in your report and present them. You are aware of the steps you should follow to include and present visualizations effectively, and you want to do your best and impress your business coworkers. So you ask a colleague to help you organize your thoughts.

Can you **order** the steps for including and **presenting visual data** effectively?

### 2.10.1 Instructions

Order the steps chronologically: the first step should be on top and the last step at the bottom.

1. Announce the visualization "Yearly total pay for private sector from 2017-2020"

2. Explain that the data was obtained from a public survey made to 100k employees

3. Affirm that the average salary in the private sector has increased from 2017 to 2021

4. Show that particularly in software and biotechnology, the average total pay has increased by 50%

5. Tell the audience that this is important because the company could attract and retain employees with a competitive salary.

## 2.11 Choosing the appropriate format

## 2.12 A communication problem

Your coworker has been working on a project on price predictions. He asks you to help him choose the most suitable format to deliver his results to the executive board as well as to his team.

You give him a set of advice and rules of thumb, so he can make an informed decision. When you arrive home, you realize that you made one mistake.

---

Which of the following advice should you not have provided?

### 2.12.1 Answer the question

### Possible Answers

- [ ] The amount of time the CEO can dedicate to getting up to speed with your analysis is an important factor in your choice of delivery format.

- [x] If a software engineer in your team wants to continue your project with new data, the central piece of information to include in your meeting is the project conclusions.

- [ ] If your project manager, located in a different time zone, needs your results to communicate them to customers, a written report would be ideal.

## 2.13 Should we meet?

It's Friday. Your project manager comes by your desk. She asks you about the status of your project on salaries for San Francisco employees. She tells you that you need to close the project. Fortunately, you have finished building the model.

But to close it, you need to communicate the results to the different stakeholders. After she talks with the people involved, you start to receive emails asking about the results. You need to decide if you are going to use an oral presentation or a written report.

Can you decide what type of format would be best suited depending on the situation and requirements?

### 2.13.1 Instructions

Correctly classify the following inquires as suitable for an oral or written format.

| Oral format                                                                                                            | Written format                                                                                                                                      |
|:----------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|
| Your program manager wants to close the project and understand what went well and what can be improved.                | A colleague wants to understand and digest your methodology so that she can replicate them on a new dataset regarding churn for New York employees. |
| Your manager wants you to present the results to the finance team, so they can ask follow-up questions about salaries. | The software engineering team needs to understand your model to integrate it with an internal application on the backend.                           |
| An investor wants the monthly update on your employee salaries project to understand your conclusions.                 | A founder wants to have the San Francisco salaries analysis to support his position at the board meeting.                                           |

## 2.14 When in doubt

You manage to deliver the results to almost all the stakeholders. You are about to start writing the report for the founder when you get an email. Your founder is coming by the office the following Friday. Your manager wants to know if presenting the project during a meeting would be better.

You have second thoughts about changing the format. So you decide to write down beneficial and unfavorable aspects of an oral presentation.

Can you classify your thoughts correctly?

### 2.14.1 Instructions

Correctly classify examples where an oral presentation is either beneficial or a unfavorable.

| Beneficial                                                                        | Unfavorable                                                                                             |
|:---------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------:|
| You are very eager to know the opinion of the founder about your recommendations. | You want to communicate a lengthy message, so you want all the results to receive the proper attention. |
| Your conclusions highlight that a decision is needed immediately.                 | The founder will share your project with a large audience, so she would like to study your conclusions. |
| Your body language will contribute to the impact your result will have.           |                                                                                                         |

# 3 Structuring written reports

Now that you understand how to prepare for communicating findings, it’s time to learn how to structure your reports. You'll also learn the importance of reproducibility (work smarter, not harder) and how to get to the point when describing your findings. You’ll then get to apply all you’ve learned to a real-world use case as you create a compelling report on credit risk.

## 3.1 Types of reports

## 3.2 Something to report

You need to present a report regarding your findings about customer churn and the predictive model you used, which you worked on in Chapter 1. Your project manager asks you to write it according to industry standards. You're aware that this requires you to follow a strict structure. Your manager also specifies that the report will be shared with technical stakeholders.

First, you write the sections separately: it's easier to handle that way. Then comes the time to bring all the sections together.

Can you organize the sections you wrote for the report in the correct order?

### 3.2.1 Instructions

Order the report sections so that the first section should be on top and the last at the bottom.

1. The purpose of this report is to describe the results obtained from a model that predicts and identifies customers that will likely churn.

2. The data, gathered from the website, contains categorical data, such as gender, and internet service. It was converted to either 0 or 1 columns.

3. The datasets was split into train (70%) and test (30%) set. A K-Nearest Neighbors model was trained and model performance was evaluated.

4. As you can see in the graph, this reports analyzes the importance of different features such as monthly charges, contract type and phone service.

5. The model has an accuracy of 92% in predicting customers churn. The internet service type and discounts correlates with customer churning.

6. In summery, discounts on premium phone services should be implemented in order to retain customers.

## 3.3 In summary

Well done ordering the sections in your technical stakeholder report. Your project lead asks you to write a report to send to the directory board. They are non-technical stakeholders. You will revisit your previous report and tailor it as a summary report. But first, you want to refresh how a final report and a summary report differ.

Can you correctly classify the statements as characteristics of a final or summary report?

### 3.3.1 Instructions

Correctly classify the examples as a feature of either a final or summary report.

| Final report                                                                                    | Summary report                                                                                                |
|:-----------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------:|
| It should be written thinking about an audience that requires details.                          | Your report is about four pages long.                                                                         |
| The report includes detailed findings and results showing how you performed your data analysis. | You might include a link to the main document in case the stakeholder wants to inspect more detailed content. |
| Your report is about twelve pages long.                                                         | In the results section, you only include three graphs that simply explain the key findings                    |
|                                                                                                 | It should be written thinking about an audience that doesn't require details.                                 |

## 3.4 Reproducibility and references

## 3.5 Replicate me

Your manager asks you to write a report on your customer churn project for your peers at the New York office. She mentions that the team wants to replicate your work. After wrapping up the report, you add a link to your code repository. She looks confused and asks you why you did that.

You explain: **If the New York team wants to replicate my work, then they should have access to the same ___ and the same ___ I used. However, if they want to achieve ___ , they can use their own set of tools**.

Fill in the blank spaces by choosing the correct word combination from the options.

### 3.5.1 Answer the question

### Possible Answers

- [ ] data, code, replicability

- [ ] team, code, reproducibility

- [x] data, code, reproducibility

- [ ] team, data, replicability

## 3.6 Same results

Your manager is very interested in learning more about reproducibility. She asks you to give a short presentation at the weekly meeting. You're going to introduce the best practices to create reproducible data science results.

You prepare a slide presenting bad practices, and another one highlighting best practices.

Which of the following statements are considered best practices in reproducibility, and which should be avoided?

### 3.6.1 Instructions

Correctly classify the statements as best or bad practices.

| Bad practice                                                                                                                           | Best practice                                                                                                     |
|:--------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|
| For your machine learning model, do not set the random seed argument as it will always return the same result, eliminating randomness. | Use a software to handle references so you can correctly cite the sources you consult during the project work.    |
| Before you start analyzing the data, open the dataset in a text editor to delete all unwanted columns.                                 | Use a Jupyter notebook, so you can explain your thought and decision-making process at each step of the analysis. |
| Erase intermediate files created from your machine to avoid confusion.                                                                 | Create an extra file where you write down the Python packages and versions you use for the analysis.              |

## 3.7 Write precise and clear reports