# What is data literacy?

## Introduction

For much of human history, most of the population didn't know how to read or write. Therefore virtually all of the writing that remains from ancient civilizations was produced by a tiny minority of highly-educated scribes (people who specialized in writing and reading).

Scribes were very important people in society. There were even statues made of them! Here's a statuette of a famous Egyptian scribe from 3,500 years ago. This man was called Minnakht, and you can see him reading hieroglyphs from the papyrus on his lap:

![Egyptian scribe](data/images/minnakht.jpg)

Fast-forward to the twenty-first century and you have [a world in which](https://en.wikipedia.org/wiki/List_of_countries_by_literacy_rate) 86% of all people over 15 know how to read and write in their native language. The figure is even higher (nearly 100%) in most developed countries. Imagine if we had to hire a scribe in order to write an email, make a grocery shopping list, or understand TV ads!

**Data literacy** is becoming the new literacy. Increases in computer processing and storage power are adding to the value of data, and those who know how to work with data are the new scribes. Today, understanding how to interpret data is becoming as important as traditional literacy. 

Not many people can reliably interpret a bar chart, a line chart, or a frequency table, yet being able to do so would make them much more attuned to the world around them. People with data literacy skills can find better jobs, make better health decisions, plan their financial lives more judiciously, and be more informed citizens. Many companies hire highly-skilled data analysts and data scientists to help them make sense of all the data they routinely gather, but sometimes these people struggle to make the wider organization understand what they do and how the insights they produce can be useful for the business. In some sense, data workers are modern scribes.

It doesn't have to be this way. Although data scientists and analysts are still necessary, the modern economy demands people who are confident about their data skills and can help disseminate and take action on data-derived knowledge across their organizations. The good news is that just as we learned how to read and write in our native human languages in school, we can also learn how to read and write in the language of data. That is what this program is all about!

But what *is* data literacy? We define it as *the ability to read, write, translate, and think about data*:

<img src="data/images/data_literacy_diagram.svg" alt="COVID dashboard" style="width:620px">

Let's break down what we mean by this.

## Reading data

![Read](data/images/read.png)

In order to become proficient at reading data, you need to develop at least two fundamental skills:

1. Recognizing the symbols used to represent data. When reading text, we need to recognize letters and words - in data, we need to understand what each chart type, table, statistic, etc. means.
2. Understanding the relationships between those symbols so that they can be connected to create coherent narratives or stories (just like how words are strung together to create articles and books).

In that sense, data analyses are like stories that need to be read correctly to be fully understood, similar to when you read a story in the news. In a proper data analysis, you will have charts, tables, and statistical quantities that, when considered together, can effectively answer the questions that the analyst needs to answer. Below are some examples of *dashboards* (organized aesthetic pages consisting of charts, tables, and statistical quantities). Don't worry if you don't understand them yet - by the end of this program you will be reading (and *creating*!) beautiful, professional dashboards like these.

### Johns Hopkins University coronavirus dashboard

When the COVID-19 global pandemic broke out in early 2020, Johns Hopkins University began collecting health data from the Internet and using it to create visualizations and tables that have helped governments all around the world make better public health decisions. Click on the image below to access the dashboard:

<a href="https://coronavirus.jhu.edu/map.html"><img src="data/images/covid.png" alt="COVID dashboard" style="width:620px"></a>

### Discussion 1

1. From the map alone, which countries or continents seem to have the fewest confirmed cases of COVID-19? (In the map, the red bubbles represent the number of confirmed cases - the larger the size of the bubble, the larger the number of cases in that particular location.)
2. Now look at the table in the left sidebar. Do the figures there agree with your initial impressions?
3. Repeat steps 1 and 2 but with the U.S. map (access it by clicking on the "U.S. Map" tab at the top of the dashboard). In this map, instead of bubbles you have tiles that get darker as the number of cases gets larger. Were your guesses more accurate this time?

### Web analytics dashboard

Have you ever wondered how Amazon knows what the most viewed products are on their website? The answer is data analytics, of course! More specifically, web analytics. Web analysts look at dashboards that tell them in real-time what the most clicked-on pages are on a website, how much time users spend looking at their content, and even how many users made a purchase during their visit. The industry standard tool for web analytics is Google Analytics. Click on the image to access a Google Analytics demo account with data from a real e-shop (it requires a Gmail account):

**Note:** If you are redirected to a Google Support page, just scroll down until you find a link that says `ACCESS DEMO ACCOUNT` or `Universal Analytics property: Google Merchandise Store (web data)` and click on it.

<a href="https://analytics.google.com/analytics/web/demoAccount"><img src="data/images/ganalytics.png" alt="Google Analytics dashboard" style="width:620px"></a>

This dashboard has data from the Google Merchandise Store, which is a website that sells Google swag. The Google Analytics home is a bit daunting at first sight because of its seemingly endless assortment of menus and options, but that is exactly what power users like about it! To get an idea about the kind of information that Google Analytics can offer, let's find out what products drive the most revenue on the website.

Go to `Conversions`, `E-commerce`, and then `Product performance` on the left sidebar (alternatively, follow the steps shown in the animation below):

![Product performance tutorial](data/images/product_performance_tutorial.gif)

If you scroll down a bit to the table, you will see that the first column has the names of the products sold in the store, and the second column tells you the revenue they have generated in the associated time period. Which products have generated the most revenue?

### NASA's Earth Observation Dashboard

Satellite imagery has become a very important source of data for scientists in the last decades, and its huge potential for many applications is still only starting to be realized. One institution that has produced a large number of satellite images over the years is NASA (you can view some of them [here](https://earthobservatory.nasa.gov/map)). What is even more interesting is that NASA has a publicly accessible dashboard - the Earth Observing (EO) Dashboard - that aggregates data from multiple sources and combines them with its own satellite data to create amazing analyses. Click on the image below to access NASA's EO Dashboard. It will open a global map of CO<sub>2</sub> air pollution:

<a href="https://eodashboard.org/?poi=W4-N2&indicator=N2"><img src="data/images/nasa.png" alt="NASA dashboard" style="width:620px"></a>

Once you're on the website, click on the "Full Screen" button at the top right of the map to access the map in full screen mode.

![Full screen map](data/images/full_screen_button.png)

### Discussion 2

On the map, the redder the area, the higher the concentration of CO<sub>2</sub>, and the bluer the area, the lower the concentration. White areas have CO<sub>2</sub> concentrations that are intermediate. Can you detect any interesting patterns just from looking at the map?

### U.S. Census Bureau statistical profiles

The U.S. Census Bureau is the government body in charge of providing data about the nation's people and economy. It routinely conducts surveys and occasional censuses (typically every ten years) to get snapshots of many important demographic and economic variables that policymakers can use to make informed decisions. There are many ways to consult the U.S. Census Bureau data, but one of the most popular is via its interactive statistical profiles. Click on the image below to access the United States' statistical profile. This is a dashboard that contains figures and charts created with data from surveys conducted in 2012, 2016, and 2019:

<a href="https://data.census.gov/cedsci/profile?g=0100000US"><img src="data/images/us_census.png" alt="US statistical profile" style="width:620px"></a>

### Discussion 3

1. Read the figures just below the map at the top of the profile. Do you find them surprising? Why?
2. There is a treasure trove of additional information in this profile. Scroll down and read the sub-headers to get a sense of all the categories on this dashboard that have data.
3. If you scroll all the way down, you will find links to state profiles. Choose your state and see what interesting facts you can find about it!

## Writing data

![Write](data/images/write.png)

Once you become familiar with the art of reading data and interpreting dashboards, you can begin learning how those dashboards and analyses are created. Here is a sneak peek of some of the tools that you will be using to "write" data in the coming weeks. You can think of them as your data "pen and paper."

### Excel

Many data analyses start with an Excel spreadsheet. Some of the key factors which make Excel such a useful tool are:

* It is easy to learn
* It is very intuitive and the user interface just "makes sense"
* You can relatively easily calculate some fairly sophisticated data quantities 
* Many data analytics tools can work directly with Excel files

One of the most helpful things about Excel is that you can actually "look" at your dataset as soon as you open it. Here is an example:

![Looking at a dataset in Excel](data/images/excel.gif)

All of the data in this dataset is immediately visible by navigating the Excel user interface. This is NOT the case with most other data tools.

### Looker Studio

Looker Studio, formerly Google Data Studio, is the dashboarding tool of choice for this program. Some reasons for this are its ease of use, its wide range of visualization options, and its highly customizable aesthetics. Whenever you stumble upon a beautiful dashboard on the Internet, there is a good chance that it was made using Looker Studio or a commercial dashboarding tool (e.g. Tableau, Power BI).

### Discussion 4

Click the image below to go to Looker Studio's Public gallery:

<a href="https://datastudio.google.com/gallery"><img src="data/images/gds.png" alt="Looker Studio Public Gallery" style="width:620px"></a>

Pick a dashboard that catches your eye. Share it with the class. What made you choose it?

### SQL and Python

Some complex tasks can be difficult to accomplish using Excel alone. In those cases, you can use other tools like SQL and Python. *SQL (Structured Query Language)* is a programming language that uses basic English words to search for or calculate specific things using your available datasets. To give you an idea of what working with SQL looks like, we have recorded a GIF of a data scientist querying a more extensive database consisting of hundreds of thousands of data points:

![SQL demo](data/images/sql.gif)

Larger databases like this one quickly become impractical to manage using Excel.
 
Python is another tool that lets you perform very complex analyses on your data. Many *AI (artificial intelligence)* applications, like self-driving cars or stock market prediction models, are developed using Python. Although most of this program will focus on teaching you Excel and dashboarding, we will also introduce you to SQL and show you a little of what Python can do.

## Translating your data

![Translate](data/images/translate.png)

As you progress as a data professional, you will find that not everyone around you will be as data-savvy as you. In many situations, you will be asked to take data analyses and their associated charts, graphs, tables, and/or numbers and explain them to a non-technical audience. A typical scenario would be in a business meeting, in which you'd have to present and explain the significance of metrics to stakeholders who are not very familiar with data. Your analyses may seem like Egyptian hieroglyphs to them, so you will be expected to translate! We will teach you in later cases how to do this effectively.

### Discussion 5

Swedish scholar [Hans Rosling](https://en.wikipedia.org/wiki/Hans_Rosling) was renowned for his excellent skills as a data translator. Watch the first 5 minutes of this lecture he delivered back in 2006, then discuss with your classmates what you found interesting or noteworthy (the video will open in a new tab):

<a href="https://www.youtube.com/watch?v=hVimVzgtD6w"><img src="data/images/rosling.png" alt="Hans Rosling TED talk" style="width:620px"></a>

## Thinking about data

![Think](data/images/think.png)

Data literacy is about reading, writing, and translating data *with a purpose*. That purpose is usually made explicit in the form of a "research question" or "business question." Coming up with great questions that are feasible and pertinent with the aim to produce actionable insights is probably the most important skill you will need to develop to truly become data literate - i.e., the art of "thinking about data."

Thinking about data involves:

* Defining a good research or business question
* Determining what data needs to be gathered in order to answer the question
* Avoiding common pitfalls when interpreting statistical quantities
* Being aware of what can and can't be inferred from a limited amount of data points

### Discussion 6

Earlier, we looked at a demo Google Analytics dashboard and tried to answer a very simple business question: "What products drive the most revenue on the Google Merchandise Store website?". Now come up with other business questions that you think would be interesting to the store's Management. Share them with the class.

## Conclusions & Takeaways

Data literacy is one of the most important skills to have in the 21st century. Data literate people are able to make better financial decisions, are more employable, and are generally more informed citizens. We can group data literacy skills into four main pillars:

1. "Reading" data - being able to read data representations such as charts, tables, and statistical figures, which not only involves understanding what they mean by themselves but also understanding how they make up part of a coherent narrative
2. "Writing" data - being able to produce charts, tables, statistical quantities, and dashboards for other people to read.
3. "Translating" data - communicating data analyses to non-technical audiences
4. "Thinking" about data - understanding how your analyses relate to the business questions that matter to your stakeholders

You will encounter many interesting concepts and tools along your journey as a data professional. Some of them will be challenging, but we are all here to help you and make sure you learn as much as possible and enjoy the journey!

## Attribution

[Scribe Statue of Min-nakht, Walters Art Museum, Creative Commons Attribution-Share Alike 3.0 Unported](https://commons.wikimedia.org/wiki/File:Egyptian_-_Scribe_Statue_of_Min-nakht_-_Walters_22230_-_Three_Quarter.jpg)