# Share Data Through the Art of Visualization

Notes from this course: https://www.coursera.org/learn/visualize-data/

## Module 1: Visualize data

### Learning log

#### Understand data visualization
- Rule for creating data visualization
    - Your audience should know exactly what they're looking at within the first five seconds of seeing it
    - This means the visual should be clear and easy to follow. In the five seconds after that, your audience should understand the conclusion your visualization is making
- They might not agree with your conclusion, and that's okay. You can always use their feedback to adjust your visualization and go back to the data to do further analysis
- Four elements of successful visualization
    - Information (data)
    - Story (concept)
    - Goal (function)
    - Visual form (metaphor)
- Frameworks for organizing your thoughts about visualization
    - Frameworks help organize your thoughts about data visualization and give you a useful checklist to reference as you plan and evaluate your data visualization
- [The McCandless method](https://informationisbeautiful.net/visualizations/what-makes-a-good-data-visualization/)
    - Four elements:
        - Information: the data with which you’re working
        - Story: a clear and compelling narrative or concept
        - Goal: a specific objective or function for the visual
        - Visual form: an effective use of metaphor or visual expression
    - Provides terminology that isolates the specific elements of a graphic, allowing the person making a visual the ability to evaluate how well those criteria have been met
    - Visualizations that fail to incorporate all four elements can be ineffective at communicating insights in various ways
        - Visual form without a goal, story, or data could be a sketch or even art
        - Data in visual form without a goal or function is just a pretty picture
        - Data with a goal but no story or visual form can be boring
- [Kaiser Fung’s Junk Charts trifecta checkup](https://junkcharts.typepad.com/junk_charts/junk-charts-trifecta-checkup-the-definitive-guide.html)
    - This approach is a set of questions that can help consumers of data visualization critique what they are consuming and determine how effective it is
    - Questions to determine if your data visualization is effective
        - What is the practical question?
        - What does the data say?
        - What does the visual say?
- Pre-attentive attributes
    - Creating effective visuals means leveraging what is known about how the brain works, and then using specific visual elements to communicate the information effectively
    - Pre-attentive attributes are the elements of a data visualization that people recognize automatically and without conscious effort
    - The essential, basic building blocks that make visuals immediately understandable are called marks and channels
    - Marks
        - Are basic visual objects such as points, lines, and shapes
        - Every mark can be broken down into four qualities
            - Position
                - Where is a specific mark in space relative to a scale or to other marks?
                - For example, if you’re looking at two different trends, position allows you to compare the pattern of one element relative to another
            - Size
                - How big, small, long, or tall is a mark?
                - The comparison of object sizes can be an easy visual interpretation for humans
                - This can be very useful for conveying the relationship between categories or data points
                - However, this also presents a potential problem: The human eye can inadvertently interpret comparisons that aren’t intended to convey meaning. For example, sometimes objects that appear to be the same size when they are not. Controlling the scale of a visual is important even when comparative sizes are not intended to offer information.
            - Shape
                - Does the shape of a specific object communicate something about it?
                - Rather than using simple dots or lines, a bit of creativity can enhance how quickly people are able to interpret a visual by using shapes that align with a given application
            - Color
                - What color is a mark?
                - Colors can be used both as a simple differentiator of groupings or as a way to communicate other concepts such as profitable versus unprofitable, or hot versus cold
    - Channels
        - Are visual aspects or variables that represent characteristics of the data in a visualization
        - They are basically specialized marks that have been used to visualize data
        - It’s important to understand that channels vary in terms of how effective they are at communicating data based on three elements:
            - Accuracy
                - Are the channels helpful in accurately estimating the values being represented?
                - For example, color is very accurate when communicating categorical differences, such as apples and oranges. But it is much less effective when distinguishing quantitative data, such as 5 from 5.5
            - Popout
                - How easy is it to distinguish certain values from others?
                - There are many ways of drawing attention to specific parts of a visual, and lots of them leverage pre-attentive attributes including line length, size, line width, shape, enclosure, hue, and intensity
            - Grouping
                - How effective is a channel at communicating groups that exist in the data?
                - Consider the proximity, similarity, enclosure, connectedness, and continuity of the channel
- Remember: The more you emphasize one single thing, the more that counts. Emphasis diminishes with each item you emphasize because the items begin to compete with one another
- Bar graph / Column chart
    - Use size contrast to compare two or more values
    - X-axis (horizontal) is used to represent categories, time periods, or other variables
    - Y-axis (vertical) is a scale of values for the variables
    - Bar charts with horizontal bars effectively show data that are ranked with bars arranged in ascending or descending order
    - Bar chart should always be ranked by value unless there's a natural order to the data like age or time
- Line graph / Line chart
    - Help your audience understand shifts or changes in your data
    - Used to track changes through a period of time
- Pie chart
    - Show how much each part of something makes up the whole
- Maps
    - Help organize data geographically
    - Can hold location-based information
- Histogram
    - A chart that shows how often data values fall into certain ranges
- Correlation charts
    - Show relationships among data
    - Should be used with caution because they might lead viewers to think that the data shows causation
    - Causation occurs when an action directly leads to an outcome
- Heatmap
    - Use color to compare categories in a data set
    - They are mainly used to show relationships between two variables and use a system of color-coding to represent different values
- Scatterplot
    - Show relationships between different variables
    - Typically used for two variables for a set of data, although additional variables can be displayed
    - For example, you might want to show data of the relationship between temperature changes and ice cream sales
- Distribution graph
    - Displays the spread of various outcomes in a dataset
    - Example: To account for its supplies, a brand new coffee shop owner wants to measure how many cups of coffee their customers consume, and they want to know if that information is dependent on the days and times of the week
- Meaningful patterns can take many forms, such as:
    - Change
        - This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart
    - Clustering
        - A collection of data points with similar or different values. This is best represented through a distribution graph
    - Relativity
        - These are observations considered in relation or in proportion to something else. You have probably seen examples of relativity data in a pie chart
    - Ranking
        - This is a position in a scale of achievement or status. Data that requires ranking is best represented by a column chart
    - Correlation
        - This shows a mutual relationship or connection between two or more things. A scatterplot is an excellent way to represent this type of data pattern
    
- List of resources for inspiration
    - [The data visualization catalogue](https://datavizcatalogue.com/#google_vignette)
        - This catalogue features a range of different diagrams, charts, and graphs to help you find the best fit for your project.
    - [The 25 best data visualizations](https://visme.co/blog/best-data-visualizations/)
        - In this collection of images, explore the best examples of data that gets made into a stunning visual.
    - [10 data visualization blogs](https://www.tableau.com/learn/articles/best-data-visualization-blogs)
        - Each link will lead to a blog that is a fountain of information on everything from data storytelling to graphic data
    - [Information is beautiful](https://informationisbeautiful.net/wdvp/gallery-2019/)
        - Founded by David McCandless, this gallery is dedicated to helping you make clearer, more informed visual decisions based on facts and data
    - [Data studio gallery](https://lookerstudio.google.com/gallery?category=visualization)
        - Information is vital, but information presented in a digestible way is even more useful. Browse through this interactive gallery and find examples of different types of data communicated visually. You can even use the data studio tool to create your own data-driven visual.
- One of the biggest considerations when creating data visualization is where you'd like your audience to focus
- As a general rule, as long as it's not misleading, you should visually represent only the data that your audience needs in order to understand your findings
- Correlation and causation
    - Correlation
        - In statistics, is the measure of the degree to which two variables move in relationship to each other
        - An example of correlation is the idea that “As the temperature goes up, ice cream sales also go up.”
        - It is important to remember that correlation doesn’t mean that one event causes another. But, it does indicate that they have a pattern with or a relationship to each other
        - If one variable goes up and the other variable also goes up, it is a positive correlation
        - If one variable goes up and the other variable goes down, it is a negative or inverse correlation
        - If one variable goes up and the other variable stays about the same, there is no correlation
    - Causation
        - Refers to the idea that an event leads to a specific outcome
        - For example, when lightning strikes, we hear the thunder (sound wave) caused by the air heating and cooling from the lightning strike. Lightning causes thunder.
    - Why is differentiating between correlation and causation important?
        - When you make conclusions from data analysis, you need to make sure that you don’t assume a causal relationship between elements of your data when there is only a correlation
        - When your data shows that outdoor temperature and ice cream consumption both go up at the same time, it might be tempting to conclude that hot weather causes people to eat ice cream. But, a closer examination of the data would reveal that every change in temperature doesn’t lead to a change in ice cream purchases. In addition, there might have been a sale on ice cream at the same time that the data was collected, which might not have been considered in your analysis.
        - Knowing the difference between correlation and causation is important when you make conclusions from your data since the stakes could be high.
        - The next two examples illustrate the high stakes to health and human services
            - Cause of disease
                - For example, pellagra is a disease with symptoms of dizziness, sores, vomiting, and diarrhea. In the early 1900s, people thought that the disease was caused by unsanitary living conditions. Most people who got pellagra also lived in unsanitary environments. But, a closer examination of the data showed that pellagra was the result of a lack of niacin (Vitamin B3). Unsanitary conditions were related to pellagra because most people who couldn’t afford to purchase niacin-rich foods also couldn’t afford to live in more sanitary conditions. But, dirty living conditions turned out to be a correlation only
            - Distribution of aid
                - Here is another example. Suppose you are working for a government agency that provides SNAP benefits. You noticed from the agency’s Google Analytics that people who qualify for the benefits are browsing the official website, but they are leaving the site without signing up for benefits. You think that the people visiting the site are leaving because they aren’t finding the information they need to sign up for SNAP benefits. Google Analytics can help you find clues (correlations), like the same people coming back many times or how quickly people leave the page. One of those correlations might lead you to the actual cause, but you will need to collect additional data, like in a survey, to know exactly why people coming to the site aren’t signing up for SNAP benefits. Only then can you figure out how to increase the sign-up rate
        - Key takeaways
            - Critically analyze any correlations that you find
            - Examine the data’s context to determine if a causation makes sense (and can be supported by all of the data)
            - Understand the limitations of the tools that you use for analysis
- Static visualization
    - Do not change over time unless they're edited
- Dynamic visualization
    - Visualization that are interactive or change over time
- Tableau
    - A business intelligence and analytics platform that helps people see, understand, and make decisions with data
- Decision tree
    - Decision-making tool that allows you, the data analyst, to make decisions based on key questions that you can ask yourself
    - Each question in the visualization decision tree will help you make a decision about critical features for your visualization
    - Example:
        - Which story would you like to tell?
            - Does your data have only one numeric variable?
                - Histogram
                - Density plot
            - Are there multiple datasets?
                - Line chart
                - Pie chart
            - Are you measuring changes over time?
                - Bar chart
            - Do relationships between the data need to be shown?
                - Scatter plot
                - Heatmap
    - Start off by evaluating the type of data you have and go through a series of questions to determine the best visual source
        - Does your data have only one numeric variable? 
            - If you have data that has one, continuous, numerical variable, then a histogram or density plot are the best methods of plotting your categorical data
            - Depending on your type of data, a bar chart can even be appropriate in this case. For example, if you have data pertaining to the height of a group of students, you will want to use a histogram to visualize how many students there are in each height range
        - Are there multiple datasets?
            - For cases dealing with more than one set of data, consider a line or pie chart for accurate representation of your data
            - A line chart will connect multiple data sets over a single, continuous line, showing how numbers have changed over time
            - A pie chart is good for dividing a whole into multiple categories or parts
            - An example of this is when you are measuring quarterly sales figures of your company
        - Are you measuring changes over time?
            - A line chart is usually adequate for plotting trends over time
            - However, when the changes are larger, a bar chart is the better option
        - Do relationships between the data need to be shown?
            - When you have two variables for one set of data, it is important to point out how one affects the other
            - Variables that pair well together are best plotted on a scatterplot
            - However, if there are too many data points, the relationship between variables can be obscured so a heat map can be a better representation in that case
            - If you are measuring the population of people across all 50 states in the United States, your data points would consist of millions so you would use a heat map

##### Further reading
- [The beauty of data visualization](https://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization?language=en#t-150183)
- [‘The McCandless Method’ of data presentation](https://artscience.blog/home/the-mccandless-method-of-data-presentation)
- [Information is beautiful](https://informationisbeautiful.net/)
- [Beautiful news](https://informationisbeautiful.net/beautifulnews/)
- [The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures](https://www.amazon.com/Street-Journal-Guide-Information-Graphics/dp/0393072959)
- [Correlation is not causation](https://towardsdatascience.com/correlation-is-not-causation-ae05d03c1f53?gi=a144ac47d077)
    - This article describes the impact to a business when correlation and causation are confused
- [Correlation and causation](https://www.khanacademy.org/test-prep/praxis-math/praxis-math-lessons/gtp--praxis-math--lessons--statistics-and-probability/a/gtp--praxis-math--article--correlation-and-causation--lesson)
    - This lesson describes correlation and causation along with a working example
- [From data to visualization](https://www.data-to-viz.com/)
    - This is an excellent analysis of a larger decision tree. With this comprehensive selection, you can search based on the kind of data you have or click on each  graphic example for a definition and proper usage
- [Selecting the best chart](https://www.youtube.com/watch?v=C07k0euBpr8)
    - This two-part YouTube video can help take the guesswork out of data chart selection. Depending on the type of data you are aiming to illustrate, you will be guided through when to use, when to avoid, and several examples of best practices. [Part 2](https://www.youtube.com/watch?v=qGaIB-bRn-A) of this video provides even more examples of different charts, ensuring that there is a chart for every type of data out there
    
#### Design data visualizations
- 

#### Visualization considerations
- 


## Module 2: Create data visualizations with Tableau

### Learning log

#### Topic

## Module 3: Craft data stories

### Learning log

#### Topic

## Module 4: Develop presentations and slideshows

### Learning log

#### Topic