# DATA VISUALIZATION WITH PYTHON

## LESSON 1: INTRODUCTION TO DATA VISUALIZATION
Data visualization refers to the process of creating graphical representations of data in order to effectively communicate information and insights. This can include charts, graphs, maps, and other types of visualizations that make it easier to understand patterns, trends, and relationships within large sets of data.

There are two main reasons for creating visuals using data:

- **Exploratory analysis** is done when you are searching for insights. These visualizations don't need to be perfect. You are using plots to find insights, but they don't need to be aesthetically appealing. You are the consumer of these plots, and you need to be able to find the answer to your questions from these plots.
  
- **Explanatory analysis** is done when you are providing your results for others. These visualizations need to provide you the emphasis necessary to convey your message. They should be accurate, insightful, and visually appealing.

There are five steps to **Data Analysis**:

1. **Gathering Data** - Collecting data from a variety of sources, including databases, CSV files, and web pages.
2. **Cleaning Data** - Fixing errors, removing duplicates, and filling in missing data.
3. **Exploring Data** - Finding patterns, anomalies, and outliers.
4. **Analyzing Data** - Using statistical methods to answer questions. Here, we can use **Explanatory** or **exploratory** visuals.
5. **Share** - Share your **Explanatory** Visuals

### Python Data Visualization Libraries
In this course, you will make use of the following libraries for creating data visualizations:

- **Matplotlib**: a versatile library for visualizations, but it can take some coding effort to put together common visualizations.
- **Seaborn**: built on top of matplotlib, adds a number of functions to make common statistical visualizations easier to generate.
- **pandas**: while this library includes some convenient methods for visualizing data that hook into matplotlib, we'll mainly use it for its main purpose as a general tool for working with data.

## LESSON 2: DESIGN OF VISUALIZATIONS
Before getting into the actual creation of visualizations later in the course, this lesson introduces design principles that will be useful both in exploratory and explanatory analysis. You will learn about different data types and ways of encoding data. You will also learn about properties of visualizations that can impact both the clarity of messaging as well as their accuracy.

In this lesson, you'll learn about the following topics related to the design of data visualizations.

- What makes a bad visual?
- Levels of measurement and types of data
- Continuous vs. discrete data
- Identifying data types
- What experts say about visual encodings
- Chart Junk
- Data-to-ink ratio
- Design integrity
- Using color and designing for color blindness
- Shape, size, and other tools

Visuals can be bad if they:

- Don't convey the desired message.
- Are misleading.

### The Four Levels of Measurement
There are four levels of measurement that can be used to describe data:
Qualitative or categorical types (non-numeric types)
1. **Nominal data**: pure labels without inherent order (no label is intrinsically greater or less than any other). Example of nominal data include: 
   1. Gender
   2. Type of a fruit
   3. Nationality
   4. Genre of a movie
2. **Ordinal data**: labels with an intrinsic order or ranking (comparison operations can be made between values, but the magnitude of differences are not be well-defined). Example of ordinal data include:
   1. Size of a shirt
   2. Rating of a restaurant
   3. Level of education
   4. Letter grade in a class (A, B, C, D, F)
   
Quantitative or numeric types
1. **Interval data**: numeric values where absolute differences are meaningful (addition and subtraction operations can be made)
2. **Ratio data**: numeric values where relative differences are meaningful (multiplication and division operations can be made)

All quantitative-type variables also come in one of two varieties: **discrete** and **continuous**.

- **Discrete** quantitative variables can only take on a specific set values at some maximum level of precision. Examples include:
  - Number of children in a family, 
  - Number of times a person has been to the doctor
  - Number of pages in a book
  - Number of students in a class
- **Continuous** quantitative variables can (hypothetically) take on values to any level of precision. Examples include:
  - Height of a person
  - Weight of a person
  - Temperature
  - Amount of money in a bank account

### Chart Junk
Chart junk is any visual element that is not necessary for conveying the message of the visualization. Chart junk can include:
- Gridlines
- 3D effects
- Drop shadows
- Unnecessary text

### Data-to-Ink Ratio
The data-to-ink ratio is a measure of how much of the visual is used to convey the data versus how much is used to convey the visual itself.

### Design Integrity
Design integrity is the idea that the visual should be designed to convey the message as clearly as possible. This means that the visual should be designed to be as simple as possible while still conveying the message. This can include:

- Removing unnecessary elements
- Using color and shape judiciously
- Using a consistent style
- Using a consistent color scheme

### Color
Color can both help and hurt a data visualization.

Three tips for using color effectively.
- Before adding color to a visualization, start with black and white.
- When using color, use less intense colors - not all the colors of the rainbow, which is the default in many software applications.
- Color for communication. Use color to highlight your message and separate groups of interest. Don't add color just to have color in your visualization.

### Color Blindness
Color blindness is a condition where a person is unable to distinguish between certain colors. This can include:
- Red and green
- Blue and yellow

### Shape, Size, and Other Tools
In addition to color, there are other tools that can be used to convey information in a visualization. These include:
- Shape
- Size
- Orientation
- Texture