## What Is Data Visualization  
 the techniques used to communicate data or information by encoding it as visual objects  
 **将数据转变为易于理解的图像**  

## Why Data Visualization  
You will never be an expert on the data you are working with, and will always need to explore the variables in great depth before you can move on to building a model or doing something else with the data  
**让我们更了解数据，进而设计更好的模型，更精确的解决问题**  

## Content Of Kaggle Data Visualization Tutorial  
1. Plot with pandas  
2. Plot using seaborn  
3. Plot using matplotlib  

## 利用 Pandas 进行单变量绘图    

### Types of charts  
1. Bar chart  
`df.plot.bar` -  善于处理标称和小序数分类数据  
2. Line chart  
`df.plot.line` - 善于处理顺序分类和区间数据  
3. Area Chart  
`df.plot.area` - 善于处理顺序分类和区间数据  
4. Histogram chart  
`df.plot.hist` - 善于处理区间数据   

### Bar chart (条状图) and categorical data  
**nominal categories: "pure" categories that don't make a lot of sense to order.(ike countries, ZIP codes, types of cheese, and lunar landers)**    
**ordinal categories: things that do make sense to compare, like earthquake magnitudes, housing complexes with certain numbers of apartments, and the sizes of bags of chips at your local deli**  
```python  
reviews['province'].value_counts().head(10).plot.bar()
(reviews['province'].value_counts().head(10) / len(reviews)).plot.bar()
reviews['points'].value_counts().sort_index().plot.bar()
```  

### Line chart  
**the tool of first choice for distributions with many unique values or categories**  
**weakness: unlike bar charts, they're not appropriate for nominal categorical data**  
```python
reviews['points'].value_counts().sort_index().plot.line()
```

### Area Chart  
**just line charts, but with the bottom shaded in**  
```python  
reviews['points'].value_counts().sort_index().plot.area()
```  
#### What is interval data  
Examples of interval variables are the wind speed in a hurricane, shear strength in concrete, and the temperature of the sun. An interval variable goes beyond an ordinal categorical variable: it has a meaningful order, in the sense that we can quantify what the difference between two entries is itself an interval variable.  


### Histograms chart(柱状图)
A histogram looks, trivially, like a bar plot. And it basically is! In fact, a histogram is special kind of bar plot that splits your data into even intervals and displays how many rows are in each interval with bars. The only analytical difference is that instead of each bar representing a single value, it represents a range of values  
**一种特殊的 bar 图，展示了一个范围内的数据**  
**weakness: they don't deal very well with skewed data**  

```python  
reviews[reviews['price'] < 200]['price'].plot.hist()
```  

## 利用 Pandas 进行 双变量 绘图  
**Many pandas multivariate plots expect input data to be in this format, with one categorical variable in the columns, one categorical variable in the rows, and counts of their intersections in the entries.**  

### Types of charts  
1. Scatter plot  
`df.plot.scatter` - 善于处理连续以及某些顺序无关的分类数据  
2. Hex plot  
`df.plot.hex` - 善于处理连续以及某些顺序无关的分类数据    
3. Stacked Bar chart  
`df.plot.bar(stacked=True)` - 善于处理连续以及顺序相关的分类数据  
4. Bivariate Line chart  
`df.olot.line()` - 善于处理连续以及顺序相关的分类数据  

### Scatter plot(散点图)  
simply maps each variable of interest to a point in two-dimensional space  
**为了应对数据太过密集导致的问题，通常需要降低采样率**  

```python  
reviews[reviews['price'] < 100].sample(100).plot.scatter(x='price', y='points')  
```  

### Hex plot(六角图)  
aggregates points in space into hexagons, and then colorize those hexagons  
```python
reviews[reviews['price'] < 100].plot.hexbin(x='price', y='points', gridsize=15)
```  

### Stacked plot(堆栈图)  
plots the variables one on top of the other    
**weakness:  **  
1. the second variable in a stacked plot must be a variable with a very limited number of possible values (probably an ordinal categorical, as here)  
2. interpretability
```python  
wine_counts.plot.bar(stacked=True)  
wine_counts.plot.area()  
```  

### Bivariate line chart(双变量线图)  
```python
wine_counts.plot.line()  
```