# DS104 Data Wrangling and Visualization : Lesson Five Companion Notebook

### Table of Contents <a class="anchor" id="DS104L5_toc"></a>

* [Table of Contents](#DS104L5_toc)
    * [Page 1 - Introduction](#DS104L5_page_1)
    * [Page 2 - Heat Maps](#DS104L5_page_2)
    * [Page 3 - Issues with Heat Maps](#DS104L5_page_3)
    * [Page 4 - Heat Map Variations](#DS104L5_page_4)
    * [Page 5 - Heat Maps in R](#DS104L5_page_5)
    * [Page 6 - Tree Maps](#DS104L5_page_6)
    * [Page 7 - Tree Maps in R](#DS104L5_page_7)
    * [Page 8 - Mosaic Plots](#DS104L5_page_8)
    * [Page 9 - Mosaic Plots in R](#DS104L5_page_9)
    * [Page 10 - Key Terms](#DS104L5_page_10)    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Introduction<a class="anchor" id="DS104L5_page_1"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

In [1]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Histogram and Bar Charts
VimeoVideo('241240314', width=720, height=480)

# Introduction

This lesson explores the tools available for displaying qualitative data, including heat maps, tree maps, and mosaic plots. 

By the end of the lesson, you will be able to create: 

* Heat maps
* Tree maps
* Mosaic plots

This lesson will culminate in a brief exam on the difference between these three qualitative data visualization tools.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Heat Maps<a class="anchor" id="DS104L5_page_2"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Heat Maps

Heat maps are pretty intuitive. As a child, you were taught that anything hot is dangerous (fire, stove, etc.) and so you probably connote red as danger, and green as safe. Even today, when you are using your GPS in your phone to map out a route to your destination, the roads that are red are crowded and should be avoided, whereas the roads that are green indicate traffic is flowing freely. Another example is traffic lights - red is stop, and danger, green is go, and safe.  This thought process makes heat maps easy to interpret! 

Check out this demonstration, and see if it makes sense to you:

![A graph that is based on colors represents the comparison of date against a list of country. The list of countries are placed on the left side of the graph and is represented in five different groups. The first group contains Canada, Mexico, and the United States. The second group contains China, India, Japan, Singapore, South Korea, and Taiwan. The third group contains Australia, Brazil, and South Africa. The fourth group contains Austria, France, Germany, Netherlands, Switzerland, United Kingdom, Greece, Italy, Spain, and the Euro Area. The fifth group contains the Czech Republic, Hungary, Poland, Russia, and Turkey. The last row represents Global.](Media/L07-01.png)

The title was purposely left off.  Along the top, it shows every month from June, 2008 to May, 2013. Along the left edge, there are countries listed.

This heat map shows each country's Purchasing Managers' Index (PMI), which is supposed to be a good preliminary indicator for overall economic health, with emphasis on manufacturing. The worldwide economy was a mess in 2008. Note that the existence of colors makes it so that on a macro level, it doesn't really matter what the numbers are. It is easy to see here that from Oct 08 to about Jun 09, things were not great, since they are all red. However, certain regions showed improvement from Mar 10 to Apr 11, particularly the US and Europe, although Greece, Italy, and Spain continued to struggle for most of the next 5 or 6 years.

The point here is this: with hardly any prior knowledge, even the novice observer would immediately be drawn to the big red blob near the left edge, and just a bit more than a cursory glance would draw someone to the green area. This bit of visualization does its job.

---

## Inputs for Heat Maps

So, what kind of inputs are needed for a heat map? There should be a quantitative dependent variable. In the example above, it is the PMI for that particular country for that particular month. There should also be two variables that indicate the rows and columns where the single point of data will be found. They should be categorical.

If one or both factors are categorical, it is customary to have some sort of ordering. For instance, in the heat map shown above, they are more or less grouped by region. Countries could also be ordered by population, GDP or some other economic indicator, or just about anything. Listing countries alphabetically doesn't really make much sense though, because it doesn't account for some logical ordering. This is a case where the ordering of a categorical variable should help illustrate some sort of narrative.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - Issues with Heat Maps<a class="anchor" id="DS104L5_page_3"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Issues with Heat Maps

One of the caveats about heat maps is this: Most tools that automatically create a heat map take the entire range of numbers, and set the lowest equal to the reddest shading, and the highest is set to the greenest shading. So, shading is a helpful tool, but you should look closely at what the particular numbers are before drawing solid conclusions. Here are a few examples:

![A 10 by 10 grid box is represented in colors. The column headings are labeled period 1, period 2, period 3, period 4, period 5, period 6, period 7, period 8, period 9, and period 10. The row headings are labeled location 1, location 2, location 3, location 4, location 5, location 6, location 7, location 8, location 9, and location 10.](Media/L07-03.png)

This is a typical heat map, with made up numbers that don't mean anything. Note the region of low numbers (red), and the region of high numbers (green).

Compare that heat map with the following one:

![A 10 by 10 grid box is represented in colors. The column headings are labeled period 1, period 2, period 3, period 4, period 5, period 6, period 7, period 8, period 9, and period 10. The row headings are labeled location 1, location 2, location 3, location 4, location 5, location 6, location 7, location 8, location 9, and location 10.](Media/L07-04.png)

The numbers are different, but the shading is identical. Each cell in this second spreadsheet is exactly four times the number in the corresponding cell of the first spreadsheet, plus twenty-five. The point being made here is this: It is dangerous to compare two different heat maps to each other, because the 'red' cells in one heat map can reflect values that differ greatly from the 'red' cells in another heat map. Never compare two different heat maps to each other.

Now look at this heat map:

![A 10 by 10 grid box is represented in colors. The column headings are labeled period 1, period 2, period 3, period 4, period 5, period 6, period 7, period 8, period 9, and period 10. The row headings are labeled location 1, location 2, location 3, location 4, location 5, location 6, location 7, location 8, location 9, and location 10. The first box in the top left corner and the last box in the bottom right corner are shaded.](Media/L07-05.png)

This is the same spreadsheet snippet from the first example, with two changes. The upper left hand corner cell was changed from 77 to 257, and the lower right hand corner cell was changed from 70 to 14. All of the remaining cells are between 63 and 88.

Creating the heat map automatically sets the smallest number to 'red,' and the largest number to 'green.' Everything in between those numbers is scaled using some sort of color gradient. In this case, it is red - orange - yellow - yellow/green - green. The extremes cause every other cell to be yellow or slightly orange, but the differences are subtle. The total range for all cells but the two corner cells is only 25, whereas the range for the entire sheet is 243.

The moral of this story is that heat maps are **EXTREMELY** sensitive to outliers. So sensitive that the outliers tend to wash out any other information available by using a heat map. In other words, heat maps are best for situations where data are fairly well-behaved, and the user is looking for clustering of high or low numbers. Your eyeball analysis is much better at doing this for colors than it is for raw numbers.

![A 10 by 10 grid box represented in colors. The column headings are labeled period 1, period 2, period 3, period 4, period 5, period 6, period 7, period 8, period 9, and period 10. The row headings are labeled location 1, location 2, location 3, location 4, location 5, location 6, location 7, location 8, location 9, and location 10.](Media/L07-06.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - Heat Map Variations<a class="anchor" id="DS104L5_page_4"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Heat Map Variations

Below, you will learn about a few more heat map variations, including those with negative numbers, those eschewing the red-green color scheme, and those that are actually placed upon maps.

---

## Heat Map with Negative Numbers

Heat maps work fine with negative numbers. This heat map is identical to the first one again. This time, the numbers are scaled from -140.4 to 159.6, but each cell occupies the same 'location' on that scale as the cells did in the first heat map.

If you are now getting a sense of a heat map basically being a color coding of where a cell resides on the scale of all possible values in a spreadsheet, you have a thorough understanding of it.

---

## Heat Maps with Other Colors

Before going into a few more examples of heat maps, you should also know that most software packages and commands that create heat maps also allow the creator some options for color scaling. It doesn't always have to be "largest numbers are green, smallest numbers are red." Check out the heatmap below:

![A box with 19 columns and 37 rows. The column headings are labeled E217S299R783, E217S299R784, E217S300R787, E217S300R786, E217S300R785, E217S299R782, E111S150R567, E202S185R546, E202S196R585, E202S188R552, E111S150R568, E202S185R545, E199S255R449, E199S255R448, E199S255R450, E202S192R557, E202S190R553, E111S150R566, E144S184R737, E202S186R548, E202S186R549, E202S186R547, E202S192R556, E202S188R550, E202S192R558, E202S185R544, E202S188R551, E202S196R563, E144S184R738, E202S196R582, E202S194R560, E202S190R554, E202S190R555, E202S194R559, E189S232R190, E189S232R386.](Media/L07-07.png)

This heat map has two categorical predictors, and no numbers shown. This is obviously a case where the values being mapped are not terribly important, whereas the shading is. This is a heat map of gene expression values in several conditions.

The takeaways here are:

* Heat maps can just show colors without data and still be meaningful.
* Heat maps can use categorical variables on both the horizontal and their vertical axes.
* The scales can be on top or on bottom, or on the left or the right.

The heat map below has two categorical predictors. Again, there are no numbers in the cells. The scaling at the bottom clearly shows that faster response times are green, while slower response times are red. What you don't know is if the difference between the fastest and slowest response times is measured in milliseconds or hours.

The takeaway here is this: When using heat maps, you need to have integrity with your data and your story. It would be easy to bully someone armed with this heat map. Maybe you want to point out to the manager that their response times in the middle of the day are horrible, without acknowledging the fact that most customers are asleep at 2:00 AM. As with most everything in statistics and visualization, data and pictures can be (knowingly or unknowingly) abused to misrepresent things.

![A table labeled six weeks of aggregate average response time data, by day of week and hour of a day. The table has seven columns labeled Sun, Mon, Tue, Wed, Thu, Fri, and Sat. The row headings are labeled Midnight, 01:00 AM PT, 02:00 AM PT, 03:00 AM PT, 04:00 AM PT, 05:00 AM PT, 06:00 AM PT, 07:00 AM PT, 08:00 AM PT, 09:00 AM PT, 10:00 AM PT, 11:00 AM PT, Noon, 01:00 PM PT, 02:00 PM PT, 03:00 PM PT, 04:00 PM PT, 05:00 PM PT, 06:00 PM PT, 07:00 PM PT, 08:00 PM PT, 09:00 PM PT, 10:00 PM PT, and 11:00 PM PT.](Media/L07-08.png)

---

## Literal Heat Maps

Heat maps often are literally maps, and literally show heat. The takeaway here is that Sicily, Sardinia, and Tunisia are lousy vacation spots in August.

![A map depicting the locations of Rome, Tirana, Athens, Split, and Podgotra. The map is labeled 7 August 2017 at the bottom left corner of the map.](Media/L07-09.png)

The next heat map shows regional sales for a large company.

![A map depicting the cities of the United States. Four different regions are displayed in four different colors.](Media/L07-10.png)

Note that the heat maps that are overlaid onto literal maps use some sort of mapping lookup or similar device to convert the city name (for instance) into a location on the map. You don't have to come up with hundreds of accurate latitude and longitude values for each data point. The software will usually take care of that work for you.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Heat Maps in R<a class="anchor" id="DS104L5_page_5"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Heat Maps in R

In order to create heat maps in R, you can use the function ```heatmap()``` from base R.  You can use **[this stock data](https://repo.exeterlms.com/documents/V2/DataScience/Data-Wrang-Visual/stockdata.zip)**. 

In order to use the ```heatmap``` function, you must have your data formatted as a matrix, and all of your data must be numeric. 

Start by removing the date column from this data:

```{r}
stockdata1 <- stockdata[,2:11]
```

Then use the function ```as.matrix()``` to reformat the data as a matrix, rather than as a data frame: 

```{r}
stockdata2 <- as.matrix(stockdata1)
```

And lastly, run the ```heatmap()``` function: 

```{r}
heatmap(stockdata2)
```

And here is the resulting graphic: 

![A box with ten columns labeled AX{, CAT, WMT, VZ, PFE, MMM, INTC, TRV, HD, and IBM. The columns are grouped and connected to each other.](Media/qual1.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Tree Maps<a class="anchor" id="DS104L5_page_6"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Tree Maps

Heat maps and tree maps have some similarities:

* The layout is often done on a rectangular format.
* Colors are used strategically.

But they also have some differences. As opposed to heat maps, tree maps:

* Rely on a 'nested' perspective.
* Use cell size as a measurement (usually how large or small something is). 

Take a look at this tree map:

![A box is divided into thirteen parts. Each part is shaded in different colors. A few boxes are labeled 3.3-percent monolithic integrated circuits digital, 2.6-percent turbo-jet engines of a thrust less than 25 KN, 2.5-percent penicillins and streptomycins, derivs, in, 1.4-percent Antisera and other blood fractions.](Media/L07-11.png)

![Thirty-four boxes shaded in thirty-four different colors and they are labeled machinery, electronics, aircraft, boilers, ships, metal products, construction equipment and materials, home and office material, pulp and paper, beer, spirits and cigarettes, food processing, petrochemicals, inorganic salts and acids, other chemicals, agrochemicals, chemicals and health-related products, coal, mining, oil, precious stones, garments, cereals and vegetable oils, cotton, rice, soy beans and others, tropical treetops and flowers, tobacco, fruit, miscellaneous agriculture, fish and seafood, meat and eggs, animal fibers, milk and cheese, leather, and not classified.](Media/L07-12.png)

This tree map shows U.S. exports in 2012. The table below the tree map indicates which color represents which segment of industry. Each color is broken up into several smaller rectangles, each representing a subset of the segment.

Several of the rectangles have writing within their border. For the bigger rectangles, both the overall percentage and subset specifics are spelled out. This is common with tree maps. Sometimes creators make the mistake of trying to cram as much info as possible until the font size is unreadable - this kind of defeats the purpose of visualization in the first place, but the attraction to include as much info as possible is strong.

Another common problem with tree maps is too many categories. As you look at this tree map, you can see 13 distinct colors for the rectangles on the tree map. On the other hand, the legend has 34 different colors, many of which are pretty indistinguishable from each other. For instance, look at the colors for ```Electronics``` and ```Boilers```. On the legend, their difference is subtle, but if they were on the tree map, it would be difficult to tell them apart.

Once again, the size is relative to some sort of metric. It might be the relative proportion of the size of the subset, or it might be some other metric.

Take a look at this tree map:

![A box is divided into nineteen parts. The parts are labeled Germany, United Kingdom, France, Italy, Romania, Netherlands, Belgium, Bulgaria, Austria, Sweden, Portugal, Czech Republic, Hungary, Denmark, Southern and Eastern Ireland Slovakia, Spain, and Poland. Each part is subdivided into a few parts that are labeled with the city names.](Media/L07-13.png)

The big blocks are obviously countries in Europe. The sub-blocks are locations within each country, probably the equivalent of states. The size of the big blocks represent the number of elderly (65 and over) in each country and location within country. Then, the colors are the ratio of elderly people within the locale. The red colors are most densely populated with elderly, whereas the blue colors are least densely populated with the elderly.

As with the heat maps, the colors are relative, and without a key, the tree map is good for relative comparisons, but not absolute comparisons. For instance, you can definitively say that overall, Germany has a larger portion of their population that is elderly than Poland, but by how much? It is impossible to say.

---


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Tree Maps in R<a class="anchor" id="DS104L5_page_7"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Tree Maps in R

Here is **[the data you will be using](https://repo.exeterlms.com/documents/V2/DataScience/Data-Wrang-Visual/datascience_posts.zip)** for tree maps.  It shows data science web posts and the categories they fall into and the number of views and comments they received. 

In order to create tree maps in R, you will need to install and load two different libraries: ```treemap``` and ```scales```.  Then you can go ahead and use the ```treemap()``` function, as follows: 

```{r}
treemap(datascience_posts, index=c("category"), vSize="views", type="index")
```

```datascience_posts``` is your data.  Then you can use the ```index=``` argument to specify a vector of indices - in this case ```category```.  This will be the labels on your boxes.  Then you can use the argument ```vSize``` to determine that the number of ```views``` will determine the size of the boxes.  Lastly, you'll need to indicate that you want to graph the index for your tree map, using the argument ```type=``` and specifying ```"index"```.  The result is a visually pleasing, relatively easy to read tree map! 

![A box labeled views is divided into twenty different parts of different sizes. The parts are labeled featured, mapping, visualization, infographics, artistic visualization, data sources, network visualization, data design tips, miscellaneous visualization, statistical visualization, mistaken data, projects, self-surveillance, statistics, ugly visualization, software, miscellaneous, tutorials, and quotes.](Media/qual2.png)

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 8 - Mosaic Plots<a class="anchor" id="DS104L5_page_8"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Mosaic Plots

A mosaic plot is a graphical method of displaying two or more categorical variables. It gives a quick and dirty overview of the data, and makes it possible to recognize relationships between different variables. A good example is that independence is shown when the boxes across categories all have the same areas.

Here is a typical mosaic plot:

![A graph depicts the plot of industry against nation. The x-axis represents four industries and is labeled automobiles, electronics, food, and oil. The y-axis represents nation they are labeled Britain, France, Germany, Japan, and U S.](Media/L07-14.png)

Sometimes mosaic plots require a bit of scrutiny, but they can convey a bunch of fascinating information. For example, in the mosaic plot above, production in four different industries is presented for five different nations. Each nation is represented by a different color, and each vertical stack represents a different industry. Take a closer look:

* The purple boxes are for U.S. production. As a proportion of the total production of these five countries, the U.S. produces a lot more food and oil than autos. This is shown by the fact that the height of the food and oil portions for the U.S. is much taller than the autos portion. You might say that the production of food and oil is disproportionately large for the U.S.
* Food production for Germany is negligible. In fact, the biggest contribution for production from Germany is autos.
* Britain's largest contribution is food.
* The width of the columns is sized to be relative to production. Therefore, the greatest production is in the food industry.

There aren't too many options for enhancing a mosaic plot. Some mosaic plots show a scale on either the horizontal axis or the vertical axis (or both) to give an idea of the numbers or percentages being represented.

---

## Data Input for a Mosaic Plot

The input data for a mosaic plot typically consists of a table with three columns. The first two columns are the categorical variables, and the third column is usually a count, or a percentage.

---

## Mosaic Plots with a Third Categorical Variable

It is possible, but a little clumsy, to add a third categorical variable into a mosaic plot. For example, suppose you wanted to graphically represent the passengers on the Titanic. Suppose you have three categorical variables: The class (1st, 2nd, 3rd, and crew), gender (male and female), and survival (yes and no). This is clearly three different categorical variables, but two of them can be reduced to one.

To do this, consider the gender and survival variables. They can easily be combined to have four different levels:

* male-yes
* male-no
* female-yes
* female-no

Taking these data in a table format, they look like this:

![A table has six columns and four rows. The columns are labeled gender, survived, first class, second class, third class, and crew. The first two rows are labeled male and the second two rows are labeled female. The row entries are as follows. Row 1, no, 118, 154, 422, 670. Row 2, yes, 62, 25, 88, 192. Row 3, no, 4, 13, 106, 3. Row 4, yes, 141, 93, 90, 20.](Media/L07-15.png)

And the mosaic plot looks like this:

![A graph depicts the plot of four classes in titanic. The classes are labeled first, second, third, and crew and they are represented on the x-axis. The y-axis represents sex and they are labeled female yes, no, yes, male, and no.](Media/L07-16.png)

You might notice that the column width integrity is lost a bit here. For example, the width of the female first class section is much wider than the male first class section, but that is not because there were three times as many women in first class as there were men. What it means is that women passengers were heavily weighted to first, second, and third class passengers. There were very few female crew members.  The table above shows that there were only 23 of them, out of the 470 women on board.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 9 - Mosaic Plots in R<a class="anchor" id="DS104L5_page_9"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Mosaic Plots in R

Mosaic plots can be made in R with the library ```vcd```.  Here is **[the data you will be using](https://repo.exeterlms.com/documents/V2/DataScience/Data-Wrang-Visual/defects.zip)** for mosaic plots. 

In order to make use of the function ```mosaic()```, you will need to have your data formatted as a table.  In order to do that, you will first use the ```attach()``` function, and then will use the ```table()``` function, like this: 

```{r}
attach(defects)
defects2 <- table(Region, Defect)
defects2
```

When you call the table, ```defects2```, this is what you'll receive: 

```text
             Defect
Region        dead part old model runs hot wrong frequency wrong size
  Americas          148        11       20             148         38
  Asia               41        60       56              88         32
  Europe             23        45       51              40         71
  Middle East         6        13       16              43          9
```

Then you can proceed to plotting: 

```{r}
mosaic(defects2, shade=TRUE, legend=TRUE)
```

In the ```mosaic()``` function, you'll list your data table as the first argument, then use the argument ```shade=TRUE``` to indicate that you want different colors, and ```legend=TRUE``` to indicate that you'd like to see a key of the colors.

Here is the resultant plot: 

![Eight graphs are presented in eight rows and eight columns. The x-axis of a few of the graphs depict satisfaction level, last evaluation, number project, average monthly hours, time spend company, work accident, left, and promotion last five years.](Media/quant3.png)

---

## Summary

* Heat maps use color and location to help the user quickly identify 'hot spots' and 'cold spots.'
* Heat maps are often created on actual geographical maps.
* Tree maps are used to organize huge piles of data for easy visual categorization.
* Mosaic plots are great visual tools to plot two categorical variables on a single graph.

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 10 - Key Terms<a class="anchor" id="DS104L5_page_10"></a>

[Back to Top](#DS104L5_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Key Terms 

Below is a list and short description of the important keywords learned in this lesson. Please read through and go back and review any concepts you do not fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Heat map</td>
        <td>A visualization that shows the "hot spots" in data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Tree map</td>
        <td>A visualization that uses cell size as a measure of how large something is.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Mosaic plot</td>
        <td>A graph displaying two or more categorical variables.</td>
    </tr>
</table>

---

## Key R Code

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>heatmap()</td>
        <td>Creates a heatmap from a matrix.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>treemap()</td>
        <td>A function to create a tree map.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>index=</td>
        <td>An argument to treemap() that sets an index for the visual.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>vSize=</td>
        <td>An argument to treemap() that sets a particular variable as influencing the size of the squares on the plot.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>type="index"</td>
        <td>An argument to treemap() that ensures the index variable is utilized.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>mosaic()</td>
        <td>A function that creates a mosaic plot from a table.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>shade=TRUE</td>
        <td>An argument to mosaic() that provides colors.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>legend=TRUE</td>
        <td>An argument to mosaic() that provides an interpretive legend.</td>
    </tr>
</table>

---

## Key R Libraries

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>treemap</td>
        <td>Used for creating tree maps.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>scales</td>
        <td>Used for creating tree maps.</td>
    </tr>
</table>
