# Lecture 12 - Graphical polish

While graphical basics and Tufte's rules get you most of the way there, there are a few additional techniques and definitions, both in and out of R, that you'll need to know to deliver top-level graphics for reports and publications. 

## `Par()` specifics*

*Cribbed from Trevor Branch [here](https://canvas.uw.edu/courses/1220990/assignments/syllabus)

The `par()` function controls nearly every aspect of how a figure looks and mastering it is therefore essential to producting excellent graphics. The function is divided into three classes of arguments:

### 1. High‐level settings
These can be queried by `par()` and passed within most other plotting fuctions (`plot()` etc).

 Argument | Operation | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |Argument | Operation 
 :------- | :-------- | | :-------- | :--------
adj | Justification of text  | | font.sub | Font of subheading text
ann | Draw plot labels and titles  | | **lab** | Label text on axes
bg | Background color | | **las** | Rotation of text in margins
bty | Type of box drawn by box()  | | lend | Line end style (round, square)
**cex** | Size of text (multiplier)  | | ljoin | Line join style
cex.axis | Size of axis text | | lmitre | Line width limit for mitre (pointy)
cex.lab | Size of label text | | **lty** | Line type (solid, dashed)
cex.main | Size of main caption text  | | **lwd** | Line width
cex.sub | Size of subheading text  | | mgp | Placement of axis ticks and tick labels
**col** | Color of lines and symbols  | | **pch** | Data symbol types
col.axis | Color of axis text | | srt | Rotation of text in plot region
col.lab | Color of label text  | | tck | Length of axis ticks (rel to plot size)
col.main | Color of main caption text  | | tcl | Length of axis ticks (rel to text size)
col.sub | Color of subheading text  | | xaxp | Number of ticks on x‐axis
family | Font family for text | | **xaxs** | Calculation of scale range on x‐axis
fg | Foreground color | | xaxt | x‐axis style (standard, none)
font | Font face (bold, italic)  | | xpd | Clipping region (plot outside region)
font.axis|  Font of axis text | | yaxp | Number of ticks on y‐axis
font.lab | Font of label text  | | **yaxs** | Calculation of scale range on y‐axis
font.main | Font of main caption text | | yaxt | y‐axis style (standard, none)



### 2. **Low-level settings**
These can only be queried by `par()`.

 Argument | Operation | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |Argument | Operation 
 :------- | :-------- | | :-------- | :--------
fig |  Location of figure region (normalized)  | **oma**  | Size of outer margins (lines of text) 
fin |  Size of figure region (inches) | [omd](https://www.youtube.com/watch?v=EtYre1GABao) |  Location of inner region (normalized)
lheight |  Line spacing (multiplier) | omi |  Size of outer margins (inches)
mai |  Size of figure margins (inches) | pin |  Size of plot region (inches)
**mar** |  Size of figure margins (lines of text)  | plt |  Location of plot region (normalized)
mex |  Line spacing in margins | ps  | Size of text (points)
mfcol  | Multiple figure layout, fill by columns  | **pty** |  Aspect ratio of plot region
mfg |  Which figure is used next | usr |  Range of scales on axes
**mfrow**  | Multiple figure layout, fill by rows  | xlog |  Logarithmic scale on x‐axis
new  | Has a new plot been started? | ylog |  Logarithmic scale on y‐axis



### 3. **Read-only settings** 
These can be seen by calling `par()` but not changed.

 Argument | Operation | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |Argument | Operation 
 :------- | :-------- | | :-------- | :--------
cin |  Size of character (inches) | cra  | Size of character (pixels)
cxy  | Size of character (user coordinates) | din |  Size of graphics device (inches)

 







 




### Points and lines

Points and lines are the grist of plotting - they have a wide range of colours and thicknesses that can be manipulated:

In [None]:
# Re-scale plot dimensions for Jupyter
options(repr.plot.width=7, repr.plot.height=3)

In [None]:
# Plot symbols and their corresponding numbers
plot(c(1:13,1:12),c(rep(2,13),rep(1,12)),pch=1:25,ylim=c(0,4),ylab="",axes=F,xlab="")
text(c(1:13,1:12)+.3,c(rep(2,13),rep(1,12)),labels=c(1:25),cex=.7)

But these symbols are just the beginning - because R recognizes [Unicode](https://unicode.org), we can plot many different [shapes](http://www.alanwood.net/unicode/geometric_shapes.html) and [symbols](http://www.alanwood.net/unicode/miscellaneous_symbols.html). This opens up quite a lot:

In [None]:
# Re-scale plot dimensions for Jupyter
options(repr.plot.width=7, repr.plot.height=3)

In [None]:
TestUnicode <- function(start="25a0", end="25ff", ...)
  {
    nstart <- as.hexmode(start)
    nend <- as.hexmode(end)
    r <- nstart:nend
    s <- ceiling(sqrt(length(r)))
    par(pty="s")
    plot(c(-1,(s)), c(-1,(s)), type="n", xlab="", ylab="",
         xaxs="i", yaxs="i")
    grid(s+1, s+1, lty=1)
    for(i in seq(r)) {
      try(points(i%%s, i%/%s, pch=-1*r[i],...))
    }
  }

TestUnicode()
TestUnicode(9500,9900);

Depending on your machine and setup, some or all of these symbols will get plotted. So not universally useful, but gives some additional options.

In [None]:
plot(1,1,pch=-as.hexmode('25C0'), cex=2, bg=rgb(0,255,0, maxColorValue=255))

In [None]:
# Plot lines and their corresponding numbers
plot(1,1,type='n',xlim=c(0,8),ylim=c(0,8),axes=F,xlab="",ylab="")

# Line types
llab=c("solid","dashed","dotted","dotdash","longdash","twodash")
for (i in 1:6){
    lines(c(1,3),c(i,i),lty=i)
    text(3.05,i,llab[i],cex=.7,pos=4)
    text(.95, i, i, cex=.7,pos=2)
}

# Line widths
lwdx=c(.2,.5,.1,2,3,5)
for (i in 1:6){
    lines(c(5,7),c(i,i),lwd=lwdx[i])
    text(7.05,i,lwdx[i],cex=.7,pos=4)
}

text(2,7,'lty')
text(6,7,'lwd')

In [None]:
# Import MLB 2017 Batting data
mlbdata = na.omit(read.csv('mlb2017_batting.txt'))
head(mlbdata)

---
# Task 1
---

Use the par arguments above to plot On Base Percentage vs Batting Average, with circles for the National League and up triangles for the American league. Also apply the Tufte rules 4&5.

In [None]:
# Your answer here (feel free to add cells to complete your answer)


## Scaling

Sizes of points and lines are a critical tool you can use to improve your graphics - we used line thicknesses in Lecture 11 to plot 50 and 95% CI's around a point for example. Similarly point sizes can be used to convey another dimension to a graphic, increasing the data-ink ratio. Point sizes are scaled through the `cex()` argument in a plot.

---
# Task 2
---

Use your plot from Task 1 and add a third dimesion by scaling point sizes in proportion to the percentage of games played during the year (of 162), using your judgement to do it in such a way that the plot looks good.

In [None]:
# Your answer here (feel free to add cells to complete your answer)


# Colours

Colour is critical for conveying information and doing so well takes a lot of trial and error, as well as knowledge about perception. Information-based use of colour can be divided into either categorical or gradient-based categories. 

## Categorical colours

Finding good categorical colours takes some reflection on what you're plotting; content matters and colours can convey meaning reflected in the colours used. In maps for example, we typically think of greens as related to vegetation, tan related to grasslands or deserts, greys as urban areas, and blues relating to water. Selecting good colours however is also subjective. Depending on the decade you're from this might look great or awful:

In [None]:
nx = 100
plot(rnorm(nx,0,1),rnorm(nx,2,2),pch=21,bg=sample(c('#ff48c4','#2bd1fc','#f3ea5f','#c04df9','#ff3f3f'),size=nx,replace=T),xlab="N(0,1)",ylab="N(2,2)")

As alluded to in Lecture 11, we can make our own colour pallets, with the Adobe colour wheel. Here we can upload a photo and make pleasing categorical colour palettes to suit our tastes - and the subject. But to do this we need to know a little about colours:

### Colour specification

Colours can be specified in three standard ways:
1. CMYK
2. RGB
3. Hex

### CMYK = Cyan-Magenta-Yellow-Key (black)

[CMYK colour](https://en.wikipedia.org/wiki/CMYK_color_model) specification is used in colour printing to blend dots into a wide range of colours. The CMYK colour model assumes light is reflected off the surface of something, and colours are achieved by plotting dots of different sizes (rather than of different intensities). If you have something very specific to print, CMYK is your colour model and will look different on the printed page than on-screen. This is why monitors need to be calibrated to specific printers to get appropriate results. If you ever buy colour printer cartridges you'll recognize these colours (plus black). This is a **subtractive** colour model in that it starts with white, and uses levels of the other colours to subtract percentages of the printed space from white, using the printed colours.

### RGB = Red-Green-Blue 

Screens, of any type, are, close up, a mix of transmitted light in red, green, and blue that are **added** together. By mixing these three colours nearly all colours can be specified. In RGB specificaiton each colour gets potential intensities between 0 and 255, with a 0-0-0 mix being black and 255-255-255 creating white (including zero, this is 256 levels).

In [None]:
plot(1,1,pch=21, cex=2, bg=rgb(255,255,255, maxColorValue=255))

In [None]:
plot(1,1,pch=21, cex=2, bg=rgb(0,0,0, maxColorValue=255))

In [None]:
plot(1,1,pch=21, cex=2, bg=rgb(255,0,0, maxColorValue=255))

In [None]:
plot(1,1,pch=21, cex=2, bg=rgb(0,255,0, maxColorValue=255))

In [None]:
rgb(0,0,255, maxColorValue=255)

In [None]:
plot(1,1,pch=21, cex=2, bg=rgb(0,0,255, maxColorValue=255))

### HEX = hexidecimal

Hexidecimal specfication is just a translation of RGB coding, where numbers and letters are used in pairs to correspond to Red, Green, and Blue:

    0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
    
Which translates as:

    #056A30 = 0*16+5, 6*16+10, 3*16+0 = 5,106,48


In [None]:
plot(1,1,pch=21, cex=2, bg='#056A30')

In [None]:
plot(1,1,pch=21, cex=2, bg=rgb(5,106,48, maxColorValue=255))

As the most compact format, often hex is the way to go when selecting colours to use. The [Adobe colour wheel](https://color.adobe.com/create/color-wheel/) gives these in their ouput. 

## Sampling colour values from images

Any decdent image manipulation or drawing program can sample colours using their eyedropper tool (e.g. Adobe Photoshop, Gimp etc.). We can also sample them from [Adobe colour wheel](https://color.adobe.com/create/color-wheel/), using our own image.

**EXAMPLE**

---
# Task 3
---

Use your plot from Task 2 and colour the background of the points, using the offical hexidecimal colours of MLB to make the National league points red and the American league points blue.

In [None]:
# Your answer here (feel free to add cells to complete your answer)


# Gradient colours

Unlike categorical colours, colour gradients convey numerical information directly. If only it were so simple...

## Single-colour gradients

By scaling numerical information between their minimum and maximum, we can use the intensity of the colour to add an additional dimension to a plot. Often this is well accomplished in greyscale, with values between white and black being between the minimum and maximum values. A mix of three equal RGB values will give grey:


In [None]:
plot(0:10, rep(1,11), pch=21, bg=rgb(255*c(10:0)/10, 255*c(10:0)/10, 255*c(10:0)/10,maxColorValue=255))

R also has a useful shorthand for greyscale:

In [None]:
plot(1:3, rep(1,3), pch=21, cex=2, bg=c('grey10','grey66','grey88') )

For a gradient between two colours (one of which could be white), you can use the very handy `colorRampPalette()` function. It works by specifying the colours at the extremes as a new object, then specifying the number of steps between them. Note this gives evenly-spaced steps, which is good only for binned data.

In [None]:
# Create blue-white colour palette
blue.pal = colorRampPalette(c("blue", "green"))

In [None]:
# Plot blue palette
plot(0:10, rep(1,11), pch=21, bg=blue.pal(11))

## Transparency

Slighly easier than specifying blends of colours is to use transparency values ranging from `0`(clear) to `99` (almost opaque). These can be specified in two ways:

1. HEX specification - simply add the percent saturation value to the end of any `hex` code value: 
2. Using `adjustcolor()` - here any colour can be passed and it's 'alpha' value changed (alpha is the typical argument in most graphics platforms):

In [None]:
# HEX spec
plot(1:3,rep(1,3),pch=21, cex=2, bg=c('#056A3033','#056A3066','#056A30') )

In [None]:
# Adjustcolor

col1 = adjustcolor('#056A30', alpha.f=0.33)
col2 = adjustcolor('#056A30', alpha.f=0.66)

plot(1:3,rep(1,3),pch=21, cex=2, bg=c(col1,col2,'#056A30') )

# Task 4
---

Use your plot from Task 3 to scale the intensities of the red and blue points according to each player's slugging average, with full saturation corresponding to full intensity.

In [None]:
# Your answer here (feel free to add cells to complete your answer)


Colour scales can do more than just use intensity gradients, they can also reflect divering colour palettes (i.e. where white is the midpoint between two extremes), including those from [ColourBrewer](http://colorbrewer2.org) that can be installed with the `RColorBrewer` library. In fact there are a [range of palettes](https://moderndata.plot.ly/create-colorful-graphs-in-r-with-rcolorbrewer-and-plotly/) available that include pre-formated categorical and sequential versions.

In [None]:
library(RColorBrewer)

In [None]:
plot(0:10, rep(1,11), pch=21, bg=brewer.pal(11,'RdBu'))

R Base graphics also has its own colour palettes, including: `rainbow(), heat.colors(), terrain.colors(), topo.colors()`, or `cm.colors()`. They are used by specifying the number of values to return, and the level of alpha (transparency):

In [None]:
plot(0:10, rep(1,11), pch=21, bg=heat.colors(11,.9))

# Graphical outputs

There are many, many formats for computer graphics, and knowing a bit about the major ones will help steer you toward what you need for various contexts. The big divide is between **raster** and **vector** graphics. 

### Raster graphics

Raster graphics are typical format for computer images, particularly photos that have been captured (and stored) as a grid of coloured squares. Our eye can perceive down to about 0.1 mm, beyond that everything gets smoothed together. The resolution of raster grahpics are typically spefied in units of `dpi` or dots-per-inch, or points. However in all cases they are ways of storing information to tell a computer how to render a mesh of different colours to create an image.

The full list of raster graphics includes: ANI ANIM APNG ART BMP BPG BSAVE CAL CIN CPC CPT DDS DPX ECW EXR FITS FLIC FLIF FPX GIF HDRi HEVC ICER ICNS ICO / CUR ICS ILBM JBIG JBIG2 JNG JPEG JPEG-LS JPEG 2000 JPEG XR JPEG XT JPEG-HDR KRA MNG MIFF NRRD ORA PAM PBM / PGM / PPM / PNM PCX PGF PICtor PNG PSD / PSB PSP QTVR RAS RGBE Logluv TIFF SGI TGA TIFF TIFF/EP TIFF/IT UFO/ UFP WBMP WebP XBM XCF XPM XWD.

Typical raster formats include:

Format | Definition
:------ | :------
jpg, jpeg | Joint photographic experts group standard - an lossy 10:1 compression; typically used in cameras
bmp | Bitmap format - Microsoft standard format that typically is uncompressed (i.e. big file sizes)
gif | Graphics interchange format - 8 bit, lossless compression suitable for things with hard angles (but not high-res)
png | Portable network graphic - a lossless compression format made for the internet; doesn't support CMYK
tiff | Tagged image file format - most common for high-res graphics, can optionally use compression


### Vector graphics

Rather than relying on individually coloured squares of RGB light, vector graphics use mathematics to define the edges of graphical objects, making them independent of resolution constraints. By combining arcs, lines and circles, graphics are built up that remain sharp at any size, and can be passed as a series of objects that can be edited in another program, such as Adobe Illustrator. 

The full list of vector graphics includes: AI CDR CGM DXF EVA EMF Gerber HVIF IGES PGML SVG VML WMF Xar

Typical vector graphics include:

Format | Definition
:------ | :------
ai | Adobe Illustrator format - Industry standard for creating non-data graphics
SVG | Scalable vector graphics - [XML](https://en.wikipedia.org/wiki/XML) based format for web use


### Compound graphics

In addition to vector and raster graphics, compound graphics can include elements of both and have several important formats.

The full list of compound graphics includes: CDF DjVu EPS PDF PICT PS SWF XAML

Typical compound graphics include:

Format | Definition
:------ | :------
eps | Encapsulated postscript - PostScript file that includes a preview stage
pdf | Portable document format - Adobe-created now open standard file that includes text, fonts, vector graphics, raster images
ps | PostScript - Adobe-created now open basis of all compound graphics, where everything is specified in terms of straight lines and cubic [Bézier curves](https://en.wikipedia.org/wiki/Bézier_curve)

### In most applications you should save your graphic as a pdf, with the `useDingbats=FALSE` tag included

In [None]:
# Example export
pdf('test.pdf',useDingbats=FALSE)
plot(1,1,xlab='text')
dev.off()

# Embed fonts so you can edit them elsehwere
embedFonts('test.pdf')

In addition to the use of vector graphics, you often need to export files according to specific resolutions and point sizes. To do this, you must specify the figure dimensions in your `pdf()` statement (and use oma and mar to fiddle with margins) and the font size will be a multiple of 12pt (the default):

In [None]:
# Example export to specific size
pdf('test.pdf',useDingbats=FALSE, width=7, height=5, family='Helvetica')
plot(1,1,xlab='text',cex.lab=6/12)
dev.off()
# Embed fonts
embedFonts('test.pdf')

# Task 5
---

Use your plot from Task 4 and apply as many Tufte rules as you can, then export the graphic as a pdf.

In [None]:
# Your answer here (feel free to add cells to complete your answer)
out = {}

for (i in 1:2){
    a = c(.2,.4)[i] 
    out[i] = adjustcolor('#041A88',alpha.f=a)
}

out


# What have you learned and what's next?

The point of today's lab was to outline key elements in helping to make excellent graphics

**You should at this point be comfortable:**
 1. With high and low level settings in `par()`
 2. Scaling point sizes
 3. Selecting categorical colours
 4. Colour gradients
 5. Graphics formats

That's it - all that's left is Assignment 3.

---
# **Thanks and best of luck on your exams** !