Using R ggplot2 library, I created a set of functions that ingest variables of interest, and output the best visualizations. It takes dataframe object and names of variables of interest as arguments of function. The key idea is, different data types and number of variables of interest require different optimal visualization.
If...
-
only one variable is given, a. and it is categorical, the function outputs bargraph, piechart, waffle chart, and dot plot. b. and it is numeric, the function outputs histogram and density plot.
-
two variables are given, a. and both are numeric, the function outputs scatterplot with loess smoothing, with margin histograms/boxplots, contour plot, and jitterplot. b. and one is categorical while another is numeric, the function outputs boxplot, lollipop plot, dotplot, violin plot, dot+box plot, box+violin plot, histogram per color, and density per color. c. and both are categorical, the function outputs jitterplot, column chart, and mosaic chart.
-
more than two variables are given (up to four max), a. it will add color/size variation to the plots above or use new facets. b. perspective plot and contour plot for all numeric case.
-
data is time series, the function outputs line/area plots and heatmap. (e.g. US GDP growth)
-
data is matrix, the function outputs heatmap and pairwise graphs. (e.g. correlation matrix)