Skip to content

R Package repository for the STAT545B course.

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

stat545ubc-2021/functionsnashid

Repository files navigation

functionsnashid

During the data exploration phase, developers write repeated code to investigate the summary view based on different categories. The goal of this package is to avoid writing boilerplate code during the data exploration phase. This package implements counting the number of observations per category in a given dataset and returns the top observations.

Installation

This package is not in the CRAN yet. You can install the development version of functionsnashid from the GitHub repository with:

devtools::install_github("stat545ubc-2021/functionsnashid")

Basic Example

Please check ?count_by_category for a more detailed explanation of the function. Now we demonstrate the basic usage of the function. In the following example, we get the number of games per genre from the steam_games dataset.

  1. Results in descending order by default:
suppressMessages(library(tidyverse))
suppressMessages(library(datateachr))
library(functionsnashid)

games <- steam_games %>%
  select(id, name, genre, publisher, developer, original_price, release_date, all_reviews) %>%
  separate_rows(genre, sep = ",", convert = TRUE)

count_by_category(steam_games, genre, 5)
#> # A tibble: 5 × 2
#>   genre                  count
#>   <chr>                  <int>
#> 1 Action                  2386
#> 2 Action,Indie            2129
#> 3 Casual,Indie            1732
#> 4 Action,Adventure,Indie  1585
#> 5 Adventure,Indie         1520
  1. Results in ascending order:
count_by_category(steam_games, genre, 5, FALSE)
#> # A tibble: 5 × 2
#>   genre                                                                    count
#>   <chr>                                                                    <int>
#> 1 Accounting,Animation & Modeling,Audio Production,Design & Illustration,…     1
#> 2 Accounting,Education,Software Training,Utilities,Early Access                1
#> 3 Action,Adventure,Casual,Early Access                                         1
#> 4 Action,Adventure,Casual,Free to Play                                         1
#> 5 Action,Adventure,Casual,Free to Play,Early Access                            1

More Examples with Different Datasets

Here we would demonstrate the usage of the function count_by_category to explore different dataset:

Get the count of trees per genus in the vancouver_trees dataset.

We see Acer genus i.e. family of Maple trees are the most common in vancouver.

count_by_category(vancouver_trees, genus_name, 5)
#> # A tibble: 5 × 2
#>   genus_name count
#>   <chr>      <int>
#> 1 ACER       36062
#> 2 PRUNUS     30683
#> 3 FRAXINUS    7381
#> 4 TILIA       6773
#> 5 QUERCUS     6119

Get the count of apartment buildings per property type in the apt_buildings dataset.

count_by_category(apt_buildings, property_type, 5)
#> # A tibble: 3 × 2
#>   property_type  count
#>   <chr>          <int>
#> 1 PRIVATE         2888
#> 2 TCHC             327
#> 3 SOCIAL HOUSING   240

What heating_types are common in in the apt_buildings dataset?

count_by_category(apt_buildings, heating_type, 5)
#> # A tibble: 3 × 2
#>   heating_type   count
#>   <chr>          <int>
#> 1 HOT WATER       2789
#> 2 FORCED AIR GAS   315
#> 3 ELECTRIC         265