The dsEssex
is an R package containing data examples and helpful tools
for teaching Data Science to beginner learners.
This R package provides datasets, case-studies, functions, and exercises that can be used for teaching Data Science to students with no/little statistical and/or programming backgrounds. This is originally created to facilitate the delivery of the MSc Applied Data Science and MSc Data Science and its Applications at the University of Essex, United Kingdom. However, it can be also used for teaching R programming and data science to both undergraduate and postgraduate students in other Data Science programmes.
The dsEssex
R package should be smoothly installed and working well
with most of the standard computers.
The dsEssex
R package is supported for Windows, Linux and macOS.
The package has been tested in R under the following systems: + Linux:
Ubuntu 16.04 (R 3.6.1) + macOS: Mojave 10.14.6 (R 3.6.1) + Windows: 10
(R 3.6.3)
The dsEssex
R package includes a variety of data examples,
case-studies, R package dependencies and practical sheets that can
facilitate teaching data science in lectures, labs, workshops, and
classes. The easiest way to install the dsEssex
R package is by
running the following code lines into your R session:
# required only once per machine!
if(!require("remotes")) install.packages("remotes")
remotes::install_github("statcourses/dsEssex")
This software requires R (>= 3.5.0)
. If you do have an older version
of R installed on your machine, you may need to install the latest R
version from here.
Installing the dsEssex
R package will automatically install the
following dependencies that are required for most Data Science labs,
classes and workshops:
tidyverse
dslabs
dplyr
stringr
ggplot2
tidytext
textdata
english
tidyr
jsonlite
lubridate
scales
Get started with loading a few data sets by running the following:
# load the package into your R session
require(dsEssex)
# load data of Donald Trump's twitter account from 2009 to 2021
data(Trump_tweets)
# display the first few rows of the data
head(Trump_tweets)
# display description of the data
help(Trump_tweets)
# load the index page that lists all the components of the package
help(package = dsEssex)
For simple string processing and text analytic exercises, you may load
the daily mortality data for Puerto Rico, the USA territory, extracted
for the month of October through a number of years (2015-2018) from
this pdf
file.
This file was downloaded from the dslabs
R package by Rafael A.
Irizarry.
# load the package into your R session
require(dsEssex)
# load the raw daily mortality data for October extracted from the pdf file
data(PR_Oct_Deaths)
# display the data
PR_Oct_Deaths
This project is covered under the GNU General Public License, version 3.0 (GPL-3.0).
This project is developed by Dr. Osama Mahmoud, Department of Mathematical Sciences, University of Essex, United Kingdom. For bug reports, feature requests, and questions on technical issues of using the
dsEssex
R package, please open an Issue. If you would like to contact the author, please feel free to send him an email on o.mahmoud@essex.ac.uk.