This repo contains the working files, dataset, lecture notes, and R scripts for a workshop organized by Algoritma Data Science Education Center. By walking through a real-life scenario of crunching twitter data for Lebaran-related insights, the session aims to help student get started with R programming and working with programmatic data.
The workshop is part of a series that serve as an introduction to data science and machine learning, and its intended audience are beginners as well as junior professionals in the field of data science. Algoritma is a data science education center that conducts public programming bootcamps and corporate workshop centered on the topic of AI, machine learning, and technologies to work with data.
The workshop covers:
-
Understanding API
- Get / Fetch requests
- Twitter's API format
- JSON and
data.frame
in R
-
Tweets retrieval from a web service endpoint
- From Twitter to R
- Converting JSON to data frames
- Just enough regex
-
Tweets retrieval from a web service endpoint
- From Twitter to R
- Converting JSON to data frames
- Text Mining Practices
-
Data cleansing and content transformation techniques
- Compare pre- and post-processed data
- Text Mining Practices II
- Bahasa Indonesia NLP
- Data cleansing
-
Insight discovery
- Visualizing tweets
- Wordcloud
- Find Frequent Terms
- "Suggested Tags"
- Saving and Reading data
-
Insight discovery II
- R Programming tips
POSIXct
- Histograms
- Density Plot
- Lattice Plots
- Bar Charts
The project is completed using R notebook, and exported to both HTML and PDF. All three files (Rmd, HTML and PDF) are available in this repo.