Wikipedia Big Data Analysis

This analysis consists of using big data tools to answer questions about datasets from Wikipedia. There are a series of analysis questions, answered using Hive and MapReduce. The tools used are determined based on the context for each question. The output of the analysis includes MapReduce .jar files and .hql files so that the analysis is a repeatable process that works on a larger dataset, not just an ad hoc calculation.

Technologies Used

Features

Find, organize, and format pageviews on any given day.
Follow clickstreams to find relative frequencies of different pages.
Determine relative popularity of page access methods.
Compare yearly popularity of pages.

Getting Started

Most of the code was done using HQL in a Hive GUI interface via DBeaver

Download DBeaver Community Edition
Install Hive on your machine or virtual machine
Clone my code - git clone https://github.com/samye760/Wikipedia-Big-Data-Analysis.git
Setup a Hive connection in DBeaver, import my script, and start querying the data.

Usage

The HQL commands can be used on similar large datasets, specifically those found in Wikipedia Dumps
This script was designed to answer all sorts of questions pertaining to big data.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
General/Scripts		General/Scripts
README.md		README.md
Wikipedia-Big-Data-Analysis.mp4		Wikipedia-Big-Data-Analysis.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General/Scripts

General/Scripts

README.md

README.md

Wikipedia-Big-Data-Analysis.mp4

Wikipedia-Big-Data-Analysis.mp4

Repository files navigation

Wikipedia Big Data Analysis

Technologies Used

Features

Getting Started

Usage

About

Releases 1

Packages

Languages

skyler-myers-db/Wikipedia-Big-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Big Data Analysis

Technologies Used

Features

Getting Started

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages