Skip to content

hridayns/Big-Data-Apache-server-logs-analysis-using-Pig-and-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project Details

This repository contains code to analyze Apache server logs to find the most visited website using Apache Hadoop's Pig script extended using Python's User-defined Functions (UDF). It was run on an Ubuntu instance deployed on Oracle's VMware with the help of Vagrant.

Contents

  • shareFiles/pig_script .py contains code to compute the page hits and store them.
  • shareFiles/script .py contains the Python UDF to parse the sample Apache logs.
  • shareFiles/sample_log contains the sample logs on which the scripts are run.

About

Big Data – Apache server logs analysis using Pig and Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published