Employee Join Example with PySpark

Introduction

This project demonstrates how to use PySpark for performing various join operations on employee data. It includes examples of inner join, left outer join, total salary calculation, broadcast join, and accumulator usage.

Setup and Installation

Install PySpark:
```
pip install pyspark
```
Prepare Input Data:
- Place your input text files (emp1.txt and emp2.txt) in the same directory as your scripts.

Employee Join Analysis

Inner Join

Performs an inner join on two employee datasets to find common entries based on employee IDs.

Left Outer Join

Performs a left outer join on two employee datasets to include all entries from the left dataset and matching entries from the right dataset.

Total Salary Calculation

Calculates the total salary of employees by joining employee details with their respective salary and hours worked.

Broadcast Join

Demonstrates the use of broadcast variables to efficiently join small datasets with large datasets.

Accumulator Example

Uses an accumulator to sum values in an RDD.

Dependencies

pyspark

Usage

Start a PySpark Session and Load the Data:

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("Employee Join Example").setMaster("local[*]")
sc = SparkContext(conf=conf)

Load and Process the Data:
- Load the employee datasets using textFile.
- Perform various join operations and transformations as described in the analysis sections.
Show Results:
- Display the results of each join operation using the collect() method.

Results

The output of each join operation will display the respective results, such as lists of joined employee records and calculated total salaries.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Employee_Join_Example_with_PySpark.ipynb		Employee_Join_Example_with_PySpark.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Employee Join Example with PySpark

Table of Contents

Introduction

Setup and Installation

Employee Join Analysis

Inner Join

Left Outer Join

Total Salary Calculation

Broadcast Join

Accumulator Example

Dependencies

Usage

Results

About

Uh oh!

Releases

Packages

Languages

idrees200/Employee-Join-Example-with-PySpark

Folders and files

Latest commit

History

Repository files navigation

Employee Join Example with PySpark

Table of Contents

Introduction

Setup and Installation

Employee Join Analysis

Inner Join

Left Outer Join

Total Salary Calculation

Broadcast Join

Accumulator Example

Dependencies

Usage

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages