# **Salaries Differences**

Calculates the difference between the highest salaries in the marketing and engineering departments. Output just the absolute difference in salaries.

### **Solution Walkthrough**
This walkthrough will explain the code snippet provided and answer the question of calculating the difference between the highest salaries found in the marketing and engineering departments, specifically the absolute difference in salaries. The code uses the pandas to manipulate and analyze data.

### **Understanding The Data**
The code snippet assumes the presence of two dataframes: `db_employee` and `db_dept`.

- **`db_employee`** contains information about employees, including columns `department_id` and `salary`.

- **`db_dept`** contains information about departments, including `column_id` and `department`.

### **The Problem**
The objective was to calculate the difference between the highest salaries in the marketing and engineering departments. The code snippet performs the following steps to achieve this:

- Merges `db_employee` and `db_dept` dataframes using the `pd.merge()` function. The merge is performed using a left join, with `department_id` from `db_employee` matching `id` from `db_dept`.

- Filters the merged dataframe (`df`) to create a new dataframe (`df1`) containing only rows where the department is engineering.

- Groups `df1` by `department` and calculates the `maximum salary` using the `groupby()` and `max()` methods. The result is stored in a new dataframe, which has columns department and eng_salary.

- Filters the merged dataframe (`df`) to create a new dataframe (`df2`) containing only rows where the department is marketing.

- Groups `df2` by department and calculates the maximum salary. The result is stored in a new dataframe, which has columns `department` and `mkt_salary`.

- Calculates the absolute difference between the `mkt_salary` and `eng_salary` columns of `df_mkt` and `df_eng` respectively. The result is stored in a new dataframe, which has a single column `salary_difference`.

- Renames the `salary_difference` column to `'salary_difference'` using the columns attribute of the result dataframe.

In [None]:
# Import
import pandas as pd

db_employee.head()

# merge employee and dept dfs and clean it
merged_df = db_employee.merge(db_dept, left_on='department_id', right_on='id')
merged_df = merged_df.copy()
merged_df = merged_df.drop(columns=['id_y']).rename(columns={"id_x":'id'})

# rank the salaries in each department
merged_df['rank'] = merged_df.groupby('department')['salary'].rank(method='dense', ascending=False)
marketing_df = merged_df[merged_df['department'].astype(str) == 'marketing']
engineering_df = merged_df[merged_df['department'].astype(str) == 'engineering']
# marketing_df
first_eng = int(engineering_df[engineering_df["rank"] == 1]["salary"])
first_mar = int(marketing_df[marketing_df["rank"] == 1]["salary"])

# calculate difference
difference = first_mar - first_eng
difference