# Data Management with Pandas 

## Exercise for Hands-on

This notebook contains the assignments to be completed as part of the hands on for the subject "Python for Data Science Part I". 

**Note:** All assignments in this document must be completed in the sequence in this document in order to complete the workshop.

### Data Sources

**The sample datasets are provided:** ```EMP & DEPT```

You may access the relevant CSV data from:

```
    Datasets/Data_EMP.csv
    Datasets/Data_DEPT.csv 
```

For the **EMP** table, the information is described as:

Column name	| Data type	| Description
--------------------|--------------------|-------------------- 
EMPNO |	Number | Employee number
ENAME |	String |	Employee name
JOB |	String |	Designation
MGR |	Numbe |Manager’s Emp. number
HIREDATE |	Date |	Date of joining
SAL	 |Number |	Basic Salary
COMM |	Number |	Commission
DEPTNO |	Number |	Department Number


For the **DEPT** table, the information is described as:

Column name	| Data type	| Description
--------------------|--------------------|-------------------- 
DEPTNO | Number |	Department number
DNAME |	String |	Department name
LOC |	String |	Location of department


                                             


### Assignment 1: Pandas Basics (Reading Data Files, DataFrames, Data Selection)

**Step #1:** Read the data files into DataFrame

  --  **```pandas.read_csv()```**

In [None]:
# Command to import the pandas library to your notebook
import pandas as pd

# Read data from provided datasets, e.g., df_EMP & df_DEPT
# ADD YOUR CODE HERE


In [None]:
# Check the data loaded correct or not


In [None]:
# Describe the maximum, minimum and average salary in the company.
# ADD YOUR CODE HERE


**Step #2:** Perform the queries on the EMP and DEPT DataFrames

* To select rows and cloumns, use the ```df.loc[]``` or ```df[]```

In [None]:
# List the names of employee
# ADD YOUR CODE HERE


### Assignment 2: Sort DataFrame Values

* To sort Pandas DataFrame based on the values of a column or multiple columns, use the **.sort_values()** attribute
* To sort the values in ascending or descending orders, use the argument **ascending=False**.

In [None]:
# List the top three employees in terms of basic salary
# ADD YOUR CODE HERE
df_EMP.sort_values('SAL', ascending = False).head(3)

### Assignment 3: Query with conditions

* To query the values **WHERE** certain conditions are met, via *boolean indexing*.

The following example is simply passing a Series of True/False objects to the DataFrame, returning all rows with True.

```
   df[df['star_rating'] == 4].head(5)
```

In [None]:
# List the names of analysts and salesmen
# ADD YOUR CODE HERE


* To query the values which is a DATE type, we may conver to the datetime first via

    --  **```pd.to_datetime('30/09/1981', format='%d/%m/%Y')```**

In [None]:
# List the details of employees who have joined before 30 Sep 81
# ADD YOUR CODE HERE


In [None]:
# List the names of employees who are not managers
# ADD YOUR CODE HERE


In [None]:
# Fine the name of employees whose employee numbers is 7369
# ADD YOUR CODE HERE


In [None]:
# List employees not belonging to department 30 or 10.
# ADD YOUR CODE HERE


* To query the unique values 

    --  **```.unique()```**

In [None]:
# List the different designations in the company.
# ADD YOUR CODE HERE
df_EMP['JOB'].unique()

* To query the **NaN** (or **null**) values 

    --  **```.isnull()```**

In [None]:
# List the names of employees who are not eligible for commission.
# ADD YOUR CODE HERE


* To query the **not null** values 

    --  **```.notnull()```**

In [None]:
# List the names of employees who are eligible for commission.
# ADD YOUR CODE HERE


In [None]:
# List the name and designation of the employee who does not report to anybody.
# ADD YOUR CODE HERE


In [None]:
# List employees whose names either start or end with “S”.
# ADD YOUR CODE HERE


In [None]:
# List names of employees whose names have “i” as the second character.
# ADD YOUR CODE HERE


In [None]:
# List the number of employees working with the company.
# ADD YOUR CODE HERE


In [None]:
# List the number of designations available in the EMP table
# ADD YOUR CODE HERE


### Assignment 4: Adding new column to existing DataFrame

* By adding a new calculated field;
* By declaring a new list as a column;

In [None]:
# Add a column to EMP DataFrame with the total salaries paid to the employees.

# Replace all the NaN values in COMM with zero
df_EMP['COMM'].fillna(0, inplace=True)

# ADD YOUR CODE HERE


In [None]:
# List the maximum salary paid to a salesman. 
# ADD YOUR CODE HERE


### Assignment 5: GROUP BY conditions

In pandas, SQL’s ```GROUP BY``` operations are performed using the similarly named ```.groupby()``` method. ```.groupby()``` typically refers to a process where we’d like to split a dataset into groups, apply some function (typically **aggregation**) , and then combine the groups together.

In [None]:
# Import the numpy package
import numpy as np

# List the number of employees and average salary for employees in department 20.
# ADD YOUR CODE HERE


In [None]:
# List the department number and total salary payable in each department.
# ADD YOUR CODE HERE


In [None]:
# List the jobs and number of employees in each job. The result should be in the descending order of the number of employees.
# ADD YOUR CODE HERE


In [None]:
# List the total salary, maximum and minimum salary and average salary of the employees jobwise.
# ADD YOUR CODE HERE


In [None]:
# List the total salary, maximum and minimum salary and average salary of the employees, for department 20.
# ADD YOUR CODE HERE


In [None]:
# List the average salary of the employees job wise, for department 20 and 
# display only those rows having an average salary > 1000
# ADD YOUR CODE HERE


### Assignment 6: JOIN -- Table Merge

* To merge the given two tables/DataFrames in your desired way:

  -- **```pd.merge```**```(df_left, df_right, left_on = 'key', right_on = 'key', how = 'inner/outer/left/right')```


In [None]:
# Find which department name President is in
# ADD YOUR CODE HERE


In [None]:
# Find the places where Managers are working
# ADD YOUR CODE HERE
