In this challenge we will walk you through a question in the previous [Subsetting and Descriptive Stats lab](../../lab-subsetting-and-descriptive-stats/your-code/main.ipynb), then leverage the challenge. In the walkthrough, you'll be exposed to the thinking process how a pro would tackle the problem. Try to understand the thinking process and apply it in the leveraged problem.

## Import all libraries that are necessary

In [2]:
import numpy as np
import pandas as pd

## Import and overview data

First import `employee.csv` from the "subsetting" lab folder and print head to overview the data:

In [3]:
employee = pd.read_csv("../../lab-subsetting-and-descriptive-stats/your-code/Employee.csv")

employee.head()

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30
3,Sonia,HR,Bachelor,F,analyst,4,35
4,Samuel,Sales,Master,M,associate,3,55


Printing the head is not a useless routine. You should really look at the data set and understand what they are. No data analyst can successfully analyze the data without in-dpeth understanding of what each column is about. As we progress in this course, the data sets are becoming increasingly complex which requires you to inspect the data at the beginning then on the needed basis thoughout the problem-solving process.

One question in the previous lab is:

**Find the minimum, mean, and maximum of all numeric columns for each Department.**

We will walk you through how to solve this question using the workflow discussed in the [Data Analysis Iteration video](https://www.youtube.com/watch?v=xOomNicqbkk).

## Main Problem - Setting Expectations

We want to break down the problem into several sub problems:

**Sub Problem 1 - How to extract numeric columns from a data set?**

**Sub Problem 2 - How to calculate minimum, mean. and maximum?**

**Sub Problem 3 - How to perform calculations for each Department?**

If we figure out each of the sub problems above, we have found the solution for our main problem.

Next let's tackcle each sub problem.

## Main Problem - Collecting Information

This step is the problem-solving process of the main problem in which we will solve each of the three sub problems.

### Sub Problem 1

#### Setting Expectations

**Define problem: How to extract numeric columns from a data set?**

#### Collecting Information

This was already covered in the lesson by using `dtypes`. So let's print out all numeric columns:

In [35]:
# enter your code here

employee.dtypes

Name          object
Department    object
Education     object
Gender        object
Title         object
Years          int64
Salary         int64
dtype: object

You should have seen:
    
```
Name          object
Department    object
Education     object
Gender        object
Title         object
Years          int64
Salary         int64
dtype: object
```

#### Reacting to Data

You found `Years` and `Salary` are the numeric columns we need to extract. So we can potentially use `employee[["Years", "Salary"]]` to extract these columns:

In [5]:
employee[["Years", "Salary"]]

Unnamed: 0,Years,Salary
0,1,35
1,2,30
2,2,30
3,4,35
4,3,55
5,2,55
6,8,70
7,7,60
8,8,70


But instead of hardcoding the column names in the solution, a better approach is to define a Python function that dynamically returns all numeric columns. You will be able to re-use this function in your future works. Also, if the data set is huge and it contains hundreds of numeric columns, it is impossible to manually select them.

#### Revising Expectations

**Define new problem: How to *dynamically* extract numeric columns from a data set?**

#### Collecting Information

This was not covered in the lesson. So we need to [google the answer](https://www.google.com/search?q=pandas+dataframe+get+all+numeric+columns).

After finding the answer, write the function below.

In [37]:
def get_numeric_cols(df):
    # write your code below.
    return df.select_dtypes(include=[np.number])

#### Reacting to Data

Now test your function:

In [38]:
get_numeric_cols(employee)

Unnamed: 0,Years,Salary
0,1,35
1,2,30
2,2,30
3,4,35
4,3,55
5,2,55
6,8,70
7,7,60
8,8,70


You should have seen:

```
   Years  Salary
0      1      35
1      2      30
2      2      30
3      4      35
4      3      55
5      2      55
6      8      70
7      7      60
8      8      70
```



Yes, this is exactly what we want!

Now we move to the next sub problem.

### Sub Problem 2

#### Setting Expectations

**Define problem: How to calculate minimum, mean. and maximum?**

#### Collecting Information

That's easy. Review the *Descriptive Statistics With Pandas* lesson and we find there are functions already made for Pandas dataframes to calculate minimum, mean, and maximum. We'll leverage from the solution we found in sub problem 1 and try to calculate on the numeric columns:

In [39]:
numeric_cols = get_numeric_cols(employee)

print('PRINTING MIN:')

# enter your code here
print(numeric_cols.min())

print('\n---\n')
print('PRINTING MEAN:')

# enter your code here
print(numeric_cols.mean())

print('\n---\n')
print('PRINTING MAX:')

# enter your code here
print(numeric_cols.max())

PRINTING MIN:
Years      1
Salary    30
dtype: int64

---

PRINTING MEAN:
Years      4.111111
Salary    48.888889
dtype: float64

---

PRINTING MAX:
Years      8
Salary    70
dtype: int64


After inspecting the output we find there is no revision required. So we move to the next sub problem.

### Sub Problem 3

#### Setting Expectations

**Define problem: How to perform calculations for each Department?**

#### Collecting Information

This is covered in the *Data Aggregations and Summarization* lesson. What we need is to aggregate data by Department. Assign the aggregated data to a new variable called `employee_by_department`

In [40]:
# enter your code here
employee_by_department = employee.groupby(['Department'])

#### Reacting to Data

Test to calculate the means of each department:

In [41]:
employee_by_department.mean()

Unnamed: 0_level_0,Years,Salary
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,4.666667,45.0
IT,4.5,48.75
Sales,2.5,55.0


This is what we expect for this sub problem. Now we are ready to combine the solutions of all three sub problems in order to solve the main problem.

## Main Problem - Reacting to Data / Revising Expectations

It turns out Pandas is smart enough to perform `mean` calculations on numeric columns only even if the data set contains non-numeric fields. We can choose to revise our solution because it is not really necessary to obtain the numeric columns (Sub Problem 1) by ourselves. In this case we simply combine solutions for Sub Problem 2 & 3. Write your codes below:

In [42]:
# enter your codes here to print department MIN
print('PRINTING DEPARTMENT MIN:')
print(employee_by_department.min())

# enter your codes here to print department MEAN
print('\n---\n')
print('PRINTING DEPARTMENT MEAN:')
print(employee_by_department.mean())

# enter your codes here to print department MAX
print('\n---\n')
print('PRINTING DEPARTMENT MAX:')
print(employee_by_department.max())

PRINTING DEPARTMENT MIN:
              Name Education Gender      Title  Years  Salary
Department                                                   
HR             Ana  Bachelor      F         VP      2      30
IT          Carlos  Bachelor      F         VP      1      30
Sales          Eva  Bachelor      F  associate      2      55

---

PRINTING DEPARTMENT MEAN:
               Years  Salary
Department                  
HR          4.666667   45.00
IT          4.500000   48.75
Sales       2.500000   55.00

---

PRINTING DEPARTMENT MAX:
              Name Education Gender      Title  Years  Salary
Department                                                   
HR           Sonia    Master      M    analyst      8      70
IT           Pedro       Phd      M  associate      8      70
Sales       Samuel    Master      M  associate      3      55


Alternatively, we can choose to stick to our original solution that combines all 3 sub problems. We want to do this because we will have more control over what we want to do with the data. What if the goal is not to perform MIN, MEAN, and MAX? What if the question is to apply a custom function you wrote which cannot automatically select numeric columns to perform? It is good that we figure out how to do this.

Write your code below that uses one line of code to perform MIN/MEAN/MAX respectively.

*Hint: use `apply` and `lambda`*

In [43]:
# enter your code here

mins = employee.groupby('Department').apply(lambda a : get_numeric_cols(a)).min()

means = employee.groupby('Department').apply(lambda a : get_numeric_cols(a)).mean()

maxes = employee.groupby('Department').apply(lambda a : get_numeric_cols(a)).max()

Test your codes and see if you will receive outputs similar to the following:

```
PRINTING DEPARTMENT MIN:

Years      1
Salary    30
dtype: int64

---

PRINTING DEPARTMENT MEAN:

Years      4.111111
Salary    48.888889
dtype: float64

---

PRINTING DEPARTMENT MAX:

Years      8
Salary    70
dtype: int64
```

If you don't see the correct output, check your codes and revise.