# Sort, Count, Group By and Aggregate
In this notebook we will go over how sort, count group by and aggregate works in Pandas.



We will start by importing ```pandas``` and a function (```create_pd_table3```) that creates the pandas table we will be using.

In [15]:
#Import pandas
import pandas as pd

#Import function
from help_pd_functions import create_pd_table3

In the block of code below we create and show the first 5 rows of the table ```data```. This table has 20 rows and 3 columns: **id**, **team**, and **score**. Below we show the values that each column can take.

* id: Numbers between 1 and 5
* team: Numbers between 1 and 3
* score: Numbers between 1 and 100

In [16]:
data = create_pd_table3()
data.head(5)

Unnamed: 0,id,team,score
0,2,1,100
1,4,2,1
2,4,3,99
3,5,3,54
4,2,3,11


# Sort
You can easily sort a table by the value of one of its columns using the ``` sort_values``` method. In the block of code below we create a new table called ``` sorted_scores```, where we have sorted the table by the values of the column **score**.


In [17]:
sorted_scores = data.sort_values("score")
sorted_scores

Unnamed: 0,id,team,score
18,3,3,0
1,4,2,1
4,2,3,11
16,4,2,12
17,4,1,13
6,3,1,21
19,4,2,28
10,5,1,35
15,2,3,37
7,1,1,38


As you can see, the sorting took place in ascending order. We can also sort it in descending order by setting ```ascending=False```. See below:

In [18]:
sorted_scores_desc = data.sort_values("score", ascending=False)
sorted_scores_desc

Unnamed: 0,id,team,score
0,2,1,100
2,4,3,99
11,3,3,86
9,4,1,83
5,1,1,58
3,5,3,54
12,2,3,51
14,5,3,49
13,3,1,44
8,2,2,40


# Counting

Counting the number of times a value shows up in a column can be very useful. We can find out how many times each value, in a column shows up, using the methods ``` groupby``` and ```size```. As an example in the block of code below we find out how many times each team (1, 2, or  3) shows up in the column **team**.

In [26]:
teams_counts = data.groupby("team").size().reset_index(name='counts')
teams_counts

Unnamed: 0,team,counts
0,1,8
1,2,4
2,3,8


It is important to mention that the last part of the code above (```.reset_index(name='counts') ```) is simply changing the name of the newly created column to **counts**.

# Aggregation

In the table ```data``` we have the same id showing up in several rows, and in each case (row) we have a different score. Now, let’s say we want to obtain some basics statistics, imagine that for each id you want to obtain the average score. In the block of code below we show you how to do it using the methods ```groupby``` and ```agg```.

In [22]:
score_mean = data.groupby(["id"]).agg({'score': "mean",}).reset_index().rename({'score':'score mean'}, axis=1)
score_mean

Unnamed: 0,id,score mean
0,1,48.0
1,2,47.8
2,3,37.75
3,4,39.333333
4,5,46.0


Ok, that was a very long one-liner that needs to be broken down. Below we have a description of the purpose of each part:

* ```data.groupby(["id"])```: We group by id.
*  ```.agg({'score': "mean",})```: We aggregate all the values under the column **score** and calculate the mean (for each id). This value is stored under the column **score**.
* ```.reset_index().rename({'score':'score mean}, axis=1)```: Change the name of the column **score** to **score mean**.


## Final Words

We have gone over the basic tools to sort, aggregate and count using Pandas. Now it is time for you to start coding. Start with the following:

* Find out how many times the each id shows up in the table
* For each id obtain statistics about the median, min, max and sum of the scores for each id. Hint: just replace ```"mean"``` by ```sum```, ```"median"```, ```"min"```, and ```”max”``` in the last example. You might also want to rename the column accordingly.