# Subsetting and Descriptive Stats

## Before your start:
   - Remember that you just need to do one of the challenges.
   - Keep in mind that you need to use some of the functions you learned in the previous lessons.
   - All datasets are provided in this lab's data folder.
   - Elaborate your codes and outputs as much as you can.
   - Try your best to answer the questions and complete the tasks and most importantly: enjoy the process!
   
#### Import all the necessary libraries here:

In [1]:
# import libraries here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# [ONLY ONE MANDATORY] Challenge 1
#### In this challenge we will use the `Temp_States`  dataset. 

#### First import it into a dataframe called `temp`.

In [2]:
temp = pd.read_csv('../data/Temp_States.csv', sep=(';'))
temp

Unnamed: 0,City,State,Temperature,Unnamed: 3
0,NYC,New York,19.444444,
1,Albany,New York,9.444444,
2,Buffalo,New York,3.333333,
3,Hartford,Connecticut,17.222222,
4,Bridgeport,Connecticut,14.444444,
5,Treton,New Jersey,22.222222,
6,Newark,New Jersey,20.0,


In [3]:
# Removing 'Unnamed' column
temp = temp.loc[:,~temp.columns.str.match('Unnamed')]

#### Print `temp`.

In [4]:
temp

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
1,Albany,New York,9.444444
2,Buffalo,New York,3.333333
3,Hartford,Connecticut,17.222222
4,Bridgeport,Connecticut,14.444444
5,Treton,New Jersey,22.222222
6,Newark,New Jersey,20.0


#### Explore the data types of the *temp* dataframe. What types of data do we have? Comment your result.

In [5]:
temp.dtypes

City            object
State           object
Temperature    float64
dtype: object

In [6]:
"""
City       :     object
State      :     object
Temperature:    float64
"""

'\nCity       :     object\nState      :     object\nTemperature:    float64\n'

#### Select the rows where state is New York.

In [7]:
temp.loc[temp['State']=='New York',:]

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
1,Albany,New York,9.444444
2,Buffalo,New York,3.333333


#### What is the average temperature of cities in New York?

In [8]:
temp.loc[temp['State']=='New York',:].Temperature.mean()

10.740740739000001

#### Which states and cities have a temperature above 15 degrees Celsius?

In [9]:
temp.loc[temp['Temperature'] > 15,:]

Unnamed: 0,City,State,Temperature
0,NYC,New York,19.444444
3,Hartford,Connecticut,17.222222
5,Treton,New Jersey,22.222222
6,Newark,New Jersey,20.0


#### Now, return only the cities that have a temperature above 15 degrees Celsius.

In [10]:
temp[['City', 'Temperature']].loc[temp['Temperature'] > 15,:]

Unnamed: 0,City,Temperature
0,NYC,19.444444
3,Hartford,17.222222
5,Treton,22.222222
6,Newark,20.0


#### Which cities have a temperature above 15 degrees Celcius and below 20 degrees Celsius?

**Hint**: First, write the condition. Then, select the rows.

In [11]:
temp[['City', 'Temperature']].loc[(temp['Temperature'] > 15) & (temp['Temperature'] < 20),:]

Unnamed: 0,City,Temperature
0,NYC,19.444444
3,Hartford,17.222222


#### Find the mean and standard deviation of the temperature of each state.

In [12]:
temp.groupby(by='State', as_index=False).agg(Temp_avg = ('Temperature',np.mean),
                                             Temp_std = ('Temperature', np.std))

Unnamed: 0,Temp_avg,Temp_std
0,Connecticut,15.833333
1,New Jersey,21.111111
2,New York,10.740741


# [ONLY ONE MANDATORY]  Challenge 2

#### Load the `employees` dataset into a dataframe. Call the dataframe `employees`.

In [13]:
employees = pd.read_csv('../data/employees.csv', sep=(';'))
employees

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary,Unnamed: 7
0,Jose,IT,Bachelor,M,analyst,1,35,
1,Maria,IT,Master,F,analyst,2,30,
2,David,HR,Master,M,analyst,2,30,
3,Sonia,HR,Bachelor,F,analyst,4,35,
4,Samuel,Sales,Master,M,associate,3,55,
5,Eva,Sales,Bachelor,F,associate,2,55,
6,Carlos,IT,Master,M,VP,8,70,
7,Pedro,IT,Phd,M,associate,7,60,
8,Ana,HR,Master,F,VP,8,70,


In [14]:
# Removing unnamed columns
employees = employees.loc[:,~employees.columns.str.match('Unnamed')]
employees

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30
3,Sonia,HR,Bachelor,F,analyst,4,35
4,Samuel,Sales,Master,M,associate,3,55
5,Eva,Sales,Bachelor,F,associate,2,55
6,Carlos,IT,Master,M,VP,8,70
7,Pedro,IT,Phd,M,associate,7,60
8,Ana,HR,Master,F,VP,8,70


#### Explore the data types of the `employees` dataframe. Comment your results.

In [15]:
employees.dtypes

Name          object
Department    object
Education     object
Gender        object
Title         object
Years          int64
Salary         int64
dtype: object

In [16]:
"""
Name          object
Department    object
Education     object
Gender        object
Title         object
Years          int64
Salary         int64
"""

'\nName          object\nDepartment    object\nEducation     object\nGender        object\nTitle         object\nYears          int64\nSalary         int64\n'

#### What's the average salary in this company?

In [17]:
employees.Salary.mean()

48.888888888888886

#### What's the highest salary?

In [18]:
employees.Salary.max()

70

#### What's the lowest salary?

In [19]:
employees.Salary.min()

30

#### Who are the employees with the lowest salary?

In [20]:
mask = employees['Salary'] == employees.Salary.min()

employees.loc[mask,:]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30


#### Find all the information about an employee called David.

In [21]:
employees.loc[employees.Name == 'David',:]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
2,David,HR,Master,M,analyst,2,30


#### Could you return only David's salary?

In [22]:
employees.loc[employees.Name == 'David',:][['Name','Salary']]

Unnamed: 0,Name,Salary
2,David,30


#### Print all the rows where job title is associate.

In [23]:
employees.loc[employees.Title=='associate', :]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
4,Samuel,Sales,Master,M,associate,3,55
5,Eva,Sales,Bachelor,F,associate,2,55
7,Pedro,IT,Phd,M,associate,7,60


#### Print the first 3 rows of your dataframe.
**Tip**: There are 2 ways to do it. Do it both ways.

In [24]:
# Method 1
employees.head(3)

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30


In [25]:
# Method 2
employees.iloc[:3]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
0,Jose,IT,Bachelor,M,analyst,1,35
1,Maria,IT,Master,F,analyst,2,30
2,David,HR,Master,M,analyst,2,30


#### Find the employees whose title is associate and whose salary is above 55.

In [26]:
employees.loc[(employees.Title=='associate') & (employees.Salary > 55)]

Unnamed: 0,Name,Department,Education,Gender,Title,Years,Salary
7,Pedro,IT,Phd,M,associate,7,60


#### Group the employees by number of years of employment. What are the average salaries in each group?

In [27]:
employees.groupby(by='Years').mean()[['Salary']].reset_index()

Unnamed: 0,Years,Salary
0,1,35.0
1,2,38.333333
2,3,55.0
3,4,35.0
4,7,60.0
5,8,70.0


####  What is the average salary per title?

In [28]:
employees.groupby(by='Title').mean()[['Salary']].reset_index()

Unnamed: 0,Title,Salary
0,VP,70.0
1,analyst,32.5
2,associate,56.666667


####  Find the salary quartiles.


In [29]:
employees[['Salary']].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Salary,9.0,48.888889,16.541194,30.0,35.0,55.0,60.0,70.0


In [30]:
employees[['Salary']].describe().T[['25%','75%']]

Unnamed: 0,25%,75%
Salary,35.0,60.0


In [31]:
# another way to find the quartiles
employees[['Salary']].quantile([0.25, 0.75])

Unnamed: 0,Salary
0.25,35.0
0.75,60.0


#### Is the mean salary different per gender?

In [32]:
employees.groupby(by='Gender').mean()[['Salary']].reset_index()

Unnamed: 0,Gender,Salary
0,F,47.5
1,M,50.0


#### Find the minimum, mean and maximum of all numeric columns for each company department.



In [33]:
# all numeric columns : .select_dtypes(include='number')
employees.groupby(by='Department').min().select_dtypes(include='number')

Unnamed: 0_level_0,Years,Salary
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,2,30
IT,1,30
Sales,2,55


In [34]:
employees.groupby(by='Department').max().select_dtypes(include='number')

Unnamed: 0_level_0,Years,Salary
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,8,70
IT,8,70
Sales,3,55


In [35]:
employees.groupby(by='Department').mean().select_dtypes(include='number')

Unnamed: 0_level_0,Years,Salary
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,4.666667,45.0
IT,4.5,48.75
Sales,2.5,55.0


#### Bonus Question:  for each department, compute the difference between the maximum and the minimum salary.
**Hint**: try using `agg` or `apply` combined with `lambda` functions.

In [36]:
pd.DataFrame(employees.groupby(by='Department').apply(lambda x : x.Salary.max() - x.Salary.min()))

Unnamed: 0_level_0,0
Department,Unnamed: 1_level_1
HR,40
IT,40
Sales,0


In [37]:
employees.groupby(by='Department')[['Salary']].apply(lambda x : x.max() - x.min())

Unnamed: 0_level_0,Salary
Department,Unnamed: 1_level_1
HR,40
IT,40
Sales,0


# [ONLY ONE MANDATORY] Challenge 3
#### Open the `Orders` dataset. Name your dataset `orders`.

In [38]:
#reading zipfile

import zipfile
zf = zipfile.ZipFile('../data/Orders.zip') 
orders = pd.read_csv(zf.open('Orders.csv'), index_col=0)
orders

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30
1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00
3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
...,...,...,...,...,...,...,...,...,...,...,...,...,...
541904,581587,22613,2011,12,5,12,pack of 20 spaceboy napkins,12,2011-12-09 12:50:00,0.85,12680,France,10.20
541905,581587,22899,2011,12,5,12,children's apron dolly girl,6,2011-12-09 12:50:00,2.10,12680,France,12.60
541906,581587,23254,2011,12,5,12,childrens cutlery dolly girl,4,2011-12-09 12:50:00,4.15,12680,France,16.60
541907,581587,23255,2011,12,5,12,childrens cutlery circus parade,4,2011-12-09 12:50:00,4.15,12680,France,16.60


#### Explore your dataset by looking at the data types and summary statistics. Comment your results.

In [39]:
orders.dtypes

InvoiceNo         int64
StockCode        object
year              int64
month             int64
day               int64
hour              int64
Description      object
Quantity          int64
InvoiceDate      object
UnitPrice       float64
CustomerID        int64
Country          object
amount_spent    float64
dtype: object

In [40]:
orders.describe()

Unnamed: 0,InvoiceNo,year,month,day,hour,Quantity,UnitPrice,CustomerID,amount_spent
count,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0,397924.0
mean,560617.126645,2010.934259,7.612537,3.614555,12.728247,13.021823,3.116174,15294.315171,22.394749
std,13106.167695,0.247829,3.416527,1.928274,2.273535,180.42021,22.096788,1713.169877,309.055588
min,536365.0,2010.0,1.0,1.0,6.0,1.0,0.0,12346.0,0.0
25%,549234.0,2011.0,5.0,2.0,11.0,2.0,1.25,13969.0,4.68
50%,561893.0,2011.0,8.0,3.0,13.0,6.0,1.95,15159.0,11.8
75%,572090.0,2011.0,11.0,5.0,14.0,12.0,3.75,16795.0,19.8
max,581587.0,2011.0,12.0,7.0,20.0,80995.0,8142.75,18287.0,168469.6


In [41]:
"""
your comments here
"""

'\nyour comments here\n'

####  What is the average purchase price?

In [42]:
orders['amount_spent'].mean()

22.39474850474768

#### What are the highest and lowest purchase prices? 

In [43]:
orders['amount_spent'].max()

168469.6

In [44]:
orders['amount_spent'].min()

0.0

#### Select all the customers from Spain.
**Hint**: Remember that you are not asked to find orders from Spain but customers. A customer might have more than one order associated. 

In [45]:
orders.loc[orders['Country']=='Spain',:]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
6421,536944,22383,2010,12,5,12,lunch bag suki design,70,2010-12-03 12:20:00,1.65,12557,Spain,115.50
6422,536944,22384,2010,12,5,12,lunch bag pink polkadot,100,2010-12-03 12:20:00,1.45,12557,Spain,145.00
6423,536944,20727,2010,12,5,12,lunch bag black skull.,60,2010-12-03 12:20:00,1.65,12557,Spain,99.00
6424,536944,20725,2010,12,5,12,lunch bag red retrospot,70,2010-12-03 12:20:00,1.65,12557,Spain,115.50
6425,536944,20728,2010,12,5,12,lunch bag cars blue,100,2010-12-03 12:20:00,1.45,12557,Spain,145.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
535271,581193,23291,2011,12,3,17,dolly girl childrens cup,2,2011-12-07 17:05:00,1.25,17097,Spain,2.50
535272,581193,85232D,2011,12,3,17,set/3 decoupage stacking tins,1,2011-12-07 17:05:00,4.95,17097,Spain,4.95
535273,581193,22721,2011,12,3,17,set of 3 cake tins sketchbook,2,2011-12-07 17:05:00,1.95,17097,Spain,3.90
535274,581193,23241,2011,12,3,17,treasure tin gymkhana design,1,2011-12-07 17:05:00,2.08,17097,Spain,2.08


#### How many customers do we have in Spain?

In [46]:
orders.loc[orders['Country']=='Spain',:].Country.count()

2485

#### Select all the customers who have bought more than 50 items.
**Hint**: Remember that you are not asked to find orders with more than 50 items but customers who bought more than 50 items. A customer with two orders of 30 items each should appear in the selection.

In [47]:
orders_qtd_per_customer = orders.groupby(by='CustomerID', as_index=False)[['Quantity']].sum()

In [48]:
orders_qtd_per_customer.loc[orders_qtd_per_customer['Quantity'] > 50, :].sort_values(by='Quantity')

Unnamed: 0,CustomerID,Quantity
1864,14890,51
2502,15748,51
193,12587,51
2207,15350,51
2642,15945,52
...,...,...
0,12346,74215
55,12415,77670
1880,14911,80515
3009,16446,80997


#### Select orders from Spain that include more than 50 items.

In [49]:
orders.loc[(orders['Country']=='Spain') & (orders['Quantity'] > 50),:].sort_values(by='Quantity')

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
6423,536944,20727,2010,12,5,12,lunch bag black skull.,60,2010-12-03 12:20:00,1.65,12557,Spain,99.0
315702,564734,84826,2011,8,7,13,asstd design 3d paper stickers,60,2011-08-28 13:32:00,0.85,12484,Spain,51.0
6421,536944,22383,2010,12,5,12,lunch bag suki design,70,2010-12-03 12:20:00,1.65,12557,Spain,115.5
6424,536944,20725,2010,12,5,12,lunch bag red retrospot,70,2010-12-03 12:20:00,1.65,12557,Spain,115.5
495740,578321,84997B,2011,11,3,16,childrens cutlery retrospot red,72,2011-11-23 16:59:00,3.75,12557,Spain,270.0
426667,573362,22599,2011,10,7,13,christmas musical zinc star,72,2011-10-30 13:06:00,0.29,12597,Spain,20.88
426666,573362,22597,2011,10,7,13,musical zinc heart decoration,72,2011-10-30 13:06:00,0.29,12597,Spain,20.88
426665,573362,22598,2011,10,7,13,christmas musical zinc tree,72,2011-10-30 13:06:00,0.29,12597,Spain,20.88
398631,571255,82482,2011,10,5,17,wooden picture frame white finish,72,2011-10-14 17:13:00,2.55,12454,Spain,183.6
398626,571255,82494L,2011,10,5,17,wooden frame antique white,72,2011-10-14 17:13:00,2.55,12454,Spain,183.6


#### Select all free orders.

In [50]:
orders.loc[orders['UnitPrice'] == 0, :]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
9302,537197,22841,2010,12,7,14,round cake tin vintage green,1,2010-12-05 14:02:00,0.0,12647,Germany,0.0
33576,539263,22580,2010,12,4,14,advent calendar gingham sack,4,2010-12-16 14:36:00,0.0,16560,United Kingdom,0.0
40089,539722,22423,2010,12,2,13,regency cakestand 3 tier,10,2010-12-21 13:45:00,0.0,14911,EIRE,0.0
47068,540372,22090,2011,1,4,16,paper bunting retrospot,24,2011-01-06 16:41:00,0.0,13081,United Kingdom,0.0
47070,540372,22553,2011,1,4,16,plasters in tin skulls,24,2011-01-06 16:41:00,0.0,13081,United Kingdom,0.0
56674,541109,22168,2011,1,4,15,organiser wood antique white,1,2011-01-13 15:10:00,0.0,15107,United Kingdom,0.0
86789,543599,84535B,2011,2,4,13,fairy cakes notebook a6 size,16,2011-02-10 13:08:00,0.0,17560,United Kingdom,0.0
130188,547417,22062,2011,3,3,10,ceramic bowl with love heart design,36,2011-03-23 10:25:00,0.0,13239,United Kingdom,0.0
139453,548318,22055,2011,3,3,12,mini cake stand hanging strawbery,5,2011-03-30 12:45:00,0.0,13113,United Kingdom,0.0
145208,548871,22162,2011,4,1,14,heart garland rustic padded,2,2011-04-04 14:42:00,0.0,14410,United Kingdom,0.0


#### Select all orders whose description starts with `lunch bag`.
**Hint**: use string functions.

In [51]:
orders_lunch_bag = orders.loc[orders['Description'].str.contains('lunch bag'), :]

#### Select all `lunch bag` orders made in 2011.

In [52]:
orders_lunch_bag.loc[orders['year']==2011, :]

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
42678,540015,20725,2011,1,2,11,lunch bag red retrospot,10,2011-01-04 11:40:00,1.65,13319,United Kingdom,16.50
42679,540015,20726,2011,1,2,11,lunch bag woodland,10,2011-01-04 11:40:00,1.65,13319,United Kingdom,16.50
42851,540023,22382,2011,1,2,12,lunch bag spaceboy design,2,2011-01-04 12:58:00,1.65,15039,United Kingdom,3.30
42852,540023,20726,2011,1,2,12,lunch bag woodland,1,2011-01-04 12:58:00,1.65,15039,United Kingdom,1.65
43616,540098,22384,2011,1,2,15,lunch bag pink polkadot,1,2011-01-04 15:50:00,1.65,16241,United Kingdom,1.65
...,...,...,...,...,...,...,...,...,...,...,...,...,...
540436,581486,23207,2011,12,5,9,lunch bag alphabet design,10,2011-12-09 09:38:00,1.65,17001,United Kingdom,16.50
541695,581538,20727,2011,12,5,11,lunch bag black skull.,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
541696,581538,20725,2011,12,5,11,lunch bag red retrospot,1,2011-12-09 11:34:00,1.65,14446,United Kingdom,1.65
541862,581581,23681,2011,12,5,12,lunch bag red vintage doily,10,2011-12-09 12:20:00,1.65,17581,United Kingdom,16.50


#### Show the frequency distribution of the amount spent in Spain.

In [53]:
orders.loc[orders['Country']=='Spain',:]['amount_spent'].value_counts()

15.00     186
17.70     122
19.80      99
17.40      86
10.20      76
         ... 
29.85       1
7.56        1
280.00      1
360.00      1
4.74        1
Name: amount_spent, Length: 316, dtype: int64

#### Select all orders made in the month of August.

In [54]:
orders_Aug= orders.loc[orders['month']==8,:]
orders_Aug

Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
285421,561904,22075,2011,8,1,8,6 ribbons elegant christmas,96,2011-08-01 08:30:00,1.45,17941,United Kingdom,139.20
285422,561904,85049E,2011,8,1,8,scandinavian reds ribbons,156,2011-08-01 08:30:00,1.06,17941,United Kingdom,165.36
285423,561905,21385,2011,8,1,9,ivory hanging decoration heart,24,2011-08-01 09:31:00,0.85,14947,United Kingdom,20.40
285424,561905,84970L,2011,8,1,9,single heart zinc t-light holder,12,2011-08-01 09:31:00,0.95,14947,United Kingdom,11.40
285425,561905,84970S,2011,8,1,9,hanging heart zinc t-light holder,12,2011-08-01 09:31:00,0.85,14947,United Kingdom,10.20
...,...,...,...,...,...,...,...,...,...,...,...,...,...
320688,565067,22644,2011,8,3,17,ceramic cherry cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90
320689,565067,22645,2011,8,3,17,ceramic heart fairy cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90
320690,565067,22637,2011,8,3,17,piggy bank retrospot,2,2011-08-31 17:16:00,2.55,15856,United Kingdom,5.10
320691,565067,22646,2011,8,3,17,ceramic strawberry cake money bank,2,2011-08-31 17:16:00,1.45,15856,United Kingdom,2.90


#### Find the number of orders made by each country in the month of August.
**Hint**: Use value_counts().

In [55]:
orders_Aug['Country'].value_counts()

United Kingdom     23105
Germany              795
EIRE                 593
France               569
Netherlands          280
Switzerland          267
Spain                252
Belgium              194
Israel               171
Channel Islands      140
Australia            107
Italy                 95
Austria               88
Norway                77
Finland               61
Malta                 55
Portugal              41
Sweden                40
Unspecified           23
Iceland               22
Poland                17
Denmark               16
Canada                 5
Name: Country, dtype: int64

#### What's the  average amount of money spent by country?

In [56]:
orders.groupby(by='Country',as_index=False)[['amount_spent']].mean()

Unnamed: 0,Country,amount_spent
0,Australia,116.89562
1,Austria,25.624824
2,Bahrain,32.258824
3,Belgium,20.283772
4,Brazil,35.7375
5,Canada,24.280662
6,Channel Islands,27.34016
7,Cyprus,22.134169
8,Czech Republic,33.0696
9,Denmark,49.882474


#### What's the most expensive item?

In [57]:
orders[['Description','UnitPrice']].max()

Description    zinc wire sweetheart letter tray
UnitPrice                               8142.75
dtype: object

#### What is the average amount spent per year?

In [58]:
orders.groupby(by='year', as_index=False)[['amount_spent']].mean()

Unnamed: 0,year,amount_spent
0,2010,21.892733
1,2011,22.430074


In [59]:
orders['year'].unique()

array([2010, 2011], dtype=int64)