## Import the Pandas library and create DataFrame

Before doing anything else, you'll need to import Pandas and get some data to work with.

In [1]:
import pandas as pd

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Claire', 'David', 'Emma', 'Alice', 'David'],
    'Age': [25, 30, 22, 28, 35, 25, 28],
    'Department': ['IT', 'HR', 'Finance', 'IT', 'Marketing', 'HR', 'IT'],
    'Gender': ['Female', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male'],
    'Salary': [50000, 60000, 45000, 70000, 80000, 52000, 72000],
    'Rating': [4.5, 3.8, 4.2, 4.0, 4.7, 4.3, 3.9]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Department,Gender,Salary,Rating
0,Alice,25,IT,Female,50000,4.5
1,Bob,30,HR,Male,60000,3.8
2,Claire,22,Finance,Female,45000,4.2
3,David,28,IT,Male,70000,4.0
4,Emma,35,Marketing,Female,80000,4.7
5,Alice,25,HR,Female,52000,4.3
6,David,28,IT,Male,72000,3.9


## 1. Grouping

Working with data, you'll often wish to know less about how every entity looks than how they look in aggregates.
- You can't know *each customer's* favorite time to shop, but you can know when *customers in different age groups* shop most often.
- You may not wish to know how *every product* sells, but rather how *products in different departments* are selling right now.

Grouping data by some feature allows you to collect those entries sharing some common feature and evaluate them collectively.

### 1.1 `.groupby()`

The first step in grouping with Pandas is to choose which feature (i.e. which column) will be be used to determine the groups.

In [None]:
df.groupby("Department")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7d62f815b610>

**What is this strange output?**

By itself, `.groupby()` will only lay the groundwork for grouping. To get any useful information, you'll need to specifiy some sort of aggregate: A way to describe many entries at once.

### 1.2 Aggregate functions
From your work with statistics, you will already know several aggregate functions, such as mean, median or standard deviation.  

Pandas provides several built-in functions to summarize data, some of which you'll find listed below. While many methods can be applied to Series or DataFrame as a whole, today we'll how they can be used to describe grouped data.

| Aggregate function | Description                                    |
|---------------------|------------------------------------------------|
| [count()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.count.html)             | Returns count for each group                   |
| [size()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.size.html?highlight=size#pandas.DataFrame.size)             | Returns size for each group                    |
| [sum()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.html)               | Returns total sum for each group               |
| [mean() ](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html)              | Returns mean for each group. Same as average() |
| [std()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.std.html)               | Returns standard deviation for each group      |
| [var() ](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.var.html)              | Returns variance for each group                      |
| [describe()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html)          | Returns different descriptive statistics                   |
| [min() ](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.min.html)               | Returns minimum value for each group           |
| [max() ](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.max.html#pandas.DataFrame.max)              | Returns maximum value for each group           |

Applied to a DataFrame, `.mean()` will return the numerical mean of each column. Since this calculation can only be performed on numbers, an error will be returned on DataFrame containing non-numerical values unless `numeric_only=True`.

In [None]:
df.mean(numeric_only=True)

Age          27.571429
Salary    61285.714286
Rating        4.200000
dtype: float64

The same method can be used on a single Series to get its mean.

In [None]:
df['Age'].mean()

27.571428571428573

The method `.count()` will simply count the number of non-null entries.

In [None]:
df['Age'].count()

7

Now let's see what happens if we use `.mean()` after grouping:

In [None]:
df.groupby('Department').mean(numeric_only=True)

Unnamed: 0_level_0,Age,Salary,Rating
Department,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Finance,22.0,45000.0,4.2
HR,27.5,56000.0,4.05
IT,27.0,64000.0,4.133333
Marketing,35.0,80000.0,4.7


Let's break down what we see here:
- The index has changed. Now each **unique** value from the "Department" column has become an index.
- Each column now represents the mean of that feature *with respect to the __group__*.
- Since only one person works in Finance, we see the same numbers as appear on line 2 of `df`.
- HR has two members — earning 52,000 and 60,000 per year — so the grouped DataFrame shows a mean earning of 56,000 in HR.

#### 1.2.1 Aggregating on one column

To see aggregate data about a single column, you can select that column with the indexing operator after performing the grouping.

In [None]:
df.groupby('Gender')['Age'].mean()

Gender
Female    26.750000
Male      28.666667
Name: Age, dtype: float64

#### 1.2.2 Aggregating on two or more columns
Multiple columns can be chosen to perform one aggregate function on by giving a list of column labels to the indexing operator

In [None]:
df.groupby('Gender')[['Age','Salary']].mean()

Unnamed: 0_level_0,Age,Salary
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,26.75,56750.0
Male,28.666667,67333.333333


#### 1.2.3 Multiple aggregate functions with `.agg()`
A wonderfully flexible method for performing multiple aggregating functions, .agg() can take a variety of input-formats to fit your needs.

To calculate multiple aggregates on one column, use a list of aggregate functions as strings.

In [None]:
df.groupby('Department')['Salary'].agg(['mean', 'median'])

Unnamed: 0_level_0,mean,median
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,45000.0,45000.0
HR,56000.0,56000.0
IT,64000.0,70000.0
Marketing,80000.0,80000.0


Aggregates can be applied to specific columns by using a dictionary of columns as keys and aggregate functions as values.

In [None]:
df.groupby('Department').agg( {'Age':'max', 'Salary': 'mean'})

Unnamed: 0_level_0,Age,Salary
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,22,45000.0
HR,30,56000.0
IT,28,64000.0
Marketing,35,80000.0


You can also give a list of functions as a value in your dictionary to perform more than one aggregate function on the same column.

In [None]:
df.groupby('Department').agg( {'Age':'max', 'Salary': ['mean','max']})

Unnamed: 0_level_0,Age,Salary,Salary
Unnamed: 0_level_1,max,mean,max
Department,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Finance,22,45000.0,45000
HR,30,56000.0,60000
IT,28,64000.0,72000
Marketing,35,80000.0,80000


Output columns can even be given custom names by providing named tuples as arguments.

In [None]:
df.groupby('Department').agg( age_max=('Age', 'max'), salary_mean=('Salary', 'mean'))

Unnamed: 0_level_0,age_max,salary_mean
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,22,45000.0
HR,30,56000.0
IT,28,64000.0
Marketing,35,80000.0


### 1.3 Grouping on multiple features

`.groupby()` accepts several columns as a list, which creates groups with subgroups.  

Let's find the average age and salary of each gender inside each department:

In [None]:
df.groupby(['Department','Gender'])[['Age','Salary']].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Age,Salary
Department,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Finance,Female,22.0,45000.0
HR,Female,25.0,52000.0
HR,Male,30.0,60000.0
IT,Female,25.0,50000.0
IT,Male,28.0,71000.0
Marketing,Female,35.0,80000.0


## Merging

In [None]:
new_data = {
    'Name': ['Alice', 'Bob', 'Claire', 'David', 'Emma', 'Alice', 'David', 'Erick'],
    'Age': [25, 30, 22, 28, 35, 25, 28, 21],
    'Department': ['IT', 'HR', 'Finance', 'IT', 'Marketing', 'HR', 'IT', 'Temp'],
    'Gender': ['Female', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Male'],
    'Salary': [50000, 60000, 45000, 70000, 80000, 52000, 72000, 31000],
    'Rating': [4.5, 3.8, 4.2, 4.0, 4.7, 4.3, 3.9, 4.1]
}
new_df = pd.DataFrame(new_data)

departments_data = {
    'Department': ['HR', 'Finance', 'Engineering', 'Marketing', 'IT'],
    'Location' : ['2nd floor', '2nd floor', '1st floor', '3rd floor', '1st floor'],
    'EstablishedYear': [2000, 1998, 2005, 2010,2002]
}
departments_df = pd.DataFrame(departments_data)
departments_df

Unnamed: 0,Department,Location,EstablishedYear
0,HR,2nd floor,2000
1,Finance,2nd floor,1998
2,Engineering,1st floor,2005
3,Marketing,3rd floor,2010
4,IT,1st floor,2002


In [None]:
new_df

Unnamed: 0,Name,Age,Department,Gender,Salary,Rating
0,Alice,25,IT,Female,50000,4.5
1,Bob,30,HR,Male,60000,3.8
2,Claire,22,Finance,Female,45000,4.2
3,David,28,IT,Male,70000,4.0
4,Emma,35,Marketing,Female,80000,4.7
5,Alice,25,HR,Female,52000,4.3
6,David,28,IT,Male,72000,3.9
7,Erick,21,Temp,Male,31000,4.1


### 2.1 .merge()  


Often data from multiple sources needs to be considered together. Merging assures specified information is *aligned*, so that relevant pieces from the two DataFrames end up on the same row.

#### 2.1.1 Syntax

The `.merge()` method takes several agruments, so let's explore them:     
```
df.merge(df_2,
        left_on='df_column',      
        right_on='df_2_column',     
        how='inner')
```  
- `df` is the starting, or "left" DataFrame   
- The first argument, `df_2` is the DataFrame to merge with, also known as the "right"  
- `left_on` represents the column in the left DataFrame to join on  
- `right_on` represents the column in the right DataFrame to join on  
    - These columns should have shared information — values found in both DataFrames.
    - When each DataFrame has an appropriate column with the same name you can also use the single parameter `on` instead of `left_on` and `right_on`  
- `how` represents the type of merge (‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’). The default type is ‘inner’.

#### 2.1.2 Inner merge  
An inner merge will return only those rows for which matching information is present in *both* DataFrames.


In [None]:
new_df.merge(departments_df, left_on='Department',right_on='Department',how='inner')

Unnamed: 0,Name,Age,Department,Gender,Salary,Rating,Location,EstablishedYear
0,Alice,25,IT,Female,50000,4.5,1st floor,2002
1,David,28,IT,Male,70000,4.0,1st floor,2002
2,David,28,IT,Male,72000,3.9,1st floor,2002
3,Bob,30,HR,Male,60000,3.8,2nd floor,2000
4,Alice,25,HR,Female,52000,4.3,2nd floor,2000
5,Claire,22,Finance,Female,45000,4.2,2nd floor,1998
6,Emma,35,Marketing,Female,80000,4.7,3rd floor,2010


Notice that Erick does not appear in the merged DataFrame, because his position, Temp, is not listed in `departments_df`.

Likewise, there is nothing here about the Engineering department, because no employee is listed as working there.

#### 2.1.3 Left merge

A left merge will maintain *all* data from the left DataFrame, even if it does not align with a row of the right DataFrame.

In [None]:
new_df.merge(departments_df, on='Department',how='left')

Unnamed: 0,Name,Age,Department,Gender,Salary,Rating,Location,EstablishedYear
0,Alice,25,IT,Female,50000,4.5,1st floor,2002.0
1,Bob,30,HR,Male,60000,3.8,2nd floor,2000.0
2,Claire,22,Finance,Female,45000,4.2,2nd floor,1998.0
3,David,28,IT,Male,70000,4.0,1st floor,2002.0
4,Emma,35,Marketing,Female,80000,4.7,3rd floor,2010.0
5,Alice,25,HR,Female,52000,4.3,2nd floor,2000.0
6,David,28,IT,Male,72000,3.9,1st floor,2002.0
7,Erick,21,Temp,Male,31000,4.1,,


This time, Erick made it into the merged DataFrame, and his row shows null values in location and established year.

Again, nothing about Engineering, because nobody seems to work there.

#### 2.1.4 Right merge

A right merge will maintain *all* data from the right DataFrame, even if it does not align with a row of the left DataFrame.

In [None]:
new_df.merge(departments_df, on='Department',how='right')

Unnamed: 0,Name,Age,Department,Gender,Salary,Rating,Location,EstablishedYear
0,Bob,30.0,HR,Male,60000.0,3.8,2nd floor,2000
1,Alice,25.0,HR,Female,52000.0,4.3,2nd floor,2000
2,Claire,22.0,Finance,Female,45000.0,4.2,2nd floor,1998
3,,,Engineering,,,,1st floor,2005
4,Emma,35.0,Marketing,Female,80000.0,4.7,3rd floor,2010
5,Alice,25.0,IT,Female,50000.0,4.5,1st floor,2002
6,David,28.0,IT,Male,70000.0,4.0,1st floor,2002
7,David,28.0,IT,Male,72000.0,3.9,1st floor,2002


This time, a row in the merged DataFrame includes Engineering, but all columns with employee information contain null values.

Erick has been left out again, because there is no department for his Temp position.

## Challenges

In [27]:
data1 = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'San Francisco', 'London', 'Paris', 'Berlin', 'Rome', 'Tokyo'],
    'Country': ['USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'France', 'Germany', 'Italy', 'Japan'],
    'Population (Millions)': [8.4, 3.9, 2.7, 2.3, 0.9, 8.9, 2.1, 3.7, 2.8, 13.9],
    'Area (km2)': [468.9, 502.8, 227.6, 1, 121.4, 1572, 105.4, 891.8, 1285, 2187],
    'Language': ['English', 'English', 'English', 'English', 'English', 'English', 'French', 'German', 'Italian', 'Japanese'],
    'Currency': ['USD', 'USD', 'USD', 'USD', 'USD', 'GBP', 'EUR', 'EUR', 'EUR', 'JPY'],
    'Continent': ['North America', 'North America', 'North America', 'North America', 'North America', 'Europe', 'Europe', 'Europe', 'Europe', 'Asia'],
    'Is_Capital': [False, False, False, False, False, True, True, True, True, True]
}

cities_info = pd.DataFrame(data1)
cities_info

Unnamed: 0,City,Country,Population (Millions),Area (km2),Language,Currency,Continent,Is_Capital
0,New York,USA,8.4,468.9,English,USD,North America,False
1,Los Angeles,USA,3.9,502.8,English,USD,North America,False
2,Chicago,USA,2.7,227.6,English,USD,North America,False
3,Houston,USA,2.3,1.0,English,USD,North America,False
4,San Francisco,USA,0.9,121.4,English,USD,North America,False
5,London,UK,8.9,1572.0,English,GBP,Europe,True
6,Paris,France,2.1,105.4,French,EUR,Europe,True
7,Berlin,Germany,3.7,891.8,German,EUR,Europe,True
8,Rome,Italy,2.8,1285.0,Italian,EUR,Europe,True
9,Tokyo,Japan,13.9,2187.0,Japanese,JPY,Asia,True


In [28]:
data2 = {
    'City': ['New York', 'Chicago', 'Houston', 'London', 'Paris', 'Berlin', 'Rome', 'Tokyo', 'Lagos'],
    'Temperature (Celsius)': [18, 21, 30, 15, 19, 20, 24, 29, 34],
    'Humidity (%)': [65, 70, 75, 75, 80, 68, 72, 74, 83],
    'Rainfall (mm)': [50, 75, 20, 60, 40, 45, 35, 10, 122],
}

cities_weather = pd.DataFrame(data2)
cities_weather

Unnamed: 0,City,Temperature (Celsius),Humidity (%),Rainfall (mm)
0,New York,18,65,50
1,Chicago,21,70,75
2,Houston,30,75,20
3,London,15,75,60
4,Paris,19,80,40
5,Berlin,20,68,45
6,Rome,24,72,35
7,Tokyo,29,74,10
8,Lagos,34,83,122


### Challenge 1
Calculate the average population of cities in each continent using the groupby method

In [5]:
#Helmut
continent_avg_pop = cities_info.groupby('Continent')['Population (Millions)'].mean()
continent_avg_pop

Unnamed: 0_level_0,Population (Millions)
Continent,Unnamed: 1_level_1
Asia,13.9
Europe,4.375
North America,3.64


In [6]:
#Uliana
cities_info.groupby('Continent').agg(avg_population=('Population (Millions)','mean'))

Unnamed: 0_level_0,avg_population
Continent,Unnamed: 1_level_1
Asia,13.9
Europe,4.375
North America,3.64


In [8]:
#Bernard
cities_info.groupby('Continent')['Population (Millions)'].agg('mean')

Unnamed: 0_level_0,Population (Millions)
Continent,Unnamed: 1_level_1
Asia,13.9
Europe,4.375
North America,3.64


In [9]:
#alt
cities_info.groupby('Continent').agg({'Population (Millions)': 'mean'})

Unnamed: 0_level_0,Population (Millions)
Continent,Unnamed: 1_level_1
Asia,13.9
Europe,4.375
North America,3.64


### Challenge 2
Find the city with the highest population in each country using the groupby method and agg function

In [12]:
#Helmut
city_max_pop = cities_info.groupby('City')['Population (Millions)'].max()
city_max_pop

Unnamed: 0_level_0,Population (Millions)
City,Unnamed: 1_level_1
Berlin,3.7
Chicago,2.7
Houston,2.3
London,8.9
Los Angeles,3.9
New York,8.4
Paris,2.1
Rome,2.8
San Francisco,0.9
Tokyo,13.9


In [11]:
#Nikhil
cities_info.groupby('Country')['Population (Millions)'].agg('max')

Unnamed: 0_level_0,Population (Millions)
Country,Unnamed: 1_level_1
France,2.1
Germany,3.7
Italy,2.8
Japan,13.9
UK,8.9
USA,8.4


In [10]:
#Nora
cities_info.groupby('Country').agg({'City': 'max', 'Population (Millions)': 'max'})

Unnamed: 0_level_0,City,Population (Millions)
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
France,Paris,2.1
Germany,Berlin,3.7
Italy,Rome,2.8
Japan,Tokyo,13.9
UK,London,8.9
USA,San Francisco,8.4


In [20]:
#alt
cities_info.groupby('Country').agg({'City': 'first', 'Population (Millions)': 'max'}).sort_values(by='Population (Millions)', ascending=False)

Unnamed: 0_level_0,City,Population (Millions)
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
Japan,Tokyo,13.9
UK,London,8.9
USA,New York,8.4
Germany,Berlin,3.7
Italy,Rome,2.8
France,Paris,2.1


In [18]:
cities_info.groupby(['Country', 'City'])['Population (Millions)'].agg('max').sort_values(ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Population (Millions)
Country,City,Unnamed: 2_level_1
Japan,Tokyo,13.9
UK,London,8.9
USA,New York,8.4
USA,Los Angeles,3.9
Germany,Berlin,3.7
Italy,Rome,2.8
USA,Chicago,2.7
USA,Houston,2.3
France,Paris,2.1
USA,San Francisco,0.9


### Challenge 3
Merge the cities_info and cities_weather DataFrames using a right join to keep only the cities for which weather data is available.

In [23]:
cities_weather

Unnamed: 0,City,Temperature (Celsius),Humidity (%),Rainfall (mm)
0,New York,18,65,50
1,Chicago,21,70,75
2,Houston,30,75,20
3,London,15,75,60
4,Paris,19,80,40
5,Berlin,20,68,45
6,Rome,24,72,35
7,Tokyo,29,74,10
8,Lagos,34,83,122


In [24]:
cities_info

Unnamed: 0,City,Country,Population (Millions),Area (km2),Language,Currency,Continent,Is_Capital
0,New York,USA,8.4,468.9,English,USD,North America,False
1,Los Angeles,USA,3.9,502.8,English,USD,North America,False
2,Chicago,USA,2.7,227.6,English,USD,North America,False
3,Houston,USA,2.3,1.0,English,USD,North America,False
4,San Francisco,USA,0.9,121.4,English,USD,North America,False
5,London,UK,8.9,1572.0,English,GBP,Europe,True
6,Paris,France,2.1,105.4,French,EUR,Europe,True
7,Berlin,Germany,3.7,891.8,German,EUR,Europe,True
8,Rome,Italy,2.8,1285.0,Italian,EUR,Europe,True
9,Tokyo,Japan,13.9,2187.0,Japanese,JPY,Asia,True


In [31]:
#Monika
new_df = cities_info.merge(cities_weather, on='City',how='right')
new_df

Unnamed: 0,City,Country,Population (Millions),Area (km2),Language,Currency,Continent,Is_Capital,Temperature (Celsius),Humidity (%),Rainfall (mm)
0,New York,USA,8.4,468.9,English,USD,North America,False,18,65,50
1,Chicago,USA,2.7,227.6,English,USD,North America,False,21,70,75
2,Houston,USA,2.3,1.0,English,USD,North America,False,30,75,20
3,London,UK,8.9,1572.0,English,GBP,Europe,True,15,75,60
4,Paris,France,2.1,105.4,French,EUR,Europe,True,19,80,40
5,Berlin,Germany,3.7,891.8,German,EUR,Europe,True,20,68,45
6,Rome,Italy,2.8,1285.0,Italian,EUR,Europe,True,24,72,35
7,Tokyo,Japan,13.9,2187.0,Japanese,JPY,Asia,True,29,74,10
8,Lagos,,,,,,,,34,83,122


### Challenge 4
Group the cities by their 'Language' and find the total area and total rainfall for each language group

In [29]:
#Uliana
cities_weather_merged = cities_info.merge(cities_weather, on='City', how='left')
cities_weather_merged.groupby('Language').agg(total_area=('Area (km2)','sum'),total_rainfall=('Rainfall (mm)','sum'))

Unnamed: 0_level_0,total_area,total_rainfall
Language,Unnamed: 1_level_1,Unnamed: 2_level_1
English,2893.7,205.0
French,105.4,40.0
German,891.8,45.0
Italian,1285.0,35.0
Japanese,2187.0,10.0


In [30]:
#alt
(
    cities_info
    .merge(cities_weather, on='City', how='left')
    .groupby('Language')
    [['Area (km2)', 'Rainfall (mm)']]
    .sum()
)

Unnamed: 0_level_0,Area (km2),Rainfall (mm)
Language,Unnamed: 1_level_1,Unnamed: 2_level_1
English,2893.7,205.0
French,105.4,40.0
German,891.8,45.0
Italian,1285.0,35.0
Japanese,2187.0,10.0


### Challenge 5
Calculate the sum of rainfall in each country using the groupby method.

In [32]:
#Nora
pd.merge(cities_info, cities_weather, on='City', how='left').groupby('Country')['Rainfall (mm)'].sum()

Unnamed: 0_level_0,Rainfall (mm)
Country,Unnamed: 1_level_1
France,40.0
Germany,45.0
Italy,35.0
Japan,10.0
UK,60.0
USA,145.0


### Challenge 6
Group the cities by their 'Continent' and find the average temperature and humidity for each continent

In [33]:
#Nora
pd.merge(cities_info, cities_weather, on='City', how='left').groupby('Continent').agg({'Temperature (Celsius)': 'mean','Humidity (%)': 'mean'})

Unnamed: 0_level_0,Temperature (Celsius),Humidity (%)
Continent,Unnamed: 1_level_1,Unnamed: 2_level_1
Asia,29.0,74.0
Europe,19.5,73.75
North America,23.0,70.0


In [34]:
#Monika
pd.merge(cities_info, cities_weather, on='City', how='right').groupby(['Continent','City'])[['Temperature (Celsius)','Humidity (%)']].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Temperature (Celsius),Humidity (%)
Continent,City,Unnamed: 2_level_1,Unnamed: 3_level_1
Asia,Tokyo,29.0,74.0
Europe,Berlin,20.0,68.0
Europe,London,15.0,75.0
Europe,Paris,19.0,80.0
Europe,Rome,24.0,72.0
North America,Chicago,21.0,70.0
North America,Houston,30.0,75.0
North America,New York,18.0,65.0


### Challenge 7
Group the cities by their 'Is_Capital' and 'Language' columns and find the maximum population and minimum area for each combination.

In [36]:
cities_info

Unnamed: 0,City,Country,Population (Millions),Area (km2),Language,Currency,Continent,Is_Capital
0,New York,USA,8.4,468.9,English,USD,North America,False
1,Los Angeles,USA,3.9,502.8,English,USD,North America,False
2,Chicago,USA,2.7,227.6,English,USD,North America,False
3,Houston,USA,2.3,1.0,English,USD,North America,False
4,San Francisco,USA,0.9,121.4,English,USD,North America,False
5,London,UK,8.9,1572.0,English,GBP,Europe,True
6,Paris,France,2.1,105.4,French,EUR,Europe,True
7,Berlin,Germany,3.7,891.8,German,EUR,Europe,True
8,Rome,Italy,2.8,1285.0,Italian,EUR,Europe,True
9,Tokyo,Japan,13.9,2187.0,Japanese,JPY,Asia,True


In [37]:
#Nora
pd.merge(cities_info, cities_weather, on='City', how='left').groupby(['Is_Capital', 'Language']).agg({'Population (Millions)': 'max', 'Area (km2)': 'min'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Population (Millions),Area (km2)
Is_Capital,Language,Unnamed: 2_level_1,Unnamed: 3_level_1
False,English,8.4,1.0
True,English,8.9,1572.0
True,French,2.1,105.4
True,German,3.7,891.8
True,Italian,2.8,1285.0
True,Japanese,13.9,2187.0


In [38]:
#Uliana
cities_info.groupby(['Is_Capital','Language']).agg(max_population=('Population (Millions)','max'), min_area=('Area (km2)','min'))

Unnamed: 0_level_0,Unnamed: 1_level_0,max_population,min_area
Is_Capital,Language,Unnamed: 2_level_1,Unnamed: 3_level_1
False,English,8.4,1.0
True,English,8.9,1572.0
True,French,2.1,105.4
True,German,3.7,891.8
True,Italian,2.8,1285.0
True,Japanese,13.9,2187.0


In [39]:
#alt
cities_info.groupby(['Is_Capital', 'Language']).agg({'Country': 'first', 'Population (Millions)': 'max', 'Area (km2)': 'min'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Country,Population (Millions),Area (km2)
Is_Capital,Language,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,English,USA,8.4,1.0
True,English,UK,8.9,1572.0
True,French,France,2.1,105.4
True,German,Germany,3.7,891.8
True,Italian,Italy,2.8,1285.0
True,Japanese,Japan,13.9,2187.0


In [40]:
#Bernard
cities_info.groupby(['Is_Capital', 'Language'])[['Population (Millions)', 'Area (km2)']].agg(['max', 'min'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Population (Millions),Population (Millions),Area (km2),Area (km2)
Unnamed: 0_level_1,Unnamed: 1_level_1,max,min,max,min
Is_Capital,Language,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
False,English,8.4,0.9,502.8,1.0
True,English,8.9,8.9,1572.0,1572.0
True,French,2.1,2.1,105.4,105.4
True,German,3.7,3.7,891.8,891.8
True,Italian,2.8,2.8,1285.0,1285.0
True,Japanese,13.9,13.9,2187.0,2187.0
