### Question

"Find the second heaviest animal in the table from each zoo.  In addition to the animal information, return the zoo name as well"

### Pandas

The classic approach to answering the question here is to read all the data into Pandas, then do whatever you need to do.  It works well enough here.  In real world situations, this might not be an option due to network latency pulling data from the database, or hardware constraints trying to hold everything in memory and process it.

In [1]:
from sqlalchemy import create_engine

engine = create_engine("sqlite:///../zoo_animals.db", echo=True)
engine

Engine(sqlite:///../zoo_animals.db)

In [2]:
import pandas

dfa = pandas.read_sql("select * from animals", engine)
dfa.head()

2021-04-30 21:11:59,068 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2021-04-30 21:11:59,069 INFO sqlalchemy.engine.base.Engine ()
2021-04-30 21:11:59,070 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2021-04-30 21:11:59,071 INFO sqlalchemy.engine.base.Engine ()
2021-04-30 21:11:59,073 INFO sqlalchemy.engine.base.Engine PRAGMA main.table_info("select * from animals")
2021-04-30 21:11:59,074 INFO sqlalchemy.engine.base.Engine ()
2021-04-30 21:11:59,076 INFO sqlalchemy.engine.base.Engine PRAGMA temp.table_info("select * from animals")
2021-04-30 21:11:59,077 INFO sqlalchemy.engine.base.Engine ()
2021-04-30 21:11:59,079 INFO sqlalchemy.engine.base.OptionEngine select * from animals
2021-04-30 21:11:59,080 INFO sqlalchemy.engine.base.OptionEngine ()


Unnamed: 0,id,weight,zoo_id
0,1117,1000,9
1,1410,690,9
2,1411,690,12
3,1415,695,9
4,9821,750,11


In [3]:
dfz = pandas.read_sql("select * from zoos", engine)
dfz.head()

2021-04-30 21:11:59,097 INFO sqlalchemy.engine.base.Engine PRAGMA main.table_info("select * from zoos")
2021-04-30 21:11:59,098 INFO sqlalchemy.engine.base.Engine ()
2021-04-30 21:11:59,100 INFO sqlalchemy.engine.base.Engine PRAGMA temp.table_info("select * from zoos")
2021-04-30 21:11:59,101 INFO sqlalchemy.engine.base.Engine ()
2021-04-30 21:11:59,104 INFO sqlalchemy.engine.base.OptionEngine select * from zoos
2021-04-30 21:11:59,106 INFO sqlalchemy.engine.base.OptionEngine ()


Unnamed: 0,id,name
0,9,NY
1,11,LA
2,12,Detroit


In [4]:
# merge on zoo_id, rename dfz .name to .zoo_name
df = dfa.merge(dfz.add_prefix('zoo_'), on='zoo_id')
df

Unnamed: 0,id,weight,zoo_id,zoo_name
0,1117,1000,9,NY
1,1410,690,9,NY
2,1415,695,9,NY
3,1411,690,12,Detroit
4,9992,1040,12,Detroit
5,99929,1090,12,Detroit
6,9821,750,11,LA
7,12993,2190,11,LA
8,96673,580,11,LA


In [5]:
df.sort_values(['zoo_id', 'weight'])

Unnamed: 0,id,weight,zoo_id,zoo_name
1,1410,690,9,NY
2,1415,695,9,NY
0,1117,1000,9,NY
8,96673,580,11,LA
6,9821,750,11,LA
7,12993,2190,11,LA
3,1411,690,12,Detroit
4,9992,1040,12,Detroit
5,99929,1090,12,Detroit


### Solution

In [6]:
# groupby preserves order of rows, so sorting first will get us what we want
df.sort_values('weight').groupby('zoo_id').nth(1)

Unnamed: 0_level_0,id,weight,zoo_name
zoo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
9,1415,695,NY
11,9821,750,LA
12,9992,1040,Detroit
