# SELECT within SELECT

In [1]:
# Prerequesites
from pyhive import hive
eng = hive.connect(host='192.168.31.31', port=10000, auth='NOSASL', 
                   database='sqlzoo', username='jupyter')

%load_ext sql
%sql eng
%config SqlMagic.displaylimit = 50

## 1. Bigger than Russia

**List each country name where the population is larger than that of 'Russia'.**

```
world(name, continent, area, population, gdp)
```

In [2]:
%%sql
SELECT /*+ MAPJOIN (t2) */
    t1.name FROM
    (SELECT name, population, 0 AS aux FROM world) AS t1
    LEFT JOIN 
    (SELECT population, 0 AS aux FROM world WHERE name='Russia') AS t2
    ON (t1.aux = t2.aux)
    WHERE t1.population > t2.population

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


t1.name
Bangladesh
Brazil
China
India
Indonesia
Nigeria
Pakistan
United States


## 2. Richer than UK

**Show the countries in Europe with a per capita GDP greater than 'United Kingdom'.**

> _Per Capita GDP_   
> The per capita GDP is the gdp/population

In [3]:
%%sql
SELECT /*+ MAPJOIN (t2) */
    name FROM 
    (SELECT name, continent, gdp/population AS pcgdp, 0 AS aux 
     FROM world WHERE continent = 'Europe') AS t1
    LEFT JOIN
    (SELECT gdp/population AS pcgdp, 0 AS aux 
     FROM world WHERE name='United Kingdom') AS t2
    ON (t1.aux = t2.aux)
    WHERE (t1.pcgdp > t2.pcgdp)

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


name
Andorra
Austria
Belgium
Denmark
Finland
Germany
Iceland
Ireland
Liechtenstein
Luxembourg


## 3. Neighbours of Argentina and Australia


List the name and continent of countries in the continents containing either Argentina or Australia. Order by name of the country.

In [4]:
%%sql
SELECT name, continent FROM world
    WHERE continent IN 
    (SELECT continent FROM world WHERE name IN ('Argentina', 'Australia'))
    ORDER BY name

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


name,continent
Argentina,South America
Australia,Oceania
Bolivia,South America
Brazil,South America
Chile,South America
Colombia,South America
Ecuador,South America
Fiji,Oceania
Guyana,South America
Kiribati,Oceania


## 4. Between Canada and Poland

Which country has a population that is more than Canada but less than Poland? Show the name and the population.

In [5]:
%%sql
SELECT /*+ MAPJOIN (t2, t3) */
    t1.name, t1.population FROM
    (
        SELECT name, population, 0 AS aux FROM world
    ) AS t1
    LEFT JOIN
    (
        SELECT 0 AS aux, population 
        FROM world WHERE name = 'Canada'
    ) AS t2
    LEFT JOIN
    (
        SELECT 0 AS aux, population
        FROM world WHERE name = 'Poland'
    ) AS t3
    ON (t1.aux = t2.aux) AND (t1.aux = t3.aux)
    WHERE (t1.population > t2.population)
        AND (t1.population < t3.population)

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


t1.name,t1.population


## 5. Percentages of Germany

Germany (population 80 million) has the largest population of the countries in Europe. Austria (population 8.5 million) has 11% of the population of Germany.

**Show the name and the population of each country in Europe. Show the population as a percentage of the population of Germany.**

The format should be Name, Percentage for example:

name	| percentage
--------|-----------
Albania	| 3%
Andorra	| 0%
Austria	| 11%
...	| ...

> _Decimal places_   
> You can use the function ROUND to remove the decimal places.

> _Percent symbol %_   
> You can use the function CONCAT to add the percentage symbol.

[To get a well rounded view of the important features of SQL you should move on to the next tutorial concerning aggregates.](https://sqlzoo.net/wiki/SUM_and_COUNT)

To gain an absurdly detailed view of one insignificant feature of the language, read on.

We can use the word `ALL` to allow >= or > or < or <=to act over a list. For example, you can find the largest country in the world, by population with this query:

```sql
SELECT name
  FROM world
 WHERE population >= ALL(SELECT population
                           FROM world
                          WHERE population>0)
```

You need the condition **population>0** in the sub-query as some countries have **null** for population.

In [6]:
%%sql
SELECT /*+ MAPJOIN (t2) */
    t1.name, CONCAT(CAST(ROUND(100*t1.population/t2.population, 0) AS int), '%') AS pct FROM
    (
        SELECT name, population, 0 AS aux FROM world
        WHERE continent = 'Europe'
    ) AS t1
    LEFT JOIN
    (
        SELECT population, 0 AS aux FROM world
        WHERE name = 'Germany'
    ) AS t2
    ON (t1.aux = t2.aux)

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


t1.name,pct
Albania,3%
Andorra,0%
Austria,11%
Belarus,11%
Belgium,14%
Bosnia and Herzegovina,4%
Bulgaria,8%
Croatia,5%
Czech Republic,13%
Denmark,7%


## 6. Bigger than every country in Europe

Which countries have a GDP greater than every country in Europe? [Give the name only.] (Some countries may have NULL gdp values)

We can refer to values in the outer SELECT within the inner SELECT. We can name the tables so that we can tell the difference between the inner and outer versions.

In [7]:
%%sql
SELECT /*+ MAPJOIN (t2) */ 
    t1.name FROM 
    (
        SELECT name, gdp, 0 aux FROM world
    ) t1
    LEFT JOIN
    (
        SELECT MAX(gdp) gdp, 0 aux FROM world
        WHERE (continent = 'Europe' AND gdp>0)
    ) t2
    ON t1.aux=t2.aux
    WHERE t1.gdp > t2.gdp

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


t1.name
China
Japan
United States


## 7. Largest in each continent

**Find the largest country (by area) in each continent, show the continent, the name and the area:**

```sql
SELECT continent, name, population FROM world x
  WHERE population >= ALL
    (SELECT population FROM world y
        WHERE y.continent=x.continent
          AND population>0)
```

> __The above example is known as a correlated or synchronized sub-query.__   
> 
> Using correlated subqueries
> A correlated subquery works like a nested loop: the subquery only has access to rows related to a single > record at a time in the outer query. The technique relies on table aliases to identify two different uses of the same table, one in the outer query and the other in the subquery.
> 
> One way to interpret the line in the **WHERE** clause that references the two table is _“… where the correlated values are the same”._
> 
> In the example provided, you would say _“select the country details from world where the population is greater than or equal to the population of all countries where the continent is the same”._

In [8]:
%%sql
SELECT /*+ MAPJOIN (t2) */
    t1.continent, t1.name, t1.area FROM
    (
        SELECT continent, name, area from world
    ) t1
    JOIN
    (
        SELECT continent, MAX(area) area
        FROM world GROUP BY continent
    ) t2
    ON (t1.continent = t2.continent AND
        t1.area = t2.area)

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


t1.continent,t1.name,t1.area
"Federated States of,Oceania",Micronesia,702.0
Africa,Algeria,2381741.0
Asia,China,9596961.0
Caribbean,Cuba,109884.0
Eurasia,Russia,17125242.0
Europe,Kazakhstan,2724900.0
North America,Canada,9984670.0
Oceania,Australia,7692024.0
South America,Brazil,8515767.0


## 8. First country of each continent (alphabetically)

**List each continent and the name of the country that comes first alphabetically.**

In [9]:
%%sql
SELECT continent, MIN(name) name FROM world 
    GROUP BY continent

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


continent,name
"Federated States of,Oceania",Micronesia
Africa,Algeria
Asia,Afghanistan
Caribbean,Antigua and Barbuda
Eurasia,Armenia
Europe,Albania
North America,Belize
Oceania,Australia
South America,Argentina


## 9. Difficult Questions That Utilize Techniques Not Covered In Prior Sections

**Find the continents where all countries have a population <= 25000000. Then find the names of the countries associated with these continents. Show name, continent and population.**

In [10]:
%%sql
SELECT name, continent, population
FROM world x
WHERE continent NOT IN 
  (SELECT DISTINCT continent FROM world WHERE population > 25000000)

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


name,continent,population
Antigua and Barbuda,Caribbean,96453.0
Bahamas,Caribbean,385340.0
Barbados,Caribbean,287025.0
Cuba,Caribbean,11209628.0
Dominica,Caribbean,71808.0
Dominican Republic,Caribbean,10358320.0
Grenada,Caribbean,112003.0
Haiti,Caribbean,11577779.0
Jamaica,Caribbean,2726667.0
Micronesia,"Federated States of,Oceania",101351.0


## 10.

**Some countries have populations more than three times that of any of their neighbours (in the same continent). Give the countries and continents.**

In [11]:
%%sql
WITH t1 AS (
    SELECT name, continent, area, 
    row_number() OVER (PARTITION BY continent ORDER BY area DESC) rn
    FROM world
), t2 AS (
    SELECT continent, 
    MAX(CASE WHEN rn=1 THEN area ELSE 0 END) area1,
    MAX(CASE WHEN rn=2 THEN area ELSE 0 END) area2 
    FROM t1
    WHERE rn <= 2
    GROUP BY continent
)
SELECT world.name, world.continent FROM world
    JOIN t2 ON 
    (world.continent = t2.continent AND world.area = t2.area1)
    WHERE t2.area1 > 3*t2.area2

*  '<pyhive.hive.Connection object at 0x7f799053ff10>'
Done.


world.name,world.continent
Micronesia,"Federated States of,Oceania"
China,Asia
Russia,Eurasia
Kazakhstan,Europe
Australia,Oceania
Brazil,South America


In [12]:
eng.close()