SQL++ Queries by Mason Howes -- Manipulates semi-structured data in JSON and uses a NoSQL database system. Queries written over the semi-structured data model implemented in AsterixDB (Apache project on building a DBMS over data stored in JSON or ADM files).

Uses Mondial database, a geographical dataset aggregated from multiple sources that can be found [here](https://www.dbis.informatik.uni-goettingen.de/Mondial/).

Please note, Google Colab only supports Python code so in its current state this SQL will not run. If you would like to test it out, I suggest copy pasting it into an environment that supports it.

**IMPORTANT:** To run this code, you need to have AsterixDB installed and set up on your device. To eliminate hassle these steps will not be provided, but the raw SQL++ queries will be listed down below.

Retrieves the names of all cities located in Peru, sorted alphabetically.

In [None]:
SELECT C.name AS city
FROM geo.world  AS X,
     X.mondial.country AS Y,
     Y.province AS Z,
     (CASE  WHEN is_array(Z.city)
            THEN Z.city
            ELSE [Z.city] END) AS C
WHERE Y.name = 'Peru'
ORDER BY C.name;

For each country, returns its name, its population, and the number of religions sorted alphabetically by country. Reports 0 religions for countries without religions.

In [None]:
SELECT Y.name AS country,
       Y.population AS population,
       (SELECT COUNT(*) FROM R) AS num_religions
FROM geo.world  AS X,
     X.mondial.country AS Y
     LET R = (CASE WHEN Y.religions IS MISSING
                   THEN []
                   WHEN is_array(Y.religions)
                   THEN Y.religions
                   ELSE [Y.religion] END)
ORDER BY Y.name;

For each religion, returns the number of countries where said religion occurs; order them in decreasing number of countries.

In [None]:
SELECT R.religion AS religion,
       COUNT(*) AS num_countries
FROM (SELECT (SELECT `#text` AS religion
              FROM Z) AS country
      FROM geo.world AS X,
           X.mondial.country AS Y
      LET Z = (CASE WHEN Y.religions IS MISSING
                    THEN []
                    WHEN is_array(Y.religions)
                    THEN Y.religions
                    ELSE [Y.religions] END)) AS C
      UNNEST C.country AS R
GROUP BY R.religion
ORDER BY COUNT(*) DESC;

For each ethnic group, returns the number of countries where said ethnic group occurs, as well as the total population world-wide of that group

In [None]:
WITH country AS (
    SELECT
        Y.population AS P,
        (
            SELECT
                eth.`#text` AS N,
                eth.`-percentage` AS PCT
            FROM Z AS eth
        ) AS E
    FROM geo.world AS X,
         X.mondial.country AS Y
    LET Z = (
        CASE
            WHEN Y.ethnicgroups IS MISSING THEN []
            WHEN IS_ARRAY(Y.ethnicgroups) THEN Y.ethnicgroups
            ELSE [Y.ethnicgroups]
        END
    )
),
arr AS (
    SELECT
        eth.N AS ethnicities,
        CAST(C.P AS FLOAT) * 0.01 * CAST(eth.PCT AS FLOAT) AS product
    FROM country AS C
    UNNEST C.E AS eth
)
SELECT
    A.ethnicities AS ethnic_group,
    COUNT(A.ethnicities) AS num_countries,
    SUM(A.product) AS total_population
FROM arr AS A
GROUP BY A.ethnicities;

Finds all countries bordering two or more seas. Joins the "sea" collection with the "country" collection. For each country in the list, returns its code, its name, and the list of bordering seas, in decreasing order of the number of seas

In [None]:
WITH cCodes AS
    (SELECT A AS code
     FROM geo.world AS X,
          X.mondial.sea AS Y,
          split(Y.`-country`, ' ') AS A
     GROUP BY A
     HAVING count(*) > 1),

sNames AS
    (SELECT C.code AS code,
            Y.name AS name
     FROM cCodes C,
          geo.world AS X,
          X.mondial.sea AS Y,
          split(Y.`-country`, ' ') AS A
     WHERE C.code = A),

sList AS
    (SELECT S.code AS code,
        (SELECT s.name
        FROM sNames AS s
        WHERE s.code = S.code) AS seas
     FROM sNames AS S
     GROUP BY S.code)

SELECT L.code AS country_code,
       Y.name AS country_name,
       L.seas AS seas
FROM sList AS L,
     geo.world AS X,
     X.mondial.country AS Y
WHERE L.code = Y.`-car_code`;