In [2]:
import pandas as pd
import sqlalchemy as sa
import psycopg2 as ps
from sqlalchemy import create_engine

In [3]:
%load_ext sql
%sql postgresql://postgres:lingga28@localhost:2828/datacamp
conn = create_engine('postgresql://postgres:lingga28@localhost/datacamp')

# 1. Your first join
### Exercises
Throughout this course, you'll be working with the countries database, which contains information about the most populous world cities and countries, and provides country-level economic data, population data, and geographic data. The database also contains information on languages spoken in each country.

You can see the different tables in this database to get a sense of what they contain by clicking on the corresponding tabs. Click through them and familiarize yourself with the fields that seem to be shared across tables before you continue with the course.

In this exercise, you'll use the cities and countries tables to build your first inner join. You'll start off by selecting all columns in step 1, performing your join in step 2, and then refining your join to choose specific columns in step 3.

### task 1
### Instruction
Begin by selecting all columns from the cities table, using the SQL shortcut that selects all.

In [4]:
%%sql

SELECT * 
FROM countries.cities
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


name,country_code,city_proper_pop,metroarea_pop,urbanarea_pop
Abidjan,CIV,4765000,,4765000
Abu Dhabi,ARE,1145000,,1145000
Abuja,NGA,1235880,6000000.0,1235880


### task 2
### Introduction
- Perform an inner join with the cities table on the left and the countries table on the right; do not alias tables here or in the next step.
- To perform your join, identifying the relevant column names in both tables that provide the country code by inspecting the tabs in the console.

In [5]:
%%sql

SELECT * 
FROM countries.cities
-- Inner join to countries
INNER JOIN countries.countries
-- Match on country codes
ON countries.cities.country_code = countries.countries.code
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


name,country_code,city_proper_pop,metroarea_pop,urbanarea_pop,code,name_1,continent,region,surface_area,indep_year,local_name,gov_frorm,capital,cap_long,cap_lat
Kabul,AFG,3414100,,3414100,AFG,Afghanistan,Asia,Southern and Central Asia,652090,1919,Afganistan/Afqanestan,Islamic Emirate,Kabul,691761.0,345228.0
Oran,DZA,1560329,3454078.0,1560329,DZA,Algeria,Africa,Northern Africa,2381740,1962,Al-Jazair/Algerie,Republic,Algiers,305097.0,367397.0
Algiers,DZA,3415811,5000000.0,3415811,DZA,Algeria,Africa,Northern Africa,2381740,1962,Al-Jazair/Algerie,Republic,Algiers,305097.0,367397.0


### task 3
### Instruction
- Complete the SELECT statement to keep only the name of the city, the name of the country, and the region the country is located in (in the order specified).
- Alias the name of the city AS city and the name of the country AS country.

In [6]:
%%sql

-- Select name fields (with alias) and region 
SELECT countries.cities.name AS city, countries.countries.name AS country, countries.countries.region
FROM countries.cities
INNER JOIN countries.countries
ON countries.cities.country_code = countries.countries.code
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


city,country,region
Kabul,Afghanistan,Southern and Central Asia
Oran,Algeria,Northern Africa
Algiers,Algeria,Northern Africa


# 2. Joining with aliased tables
### Exercises
Recall from the video that instead of writing full table names in queries, you can use table aliasing as a shortcut. The alias can be used in other parts of your query, such as the SELECT statement!

You also learned that when you SELECT fields, a field can be ambiguous. For example, imagine two tables, apples and oranges, both containing a column called color. You need to use the syntax apples.color or oranges.color in your SELECT statement to point SQL to the correct table. Without this, you would get the following error:

  column reference "color" is ambiguous
In this exercise, you'll practice joining with aliased tables. You'll use data from both the countries and economies tables to examine the inflation rate in 2010 and 2015.

When writing joins, many SQL users prefer to write the SELECT statement after writing the join code, in case the SELECT statement requires using table aliases.

### Instructions
- Start with your inner join in line 5; join the tables countries AS c (left) with economies (right), aliasing economies AS e.
- Next, use code as your joining field in line 7; do not use the USING command here.
- Lastly, select the following columns in order in line 2: code from the countries table (aliased as country_code), name, year, and inflation_rate.

In [7]:
%%sql

-- Select fields with aliases
SELECT c.code as country_code, c.name, e.year, e.inflation_rate
FROM countries.countries AS c
-- Join to economies (alias e)
INNER JOIN countries.economies AS e
-- Match on code field using table aliases
ON c.code = e.code
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country_code,name,year,inflation_rate
AFG,Afghanistan,2015,-1.549
AFG,Afghanistan,2010,2.179
NLD,Netherlands,2015,0.22


# 3. USING in action
### Exercises
In the previous exercises, you performed your joins using the ON keyword. Recall that when both the field names being joined on are the same, you can take advantage of the USING clause.

You'll now explore the languages table from our database. Which languages are official languages, and which ones are unofficial?

You'll employ USING to simplify your query as you explore this question.

### Instructions
Use the country code field to complete the INNER JOIN with USING; do not change any alias names.

In [10]:
%%sql

SELECT c.name AS country, l.name AS language, official
FROM countries AS c
INNER JOIN languages AS l
-- Match using the code column
USING (code)
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country,language,official
Afghanistan,Dari,True
Afghanistan,Pashto,True
Afghanistan,Turkic,False


# 4. Relationships in our database
Now that you know more about the different types of relationships that can exist between tables, it's time to examine a few relationships in the countries database!

To answer questions about table relationships, you can explore the tables displayed as tabs in your console.

### Instruction 1
### Question
What best describes the relationship between code in the countries table and country_code in the cities table?

### Possible Answers:
- A. This is a many-to-many relationship.
- B. This is a one-to-one relationship.
- C. This is a one-to-many relationship.

Answer: C

### Instruction 2
Question

Which of these options best describes the relationship between the countries table and the languages table?

### Possible Answers
- A. This is a one-to-many relationship.
- B. This is a many-to-many relationship.
- C. This is a one-to-one relationship.

Answer: B

# 5. Inspecting a relationship
### Exercises
You've just identified that the countries table has a many-to-many relationship with the languages table. That is, many languages can be spoken in a country, and a language can be spoken in many countries.

This exercise looks at each of these in turn. First, what is the best way to query all the different languages spoken in a country? And second, how is this different from the best way to query all the countries that speak each language?

Recall that when writing joins, many users prefer to write SQL code out of order by writing the join first (along with any table aliases), and writing the SELECT statement at the end.

### task 1
### Instructions
- Start with the join statement in line 6; perform an inner join with the countries table as c on the left with the languages table as l on the right.
- Make use of the USING keyword to join on code in line 8.
- Lastly, in line 2, select the country name, aliased as country, and the language name, aliased as language.

In [15]:
%%sql

-- Select country and language names, aliased
SELECT c.name AS country, l.name AS language
-- From countries (aliased)
FROM countries AS c
-- Join to languages (aliased)
INNER JOIN languages AS l
-- Use code as the joining field with the USING keyword
USING(code)
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country,language
Afghanistan,Dari
Afghanistan,Pashto
Afghanistan,Turkic


### task 2
### Instruction
- Rearrange the SELECT statement so that the language column appears on the left and the country column on the right.
- Sort the results by language.

In [17]:
%%sql

-- Rearrange SELECT statement, keeping aliases
SELECT l.name AS language, c.name AS country
FROM countries AS c
INNER JOIN languages AS l
USING(code)
-- Order the results by language
ORDER BY language DESC
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


language,country
unspecified,Albania
unspecified,Aruba
unspecified,Australia


### task 3
### Question
Select the incorrect answer from the following options.

The query you generated in step 1 is provided below. Run this query (or the amendment you made in step 2) in the console to find the answer to the question.

In [18]:
%%sql

SELECT c.name AS country, l.name AS language
FROM countries AS c
INNER JOIN languages AS l
USING(code)
ORDER BY country
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


country,language
Afghanistan,Pashto
Afghanistan,Turkic
Afghanistan,Dari


### Possible Answers
- A. There are at least three languages spoken in Armenia.
- B. Alsatian is spoken in more than one country.
- C.Bhojpuri is spoken in two countries.

Answer:B

# 6. Joining multiple tables
### Exercises
You've seen that the ability to combine multiple joins using a single query is a powerful feature of SQL.

Suppose you are interested in the relationship between fertility and unemployment rates. Your task in this exercise is to join tables to return the country name, year, fertility rate, and unemployment rate in a single result from the countries, populations and economies tables.

### task 1
### Instruction
- Perform an inner join of countries AS c (left) with populations AS p (right), on code.
- Select name, year and fertility_rate.

In [20]:
%%sql

-- Select relevant fields
SELECT name, year, fertility_rate
FROM countries AS c
-- Inner join countries and populations, aliased, on code
INNER JOIN populations AS p
ON c.code = p.country_code
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


name,year,fertility_rate
Aruba,2010,1.704
Aruba,2015,1.647
Afghanistan,2010,5.746


### task 2
### Instruction
- Chain another inner join to your query with the economies table AS e, using code.
- Select name, and using table aliases, select year and unemployment_rate from economies.

In [24]:
%%sql

-- Select fields
SELECT name, e.year, fertility_rate, unemployment_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies as e
-- Match on country code
USING (code)
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


name,year,fertility_rate,unemployment_rate
Afghanistan,2010,5.746,
Afghanistan,2010,4.653,
Afghanistan,2015,5.746,


# 7. Checking multi-table joins
### Exercises
Have a look at the results for Albania from the previous query below. You can see that the 2015 fertility_rate has been paired with 2010 unemployment_rate, and vice versa.

name	year	fertility_rate	unemployment_rate\
Albania	2015	1.663	           17.1\
Albania	2010	1.663	           14\
Albania	2015	1.793	           17.1\
Albania	2010	1.793	           14\

Instead of four records, the query should return two: one for each year. The last join was performed on c.code = e.code, without also joining on year. Your task in this exercise is to fix your query by explicitly stating that both the country code and year should match!

### Instruction
Modify your query so that you are joining to economies on year as well as code.

In [30]:
%%sql

SELECT name, e.year, fertility_rate, unemployment_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
INNER JOIN economies AS e
ON e.code = p.country_code
-- Add an additional joining condition such that you are also joining on year
	AND e.year = p.year
LIMIT 3; --just an addition, so that the table is not elongated

 * postgresql://postgres:***@localhost:2828/datacamp
3 rows affected.


name,year,fertility_rate,unemployment_rate
Afghanistan,2010,5.746,
Afghanistan,2015,4.653,
Angola,2010,6.416,
