In [None]:
! whoami

In [None]:
! hostname

In [None]:
! date

## Preparation

### Install PostgreSQL

In [None]:
! sudo apt-get -y -qq update

In [None]:
! sudo apt-get -y -qq install postgresql

In [None]:
! sudo service postgresql start

### Create User and Database

In [None]:
! sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'postgres';"

In [None]:
! sudo -u postgres psql -U postgres -c "DROP DATABASE IF EXISTS training;"

In [None]:
! sudo -u postgres psql -U postgres -c 'CREATE DATABASE training;'

### Create Table

In [None]:
%env DATABASE_URL=postgresql://postgres:postgres@localhost:5432/training

In [None]:
%load_ext sql

In [None]:
%reload_ext sql

In [None]:
%%sql
select * from information_schema.columns;

In [None]:
%%sql

DROP TABLE IF EXISTS fortune500

In [None]:
%%sql
CREATE TABLE IF NOT EXISTS fortune500 (
  rank INTEGER,
  title VARCHAR(100),
  name VARCHAR(100),
  ticker VARCHAR(100),
  url VARCHAR(255),
  hq VARCHAR(100),
  sector VARCHAR(50),
  industry VARCHAR(50),
  employees INTEGER,
  revenues INTEGER,
  revenues_change REAL,
  profits NUMERIC,
  profits_change REAL,
  assets NUMERIC,
  equity NUMERIC

);

### Load Dataset

In [None]:
! wget https://www.dropbox.com/s/l3rgaxvdmg0m3ld/fortune500.csv

In [None]:
%%sql
COPY fortune500
FROM '/content/fortune500.csv' DELIMITER ',' NULL 'NA' CSV HEADER;
 

In [None]:
%%sql
select * from fortune500 
where rank = 22
limit 5;

In [None]:
%%sql

select distinct(ticker) 
from fortune500 

## Lengkapi Dengan SQL

Berapa jumlah record pada tabel fortune500

First, figure out how many rows are in fortune500 by counting them

In [None]:
%%sql

-- your code

Subtract the count of the non-NULL ticker values from the total number of rows; alias the difference as missing

In [None]:
%%sql

-- your code
SELECT count(*) - count(DISTINCT ticker) AS missing
  FROM fortune500

Repeat for the profits_change column

In [None]:
%%sql

-- Select the count of profits_change, 
-- subtract from total number of rows, and alias as missing
SELECT count(*) - count(profits_change) AS missing
  FROM fortune500

Repeat for the industry column

In [None]:
%%sql

-- Select the count of industry, 
-- subtract from total number of rows, and alias as missing
SELECT count(*) - count(industry) AS missing
  FROM fortune500

### Join tables

Part of exploring a database is figuring out how tables relate to each other. The company and fortune500 tables don't have a formal relationship between them in the database, but this doesn't prevent you from joining them.

To join the tables, you need to find a column that they have in common where the values are consistent across the tables. Remember: just because two tables have a column with the same name, it doesn't mean those columns necessarily contain compatible data. If you find more than one pair of columns with similar data, you may need to try joining with each in turn to see if you get the same number of results.

Reference the entity relationship diagram if needed.

**Buat tabel company**

In [None]:
%%sql

create table company (
  id int primary key,
  exchange varchar(10),
  ticker char(5) unique,
  name varchar not null,
  parent_id int references company(id)
);

In [None]:
%%sql

insert into company values 
(1, 'nasdaq', 'PYPL', 'PayPal Holdings Incorporated', NULL),
(2, 'nasdaq', 'AMZN', 'Amazon.com Inc', NULL),
(3, 'nasdaq', 'MSFT', 'Microsoft Corp.', NULL),
(4, 'nasdaq', 'MDB', 'MongoDB', NULL),
(5, 'nasdaq', 'DBX', 'Dropbox', NULL),
(6, 'nasdaq', 'AAPL', 'Apple Incorporated', NULL),
(7, 'nasdaq', 'CTXS', 'Citrix Systems', NULL),
(8, 'nasdaq', 'GOOGL', 'Alphabet', NULL),
(9, 'nyse', 'IBM', 'International Business Machines Corporation', NULL),
(10, 'nasdaq', 'ADBE', 'Adobe Systems Incorporated', NULL),
(11, NULL, NULL, 'Stripe', NULL),
(12, NULL, NULL, 'Amazon Web Services', 2),
(13, NULL, NULL, 'Google LLC', 8),
(14, 'nasdaq', 'EBAY', 'eBay, Inc.', NULL);

**Instruction**

1. Look at the contents of the company and fortune500 tables. Find a column that they have in common where the values for each company are the same in both tables.
2. Join the company and fortune500 tables with an INNER JOIN.
3. Select only company.name for companies that appear in both tables.



In [None]:
%%sql

SELECT company.name
-- Table(s) to select from
  FROM company
       INNER JOIN fortune500
       ON company.ticker=fortune500.ticker

In [None]:
%%sql

-- Count the number of tags with each type
SELECT sector, count(*) AS count
FROM fortune500
-- To get the count for each type, what do you need to do?
GROUP BY sector
 -- Order the results with the most common
 -- tag types listed first
 ORDER BY count(*) ASC;

In [None]:
%%sql

-- Count the number of industry with each type
SELECT industry, count(*) AS count
FROM fortune500
-- To get the count for each type, what do you need to do?
GROUP BY industry
 -- Order the results with the most common
 -- tag types listed first
 ORDER BY count(*) ASC;

### Coalesce

The coalesce() function can be useful for specifying a default or backup value when a column contains NULL values.

coalesce() checks arguments in order and returns the first non-NULL value, if one exists.

* coalesce(NULL, 1, 2) = 1
* coalesce(NULL, NULL) = NULL
* coalesce(2, 3, NULL) = 2

In the fortune500 data, industry contains some missing values. Use coalesce() to use the value of sector as the industry when industry is NULL. Then find the most common industry.


**Instruction**

* Use coalesce() to select the first non-NULL value from industry, sector, or 'Unknown' as a fallback value.
* Alias the result of the call to coalesce() as industry2.
* Count the number of rows with each industry2 value.
* Find the most common value of industry2.


In [None]:
%%sql

-- Use coalesce
SELECT coalesce(industry, sector, 'Unknown') AS industry2,
       -- Don't forget to count!
       count(*) 
FROM fortune500 
-- Group by what? (What are you counting by?)
GROUP BY industry2
-- Order results to see most common first
ORDER BY count  DESC
-- Limit results to get just the one value you want
LIMIT 1;

### Coalesce with a self-join

You previously joined the company and fortune500 tables to find out which companies are in both tables. Now, also include companies from company that are subsidiaries of Fortune 500 companies as well.

To include subsidiaries, you will need to join company to itself to associate a subsidiary with its parent company's information. To do this self-join, use two different aliases for company.

coalesce will help you combine the two ticker columns in the result of the self-join to join to fortune500

**Instruction**

* Join company to itself to add information about a company's parent to the original company's information.
* Use coalesce to get the parent company ticker if available and the original company ticker otherwise.
* INNER JOIN to fortune500 using the ticker.
* Select original company name, fortune500 title and rank.


In [None]:
%%sql

SELECT company_original.name, title, rank
-- Start with original company information
FROM company AS company_original
-- Join to another copy of company with parent
-- company information
LEFT JOIN company AS company_parent
    ON company_original.parent_id = company_parent.id 
-- Join to fortune500, only keep rows that match
INNER JOIN fortune500 
-- Use parent ticker if there is one, 
-- otherwise original ticker
   ON coalesce(company_parent.ticker, company_original.ticker) = fortune500.ticker
-- For clarity, order by rank
ORDER BY rank; 

### Effects of casting

When you cast data from one type to another, information can be lost or changed. See how the casting changes values and practice casting data using the CAST() function and the :: syntax.

    SELECT CAST(value AS new_type);
    SELECT value::new_type;


**Instruction 1**

* Select profits_change and profits_change cast as integer from fortune500.
* Look at how the values were converted.


In [None]:
%%sql

-- Select the original value
SELECT profits_change, 
	   -- Cast profits_change
       CAST(profits_change AS integer) AS profits_change_int
FROM fortune500
LIMIT 15

**Instruction 2**

* Compare the results of casting of dividing the integer value 10 by 3 to the result of dividing the numeric value 10 by 3.

In [None]:
%%sql

-- Divide 10 by 3
SELECT 10/3, 
       -- Cast 10 as numeric and divide by 3
       10::numeric/3;

**Instruction 3**

* Now cast numbers that appear as text as numeric.
* Note: 1e3 is scientific notation.


In [None]:
%%sql

SELECT '3.2'::numeric,
       '-123'::numeric,
       '1e3'::numeric,
       '1e-3'::numeric,
       '02314'::numeric,
       '0002'::numeric;

### Summarize the distribution of numeric values

Was 2017 a good or bad year for revenue of Fortune 500 companies? Examine how revenue changed from 2016 to 2017 by first looking at the distribution of revenues_change and then counting companies whose revenue increased.

**Instruction 1**

* Use GROUP BY and count() to examine the values of revenues_change.
* Order the results by revenues_change to see the distribution.

In [None]:
%%sql

-- Select the count of each value of revenues_change
SELECT revenues_change, count(*)
FROM fortune500
GROUP BY revenues_change
-- order by the values of revenues_change
ORDER BY revenues_change;

**Instruction 2**

* Repeat step 1, but this time, cast revenues_change as an integer to reduce the number of different values.

In [None]:
%%sql

-- Select the count of each revenues_change integer value
SELECT revenues_change::integer, count(*)
FROM fortune500
GROUP BY revenues_change::integer
-- order by the values of revenues_change
ORDER BY revenues_change
LIMIT 10;

**Instruction 3**

* How many of the Fortune 500 companies had revenues increase in 2017 compared to 2016? To find out, count the rows of fortune500 where revenues_change indicates an increase.

In [None]:
%%sql

-- Count rows 
SELECT count(*)
FROM fortune500
 -- Where...
 WHERE revenues_change > 0;

### Division

Compute the average revenue per employee for Fortune 500 companies by sector.

**Instruction**


* Compute revenue per employee by dividing revenues by employees; use casting to produce a numeric result.
* Take the average of revenue per employee with avg(); alias this as avg_rev_employee.
* Group by sector.
* Order by the average revenue per employee.


In [None]:
%%sql

-- Select average revenue per employee by sector
SELECT sector, 
       avg(revenues/employees::numeric) AS avg_rev_employee
FROM fortune500
GROUP BY sector
-- Use the column alias to order the results
ORDER BY avg_rev_employee;

In [None]:
%%sql



In [None]:
%%sql



In [None]:
%%sql



In [None]:
%%sql



In [None]:
%%sql

