# Explore table sizes

Let's start by exploring five related tables:

- `stackoverflow`: questions asked on Stack Overflow with certain tags
- `company`: information on companies related to tags in stackoverflow
- `tag_company`: links stackoverflow to company
- `tag_type`: type categories applied to tags in stackoverflow
- `fortune500`: information on top US companies

Count the number of rows in a table with

`SELECT count(*) FROM tablename;`

Count the number of columns in a table by selecting a few rows and manually counting the columns in the result.

Which table has the most rows? Which table has the most columns?

- `stackoverflow` has the most rows; `fortune500` has the most columns

# Count missing values

Which column of `fortune500` has the most missing values? To find out, you'll need to check each column individually, although here we'll check just three.

Course Note: While you're unlikely to encounter this issue during this exercise, note that if you run a query that takes more than a few seconds to execute, your session may expire or you may be disconnected from the server. You will not have this issue with any of the exercise solutions, so if your session expires or disconnects, there's an error with your query.

```
-- Select the count of the number of rows
SELECT COUNT(*)
  FROM fortune500;


-- Select the count of ticker, 
-- subtract from the total number of rows, 
-- and alias as missing
SELECT count(*) - COUNT(ticker) AS missing
  FROM fortune500;

-- Select the count of profits_change, 
-- subtract from total number of rows, and alias as missing
SELECT COUNT(*) - COUNT(profits_change) AS missing
FROM fortune500

-- Select the count of industry, 
-- subtract from total number of rows, and alias as missing
SELECT COUNT(*) - COUNT(industry) AS missing
FROM fortune500
```

# Join tables

Part of exploring a database is figuring out how tables relate to each other. The `company` and `fortune500` tables don't have a formal relationship between them in the database, but this doesn't prevent you from joining them.

To join the tables, you need to find a column that they have in common where the values are consistent across the tables. Remember: just because two tables have a column with the same name, it doesn't mean those columns necessarily contain compatible data. If you find more than one pair of columns with similar data, you may need to try joining with each in turn to see if you get the same number of results.

Reference the entity relationship diagram if needed.
<center><img src="images/01.02.jpg"  style="width: 400px, height: 300px;"/></center>


```
SELECT company.name
-- Table(s) to select from
  FROM company
       INNER JOIN fortune500
       ON company.ticker=fortune500.ticker;
```

# Foreign keys

Recall that foreign keys reference another row in the database via a unique ID. Values in a foreign key column are restricted to values in the referenced column OR `NULL`.

Using what you know about foreign keys, why can't the `tag` column in the `tag_type` table be a foreign key that references the `tag` column in the `stackoverflow` table?

Remember, you can reference the slides using the icon in the upper right of the screen to review the requirements for a foreign key.

- `stackoverflow`.`tag` contains duplicate values

# Read an entity relationship diagram

The information you need is sometimes split across multiple tables in the database.

What is the most common `stackoverflow` `tag_type`? What companies have a `tag` of that `type`?

To generate a list of such companies, you'll need to join three tables together.

Reference the entity relationship diagram as needed when determining which columns to use when joining tables.

```
-- Count the number of tags with each type
SELECT type, COUNT(tag) AS count
  FROM tag_type
 -- To get the count for each type, what do you need to do?
 GROUP BY type
 -- Order the results with the most common
 -- tag types listed first
 ORDER BY count DESC;


 -- Select the 3 columns desired
SELECT company.name, tag_type.tag, tag_type.type
  FROM company
  	   -- Join to the tag_company table
       INNER JOIN tag_company 
       ON company.id = tag_company.company_id
       -- Join to the tag_type table
       INNER JOIN tag_type
       ON tag_company.tag = tag_type.tag
  -- Filter to most common type
  WHERE type='cloud';
```

# Coalesce

The `coalesce()` function can be useful for specifying a default or backup value when a column contains `NULL` values.

`coalesce()` checks arguments in order and returns the first non-NULL value, if one exists.

- `coalesce(NULL, 1, 2) = 1`
- `coalesce(NULL, NULL) = NULL`
- `coalesce(2, 3, NULL) = 2`

In the `fortune500` data, `industry` contains some missing values. Use `coalesce()` to use the value of `sector` as the `industry` when `industry` is `NULL`. Then find the most common `industry`.

```
-- Use coalesce
SELECT COALESCE(industry, sector, 'Unknown') AS industry2,
       -- Don't forget to count!
       COUNT(*) 
  FROM fortune500 
-- Group by what? (What are you counting by?)
 GROUP BY industry2
-- Order results to see most common first
 ORDER BY COUNT DESC
-- Limit results to get just the one value you want
 LIMIT 1;
```

# Coalesce with a self-join

You previously joined the `company` and `fortune500` tables to find out which companies are in both tables. Now, also include companies from company that are subsidiaries of Fortune 500 companies as well.

To include subsidiaries, you will need to join company to itself to associate a subsidiary with its parent company's information. To do this self-join, use two different aliases for company.

`coalesce` will help you combine the two `ticker` columns in the result of the self-join to join to `fortune500`.

```
SELECT  company_original.name, title, rank
  -- Start with original company information
  FROM company AS company_original
       -- Join to another copy of company with parent
       -- company information
	   LEFT JOIN company AS company_parent
       ON company_original.parent_id = company_parent.id 
       -- Join to fortune500, only keep rows that match
       inner JOIN fortune500 
       -- Use parent ticker if there is one, 
       -- otherwise original ticker
       ON coalesce(company_original.ticker, 
                   company_parent.ticker) = 
             fortune500.ticker
 -- For clarity, order by rank
 ORDER BY rank; 
```

# Effects of casting

When you cast data from one type to another, information can be lost or changed. See how the casting changes values and practice casting data using the CAST() function and the :: syntax.
```
SELECT CAST(value AS new_type);

SELECT value::new_type;
```

```
-- Select the original value
SELECT profits_change, 
	   -- Cast profits_change
       CAST(profits_change AS INTEGER) AS profits_change_int
  FROM fortune500;

-- Divide 10 by 3
SELECT 10/3, 
       -- Cast 10 as numeric and divide by 3
       10::numeric/3;


SELECT '3.2'::numeric,
       '-123'::numeric,
       '1e3'::numeric,
       '1e-3'::numeric,
       '02314'::numeric,
       '0002'::numeric;


```

# Summarize the distribution of numeric values

Was 2017 a good or bad year for revenue of Fortune 500 companies? Examine how revenue changed from 2016 to 2017 by first looking at the distribution of `revenues_change` and then counting companies whose revenue increased.

```
-- Select the count of each value of revenues_change
SELECT revenues_change, COUNT(*)
  FROM fortune500
GROUP BY revenues_change
 -- order by the values of revenues_change
 ORDER BY revenues_change;


-- Select the count of each revenues_change integer value
SELECT revenues_change::integer, COUNT(*)
  FROM fortune500
GROUP BY revenues_change::integer
 -- order by the values of revenues_change
 ORDER BY revenues_change::integer;

-- Count rows 
SELECT COUNT(*)
  FROM fortune500
 -- Where...
 WHERE revenues_change > 0;
```