# OLTP vs OLAP

In this exercise, you are given a list of cards describing a specific approach which you will categorize between OLAP and OLTP.

<img src="images/01.02.jpg" style="width:600px;height:300px;">

# Which is better?

The city of Chicago receives many 311 service requests throughout the day. 311 service requests are non-urgent community requests, ranging from graffiti removal to street light outages. Chicago maintains a data repository of all these services organized by type of requests. In this exercise, Potholes has been loaded as an example of a table in this repository. It contains pothole reports made by Chicago residents from the past week.

Explore the dataset. What data processing approach is this larger repository most likely using?

- OLTP because this table's structure appears to require frequent updates (every week).

# Name that data type!

 Structured data is the easiest to analyze because it is organized and cleaned. On the other hand, unstructured data is schemaless, but scales well. In the middle we have semi-structured data for everything in between.

<img src="images/01.05.jpg" style="width:800px;height:300px;">

# Ordering ETL Tasks

You have been hired to manage data at a small online clothing store. Their system is quite outdated because their only data repository is a traditional database to record transactions.

You decide to upgrade their system to a data warehouse after hearing that different departments would like to run their own business analytics. You reason that an ELT approach is unnecessary because there is relatively little data (< 50 GB).

<img src="images/01.06.jpg" style="width:600px;height:300px;">

# Recommend a storage solution

When should you choose a data warehouse over a data lake?

- To create accessible and isolated data repositories for other analysts

# Classifying data models

 three different levels of data models: conceptual, logical, and physical.

<img src="images/01.09.jpg" style="width:800px;height:300px;">

# Deciding fact and dimension tables

one table called Runs with the following schema:

```
runs
---------------------
duration_mins - float
week - int
month - varchar(160)
year - int
park_name - varchar(160)
city_name - varchar(160)
distance_km - float
route_name - varchar(160)
```
After learning about dimensional modeling, you decide to restructure the schema for the database.

 what would be the best way to organize the fact table and dimensional tables?
 - A fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively.

```
-- Create a route dimension table
CREATE TABLE route(
	route_id INTEGER PRIMARY KEY,
    route_name VARCHAR(160) NOT NULL,
    city_name  VARCHAR(160) NOT NULL,
    distance_km FLOAT NOT NULL,
    park_name VARCHAR(160) NOT NULL
);
-- Create a week dimension table
CREATE TABLE week(
	week_id INTEGER PRIMARY KEY,
    week INTEGER NOT NULL,
    month VARCHAR(160) NOT NULL,
    year INTEGER NOT NULL
);
```

# Querying the dimensional model

The schema reorganized using the dimensional model:

<img src="images/01.11.png" style="width:800px;height:300px;">

 run a query based on this schema. How about we try to find the number of minutes we ran in July, 2019? We'll break this up in two steps. 
 - First, we'll get the total number of minutes recorded in the database. 
 - Second, we'll narrow down that query to week_id's from July, 2019.

```
SELECT 
	-- Get the total duration of all runs
	SUM(duration_mins)
FROM 
	runs_fact
-- Get all the week_id's that are from July, 2019
INNER JOIN week_dim ON runs_fact.week_id = week_dim.week_id
WHERE month = 'July' and year = '2019';
```