# AdventureWorks Relational Postgres Lab

### Introduction

In this lesson, we will work with the adventureworks database in postgres.  Let's get started.

### Loading our data


We can begin by making sure our postgres application is running (if on a Mac, look for the elephant).  Once running we'll use the instructions in the [Adventureworks for postgres](https://github.com/lorint/AdventureWorks-for-Postgres) repo.

Move into the `install-script` directory.  Then run the following:

```bash
psql -c "CREATE DATABASE \"adventureworks\";"
psql -d adventureworks < install.sql
```

One thing confusing is that if we simply type `\dt` to display our tables, we won't find any listed.  This is because all of our tables are under different schemas.  We can see this if we first connect to the adventureworks database and run the following:

```sql
\c "Adventureworks"
\dt (humanresources|person|production|purchasing|sales).*
```

So as we can see there are indeed a lot of tables in our database.  And we can query any of those tables with the schema prefix like so.

```sql
select * from person.address limit 1;
```

The database is pretty complicated, and you can see all of the relations from the [erd link](https://i0.wp.com/improveandrepeat.com/wp-content/uploads/2018/12/AdvWorksOLTPSchemaVisio.png?ssl=1).  But we will stick to the tables in the sales schema.  Let's get started.



### Beginning our queries

We can connect to our database using the postgres library.

In [34]:
import warnings
warnings.filterwarnings('ignore')


In [35]:
import psycopg2

conn = psycopg2.connect(database="Adventureworks", user="postgres")

And from there, we can use pandas to read from our database.  We'll can begin with our customers table.

In [36]:
pd.read_sql("select * from sales.Customer limit 5", conn)

Unnamed: 0,customerid,personid,storeid,territoryid,rowguid,modifieddate
0,1,,934,1,3f5ae95e-b87d-4aed-95b4-c3797afcb74f,2014-09-12 11:15:07.263
1,2,,1028,1,e552f657-a9af-4a7d-a645-c429d6e02491,2014-09-12 11:15:07.263
2,3,,642,4,130774b1-db21-4ef3-98c8-c104bcd6ed6d,2014-09-12 11:15:07.263
3,4,,932,4,ff862851-1daa-4044-be7c-3e85583c054d,2014-09-12 11:15:07.263
4,5,,1026,4,83905bdc-6f5e-4f71-b162-c98da069f38a,2014-09-12 11:15:07.263


And then let's view some of the data in our `SalesOrderHeader` table.

> Notice below we have a `.T` at the very end.  This is to `transpose` our dataframe -- which sometimes makes it easier to see all of the columns.

In [44]:
pd.read_sql("select * from sales.SalesOrderHeader limit 2", conn).T

Unnamed: 0,0,1
salesorderid,43659,43660
revisionnumber,8,8
orderdate,2011-05-31 00:00:00,2011-05-31 00:00:00
duedate,2011-06-12 00:00:00,2011-06-12 00:00:00
shipdate,2011-06-07 00:00:00,2011-06-07 00:00:00
status,5,5
onlineorderflag,False,False
purchaseordernumber,PO522145787,PO18850127500
accountnumber,10-4020-000676,10-4020-000117
customerid,29825,29672


So we can see that our `SalesOrderHeader` table has a good amount of foreign keys including `customerId`.

Begin by finding the total amount spent by each customer, returning the top five total amounts.

In [47]:
query = """select sum(totaldue) as total_amount from sales.SalesOrderHeader 
group by customerId order by total_amount desc limit 5;"""
pd.read_sql(query, conn)

Unnamed: 0,total_amount
0,989184.082
1,961675.8596
2,954021.9235
3,919801.8188
4,901346.856


Now let's find the names of the top five products that brought in the most amount of revenue, and the amount of revenue for each. 

In [52]:
query = """select productid, sum(unitprice) as total_price from sales.salesOrderDetail
 group by productid order by total_price desc limit 5;"""
pd.read_sql(query, conn)

Unnamed: 0,productid,total_price
0,782,2166146.0
1,783,2090729.0
2,779,1990662.0
3,784,1944555.0
4,781,1919479.0


In [39]:
import pandas as pd
pd.read_sql("select businessEntityId from sales.SalesPerson;", conn)

Unnamed: 0,businessentityid
0,274
1,275
2,276
3,277
4,278
5,279
6,280
7,281
8,282
9,283


In [40]:
pd.read_sql("select distinct(SalesPersonId) as numbers from sales.SalesOrderHeader order by numbers ;", conn)

Unnamed: 0,numbers
0,274.0
1,275.0
2,276.0
3,277.0
4,278.0
5,279.0
6,280.0
7,281.0
8,282.0
9,283.0


In [41]:
pd.read_sql("select * from sales.Customer;", conn)

Unnamed: 0,customerid,personid,storeid,territoryid,rowguid,modifieddate
0,1,,934.0,1,3f5ae95e-b87d-4aed-95b4-c3797afcb74f,2014-09-12 11:15:07.263
1,2,,1028.0,1,e552f657-a9af-4a7d-a645-c429d6e02491,2014-09-12 11:15:07.263
2,3,,642.0,4,130774b1-db21-4ef3-98c8-c104bcd6ed6d,2014-09-12 11:15:07.263
3,4,,932.0,4,ff862851-1daa-4044-be7c-3e85583c054d,2014-09-12 11:15:07.263
4,5,,1026.0,4,83905bdc-6f5e-4f71-b162-c98da069f38a,2014-09-12 11:15:07.263
...,...,...,...,...,...,...
19815,30114,1985.0,1986.0,7,97154f3d-28f1-4b15-ae03-9518b781ece3,2014-09-12 11:15:07.263
19816,30115,1987.0,1988.0,6,e4cf8fd5-30a4-4b8e-8fd8-47032e255778,2014-09-12 11:15:07.263
19817,30116,1989.0,1990.0,4,ec409609-d25d-41b8-9d15-a1aa6e89fc77,2014-09-12 11:15:07.263
19818,30117,1991.0,1992.0,4,6f08e2fb-1cd3-4f6e-a2e6-385669598b19,2014-09-12 11:15:07.263
