# Lab | PostgreSQL queries

- Open the PgAdmin server from PostgreSQL.
- Create a database called **applestore**  and a table called **apple_table**. Use code from `applesotre_DATABASE.sql` to create table and insert data. 

Here is the description of columns for your knowledge:

- `id` : App ID
- `track_name`: App Name
- `size_bytes`: Size (in Bytes)
- `currency`: Currency Type
- `price`: Price amount
- `rating_count_tot`: User Rating counts (for all version)
- `rating_count_ver`: User Rating counts (for current version)
- `user_rating`: Average User Rating value (for all version)
- `user_rating_ver`: Average User Rating value (for current version)
- `ver`: Latest version code
- `cont_rating`: Content Rating
- `prime_genre`: Primary Genre
- `sup_devicesnum`: Number of supporting devices
- `ipadSc_urlsnum`: Number of screenshots showed for display
- `langnum`: Number of supported languages
- `vpp_lic`: Vpp Device Based Licensing Enabled

In [1]:
#importing libraries
import pandas as pd
import sqlalchemy as db

In [2]:
#creating connection with database

db_server = 'postgresql'
db_user = 'postgres'
db_password = 'admin'
db_host = 'localhost'
db_database = 'LAB-M01-L18'

# create the engine

engine = db.create_engine(f'{db_server}://{db_user}:{db_password}@{db_host}/{db_database}')

# open the connection

conn = engine.connect()
conn

# Close the connection
#conn.close()

<sqlalchemy.engine.base.Connection at 0x178cad8e820>

Answer the following questions using the **apple_table**:

**1. What are the different genres?**  

- Use `prime_genre` column.

```SQL
-- ---------------
-- query01
-- ---------------

SELECT
	distinct prime_genre
FROM
	apple_table
```

In [42]:
query01 = '''
SELECT
	distinct prime_genre
FROM
	apple_table
'''

print(list(pd.read_sql(query01, conn).values[:, 0]))

['Shopping', 'Games', 'Education', 'Reference', 'Business', 'Social Networking', 'Food & Drink', 'Sports', 'Catalogs', 'Weather', 'Book', 'Music', 'Entertainment', 'Medical', 'Utilities', 'Travel', 'Navigation', 'Photo & Video', 'Finance', 'Health & Fitness', 'News', 'Productivity', 'Lifestyle']


**2. Which is the genre with the highest number of ratings?**

- To sum the rating use `rating_count_tot` column.

```SQL
-- ---------------
-- query02
-- ---------------
SELECT
	prime_genre,
	sum(rating_count_tot) as rating_count
FROM
	apple_table
GROUP BY
	prime_genre
ORDER BY rating_count DESC
LIMIT 1
```

In [3]:
query02 = '''
SELECT
	prime_genre,
	sum(rating_count_tot) as rating_count
FROM
	apple_table
GROUP BY
	prime_genre
ORDER BY rating_count DESC
LIMIT 1;

'''

pd.read_sql(query02, conn)

Unnamed: 0,prime_genre,rating_count
0,Games,52878491


**3. Which is the genre with most apps?**

- Use `prime_genre` column.

```SQL
-- ---------------
-- query03
-- ---------------
SELECT
	prime_genre,
	COUNT(id)
FROM
	apple_table
GROUP BY
	prime_genre
ORDER BY COUNT(id) DESC
LIMIT 1
```

In [4]:
query03 = '''
SELECT
    prime_genre,
    COUNT(id)
FROM
    apple_table
GROUP BY
    prime_genre
ORDER BY COUNT(id) DESC
LIMIT 1
'''

pd.read_sql(query03, conn)

Unnamed: 0,prime_genre,count
0,Games,3862


**4. Which is the genre with the fewest apps?**

- Use `prime_genre` column.

```SQL
-- ---------------
-- query04
-- ---------------
SELECT
	prime_genre,
	COUNT(id)
FROM
	apple_table
GROUP BY
	prime_genre
ORDER BY COUNT(id)
LIMIT 1
```

In [5]:
query04 = '''
SELECT
    prime_genre,
    COUNT(id)
FROM
    apple_table
GROUP BY
    prime_genre
ORDER BY COUNT(id)
LIMIT 1
'''

pd.read_sql(query04, conn)

Unnamed: 0,prime_genre,count
0,Catalogs,10


**5. Find the top 10 apps most rated.**

- Use `track_name` and `rating_count_tot` column.

```SQL
-- ---------------
-- query05
-- ---------------
SELECT
	track_name,
	rating_count_tot
FROM
	apple_table
ORDER BY rating_count_tot DESC
LIMIT 10
```

In [6]:
query05 = '''
SELECT
    track_name,
    rating_count_tot
FROM
    apple_table
ORDER BY rating_count_tot DESC
LIMIT 10
'''

pd.read_sql(query05, conn)

Unnamed: 0,track_name,rating_count_tot
0,Facebook,2974676
1,Instagram,2161558
2,Clash of Clans,2130805
3,Temple Run,1724546
4,Pandora - Music & Radio,1126879
5,Pinterest,1061624
6,Bible,985920
7,Candy Crush Saga,961794
8,Spotify Music,878563
9,Angry Birds,824451


**6. Find the top 10 apps best rated by users.**

- Use `track_name` and `user_rating` column.

```SQL
-- ---------------
-- query06
-- ---------------
SELECT
	track_name,
	user_rating
FROM
	apple_table
ORDER BY user_rating DESC
LIMIT 10;
```

In [7]:
query06 = '''
SELECT
    track_name,
    user_rating
FROM
    apple_table
ORDER BY user_rating DESC
LIMIT 10
'''

pd.read_sql(query06, conn)

Unnamed: 0,track_name,user_rating
0,Plants vs. Zombies HD,5.0
1,Flashlight Òã,5.0
2,TurboScanã¢ Pro - document & receipt scanner:...,5.0
3,Learn to Speak Spanish Fast With MosaLingua,5.0
4,The Photographer's Ephemeris,5.0
5,ÐÈSudoku +,5.0
6,:) Sudoku +,5.0
7,King of Dragon Pass,5.0
8,Plants vs. Zombies,5.0
9,Infinity Blade,5.0


**7. Using the same query from the previous exercise add the column `rating_count_tot`.**

- You'll notice that some of the top-rated don't have many reviews.
- Use `track_name`, `user_rating` and `rating_count_tot`.

```SQL
-- ---------------
-- query07
-- ---------------
SELECT
	track_name,
	user_rating,
	rating_count_tot
FROM
	apple_table
ORDER BY user_rating DESC
LIMIT 10
```

In [8]:
query07 = '''
SELECT
    track_name,
    user_rating,
    rating_count_tot
FROM
    apple_table
ORDER BY user_rating DESC
LIMIT 10
'''

pd.read_sql(query07, conn)

Unnamed: 0,track_name,user_rating,rating_count_tot
0,Plants vs. Zombies HD,5.0,163598
1,Flashlight Òã,5.0,130450
2,TurboScanã¢ Pro - document & receipt scanner:...,5.0,28388
3,Learn to Speak Spanish Fast With MosaLingua,5.0,9
4,The Photographer's Ephemeris,5.0,663
5,ÐÈSudoku +,5.0,5397
6,:) Sudoku +,5.0,11447
7,King of Dragon Pass,5.0,882
8,Plants vs. Zombies,5.0,426463
9,Infinity Blade,5.0,326482


**8. Now, find the top 5 ordering by ratings and number of votes.**

- Use `track_name`, `user_rating` and `rating_count_tot` columns.

```SQL
-- ---------------
-- query08
-- ---------------
SELECT
	track_name,
	user_rating,
	rating_count_tot
FROM
	apple_table
ORDER BY user_rating DESC, rating_count_tot DESC
LIMIT 10
```

In [9]:
query08 = '''
SELECT
    track_name,
    user_rating,
    rating_count_tot
FROM
    apple_table
ORDER BY user_rating DESC, rating_count_tot DESC
LIMIT 10
'''

pd.read_sql(query08, conn)

Unnamed: 0,track_name,user_rating,rating_count_tot
0,Head Soccer,5.0,481564
1,Plants vs. Zombies,5.0,426463
2,Sniper 3D Assassin: Shoot to Kill Gun Game,5.0,386521
3,Geometry Dash Lite,5.0,370370
4,Infinity Blade,5.0,326482
5,Geometry Dash,5.0,266440
6,Domino's Pizza USA,5.0,258624
7,CSR Racing 2,5.0,257100
8,Pictoword: Fun 2 Pics Guess What's the Word Tr...,5.0,186089
9,Plants vs. Zombies HD,5.0,163598


**9. Find the total number of games available in more than 1 language.**

- Use `track_name` and `langnum` columns.

```SQL
-- ---------------
-- query09
-- ---------------
SELECT
	COUNT(track_name)
FROM
    apple_table
WHERE
	prime_genre = 'Games' and
	langnum > 1
```

In [11]:
query09 = '''
SELECT
    COUNT(track_name)
FROM
    apple_table
WHERE
    prime_genre = 'Games' and
    langnum > 1
'''

pd.read_sql(query09, conn)

Unnamed: 0,count
0,1660


**10. Find the number of free vs paid apps.**

- Use `price` column.
- You can use `CASE WHEN` to filter free or paid apps.

```SQL
-- ---------------
-- query10
-- ---------------
SELECT
	CASE WHEN
		price = 0 
			then 'free'
			else 'paid'
			end as price_policy,
	COUNT(id)
FROM
    apple_table
GROUP BY price_policy
```

In [12]:
query10 = '''
SELECT
    CASE WHEN
        price = 0 
            then 'free'
            else 'paid'
            end as price_policy,
    COUNT(id)
FROM
    apple_table
GROUP BY price_policy
'''

pd.read_sql(query10, conn)

Unnamed: 0,price_policy,count
0,paid,3141
1,free,4056


**11. Find the number of free vs paid apps for each genre.**

- Use `price` and `prime_genre` column.
- You can use `CASE WHEN` to filter free or paid apps.

```SQL
-- ---------------
-- query11
-- ---------------
SELECT
	prime_genre,
	COUNT(CASE WHEN
			price = 0 
			then 'id'
			else null
			end) as free_app,
	COUNT(CASE WHEN
			price != 0 
			then 'id'
			else null
			end) as paid_app,
	COUNT(id)
FROM
    apple_table
GROUP BY prime_genre
ORDER BY COUNT(id) DESC;
```

In [13]:
query11 = '''
SELECT
    prime_genre,
    COUNT(CASE WHEN
            price = 0 
            then 'id'
            else null
            end) as free_app,
    COUNT(CASE WHEN
            price != 0 
            then 'id'
            else null
            end) as paid_app,
    COUNT(id)
FROM
    apple_table
GROUP BY prime_genre
ORDER BY COUNT(id) DESC;
'''

pd.read_sql(query11, conn)

Unnamed: 0,prime_genre,free_app,paid_app,count
0,Games,2257,1605,3862
1,Entertainment,334,201,535
2,Education,132,321,453
3,Photo & Video,167,182,349
4,Utilities,109,139,248
5,Health & Fitness,76,104,180
6,Productivity,62,116,178
7,Social Networking,143,24,167
8,Lifestyle,94,50,144
9,Music,67,71,138
