# SQL analysis part 2

Download this Juypter Notebook and solve the tasks by inserting the SQL queries (it is not possible to solve the tasks in Colab). 

Example query (we include `df_example` at the end of the code cell to print the result):

```Python
df_example = pd.read_sql("""
    SELECT *
    FROM ecommerce_data;
""", engine)

df_example
```

## Setup

In [1]:
import os
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv

## Data

Connect to your MySQL-database "db_ecommerce" (make sure to prepare your `.env` file)

In [2]:
load_dotenv()   # take environment variables from .env

engine = create_engine("mysql+pymysql://" + os.environ['DB_URL'] + "/db_ecommerce", pool_pre_ping=True, pool_recycle=300)

In [3]:
# Use pandas to_sql function to create the table in the database
df = pd.read_csv('https://raw.githubusercontent.com/kirenz/lab-competitive/main/code/ecommerce.csv')
df.to_sql('ecommerce', engine, if_exists='replace')

108

# Task 1 

Get the count of E-shops that have an average rating more than 6.5. Use the alias `eshop_count`.


In [15]:

df1 = pd.read_sql("""
    SELECT COUNT(DISTINCT eshop_name) as eshop_count
    FROM ecommerce_data
    WHERE average_rating > 6.5;
""", engine)

df1



Unnamed: 0,eshop_count
0,2


# Task 2

Get the sum of annual revenue grouped by eshop_name. Use the alias `total_revenue`.

In [16]:
df2 = pd.read_sql("""
    SELECT eshop_name, SUM(annual_revenue) as total_revenue
    FROM ecommerce_data
    GROUP BY eshop_name;
""", engine)

df2


Unnamed: 0,eshop_name,total_revenue
0,E-ShopA,1205.4
1,E-ShopB,1052.02
2,E-ShopC,1102.5


# Task 3 

Get the maximum and minimum number of social_media_followers by eshop_name. Use the aliases `max_followers` and `min_followers`

In [17]:
df3 = pd.read_sql("""
    SELECT eshop_name, MAX(social_media_followers) as max_followers, MIN(social_media_followers) as min_followers
    FROM ecommerce_data
    GROUP BY eshop_name;
""", engine)

df3


Unnamed: 0,eshop_name,max_followers,min_followers
0,E-ShopA,1417.39,52.69
1,E-ShopB,1392.46,50.92
2,E-ShopC,1529.19,49.89


# Task 4 

Get the average annual revenue by eshop_name, only include groups having average annual revenue more than 30000. Use the alias `average_revenue`. Show average revenue with all digits (e.g. 30000 instead of 30).


In [24]:

df4 = pd.read_sql("""
    SELECT eshop_name, AVG(annual_revenue)*1000 as average_revenue
    FROM ecommerce_data
    GROUP BY eshop_name
    HAVING average_revenue > 30000;
""", engine)

df4



Unnamed: 0,eshop_name,average_revenue
0,E-ShopA,33483.333333
1,E-ShopC,30625.0


# Task 5 

Get the total annual revenue and social media followers by eshop_name, including rollup. Use the aliases `total_revenue` and `total_followers`


In [25]:

df5 = pd.read_sql("""
    SELECT eshop_name, SUM(annual_revenue) as total_revenue, SUM(social_media_followers) as total_followers
    FROM ecommerce_data
    GROUP BY eshop_name WITH ROLLUP;
""", engine)

df5


Unnamed: 0,eshop_name,total_revenue,total_followers
0,E-ShopA,1205.4,21938.24
1,E-ShopB,1052.02,19253.83
2,E-ShopC,1102.5,20042.07
3,,3359.92,61234.14


## Close the connection

In [28]:
# close connection
engine.dispose()