# Assignment SQL

### Problem Statement

You’ve just joined the Data Analytics & Strategy Team at Retail Insights Inc., a fast-growing global retail company headquartered in New York City. The company sells a wide variety of products, ranging from electronics and office supplies to furniture and home goods across North America, Europe, and Asia-Pacific.

But lately, something’s not quite right.

The executive team is concerned that sales are dropping in certain regions, and customer loyalty seems to be slipping. With a big strategy meeting coming up in two weeks, senior leadership needs insights fast.

Your team has been asked to conduct a deep dive using SQL and Python notebooks to answer the following big questions:
- What are the top 5 best-selling products in the last year?
- Which customer segment brings the most revenue?
- Which regions are underperforming based on total sales?
- What is the average shipping time, and does it vary by region?
- List customers who have placed more than 10 orders.
- Are there shipping issues causing customer churn?
- What product lines are driving (or dragging) revenue?
- Are there any bottlenecks in the supply chain or fulfillment pipeline?



### Your Role:

You’re a Junior Data Analyst, and this is your first major project. You’ll work alongside a data scientist, a business analyst, and the head of operations.

You’ve been given access to the company’s internal database system, and you’ll be using a Jupyter Notebook connected to a SQL database (either SQLite or MySQL) to explore, analyze, and visualize the data.


### Tools & Tech Stack
- Database: SQLite or MySQL (preloaded)
- Python Notebook Environment: Jupyter via Anaconda
- SQL Libraries: ipython-sql, pymysql, sqlite3
- Dataset: Global Superstore dataset (Orders, Customers, Products, Regions, etc.)


### Assignment Instructions

You’ll be expected to:
1. Complete the Jupyter Notebook
2. Connect it to the database using SQL magic commands
3. Run SQL queries to explore the dataset
4. Document your process using markdown cells
5. Draw conclusions from your findings, explaining them as if preparing a report for the executive team


### Deliverables
1. Clean, well-documented Jupyter Notebook
2. SQL queries with markdown explanations
3. Final markdown section titled: “Insights for the Executive Team” – Summarize your top 3 findings



In [10]:
import pandas as pd
import sqlite3

In [11]:
df = pd.read_csv("superstore.csv")
df.head()

Unnamed: 0,Category,City,Country,Customer.ID,Customer.Name,Discount,Market,记录数,Order.Date,Order.ID,...,Sales,Segment,Ship.Date,Ship.Mode,Shipping.Cost,State,Sub.Category,Year,Market2,weeknum
0,Office Supplies,Los Angeles,United States,LS-172304,Lycoris Saunders,0.0,US,1,2011-01-07 00:00:00.000,CA-2011-130813,...,19,Consumer,2011-01-09 00:00:00.000,Second Class,4.37,California,Paper,2011,North America,2
1,Office Supplies,Los Angeles,United States,MV-174854,Mark Van Huff,0.0,US,1,2011-01-21 00:00:00.000,CA-2011-148614,...,19,Consumer,2011-01-26 00:00:00.000,Standard Class,0.94,California,Paper,2011,North America,4
2,Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,...,21,Consumer,2011-08-09 00:00:00.000,Standard Class,1.81,California,Paper,2011,North America,32
3,Office Supplies,Los Angeles,United States,CS-121304,Chad Sievert,0.0,US,1,2011-08-05 00:00:00.000,CA-2011-118962,...,111,Consumer,2011-08-09 00:00:00.000,Standard Class,4.59,California,Paper,2011,North America,32
4,Office Supplies,Los Angeles,United States,AP-109154,Arthur Prichep,0.0,US,1,2011-09-29 00:00:00.000,CA-2011-146969,...,6,Consumer,2011-10-03 00:00:00.000,Standard Class,1.32,California,Paper,2011,North America,40


In [12]:
conn = sqlite3.connect(":memory:")  # or use 'superstore.db' to persist
df.to_sql("superstore", conn, index=False, if_exists="replace")

51290

In [None]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

The sql module is not an IPython extension.


In [15]:
!pip install pymysql ipython-sql

Collecting pymysql
  Downloading PyMySQL-1.1.1-py3-none-any.whl.metadata (4.4 kB)
Collecting ipython-sql
  Downloading ipython_sql-0.5.0-py3-none-any.whl.metadata (17 kB)
Collecting prettytable (from ipython-sql)
  Downloading prettytable-3.16.0-py3-none-any.whl.metadata (33 kB)
Collecting sqlalchemy>=2.0 (from ipython-sql)
  Downloading sqlalchemy-2.0.41-cp311-cp311-macosx_11_0_arm64.whl.metadata (9.6 kB)
Collecting sqlparse (from ipython-sql)
  Using cached sqlparse-0.5.3-py3-none-any.whl.metadata (3.9 kB)
Collecting ipython-genutils (from ipython-sql)
  Downloading ipython_genutils-0.2.0-py2.py3-none-any.whl.metadata (755 bytes)
Downloading PyMySQL-1.1.1-py3-none-any.whl (44 kB)
Downloading ipython_sql-0.5.0-py3-none-any.whl (20 kB)
Downloading sqlalchemy-2.0.41-cp311-cp311-macosx_11_0_arm64.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading ipython_genutils-0.2.0-py2.py