
# Practice SQL with Pandas pt. 2


---

We've learned about relational databases and the language most use to query them, SQL.  

In this lab we are going to gain more practice converting information to a SQL DB, querying the data and then analyzing it with Python.

In [1]:
# Necessary Libraries
import pandas as pd
import sqlite3
from pandas.io import sql

#### 1.  Read in the EuroMart CSV Data.
- 'EuroMart-ListOfOrders.csv'
- 'EuroMart-OrderBreakdown.csv'
- 'EuroMart-SalesTargets.csv'

In [180]:
# Reading CSV to Dataframe
orders = pd.read_csv('./datasets/csv/EuroMart-ListOfOrders.csv', encoding = 'utf-8')
obd =  pd.read_csv('./datasets/csv/EuroMart-OrderBreakdown.csv', encoding = 'utf-8')
sales_targets =  pd.read_csv('./datasets/csv/EuroMart-SalesTargets.csv', encoding = 'utf-8')

In [181]:
orders.head(3)

Unnamed: 0,Order ID,Order Date,Customer Name,City,Country,Region,Segment,Ship Date,Ship Mode,State
0,BN-2011-7407039,1/1/2011,Ruby Patel,Stockholm,Sweden,North,Home Office,1/5/2011,Economy Plus,Stockholm
1,AZ-2011-9050313,1/3/2011,Summer Hayward,Southport,United Kingdom,North,Consumer,1/7/2011,Economy,England
2,AZ-2011-6674300,1/4/2011,Devin Huddleston,Valence,France,Central,Consumer,1/8/2011,Economy,Auvergne-Rhône-Alpes


In [182]:
obd.head(3)

Unnamed: 0,Order ID,Product Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,$45.00,-$26.00,3,Office Supplies,Paper
1,AZ-2011-9050313,"Dania Corner Shelving, Traditional",0.0,$854.00,$290.00,7,Furniture,Bookcases
2,AZ-2011-6674300,"Binney & Smith Sketch Pad, Easy-Erase",0.0,$140.00,$21.00,3,Office Supplies,Art


In [183]:
sales_targets.head(3)

Unnamed: 0,Month of Order Date,Category,Target
0,Jan-11,Furniture,"$10,000.00"
1,Feb-11,Furniture,"$10,100.00"
2,Mar-11,Furniture,"$10,300.00"


#### 2. Rename columns to remove any spaces.

In [184]:
obd.columns

Index([u'Order ID', u'Product Name', u'Discount', u'Sales', u'Profit',
       u'Quantity', u'Category', u'Sub-Category'],
      dtype='object')

In [185]:
# A: 

new_columns_obd = [i.replace(' ', '_') for i in obd.columns]    
obd.columns = new_columns_obd

new_columns_orders = [i.replace(' ', '_') for i in orders.columns]    
orders.columns = new_columns_orders

new_columns_sales = [i.replace(' ', '_') for i in sales_targets.columns]    
sales_targets.columns = new_columns_sales

In [186]:
obd.head(1)

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,$45.00,-$26.00,3,Office Supplies,Paper


#### 3. Remove dollar signs from sales and profit columns in the order breakdown dataframe.

Convert the columns to float.

In [187]:
# A: 
obd['Sales'] = obd['Sales'].apply(lambda x: float(x.replace('$','').replace(',', '')))


In [188]:
obd['Sales'].dtypes

dtype('float64')

In [189]:
obd['Profit'] = obd['Profit'].apply(lambda x: float(x.replace('$','').replace(',', '')))

In [190]:
obd['Profit'].dtypes

dtype('float64')

In [191]:
obd.dtypes

Order_ID         object
Product_Name     object
Discount        float64
Sales           float64
Profit          float64
Quantity          int64
Category         object
Sub-Category     object
dtype: object

#### 4. Create a SQL Database called 'EuroMart' and save the three dataframes as SQL tables. 

In [192]:
# Establishing Local DB connection
db_connection = sqlite3.connect('./datasets/sql/EuroMart.db.sqlite')


In [193]:
# A: 
orders.to_sql(name = 'orders', con = db_connection, if_exists = 'replace', index = False)
obd.to_sql(name = 'obd', con = db_connection, if_exists = 'replace', index = False)
sales_targets.to_sql(name = 'sales_targets', con = db_connection, if_exists = 'replace', index = False)



In [194]:
def Q(query, db=db_connection):
    return sql.read_sql(query, db)

#### 5. How many orders has each Customer placed? 

In [60]:
# A:
Q('SELECT "Customer_Name", COUNT("Customer_Name") AS "number_of_orders" FROM "orders" GROUP BY "Customer_Name"').head()

Unnamed: 0,Customer_Name,number_of_orders
0,Aaron Bootman,11
1,Aaron Cunningham,8
2,Aaron Davey,4
3,Aaron Macrossan,1
4,Abbie Perry,4


> *If you're doubting your output check using Pandas*

#### 6. Create a Query to return a table of only geographic features from the List of Orders Table.

In [68]:
# A:
Q('SELECT "City", "Country", "Region", "State" FROM "orders" ').head()

Unnamed: 0,City,Country,Region,State
0,Stockholm,Sweden,North,Stockholm
1,Southport,United Kingdom,North,England
2,Valence,France,Central,Auvergne-Rhône-Alpes
3,Birmingham,United Kingdom,North,England
4,Echirolles,France,Central,Auvergne-Rhône-Alpes


#### 7. Create a Query to return a table with all of the orders that had a negative profit from the Order Breakdown Table.

In [71]:
# A:
Q('SELECT "Order_ID", "Profit" FROM "obd" WHERE "Profit" < 0').head(3)

Unnamed: 0,Order_ID,Profit
0,BN-2011-7407039,-26.0
1,BN-2011-2819714,-22.0
2,BN-2011-2819714,-1.0


#### 8. Construct a query to return a table with the Customer Name and Product Name.  

> **Note:** This will require a join!

In [73]:
Q('SELECT orders."Customer_Name", obd."Product_Name" '
 'FROM orders '
 'INNER JOIN obd '
 'ON orders."Order_ID" = obd."Order_ID"').head()

Unnamed: 0,Customer_Name,Product_Name
0,Ruby Patel,"Enermax Note Cards, Premium"
1,Summer Hayward,"Dania Corner Shelving, Traditional"
2,Devin Huddleston,"Binney & Smith Sketch Pad, Easy-Erase"
3,Mary Parker,"Boston Markers, Easy-Erase"
4,Mary Parker,"Eldon Folders, Single Width"


In [10]:
# A:


#### 9.  How many orders for "Office Supplies" (Category) has Sweden made?

> **Note:** from this point on you'll probably be combining SQL and Pandas, in that you would use SQL querys to gather the relevant information and use Pandas to analyze it.

In [121]:
print swed_office["Country"][12], "has made", swed_office["office_supplies_orders"][12], "'Office Supplies'"
swed_office = Q(''' SELECT orders."Country", COUNT(obd."Category") AS  "office_supplies_orders" 
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" 
WHERE "Category" = 'Office Supplies' GROUP BY "Country"''')
swed_office

Sweden has made 133 'Office Supplies'


Unnamed: 0,Country,office_supplies_orders
0,Austria,177
1,Belgium,96
2,Denmark,39
3,Finland,38
4,France,1231
5,Germany,1065
6,Ireland,67
7,Italy,656
8,Netherlands,242
9,Norway,46


#### 10. What was the total sales for products that have been discounted? 

In [95]:
# A:
Q('SELECT SUM("Sales") FROM "obd" WHERE "Discount" > 0 ')

Unnamed: 0,"SUM(""Sales"")"
0,1115614.0


#### 11. What is the total quantity of objects sold for each country?

In [96]:
# A:
Q('''SELECT orders."Country", SUM("Quantity") AS "Quantity"
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" GROUP BY "Country" ''')

Unnamed: 0,Country,Quantity
0,Austria,973
1,Belgium,532
2,Denmark,204
3,Finland,201
4,France,7329
5,Germany,6179
6,Ireland,392
7,Italy,3612
8,Netherlands,1526
9,Norway,261


#### 12. In what Countries are profits lowest? (Report lowest 5-10)

In [117]:
# A:
Q('''SELECT orders."Country", SUM("Profit") AS "Profit"
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" 
GROUP BY "Country" ORDER BY "Profit" ASC''').head()

Unnamed: 0,Country,Profit
0,Netherlands,-37188.0
1,Sweden,-17524.0
2,Portugal,-8704.0
3,Ireland,-6886.0
4,Denmark,-3608.0


#### 13. What Counties have the best and worst Sales to Profit Ratios?
(Total Sales divided by Total Profits.)
Essentially this is saying for every dollar of product sold, how much is profit.

In [115]:
# A:
print ratios['Country'][0], "has the best Sales to Profit Ratio: ", ratios['Ratios'][0]
print ratios['Country'][14], "has the worst Sales to Profit Ratio: ", ratios['Ratios'][14]

ratios = Q('''SELECT orders."Country", SUM("Sales") / SUM("Profit") AS "Ratios"
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" 
GROUP BY "Country" ORDER BY "Ratios" DESC''')
ratios

Italy has the best Sales to Profit Ratio:  15.9943045184
Ireland has the worst Sales to Profit Ratio:  -2.32326459483


Unnamed: 0,Country,Ratios
0,Italy,15.994305
1,France,8.701429
2,Germany,5.663962
3,Spain,5.298872
4,Finland,5.297339
5,United Kingdom,4.652442
6,Belgium,4.269572
7,Norway,3.973099
8,Austria,3.721264
9,Switzerland,3.438485


#### 14. What Shipping method is most common for 'Bookcases' (Sub Category)?

In [135]:
print "The most common Shipping method for 'Bookcases' (Sub Category) is:", bookcases_ship['Ship_Mode'][0]
bookcases_ship = Q('''SELECT orders."Ship_Mode", COUNT(obd."Sub-Category") AS "Bookcases"
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" 
WHERE "Sub-Category" = 'Bookcases'
GROUP BY "Ship_Mode" ''')
bookcases_ship

The most common Shipping method for 'Bookcases' (Sub Category) is: Economy


Unnamed: 0,Ship_Mode,Bookcases
0,Economy,234
1,Economy Plus,76
2,Immediate,22
3,Priority,59


#### 15 .What city in the Orders table generated the highest net sales?  (List all the cities and countries in descending order by net sales.)

In [141]:
# A:
print "The city of", city_sales['City'][0], "generated the highest net sales:", city_sales['Sales'][0]
city_sales = Q('''SELECT orders."City", orders."Country", SUM("Sales") AS "Sales"
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" 
GROUP BY "City" ORDER BY "Sales" DESC''')
city_sales.head()

The city of London generated the highest net sales: 69230.0


Unnamed: 0,City,Country,Sales
0,London,United Kingdom,69230.0
1,Berlin,Germany,52555.0
2,Vienna,Austria,51844.0
3,Madrid,Spain,44981.0
4,Paris,France,42245.0


#### BONUS: Create a Column called 'Shipping Delay' on the 'orders' table, which is the difference in days between 'Order Date' and 'Ship Date'.

In [195]:
orders['Shipping Delay'] = pd.to_datetime(orders['Ship_Date']) - pd.to_datetime(orders['Order_Date'])

In [196]:
orders['Shipping Delay'] = orders['Shipping Delay'].dt.days


In [201]:
orders.head(1)

Unnamed: 0,Order_ID,Order_Date,Customer_Name,City,Country,Region,Segment,Ship_Date,Ship_Mode,State,Shipping Delay
0,BN-2011-7407039,1/1/2011,Ruby Patel,Stockholm,Sweden,North,Home Office,1/5/2011,Economy Plus,Stockholm,4


#### BONUS: Update your Orders table in your Sqlite DB to include the 'Shipping Delay' feature.

In [198]:
# A:
orders.to_sql(name = 'orders', con = db_connection, if_exists = 'replace', index = False)


#### BONUS: Which Product Category has the highest average 'Shipping Delay'?

In [200]:
# A:
Q('''SELECT obd."Category", AVG(orders."Shipping Delay") AS "avg_shipping_delay"
FROM orders INNER JOIN obd ON orders."Order_ID" = obd."Order_ID" GROUP BY "Category" ''')

Unnamed: 0,Category,avg_shipping_delay
0,Furniture,4.0
1,Office Supplies,3.975028
2,Technology,4.12541


### Challenge problem:   
**In what months and Categories were Sales Targets Exceeded?**

---

This may require a considerable amount of data processing.

In [22]:
# A: