# SQL with Python and Pandas Lab
Today we learned about relational databases and the language most use to query them, SQL.  In this lab we are going to practice converting information to a SQL DB, querying the data and then analyzing it with Python and just Python.

## Read in the EuroMart CSV Data
- 'EuroMart-ListOfOrders.csv'
- 'EuroMart-OrderBreakdown.csv'
- 'EuroMart-SalesTargets.csv'

In [142]:
import pandas as pd
import sqlite3
from pandas.io import sql


for csv in ['ListOfOrders','OrderBreakdown', 'SalesTargets']:
    pd_df = pd.read_csv('data/csv/EuroMart-'+ csv + '.csv', encoding = 'utf-8')
    pd_df.columns = [x.replace(' ', '_') for x in pd_df.columns]
    pd_df.columns = [x.replace('-', '_') for x in pd_df.columns]    
    connection = sqlite3.connect('data/sql/EuroMart.db.sqlite')
    pd_df.to_sql(name = csv, con = connection, if_exists = 'replace', index = False)
    

## converting $$ columns to numbers
c = connection.cursor()
c.execute('update OrderBreakdown set Sales = cast(replace(Sales,"$","") as number);')
c.execute('update OrderBreakdown set Profit = cast(replace(Profit,"$","") as number);')
c.execute('update SalesTargets set Target = cast(replace(replace(Target,"$",""),",","") as number);')
connection.commit()


DatabaseError: Execution failed on sql 'DROP TABLE "ListOfOrders"': database is locked

## Create a SQL Database called 'EuroMart' and save the three dataframes as SQL tables. 

In [84]:
## included above

### 1. How many orders has each Customer placed? 

In [11]:
sql.read_sql_query('select customer_name, count(*) num_orders from ListOfOrders group by customer_name',connection)


Unnamed: 0,Customer_Name,num_orders
0,Aaron Bootman,11
1,Aaron Cunningham,8
2,Aaron Davey,4
3,Aaron Macrossan,1
4,Abbie Perry,4
5,Abby Colebe,3
6,Abby Mei,6
7,Abby Muramats,5
8,Abigail Humffray,5
9,Ada Dalton,4


##### *If you're doubting your output check using Pandas

### 2. Create a Query to return a table of only Geographic features from the List of Orders Table.

In [13]:
sql.read_sql_query('select city, country, region, state from ListOfOrders',connection)


Unnamed: 0,City,Country,Region,State
0,Stockholm,Sweden,North,Stockholm
1,Southport,United Kingdom,North,England
2,Valence,France,Central,Auvergne-Rhône-Alpes
3,Birmingham,United Kingdom,North,England
4,Echirolles,France,Central,Auvergne-Rhône-Alpes
5,La Seyne-sur-Mer,France,Central,Provence-Alpes-Côte d'Azur
6,Toulouse,France,Central,Languedoc-Roussillon-Midi-Pyrénées
7,Genoa,Italy,South,Liguria
8,Vienna,Austria,Central,Vienna
9,Murcia,Spain,South,Murcia


### 3. Create a Query to return a table with all of the orders that had a negative profit from the Order Breakdown Table.

In [80]:
sql.read_sql_query('select * from OrderBreakdown where profit < 0 ',connection)


Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.50,45,-26,3,Office Supplies,Paper
1,BN-2011-2819714,"Boston Markers, Easy-Erase",0.50,27,-22,2,Office Supplies,Art
2,BN-2011-2819714,"Eldon Folders, Single Width",0.50,17,-1,2,Office Supplies,Storage
3,BN-2011-3248724,"Ikea Classic Bookcase, Metal",0.60,987,-1,6,Furniture,Bookcases
4,BN-2011-3248724,"Binney & Smith Sketch Pad, Blue",0.50,116,-56,5,Office Supplies,Art
5,AZ-2011-6439906,"Bevis Training Table, with Bottom Storage",0.60,268,-342,2,Furniture,Tables
6,AZ-2011-2222024,"Green Bar Note Cards, Multicolor",0.50,34,-6,2,Office Supplies,Paper
7,BN-2011-4913858,"Wilson Jones Hole Reinforcements, Durable",0.50,9,-3,3,Office Supplies,Binders
8,BN-2011-4913858,"Harbour Creations Legal Exhibit Labels, Laser ...",0.50,22,-12,4,Office Supplies,Labels
9,BN-2011-4913858,"Green Bar Cards & Envelopes, Multicolor",0.50,50,-38,2,Office Supplies,Paper


### 4. Construct a query to return a table with the Customer Name and Product Name.  
(This will require a Join)

In [16]:
sql.read_sql_query('select customer_name, product_name from ListOfOrders LOO join OrderBreakdown OB on LOO.Order_ID = OB.Order_ID',connection)


Unnamed: 0,Customer_Name,Product_Name
0,Ruby Patel,"Enermax Note Cards, Premium"
1,Summer Hayward,"Dania Corner Shelving, Traditional"
2,Devin Huddleston,"Binney & Smith Sketch Pad, Easy-Erase"
3,Mary Parker,"Boston Markers, Easy-Erase"
4,Mary Parker,"Eldon Folders, Single Width"
5,Daniel Burke,"Binney & Smith Pencil Sharpener, Water Color"
6,Daniel Burke,"Sanford Canvas, Fluorescent"
7,Fredrick Beveridge,"Accos Thumb Tacks, Assorted Sizes"
8,Fredrick Beveridge,"Bush Floating Shelf Set, Pine"
9,Fredrick Beveridge,"Smead Lockers, Industrial"


##### From this point on you'll probably be combining SQL and Pandas, in that you would use SQL querys to gather the relevant information and use Pandas to analyze it.

### 5.  How many orders for "Office Supplies" (Category) has Sweeden made?

In [21]:
sql.read_sql_query('select count(*) num_orders from ListOfOrders LOO join OrderBreakdown OB on LOO.Order_ID = OB.Order_ID where Country = "Sweden" and Category = "Office Supplies"',connection)


Unnamed: 0,num_orders
0,133


### 6. What was the total sales for products that have been discounted? 

In [81]:
sql.read_sql_query('select sum(Sales) total_discounted_sales from OrderBreakdown where Discount > 0',connection)


Unnamed: 0,total_discounted_sales
0,661672


### 7. What is the total quantity of objects sold for each country?

In [51]:
sql.read_sql_query('select country, count(*) num_orders from ListOfOrders LOO join OrderBreakdown OB on LOO.Order_ID = OB.Order_ID group by Country',connection)


Unnamed: 0,Country,num_orders
0,Austria,264
1,Belgium,135
2,Denmark,60
3,Finland,64
4,France,1916
5,Germany,1640
6,Ireland,100
7,Italy,979
8,Netherlands,393
9,Norway,70


### 8. In what Countries are profits lowest? (Report lowest 5-10)

In [83]:
sql.read_sql_query('select country, sum(Profit) profits from OrderBreakdown OB join ListOfOrders LOO on OB.Order_ID = LOO.Order_ID group by Country order by sum(Profit) limit 10',connection)


Unnamed: 0,Country,profits
0,Netherlands,-30893
1,Sweden,-12124
2,Ireland,-6886
3,Portugal,-5647
4,Denmark,-3608
5,Finland,3908
6,Norway,5167
7,Switzerland,6174
8,Belgium,9912
9,Italy,14148


### 9. What Counties have the best and worst Sales to Profit Ratios?
(Total Sales divided by Total Profits.)
Essentially this is saying for every dollar of product sold, how much is profit

In [88]:

print sql.read_sql_query('select country,sum(Sales) / sum(Profit) best_sales_to_profits_ratio from OrderBreakdown OB join ListOfOrders LOO on OB.Order_ID = LOO.Order_ID group by Country order by sum(Sales) / sum(Profit) desc limit 10',connection)
print ""
print sql.read_sql_query('select country,sum(Sales) / sum(Profit) worst_sales_to_profits_ratio from OrderBreakdown OB join ListOfOrders LOO on OB.Order_ID = LOO.Order_ID group by Country order by sum(Sales) / sum(Profit) limit 10',connection)


          Country  best_sales_to_profits_ratio
0           Italy                           11
1          France                            5
2         Finland                            3
3         Germany                            3
4           Spain                            3
5  United Kingdom                            3
6         Austria                            2
7         Belgium                            2
8          Norway                            2
9     Switzerland                            2

       Country  worst_sales_to_profits_ratio
0      Denmark                            -2
1      Ireland                            -2
2  Netherlands                            -1
3     Portugal                            -1
4       Sweden                            -1
5      Austria                             2
6      Belgium                             2
7       Norway                             2
8  Switzerland                             2
9      Finland                  

### 10. What Shipping method is most common for 'Bookcases' (Sub Category)

In [101]:
sql.read_sql_query('select Ship_Mode from ListOfOrders LOO join OrderBreakdown OB on LOO.Order_ID = OB.Order_ID where Sub_Category = "Bookcases" group by ship_mode order by count(*) desc limit 1' ,connection)

Unnamed: 0,Ship_Mode
0,Economy


### 11 .What city in the Orders table generated the highest net sales?  (List all the cities and countries in descending order by net sales.)

In [112]:
sql.read_sql_query('select city, country, sum(Sales) net_sales from ListOfOrders LOO join OrderBreakdown OB on LOO.Order_ID = OB.Order_ID group by city, country order by sum(Sales) desc' ,connection)

Unnamed: 0,City,Country,net_sales
0,London,United Kingdom,35572
1,Berlin,Germany,33682
2,Vienna,Austria,33218
3,Paris,France,26146
4,Madrid,Spain,25864
5,Rome,Italy,18267
6,Milan,Italy,15243
7,Barcelona,Spain,14212
8,Hamburg,Germany,13789
9,Munich,Germany,13419


### 12.1 .Create a Column called 'Shipping Delay' on the 'orders' table, which is the difference in days between 'Order Date' and 'Ship Date'.

In [3]:
c.execute('alter table ListOfOrders add column Shipping_Delay text')
connection.commit()

### 12.2 Update your Orders table in your Sqlite DB to include the 'Shipping Delay' feature.

In [112]:
c.execute('alter table ListOfOrders add column ship_new text')
c.execute('alter table ListOfOrders add column order_new text')

c.execute('update ListOfOrders set ship_new = ship_date')
connection.commit()

c.execute('update ListOfOrders set ship_new = '
          '(case when ship_date like "_/_/____" then "0" || substr(ship_date,0,3) || "0" || substr(ship_date,3) '
                'when ship_date like "_/__/____" then "0" || substr(ship_date,0) '
                'when ship_date like "__/_/____" then substr(ship_date,0,4) || "0" || substr(ship_date,4)' 
                'else ship_date end) ')
connection.commit()

c.execute('update ListOfOrders set order_new = '
          '(case when order_date like "_/_/____" then "0" || substr(order_date,0,3) || "0" || substr(order_date,3) '
                'when order_date like "_/__/____" then "0" || substr(order_date,0) '
                'when order_date like "__/_/____" then substr(order_date,0,4) || "0" || substr(order_date,4)' 
                'else order_date end)')
connection.commit()

c.execute('update ListOfOrders set order_new = replace(order_new,"/","-")')
c.execute('update ListOfOrders set ship_new = replace(ship_new,"/","-")')
c.execute('update ListOfOrders set ship_new = substr(ship_new,7)||"-"||substr(ship_new,0,3)||substr(ship_new,3,3)')
c.execute('update ListOfOrders set order_new = substr(order_new,7)||"-"||substr(order_new,0,3)||substr(order_new,3,3)')
connection.commit()
c.execute('update ListOfOrders set Shipping_Delay = julianday(ship_new)-julianday(order_new)')
connection.commit()

Unnamed: 0,Shipping_Delay
0,4.0
1,4.0
2,4.0
3,5.0
4,2.0
5,1.0
6,6.0
7,5.0
8,4.0
9,4.0


### 12.3 Which Product Category has the highest average 'Shipping Delay'

In [120]:


sql.read_sql_query('select category, avg(shipping_delay) avg_shipping_delay from OrderBreakdown OB join ListOfOrders LOO on OB.Order_ID = LOO.Order_ID group by category order by avg(shipping_delay) desc limit 1' ,connection)

Unnamed: 0,Category,avg_shipping_delay
0,Technology,4.12541


# Hard Mode:

### 13. In what months and Categories were Sales Targets Exceeded?


In [141]:
# where cast(replace(replace(Target,"$",""),",","") as number) < sum(sales) 

sql.read_sql_query('select Month_of_order_date, st.category,sum(target), sum(sales) from SalesTargets st join orderbreakdown loo on st.category = loo.category group by month_of_order_date, st.category' ,connection)






Unnamed: 0,Month_of_Order_Date,Category,sum(target),sum(sales)
0,Apr-11,Furniture,0.0,341603
1,Apr-11,Office Supplies,0.0,613687
2,Apr-11,Technology,0.0,474066
3,Apr-12,Furniture,0.0,341603
4,Apr-12,Office Supplies,0.0,613687
5,Apr-12,Technology,0.0,474066
6,Apr-13,Furniture,0.0,341603
7,Apr-13,Office Supplies,0.0,613687
8,Apr-13,Technology,0.0,474066
9,Apr-14,Furniture,0.0,341603


### 14. In what months and Categories did Sales fail to exceed their targets?

In [126]:
c.execute('update SalesTargets set Target = cast(replace(Target,"$","") as number);')

<sqlite3.Cursor at 0x11956e960>