## 1 Methodology
The purpose of this notebook is to demonstrate how to import csv files into a MySQL database table. I can either create a new database table from the csv file or simply import it into a new or existing schema in MySQL. The correct SQL in order to create a table from the 'Bike Sales.csv' file is:

In [None]:
CREATE TABLE bike_sales
(
Date DATE,
Day INT,
Month TEXT,
Year INT,
Customer_Age INT,
Age_Group TEXT,
Customer_Gender TEXT,
Country TEXT,
State TEXT,
Product_Category TEXT,
Sub_Category TEXT,
Product TEXT,
Order_Quantity INT,
Unit_Cost INT,
Unit_Price INT,
Profit INT,
Cost INT,
Revenue INT)
);


Then the data from the csv file needs to be inserted into the table that's just been created.

In [None]:
INSERT INTO bike_sales.`bike sales`
(
`Date`,
`Day`,
`Month`,
`Year`,
`Customer_Age`,
`Age_Group`,
`Customer_Gender`,
`Country`,
`State`,
`Product_Category`,
`Sub_Category`,
`Product`,
`Order_Quantity`,
`Unit_Cost`,
`Unit_Price`,
`Profit`,
`Cost`,
`Revenue`
) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

In [None]:
CREATE TABLE sales_summary
(
AccountManager VARCHAR(255),
Qtr1 DOUBLE,
Qtr2 DOUBLE,
Qtr3 DOUBLE,
Qtr4 DOUBLE,
);

In [None]:
SELECT * FROM sales_summary;

For small datasets or csv files values can be inserted as follows:

In [None]:
INSERT INTO sales_summary Values
('Aanya Zhang','5187.90','7627.17','28867.26','742.53'),
('Charlie Bui','24271.31','130.78','116.61','355.15'),
('Connor Betts','854.08','20123.65','3050.18','4373.98'),
('Leighton Forrest','815.58','1129.69','327.02','16169.12'),
('Mihael Khan','425.78','981.27','596.70','470.74'),
('Natasha Song','5080.74','6259.31','4265.86','4956.43'),
('Nicholas Fernandes','21787.86','1533.62','2191.42','2384.04'),
('Phoebe Gour','5117.84','12156.60','351.06','15653.93'),
('Preston Senome','1326.07','1415.98','2314.11','2817.60'),
('Radhya Staples','0.00','3.32','10373.59','206.16'),
('Samantha Chairs','2233.62','2005.70','1542.68','4921.92'),
('Stevie Bacata','0.00','91.10','0.00','0.00'),
('Tina Carlton','17247.36','2512.24','7003.82','2952.73'),
('Yvette Biti','2252.16','1476.92','3293.39','7731.78')

For larger datasets with hundreds or thousands of rows it becomes more efficient to use Python.

## 2 Data Preparation and Export

### Import the Pandas Library

In [1]:
import pandas as pd

### Read the CSV File
Load the CSV file using the (read_csv) method and save it as a dataframe (df).

In [2]:
df = pd.read_csv('Bike Sales.csv')
df.head()

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
0,2013-11-26,26,November,2013,19,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8,45,120,590,360,950
1,2015-11-26,26,November,2015,19,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8,45,120,590,360,950
2,2014-03-23,23,March,2014,49,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,23,45,120,1366,1035,2401
3,2016-03-23,23,March,2016,49,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,20,45,120,1188,900,2088
4,2014-05-15,15,May,2014,47,Adults (35-64),F,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,4,45,120,238,180,418


### Data Cleaning
Drop all unecessary columns that aren't required.

In [3]:
cols = ['Day','Month','Year','Age_Group']

df.drop(cols, inplace=True, axis = 1)
df.head()

Unnamed: 0,Date,Customer_Age,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
0,2013-11-26,19,M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8,45,120,590,360,950
1,2015-11-26,19,M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8,45,120,590,360,950
2,2014-03-23,49,M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,23,45,120,1366,1035,2401
3,2016-03-23,49,M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,20,45,120,1188,900,2088
4,2014-05-15,47,F,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,4,45,120,238,180,418


Check for missing values.

In [4]:
df.isnull().sum()

Date                0
Customer_Age        0
Customer_Gender     0
Country             0
State               0
Product_Category    0
Sub_Category        0
Product             0
Order_Quantity      0
Unit_Cost           0
Unit_Price          0
Profit              0
Cost                0
Revenue             0
dtype: int64

Handle data types.

In [5]:
df.dtypes

Date                object
Customer_Age         int64
Customer_Gender     object
Country             object
State               object
Product_Category    object
Sub_Category        object
Product             object
Order_Quantity       int64
Unit_Cost            int64
Unit_Price           int64
Profit               int64
Cost                 int64
Revenue              int64
dtype: object

We need to fix the datatype for the Date column by converting it from 'Object' to 'Date'.

In [6]:
df['Date'] = pd.to_datetime(df['Date'])

We also need to convert the last 5 columns of data into type 'Float' instead of 'Integer'.

In [8]:
df = df.astype({'Unit_Cost':'float','Unit_Price':'float','Profit':'float','Cost':'float','Revenue':'float'})

In [9]:
df.dtypes

Date                datetime64[ns]
Customer_Age                 int64
Customer_Gender             object
Country                     object
State                       object
Product_Category            object
Sub_Category                object
Product                     object
Order_Quantity               int64
Unit_Cost                  float64
Unit_Price                 float64
Profit                     float64
Cost                       float64
Revenue                    float64
dtype: object

### 3 DataFrame Export
Next, save our cleaned data as a text file. Convert all rows into a list of tuples first.

In [10]:
y = []

for i in range(len(df)):
    x = tuple(df.iloc[i])
    y.append(x)
    
y

[(Timestamp('2013-11-26 00:00:00'),
  19,
  'M',
  'Canada',
  'British Columbia',
  'Accessories',
  'Bike Racks',
  'Hitch Rack - 4-Bike',
  8,
  45.0,
  120.0,
  590.0,
  360.0,
  950.0),
 (Timestamp('2015-11-26 00:00:00'),
  19,
  'M',
  'Canada',
  'British Columbia',
  'Accessories',
  'Bike Racks',
  'Hitch Rack - 4-Bike',
  8,
  45.0,
  120.0,
  590.0,
  360.0,
  950.0),
 (Timestamp('2014-03-23 00:00:00'),
  49,
  'M',
  'Australia',
  'New South Wales',
  'Accessories',
  'Bike Racks',
  'Hitch Rack - 4-Bike',
  23,
  45.0,
  120.0,
  1366.0,
  1035.0,
  2401.0),
 (Timestamp('2016-03-23 00:00:00'),
  49,
  'M',
  'Australia',
  'New South Wales',
  'Accessories',
  'Bike Racks',
  'Hitch Rack - 4-Bike',
  20,
  45.0,
  120.0,
  1188.0,
  900.0,
  2088.0),
 (Timestamp('2014-05-15 00:00:00'),
  47,
  'F',
  'Australia',
  'New South Wales',
  'Accessories',
  'Bike Racks',
  'Hitch Rack - 4-Bike',
  4,
  45.0,
  120.0,
  238.0,
  180.0,
  418.0),
 (Timestamp('2016-05-15 00:00:00

Save the list as a text file.

In [11]:
file = open('Sales.txt', 'w')

for tuple in y:
    file.write(str(tuple) + ',' + '\n')

file.close()

### 4 Data Import
Import the csv file into MySQL database, thereby creating a new table. Go to the 'Sales.txt' file, click inside it and press CTRL + A to copy all the lines within it. In MySQL create a new table called 'sales':

In [None]:
INSERT INTO sales Values

Then paste all the lines copied from the text file 'Sales.txt'. On the last entry change the comma at the very end to a semi-colon ; . 

Run the query!

The file should have imported about 100,000 rows in just a few seconds.