# Task 1 - Exploratory Data Analysis (EDA)
- This notebook aims to present the findings of my analysis, and also a detailed walkthrough of the steps taken to derive the results.

# Data Pre-processing
## (a) Import libraries required

In [1]:
import matplotlib as plt
import pandas as pd
import seaborn as sb
import sqlite3

## (b) Use **sqlite3** to read **'cruise_post.db'** and **'cruise_pre.db'** into Python

### (bi) Define a 'create database connection' function

In [2]:
# Define a function that returns database connection object

def get_db_connection(url):
    conn = sqlite3.connect(url)        # points database connection object to conn
    conn.row_factory = sqlite3.Row     # set return rows as dictionaries instead of tuples
    return conn

### (bii) Read **cruise_pre.db** and **cruise_post.db** into Python

In [3]:
# Read cruise_pre.db and cruise_post.db into list of dictionaries

# Define required variables
table_name = ['cruise_pre','cruise_post']
cruise_pre_data = []
cruise_post_data = []

# Iterate through the 2 provided data sets: cruise_pre.db and cruise_post.db
for table in table_name:

    # Create connection to database
    conn = get_db_connection('data/'+ table + '.db')
    # Execute query to fetch all rows and columns
    data = conn.execute('SELECT * FROM {}'.format(table)).fetchall()

    # Append row to respective variable container
    if table == 'cruise_pre':
        for i in data:
            cruise_pre_data.append(dict(i))
    else:
        for i in data:
            cruise_post_data.append(dict(i))

# close database connection
conn.close()

# Output: cruise_pre_data and cruise_post_data now contains list of dictionaries

OperationalError: unable to open database file

### (biii) Convert list of dict to Pandas DataFrame
- As Pandas DataFrame objects, many data manipulation and analytics tools are made available to us
- One main advantage is being able to present data in pleasantly formatted tables
- As shown in 'Pre-cruise Customer Survey' and 'Post-cruise Customer Data'

In [None]:
# Read list of dictionaries into Pandas Dataframe
# Set 'index' column as index
cruise_post_df = pd.DataFrame(data=cruise_post_data).set_index('index')
cruise_pre_df = pd.DataFrame(data=cruise_pre_data).set_index('index')

### Pre-cruise Customer Survey
- This is a pre-purchase survey conducted to give ShipSail insights into what their customers prefer, or what is considered important to them for an enjoyable cruise journey.

In [None]:
# Display part of cruise pre dataset
cruise_pre_df.tail(3)

Unnamed: 0_level_0,Gender,Date of Birth,Source of Traffic,Onboard Wifi Service,Embarkation/Disembarkation time convenient,Ease of Online booking,Gate location,Logging,Onboard Dining Service,Online Check-in,Cabin Comfort,Onboard Entertainment,Cabin service,Baggage handling,Port Check-in Service,Onboard Service,Cleanliness,Ext_Intcode
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
133743,Male,23/10/2012,Direct - Email Marketing,Extremely important,5.0,5.0,5.0,31/08/2023 23:41,Extremely important,5.0,5.0,Extremely important,4.0,5.0,4.0,4.0,5.0,BL343MAXXIT
133744,Female,,Indirect - Search Engine,A little important,1.0,1.0,4.0,31/08/2023 23:43,,2.0,4.0,Very important,5.0,4.0,3.0,5.0,4.0,LB957GHIRBD
133745,Male,07/09/1996,Direct - Company Website,,,0.0,1.0,31/08/2023 23:44,,0.0,2.0,Extremely important,2.0,1.0,,1.0,5.0,LB539JAJHXJ


### Post-cruise Customer Data
- This is a post-trip data collected by ShipSail to provide insights into the services or products that customers chosen in reality.

In [None]:
# Display part of cruise post dataset
cruise_post_df.tail(3)

Unnamed: 0_level_0,Cruise Name,Ticket Type,Cruise Distance,Ext_Intcode,WiFi,Dining,Entertainment
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
133743,Blastoise,Luxury,-1947 KM,BL343MAXXIT,1.0,0,1.0
133744,Blastoise,Standard,1506 KM,LB957GHIRBD,,1,
133745,Lapras,Standard,80 KM,LB539JAJHXJ,,0,
