# eBay Laptop Analysis

This is a data exploration and analysis project that:
- Utilizes a comprehensive dataset sourced from Kaggle
- Aims to delve into the world of laptop listings on eBay
- Investigates various attributes such as brand, price, ratings, and specifications
- Employs data visualization, statistical analysis, and potentially predictive modeling
- Seeks to uncover insights into trends, patterns, and factors influencing laptop sales and prices<br>

I invite you to follow along as we conduct a thorough examination of this rich dataset to gain a deeper understanding of the dynamics within the laptop market

## Step 1: Data Understanding and Exploration

- Importing the necessary libraries

In [1]:
import os
import pandas as pd

- Reading the Data. There are two files to open and explore.

In [2]:
# Directory path
directory = 'data'

# List files in the directory
files = os.listdir(directory)

# Print the list of files
print("Files in 'data' folder:")
for file in files:
    print(file)

Files in 'data' folder:
EbayPcLaptopsAndNetbooksClean.csv
EbayPcLaptopsAndNetbooksUnclean.csv


- Opening the first file and assigning it a dataframe

In [3]:
# File path
file_path = 'data/EbayPcLaptopsAndNetbooksClean.csv'

# Load the CSV file into a DataFrame
df = pd.read_csv(file_path)

# Display the first few rows of the DataFrame
print(df.head())

   Brand   Price Currency  Color  \
0  other  303.80        $   gray   
1   dell  400.00        $  black   
2   dell  175.00        $  black   
3     hp   85.00        $  black   
4   dell  101.25        $  other   

                                            Features                Condition  \
0  Backlit Keyboard,  Built-in Microphone,  Built...                      New   
1  Backlit Keyboard, Bluetooth, Built-in Micropho...  Very Good - Refurbished   
2  10/100 LAN Card, Backlit Keyboard, Bluetooth, ...                     Used   
3  Bluetooth, Built-in Microphone, Built-in Webca...       Good - Refurbished   
4  10/100 LAN Card, Built-in Microphone, Built-in...       Good - Refurbished   

                               Condition Description  \
0  A brand-new, unused, unopened, undamaged item ...   
1  The item shows minimal wear and is backed by a...   
2  An item that has been used previously. The ite...   
3  The item shows moderate wear and is backed by ...   
4  The item show

Having printed out the first few rows of the DataFrame using the .head() method gave us a glimpse of the data's structure and the values in each column. Here's a brief overview of the columns in the DataFrame based on the printed output:

- <b>Brand:</b> The manufacturer or company that produces the laptop.
- <b>Price:</b> The cost of the laptop in USD.
- <b>Currency:</b> The currency symbol (always '$' in this case).
- <b>Color:</b> The color of the laptop.
- <b>Features:</b> Additional functionalities or specifications of the laptop.
- <b>Condition:</b> The state of the laptop (e.g., New, Used, Refurbished).
- <b>Condition Description:</b> Description of the laptop's condition.
- <b>Seller Note:</b> Additional information or notes provided by the seller regarding the laptop.
- <b>GPU:</b> The graphics processing unit responsible for rendering images and videos.
- <b>Processor:</b> The central processing unit (CPU) that powers the laptop.
- <b>Release Year:</b> The year the laptop model was released.
- <b>Maximum Resolution:</b> The highest display resolution supported by the laptop.
- <b>OS:</b> The operating system installed on the laptop.
- <b>Storage Type:</b> The type of storage technology used (e.g., SSD, HDD, eMMC).
- <b>Hard Drive Capacity:</b> The storage capacity of the traditional hard disk drive (HDD) in gigabytes (GB).
- <b>Hard Drive Capacity Unit:</b> Unit of measurement for HDD capacity (always 'gb' in this case).
- <b>SSD Capacity:</b> The storage capacity of the solid-state drive (SSD) in gigabytes (GB).
- <b>SSD Capacity Unit:</b> Unit of measurement for SSD capacity (can be 'tb' or 'gb').
- <b>Screen Size (inch):</b> The diagonal measurement of the laptop screen in inches.
- <b>Ram Size:</b> The amount of random access memory (RAM) in gigabytes (GB).
- <b>Ram Size Unit:</b> Unit of measurement for RAM size (always 'gb' in this case).

## Check for Data Types

Ensure that each column has the correct data type. For example, numerical columns should be stored as integers or floats, and categorical columns should be stored as strings or categories.

In [4]:
# Check the data types of each column
print(df.dtypes)

Brand                        object
Price                       float64
Currency                     object
Color                        object
Features                     object
Condition                    object
Condition Description        object
Seller Note                  object
GPU                          object
Processor                    object
Processor Speed              object
Processor Speed Unit         object
Type                         object
Width of the Display        float64
Height of the Display       float64
OS                           object
Storage Type                 object
Hard Drive Capacity         float64
Hard Drive Capacity Unit     object
SSD Capacity                float64
SSD Capacity Unit            object
Screen Size (inch)           object
Ram Size                    float64
Ram Size Unit                object
dtype: object


Now we will convert the 'Screen Size (inch)', 'Processor Speed', and 'Ram Size" columns to float64.

In [5]:
# Convert 'Screen Size (inch)' column to numeric, coerce errors to NaN
# df['Screen Size (inch)'] = pd.to_numeric(df['Screen Size (inch)'], errors='coerce')

# Identify rows with NaN values in 'Screen Size (inch)'
# problematic_rows = df[df['Screen Size (inch)'].isna()]

# Display the problematic rows
# print(problematic_rows[['Screen Size (inch)']])

In [8]:
# Convert 'Screen Size (inch)' column to float64
df['Screen Size (inch)'] = pd.to_numeric(df['Screen Size (inch)'], errors='coerce')

In [7]:
# Convert 'Processor Speed' column to float64
df['Processor Speed'] = pd.to_numeric(df['Processor Speed'], errors='coerce')

# Convert 'Ram Size' column to float64
df['Ram Size'] = pd.to_numeric(df['Ram Size'], errors='coerce')

# Verify the data types after conversion
print(df.dtypes)

Brand                        object
Price                       float64
Currency                     object
Color                        object
Features                     object
Condition                    object
Condition Description        object
Seller Note                  object
GPU                          object
Processor                    object
Processor Speed             float64
Processor Speed Unit         object
Type                         object
Width of the Display        float64
Height of the Display       float64
OS                           object
Storage Type                 object
Hard Drive Capacity         float64
Hard Drive Capacity Unit     object
SSD Capacity                float64
SSD Capacity Unit            object
Screen Size (inch)          float64
Ram Size                    float64
Ram Size Unit                object
dtype: object
