# EDA Laptops specs

![](https://image-cdn.hypb.st/https%3A%2F%2Fhypebeast.com%2Fimage%2F2020%2F05%2Falienware-area-51m-laptop-upgrade-2020-001.jpg?q=90&w=1400&cbr=1&fit=max)

# Introducción

Este es un conjunto de datos que contiene información sobre las especificaciones técnicas de diferentes modelos de portátiles.

# Objetivo

Este proyecto de análisis exploratorio de datos (EDA) tiene como objetivo comprender las características y tendencias presentes en este conjunto de datos.

# Hipótesis

A través de este EDA, se buscará responder preguntas como:

* ¿Cuáles son las configuraciones de hardware más comunes? (Procesador, RAM, almacenamiento)
* ¿Existe una relación entre el precio y las especificaciones técnicas?
* ¿Qué marcas de portátiles son más populares y cuáles son sus características distintivas?
* ¿Cómo han evolucionado las especificaciones de los portátiles a lo largo del tiempo?

|Feature|Descripción |tipo de dato |unidade de medida |
|---|--- |---- |--- |
|``Laptop_ID``|  Indice de los laptops |int |indice |
|``Company``| Nombre de la empresa fabricante |obj ||
|``TypeName``| Nombre comercial del laptop |obj||
|``Inches``| Especificación de pulgadas de pantalla |obj | |
|``ScreenResolution``| Resolución de pantalla |obj | |
|``Cpu``| Fabricante, modelo y Frecuencia de los CPU's |obj | GHz |
|``Ram``| Memoria ram instalada |obj| GB|
|``Memory``| Capacidad de almacienamiento y tipo de conexión del HD |obj |GB|
|``Gpu``| Fabricante y modelo del GPU |obj | |
|``OpSys``| Sistema operativo instalado |obj | |
|``Weight``| Peso del laptop |obj | kg|
|``Price_euros``| Precio del laptop |float | Euros |

# Importando Librerias

In [1]:
import os
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

In [2]:
# pd.set_option('display.max_rows', None)

In [3]:
root_path = Path(os.getcwd()).resolve().parent
sys.path.append(str(root_path))

In [4]:
root_path

WindowsPath('C:/Users/Felipe/Desktop/THE-BRIDGE/THEBRIDGE-BOOTCAMP-GITHUB/DS102024/0.2-Mis_ejer/laptops_dataset copy')

In [5]:
from scripts.auto_importer_laptops import AutoImporter, DataFrameDescriber, CompleteDescribeTable

# Carga de datos

In [6]:
data = pd.read_csv(r'../data/raw/test.csv', index_col=False)

In [7]:
data

Unnamed: 0,laptop_ID,Company,Product,TypeName,Inches,ScreenResolution,Cpu,Ram,Memory,Gpu,OpSys,Weight
0,539,Asus,Zenbook UX510UW-FI095T,Notebook,15.6,IPS Panel 4K Ultra HD 3840x2160,Intel Core i7 7500U 2.7GHz,8GB,256GB SSD + 1TB HDD,Nvidia GeForce GTX 960M,Windows 10,2kg
1,327,Asus,ZenBook UX410UA-GV183T,Notebook,14.0,Full HD 1920x1080,Intel Core i7 7500U 2.7GHz,8GB,256GB SSD,Intel HD Graphics 620,Windows 10,2kg
2,563,Mediacom,SmartBook 130,Notebook,13.3,IPS Panel Full HD 1920x1080,Intel Atom x5-Z8350 1.44GHz,4GB,32GB Flash Storage,Intel HD Graphics,Windows 10,1.35kg
3,13,Apple,MacBook Pro,Ultrabook,15.4,IPS Panel Retina Display 2880x1800,Intel Core i7 2.8GHz,16GB,256GB SSD,AMD Radeon Pro 555,macOS,1.83kg
4,935,HP,EliteBook 850,Ultrabook,15.6,Full HD 1920x1080,Intel Core i7 6500U 2.5GHz,8GB,256GB SSD,AMD Radeon R7 M365X,Windows 10,1.84kg
...,...,...,...,...,...,...,...,...,...,...,...,...
386,742,Lenovo,ThinkPad 13,Notebook,13.3,IPS Panel Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,Windows 10,1.44kg
387,660,Dell,XPS 13,Ultrabook,13.3,Full HD 1920x1080,Intel Core i5 8250U 1.6GHz,8GB,256GB SSD,Intel UHD Graphics 620,Windows 10,1.2kg
388,983,Lenovo,IdeaPad 310-15IKB,Notebook,15.6,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,6GB,256GB SSD,Nvidia GeForce 920MX,Windows 10,2.4kg
389,1137,HP,EliteBook 1040,Notebook,14.0,Full HD 1920x1080,Intel Core i5 6200U 2.3GHz,8GB,256GB SSD,Intel HD Graphics 520,Windows 7,1.43kg


# Primera exploracion

In [8]:
inspeccion_inicial = AutoImporter(data)

In [9]:
inspeccion_inicial.inspeccion_inicial()

=== TAMAÑO Y ESTRUCTURA DE LOS DATOS ===
Número total de registros: 391
Número de columnas: 12
Uso de memoria: 36.79 KB


=== TIPOS DE DATOS Y NOMBRES DE COLUMNAS ===
laptop_ID             int64
Company              object
Product              object
TypeName             object
Inches              float64
ScreenResolution     object
Cpu                  object
Ram                  object
Memory               object
Gpu                  object
OpSys                object
Weight               object
dtype: object


Información detallada del DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 391 entries, 0 to 390
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   laptop_ID         391 non-null    int64  
 1   Company           391 non-null    object 
 2   Product           391 non-null    object 
 3   TypeName          391 non-null    object 
 4   Inches            391 non-null    float64
 5   Scree

In [10]:
df_descripcion = DataFrameDescriber(data)

In [11]:
df_descripcion.describe_numerico()

Unnamed: 0,laptop_ID,Inches
count,391.0,391.0
mean,658.493606,15.157545
std,378.687395,1.412449
min,3.0,10.1
25%,335.0,14.0
50%,660.0,15.6
75%,983.5,15.6
max,1317.0,17.3


In [12]:
df_descripcion.describe_categorico()

Unnamed: 0,Company,Product,TypeName,ScreenResolution,Cpu,Ram,Memory,Gpu,OpSys,Weight
count,391,391,391,391,391,391,391,391,391,391
unique,16,261,6,28,66,8,26,69,9,125
top,Dell,XPS 13,Notebook,Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,Windows 10,2.2kg
freq,86,9,205,157,56,188,123,79,317,31


# Limpieza y preprocesado

In [13]:
#Eliminando la unidad de medida y dejando solo el valor numérico para transformar la columna de tipo objeto a numero flotante

data['Weight'] = data['Weight'].str.replace('kg','').astype(float)

In [14]:
#Eliminando la unidad de medida y dejando solo el valor numérico para transformar la columna de tipo objeto a numero intero

data['Ram'] = data['Ram'].str.replace('GB','').astype(int)

In [15]:
# Transformación columna inches de tipo objeto a numero flotante

data['Inches'] = data['Inches'].astype(float)

In [16]:
# Creando lista con el nombre de la fabricante de la CPU

cpu_brand_list = []
for cpu in data['Cpu']:
    cpu_brand_list.append(cpu.split()[0])

list_a = []
cpu_model_list = []
for cpu in data['Cpu']:
    list_a.append(cpu.split()[1:-1])

for output_model_cpu in list_a:
    cpu_model_list.append(' '.join(output_model_cpu))

cpu_Frec_list = []
for cpu in data['Cpu']:
    cpu_Frec_list.append(cpu.split()[-1].replace('GHz',''))

# Creando columna en el df para recibir los valores que provienen de la lista anterior

data.insert(7,'CPU_Brand', cpu_brand_list)
data.insert(8,'CPU_Model', cpu_model_list)
data.insert(9,'CPU_Frec', cpu_Frec_list)

#Eliminando la unidad de medida y dejando solo el valor numérico para transformar la columna de tipo objeto a numero intero

data['CPU_Frec'] = data['CPU_Frec'].astype(float)

# Drop columna CPU que dio origen a tres nuevas columnas que describen más específicamente cada procesador

data = data.drop(columns='Cpu')

In [17]:
# Creación de las columnas que recibirán los valores de resolución de pantalla

data[['Resolution_X', 'Resolution_Y']] = data['ScreenResolution'].str.extract(r'(\d+)x(\d+)').astype(int)

# Posicionamento de las columnas en sus lugares debidos

data.insert(5, 'Resolution_X', data.pop('Resolution_X'))
data.insert(6, 'Resolution_Y', data.pop('Resolution_Y'))

In [18]:
# Creación de la columna que recibirá los valores categóricos de resoluciones de pantalla
data['cat_resolution'] = data['Resolution_X'].astype(str)+'x'+data['Resolution_Y'].astype(str)

# Iterando sobre los datos para reemplazar los valores numéricos por valores categóricos en dicha columna
for index, n in enumerate(data['cat_resolution']):
    width, height = map(int, n.split('x'))

    if width >= 3840 and height >= 2160:
        data.loc[index, 'cat_resolution'] = "4K UHD"
    elif width >= 2560 and height >= 1440:
        data.loc[index, 'cat_resolution'] = "QHD"
    elif width >= 1920 and height >= 1080:
        data.loc[index, 'cat_resolution'] = "Full HD"
    elif width >= 1280 and height >= 720:
        data.loc[index, 'cat_resolution'] = "HD"
    else:
        data.loc[index, 'cat_resolution'] = "SD"

# Posicionando nueva columna en su lugar debido
data.insert(7, 'cat_resolution', data.pop('cat_resolution'))

In [19]:
# Creación de la columna Is_touch que asigna valor 1 para laptop con pantalla touchscreen y 0 para pantallas no touchscreen

data['Is_touch'] = data['ScreenResolution'].str.contains('Touch', case=False).astype(int)

# Posicionamento de la columna en su lugar debido

data.insert(8, 'Is_touch', data.pop('Is_touch'))

# # Eliminación de la columna inicial de resolución de pantalla que fue reemplazada por las columnas recién creadas

data = data.drop(columns='ScreenResolution')

In [20]:
# Cambio del valor en la columna TypeName para laptops 2 en 1

data.loc[data['TypeName'].str.contains('Convertible', case=False, na=False), 'TypeName'] = 'Convertible'

In [21]:
# Función para conversión de los valores de TB para GB
def convert_tb_to_gb(value):
    value = re.sub(r'(\d+(?:\.\d+)?)TB', lambda x: str(int(float(x.group(1)) * 1024)) + 'GB', value)
    return value

In [22]:
# Aplicando función para arreglar los valores de la columna memória
data['Memory'] = data['Memory'].apply(convert_tb_to_gb)

In [23]:
# Función para separar los tipos de HD en dos columnas, los SSD, Hybrid y Flash storage permanecerán en la columna SSD mientras que los HDD se quedarán en la columna HDD
def extract_storage(text, drive_type):
    match = re.search(r"(\d+)GB " + drive_type, text)
    if match:
        size_gb = int(match.group(1))
        if drive_type == "HDD" and "TB" in text:
            size_gb *= 1024  # Convert TB to GB for HDD
        return size_gb
    return None

# Aplicando función a la columna para separar los tipos de disco duro
data['SSD'] = data['Memory'].apply(lambda x: extract_storage(x, "SSD"))
data['HDD'] = data['Memory'].apply(lambda x: extract_storage(x, "HDD"))

data['SSD'] = data['SSD'].fillna(0)
data['HDD'] = data['HDD'].fillna(0)

data.insert(11, 'SSD', data.pop('SSD').astype(int))
data.insert(12, 'HDD', data.pop('HDD').astype(int))

# Eliminación de la columna original que dio origen a estas nuevas
data = data.drop(columns='Memory')

In [24]:
data

Unnamed: 0,laptop_ID,Company,Product,TypeName,Inches,Resolution_X,Resolution_Y,cat_resolution,Is_touch,CPU_Brand,CPU_Model,SSD,HDD,CPU_Frec,Ram,Gpu,OpSys,Weight
0,539,Asus,Zenbook UX510UW-FI095T,Notebook,15.6,3840,2160,4K UHD,0,Intel,Core i7 7500U,256,1024,2.70,8,Nvidia GeForce GTX 960M,Windows 10,2.00
1,327,Asus,ZenBook UX410UA-GV183T,Notebook,14.0,1920,1080,Full HD,0,Intel,Core i7 7500U,256,0,2.70,8,Intel HD Graphics 620,Windows 10,2.00
2,563,Mediacom,SmartBook 130,Notebook,13.3,1920,1080,Full HD,0,Intel,Atom x5-Z8350,0,0,1.44,4,Intel HD Graphics,Windows 10,1.35
3,13,Apple,MacBook Pro,Ultrabook,15.4,2880,1800,QHD,0,Intel,Core i7,256,0,2.80,16,AMD Radeon Pro 555,macOS,1.83
4,935,HP,EliteBook 850,Ultrabook,15.6,1920,1080,Full HD,0,Intel,Core i7 6500U,256,0,2.50,8,AMD Radeon R7 M365X,Windows 10,1.84
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
386,742,Lenovo,ThinkPad 13,Notebook,13.3,1920,1080,Full HD,0,Intel,Core i5 7200U,256,0,2.50,8,Intel HD Graphics 620,Windows 10,1.44
387,660,Dell,XPS 13,Ultrabook,13.3,1920,1080,Full HD,0,Intel,Core i5 8250U,256,0,1.60,8,Intel UHD Graphics 620,Windows 10,1.20
388,983,Lenovo,IdeaPad 310-15IKB,Notebook,15.6,1920,1080,Full HD,0,Intel,Core i5 7200U,256,0,2.50,6,Nvidia GeForce 920MX,Windows 10,2.40
389,1137,HP,EliteBook 1040,Notebook,14.0,1920,1080,Full HD,0,Intel,Core i5 6200U,256,0,2.30,8,Intel HD Graphics 520,Windows 7,1.43


In [25]:
# Creación de la lista con los nombres de la marca del GPU

gpu_brand = []
for gpu in data['Gpu']:
    gpu_brand.append(gpu.split()[0])

# Creación de la lista con los nombres de modelos del GPU
gpu_model = []
gpu_model_list = []
for gpu in data['Gpu']:
    gpu_model.append(gpu.split()[1:])

for output_model_gpu in gpu_model:
    gpu_model_list.append(' '.join(output_model_gpu))

# Creación de las nuevas columnas posicionándolas en sus lugares debidos

data.insert(16,'GPU_Brand', gpu_brand)
data.insert(17,'GPU_Model', gpu_model_list)

# Eliminación de la columna inicial GPU que fue reemplazada por las nuevas columnas recién creadas

data = data.drop(columns='Gpu')

In [26]:
# Creación y posicionamento de la columna que informa se el GPU es dedicado o integrado

data['Is_GPU_dedicated'] = np.where(data['GPU_Brand'].str.contains('Intel', case=False),0,1).astype(int)

data.insert(17, 'Is_GPU_dedicated', data.pop('Is_GPU_dedicated'))

In [27]:
# Arreglando valores de la columna OpSys excluyendo las versiones y dejando solo el nombre de la empresa fabricante del OS en uso

def categorize_os(os):
    if 'Windows' in os:
        return 'Windows'
    elif 'macOS' in os:
        return 'macOS'
    elif 'Mac OS' in os:
        return 'macOS'
    elif 'Linux' in os:
        return 'Linux'
    else:
        return 'Chrome OS'

data['OpSys'] = data['OpSys'].apply(categorize_os)

In [28]:
data

Unnamed: 0,laptop_ID,Company,Product,TypeName,Inches,Resolution_X,Resolution_Y,cat_resolution,Is_touch,CPU_Brand,CPU_Model,SSD,HDD,CPU_Frec,Ram,GPU_Brand,GPU_Model,Is_GPU_dedicated,OpSys,Weight
0,539,Asus,Zenbook UX510UW-FI095T,Notebook,15.6,3840,2160,4K UHD,0,Intel,Core i7 7500U,256,1024,2.70,8,Nvidia,GeForce GTX 960M,1,Windows,2.00
1,327,Asus,ZenBook UX410UA-GV183T,Notebook,14.0,1920,1080,Full HD,0,Intel,Core i7 7500U,256,0,2.70,8,Intel,HD Graphics 620,0,Windows,2.00
2,563,Mediacom,SmartBook 130,Notebook,13.3,1920,1080,Full HD,0,Intel,Atom x5-Z8350,0,0,1.44,4,Intel,HD Graphics,0,Windows,1.35
3,13,Apple,MacBook Pro,Ultrabook,15.4,2880,1800,QHD,0,Intel,Core i7,256,0,2.80,16,AMD,Radeon Pro 555,1,macOS,1.83
4,935,HP,EliteBook 850,Ultrabook,15.6,1920,1080,Full HD,0,Intel,Core i7 6500U,256,0,2.50,8,AMD,Radeon R7 M365X,1,Windows,1.84
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
386,742,Lenovo,ThinkPad 13,Notebook,13.3,1920,1080,Full HD,0,Intel,Core i5 7200U,256,0,2.50,8,Intel,HD Graphics 620,0,Windows,1.44
387,660,Dell,XPS 13,Ultrabook,13.3,1920,1080,Full HD,0,Intel,Core i5 8250U,256,0,1.60,8,Intel,UHD Graphics 620,0,Windows,1.20
388,983,Lenovo,IdeaPad 310-15IKB,Notebook,15.6,1920,1080,Full HD,0,Intel,Core i5 7200U,256,0,2.50,6,Nvidia,GeForce 920MX,1,Windows,2.40
389,1137,HP,EliteBook 1040,Notebook,14.0,1920,1080,Full HD,0,Intel,Core i5 6200U,256,0,2.30,8,Intel,HD Graphics 520,0,Windows,1.43


In [29]:
data.to_csv('../data/processed/test.csv', index=False)

----------------------

In [30]:
from sklearn.preprocessing import OneHotEncoder

In [31]:
type_temp = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

type_temp.fit(data[['TypeName']])

data_Type = type_temp.transform(data[['TypeName']])

data_Type = pd.DataFrame(data_Type, columns=type_temp.get_feature_names_out(['TypeName']))

data = pd.concat([data, data_Type], axis=1)

data = data.drop(columns='TypeName')

In [32]:
cpu_brand_temp = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

cpu_brand_temp.fit(data[['CPU_Brand']])

data_cpu_brand = cpu_brand_temp.transform(data[['CPU_Brand']])

# data_cpu_brand = pd.DataFrame(data_cpu_brand, columns=cpu_brand_temp.categories_[0].ravel())
data_cpu_brand = pd.DataFrame(data_cpu_brand, columns=cpu_brand_temp.get_feature_names_out(['CPU_Brand']))

data = pd.concat([data, data_cpu_brand], axis=1)

data.drop(columns='CPU_Brand', inplace=True)

In [33]:
# gpu_base_clocks = {
#     'Radeon R7 M445': 0.780,
#     'HD Graphics 620': 0.300,
#     'HD Graphics 615': 0.300,
#     'UHD Graphics 620': 0.300,
#     'GeForce GTX 1050 Ti': 1.354,
#     'GeForce 920MX': 0.965,
#     'GeForce GTX 1060': 1.404,
#     'GeForce 940MX': 1.004,
#     'GeForce GTX 1050': 1.354,
#     'HD Graphics 520': 0.300,
#     'Radeon R5 M330': 1.030,
#     'GeForce 930MX': 0.928,
#     'HD Graphics 400': 0.300,
#     'Radeon R4': 0.800,
#     'Radeon 520': 0.650,
#     'GeForce GTX 1070': 1.442,
#     'GeForce GTX 1050M': 1.354,
#     'Quadro M1200': 1.093,
#     'GeForce GTX 950M': 0.914,
#     'GeForce 150MX': 1.468,
#     'HD Graphics 500': 0.300,
#     'Radeon R5 M430': 1.030,
#     'Radeon R5 520': 0.650,
#     'HD Graphics 515': 0.300,
#     'GeForce GTX 965M': 0.944,
#     'Radeon 530': 0.730,
#     'GeForce GT 940MX': 1.004,
#     'GeForce GTX 940MX': 1.004,
#     'HD Graphics': 0.300,
#     'Radeon RX 540': 1.100,
#     'Radeon R5': 0.800,
#     'GeForce GTX 960M': 1.097,
#     'GeForce 920': 0.954,
#     'GeForce GTX 980': 1.126,
#     'HD Graphics 505': 0.300,
#     'Radeon RX 560': 1.175,
#     'Iris Pro Graphics': 0.300,
#     'Quadro M1000M': 1.029,
#     'GeForce MX130': 1.122,
#     'Quadro M620': 1.071,
#     'Radeon R2': 0.500,
#     'Quadro M520M': 0.965,
#     'Radeon R5 M420': 1.030,
#     'GeForce MX150': 1.468,
#     'Radeon R4 Graphics': 0.800,
#     'GTX 980 SLI': 1.126,
#     'Graphics 620': 0.300,
#     'HD Graphics 405': 0.300,
#     'GeForce GTX1050 Ti': 1.354,
#     'Iris Graphics 540': 0.300,
#     'GeForce 930M': 0.928,
#     'R17M-M1-70': 900,
#     'FirePro W4190M': 0.825,
#     'Radeon R7 M440': 0.780,
#     'Radeon Pro 560': 0.907,
#     'Quadro M2200': 1.187,
#     'Quadro M2000M': 1.013,
#     'GeForce GTX 980M': 1.038,
#     'Iris Plus Graphics 640': 0.300,
#     'Quadro M2200M': 1.187,
#     'HD Graphics 5300': 0.300,
#     'GeForce GTX 1080': 1.607,
#     'Quadro M500M': 0.965,
#     'Radeon RX 580': 1.257,
#     'HD Graphics 6000': 0.300,
#     'Radeon R5 M420X': 1.030,
#     'Iris Graphics 550': 0.300,
#     'Radeon R3': 0.800,
#     'Radeon RX 550': 1.100,
#     'Iris Plus Graphics 650': 0.300,
#     'Radeon R2 Graphics': 0.500,
#     'GeForce GTX1060': 1.404,
#     'GeForce 940M': 1.072,
#     'GeForce GTX1080': 1.607,
#     'Radeon R7 M460': 0.780,
#     'Quadro M620M': 1.071,
#     'Radeon R5 M315': 0.925,
#     'HD Graphics 630': 0.300,
#     'GeForce 920M': 0.954,
#     'Radeon R7 M360': 1.120,
#     'Quadro M3000M': 1.013,
#     'FirePro W6150M': 0.715,
#     'HD Graphics 530': 0.300,
#     'Radeon R5 430': 0.720,
#     'FirePro W5130M': 0.925,
#     'GeForce GTX 940M': 1.072,
#     'GeForce GTX 1070M': 1.442
# }

In [34]:
# data['GPU_Frec'] = data['GPU_Model'].map(gpu_base_clocks)

In [35]:
data

Unnamed: 0,laptop_ID,Company,Product,Inches,Resolution_X,Resolution_Y,cat_resolution,Is_touch,CPU_Model,SSD,...,Weight,TypeName_Convertible,TypeName_Gaming,TypeName_Netbook,TypeName_Notebook,TypeName_Ultrabook,TypeName_Workstation,CPU_Brand_AMD,CPU_Brand_Intel,CPU_Brand_Samsung
0,539,Asus,Zenbook UX510UW-FI095T,15.6,3840,2160,4K UHD,0,Core i7 7500U,256,...,2.00,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
1,327,Asus,ZenBook UX410UA-GV183T,14.0,1920,1080,Full HD,0,Core i7 7500U,256,...,2.00,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
2,563,Mediacom,SmartBook 130,13.3,1920,1080,Full HD,0,Atom x5-Z8350,0,...,1.35,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
3,13,Apple,MacBook Pro,15.4,2880,1800,QHD,0,Core i7,256,...,1.83,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
4,935,HP,EliteBook 850,15.6,1920,1080,Full HD,0,Core i7 6500U,256,...,1.84,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
386,742,Lenovo,ThinkPad 13,13.3,1920,1080,Full HD,0,Core i5 7200U,256,...,1.44,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
387,660,Dell,XPS 13,13.3,1920,1080,Full HD,0,Core i5 8250U,256,...,1.20,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
388,983,Lenovo,IdeaPad 310-15IKB,15.6,1920,1080,Full HD,0,Core i5 7200U,256,...,2.40,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
389,1137,HP,EliteBook 1040,14.0,1920,1080,Full HD,0,Core i5 6200U,256,...,1.43,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0


In [36]:
gpu_brand_temp = OneHotEncoder(handle_unknown='ignore', sparse_output=False)

gpu_brand_temp.fit(data[['GPU_Brand']])

data_gpu_brand = gpu_brand_temp.transform(data[['GPU_Brand']])

# data_cpu_brand = pd.DataFrame(data_gpu_brand, columns=cpu_brand_temp.categories_[0].ravel())
data_gpu_brand = pd.DataFrame(data_gpu_brand, columns=gpu_brand_temp.get_feature_names_out(['GPU_Brand']))

data = pd.concat([data, data_gpu_brand], axis=1)

data.drop(columns='GPU_Brand', inplace=True)

In [37]:
data.insert(27, 'Price_euros', data.pop('Price_euros'))

KeyError: 'Price_euros'

In [34]:
data

Unnamed: 0,laptop_ID,Company,Product,Inches,Resolution_X,Resolution_Y,Is_touch,CPU_Model,CPU_Frec,SSD,...,TypeName_Notebook,TypeName_Ultrabook,TypeName_Workstation,CPU_Brand_AMD,CPU_Brand_Intel,CPU_Brand_Samsung,GPU_Brand_AMD,GPU_Brand_ARM,GPU_Brand_Intel,GPU_Brand_Nvidia
0,539,Asus,Zenbook UX510UW-FI095T,15.6,3840,2160,0,Core i7 7500U,2.70,256,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
1,327,Asus,ZenBook UX410UA-GV183T,14.0,1920,1080,0,Core i7 7500U,2.70,256,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
2,563,Mediacom,SmartBook 130,13.3,1920,1080,0,Atom x5-Z8350,1.44,0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
3,13,Apple,MacBook Pro,15.4,2880,1800,0,Core i7,2.80,256,...,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
4,935,HP,EliteBook 850,15.6,1920,1080,0,Core i7 6500U,2.50,256,...,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
386,742,Lenovo,ThinkPad 13,13.3,1920,1080,0,Core i5 7200U,2.50,256,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
387,660,Dell,XPS 13,13.3,1920,1080,0,Core i5 8250U,1.60,256,...,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
388,983,Lenovo,IdeaPad 310-15IKB,15.6,1920,1080,0,Core i5 7200U,2.50,256,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
389,1137,HP,EliteBook 1040,14.0,1920,1080,0,Core i5 6200U,2.30,256,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0


In [35]:
data.to_csv('../data/processed/test.csv', index=False)

submission_1 = train = train[['Inches','Ram', 'SSD', 'HDD', 'Is_GPU_dedicated', 'Weight', 'Price_euros']]  
submission_2 = train = train[['Resolution_X','Resolution_Y', 'Ram', 'CPU_Frec', 'SSD', 'Price_euros']]  
submission_3 = train = train[[ 'Ram', 'CPU_Frec', 'SSD','Price_euros']]  
submission_4 = train = train[['Resolution_X', 'Resolution_Y' ,'Ram', 'CPU_Frec', 'HDD' ,'Price_euros']]  
submission_5 = train = train[['Resolution_X', 'Resolution_Y' ,'Ram', 'CPU_Frec', 'SSD' , 'CPU_Brand_Intel','TypeName_Gaming', 'Price_euros']]  
submission_6 = train = train[['Resolution_X','Resolution_Y','Ram','SSD','Price_euros']]