 <a name='ind'/>

## <ins>Indice</ins>:

#### 0- [Importaciones](#imp)
#### 1- [Exploración](#exp)
#### 2- [Lompieza](#clean)
#### 3- [Carga](#load)
***

En este notebook prepararemos el [dataset](../data/train.csv) que se nos ha proporcionado para poder aplicar un modelo de Machine Learning, que nos permita predecir el precio de un ordenador.

### Importaciones
***

In [76]:
import pandas as pd
import numpy as np
pd.options.plotting.backend = "plotly"
pd.set_option('display.max_columns', None) 

import re

<a name='exp'/>

###### ⬆️ [Indice](#ind)

### Exploración
***

Vamos a revisar la estructura de los datos, y sus tipos y obtener unas primeras conclusiones para iniciar la limpieza

In [61]:
data = pd.read_csv('../data/train.csv')

In [62]:
data.head()

Unnamed: 0,Manufacturer,Model Name,Category,Screen Size,Screen,CPU,RAM,Storage,GPU,Operating System,Operating System Version,Weight,Price
0,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8GB,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37kg,11912523.48
1,Apple,Macbook Air,Ultrabook,"13.3""",1440x900,Intel Core i5 1.8GHz,8GB,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34kg,7993374.48
2,HP,250 G6,Notebook,"15.6""",Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,No OS,,1.86kg,5112900.0
3,Apple,MacBook Pro,Ultrabook,"15.4""",IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16GB,512GB SSD,AMD Radeon Pro 455,macOS,,1.83kg,22563005.4
4,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8GB,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37kg,16037611.2


In [63]:
# Formateamos columnas

columns = list(map(lambda x:x.strip(),data.columns))
data.columns = list(map(lambda x:x.replace(' ','_'),columns))

In [64]:
data.head()

Unnamed: 0,Manufacturer,Model_Name,Category,Screen_Size,Screen,CPU,RAM,Storage,GPU,Operating_System,Operating_System_Version,Weight,Price
0,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8GB,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37kg,11912523.48
1,Apple,Macbook Air,Ultrabook,"13.3""",1440x900,Intel Core i5 1.8GHz,8GB,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34kg,7993374.48
2,HP,250 G6,Notebook,"15.6""",Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8GB,256GB SSD,Intel HD Graphics 620,No OS,,1.86kg,5112900.0
3,Apple,MacBook Pro,Ultrabook,"15.4""",IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16GB,512GB SSD,AMD Radeon Pro 455,macOS,,1.83kg,22563005.4
4,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8GB,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37kg,16037611.2


In [65]:
data.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 977 entries, 0 to 976
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Manufacturer              977 non-null    object 
 1   Model_Name                977 non-null    object 
 2   Category                  977 non-null    object 
 3   Screen_Size               977 non-null    object 
 4   Screen                    977 non-null    object 
 5   CPU                       977 non-null    object 
 6   RAM                       977 non-null    object 
 7   Storage                   977 non-null    object 
 8   GPU                       977 non-null    object 
 9   Operating_System          977 non-null    object 
 10  Operating_System_Version  841 non-null    object 
 11  Weight                    977 non-null    object 
 12  Price                     977 non-null    float64
dtypes: float64(1), object(12)
memory usage: 780.0 KB


In [66]:
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Price,977.0,10018990.0,6306430.0,1706374.8,5326308.0,8527428.0,13115700.0,54232308.0


In [67]:
data.describe(include='object').T

Unnamed: 0,count,unique,top,freq
Manufacturer,977,19,Dell,232
Model_Name,977,488,XPS 13,22
Category,977,6,Notebook,549
Screen_Size,977,18,"15.6""",494
Screen,977,38,Full HD 1920x1080,397
CPU,977,106,Intel Core i5 7200U 2.5GHz,151
RAM,977,8,8GB,483
Storage,977,36,256GB SSD,318
GPU,977,98,Intel HD Graphics 620,225
Operating_System,977,7,Windows,837


### <ins>Primeras impresiones</ins>:

- Solo tenemos nulos en la columna versión
- Weight la pasaremos a float. Revisando que estén todos los pesos en las mismas unidades
- RAM y Storage, se pueden pasar a int, si todas están en GB o MB, si no hay que hacer la conversión adecuada
- Hay mucha varaición en el precio. ¿Outliers?
- En CPU podemos explosionarla en valores como marca, nucleos GHZ y versión
- Category admite también one-hot encoding, solo 6 categorías
- Model Name es más peleaguda, si la pasamos a one-hot encoding sería recomendable bajar dimensiones
- Manufacturer 19 valores uno no debe ser más importante que otro
- Sería interesante ver como se comporta el modelo si aplicamos una clusterización y luego un aplicamos un modelo de regresión

<a name='exp'/>

###### ⬆️ [Indice](#ind)

### Limpieza
***

Gestionaremos nulos y cambiaremos tipos de datos de cada columna

#### Columnas susceptibles de paasar a numérico

In [68]:
# Columna Wieight. Parace que todo está en Kg
data.Weight.unique()

array(['1.37kg', '1.34kg', '1.86kg', '1.83kg', '2.1kg', '2.04kg', '1.3kg',
       '1.6kg', '2.2kg', '0.92kg', '1.22kg', '0.98kg', '2.5kg', '1.62kg',
       '1.91kg', '2.3kg', '1.35kg', '1.88kg', '1.89kg', '1.65kg',
       '2.71kg', '1.2kg', '1.44kg', '2.8kg', '2kg', '2.65kg', '2.77kg',
       '3.2kg', '0.69kg', '1.49kg', '2.4kg', '2.13kg', '2.43kg', '1.7kg',
       '1.4kg', '1.8kg', '1.9kg', '3kg', '1.252kg', '2.7kg', '2.02kg',
       '1.63kg', '1.96kg', '1.21kg', '2.45kg', '1.25kg', '1.5kg',
       '2.62kg', '1.38kg', '1.58kg', '1.85kg', '1.23kg', '1.26kg',
       '2.16kg', '2.36kg', '2.05kg', '1.32kg', '1.75kg', '0.97kg',
       '2.9kg', '2.56kg', '1.48kg', '1.74kg', '1.1kg', '1.56kg', '2.03kg',
       '1.05kg', '4.4kg', '1.90kg', '1.29kg', '2.0kg', '1.95kg', '2.06kg',
       '1.12kg', '1.42kg', '3.49kg', '3.35kg', '2.23kg', '4.42kg',
       '2.69kg', '2.37kg', '4.7kg', '3.6kg', '2.08kg', '4.3kg', '1.68kg',
       '1.41kg', '4.14kg', '2.18kg', '2.24kg', '2.67kg', '2.14kg',
       '1.

In [69]:
data.Weight.apply(lambda x: x[-2:] != 'kg').sum()

0

In [70]:
data.Weight = data.Weight.apply(lambda x: float(x[:-2]))

In [71]:
# Columna RAM
data.RAM.unique()

array(['8GB', '16GB', '4GB', '2GB', '12GB', '6GB', '32GB', '24GB'],
      dtype=object)

In [72]:
data.RAM = data.RAM.apply(lambda x: np.int8(x[:-2]))

In [100]:
# Columna Storage

data[data.Storage.str.contains(r'+', regex=False)]['Storage'].unique()

array(['128GB SSD +  1TB HDD', '256GB SSD +  256GB SSD',
       '256GB SSD +  1TB HDD', '256GB SSD +  2TB HDD',
       '512GB SSD +  1TB HDD', '256GB SSD +  500GB HDD',
       '128GB SSD +  2TB HDD', '512GB SSD +  512GB SSD',
       '512GB SSD +  256GB SSD', '512GB SSD +  2TB HDD',
       '64GB Flash Storage +  1TB HDD', '1TB HDD +  1TB HDD',
       '1TB SSD +  1TB HDD'], dtype=object)

Esta columna tiene algunos valores que tenemos que tratar con cuidado. Podemos crear una columna con el valor total de almacenamiento, y otra con el tipo de almacenamiento. Hay que tener cuidado porque tenemos GB y TB

In [84]:
# Creamos función para obtener la parte numérica de la columna
def get_numeric(text):
    patron = r'\d+'                            # Patrón para encontrar una o más secuencias de dígitos
    matchs = re.findall(patron, text)
    numeros = [int(m) for m in matchs]         # Convertir las coincidencias a enteros
    return sum(numeros)

In [102]:
# Creamos función para obtener el dato en las mismas unidades
def get_storage(x):
    groups = x.split(' + ')
    storage= 0
    for group in groups:
        if 'GB' in group:
            storage += get_numeric(group)
        elif 'TB' in group:
            storage += get_numeric(group)*1024
    return storage

In [104]:
data['Storage_Capacity'] = data.Storage.apply(get_storage)

In [124]:
# Nos interesa obtener el tipo de disco duro de cada ordenador
def get_disk_type(x):
    groups = x.split(' + ')
    patron = r"(?<=B\s).+"
    tipo = []
    for group in groups:
        tipo.append(re.findall(patron,group)[0])
    return ' + '.join(tipo)

In [126]:
data['Storage_Type'] = data.Storage.apply(get_disk_type)

In [127]:
data.Storage_Type.unique()

array(['SSD', 'Flash Storage', 'HDD', 'SSD + HDD', 'SSD + SSD', 'Hybrid',
       'Flash Storage + HDD', 'HDD + HDD'], dtype=object)

In [128]:
# Unificamos tipos para evitar duplicidades

data.loc[data.Storage_Type == 'SSD + SSD','Storage_Type'] = 'SSD' 
data.loc[data.Storage_Type == 'HDD + HDD','Storage_Type'] = 'HDD'

In [129]:
data.Storage_Type.unique()

array(['SSD', 'Flash Storage', 'HDD', 'SSD + HDD', 'Hybrid',
       'Flash Storage + HDD'], dtype=object)

In [130]:
data.head()

Unnamed: 0,Manufacturer,Model_Name,Category,Screen_Size,Screen,CPU,RAM,Storage,GPU,Operating_System,Operating_System_Version,Weight,Price,Storage_Capacity,Storage_Type
0,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37,11912523.48,128,SSD
1,Apple,Macbook Air,Ultrabook,"13.3""",1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34,7993374.48,128,Flash Storage
2,HP,250 G6,Notebook,"15.6""",Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,,1.86,5112900.0,256,SSD
3,Apple,MacBook Pro,Ultrabook,"15.4""",IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,,1.83,22563005.4,512,SSD
4,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37,16037611.2,256,SSD


In [165]:
# Columna CPU
data.CPU.unique()

array(['Intel Core i5 2.3GHz', 'Intel Core i5 1.8GHz',
       'Intel Core i5 7200U 2.5GHz', 'Intel Core i7 2.7GHz',
       'Intel Core i5 3.1GHz', 'AMD A9-Series 9420 3GHz',
       'Intel Core i7 2.2GHz', 'Intel Core i7 8550U 1.8GHz',
       'Intel Core i5 8250U 1.6GHz', 'Intel Core i3 6006U 2GHz',
       'Intel Core i7 2.8GHz', 'Intel Core M m3 1.2GHz',
       'Intel Core i7 7500U 2.7GHz', 'Intel Core i7 2.9GHz',
       'Intel Core i3 7100U 2.4GHz', 'Intel Atom x5-Z8350 1.44GHz',
       'Intel Core i5 7300HQ 2.5GHz', 'AMD E-Series E2-9000e 1.5GHz',
       'Intel Core i5 1.6GHz', 'Intel Core i7 8650U 1.9GHz',
       'Intel Atom x5-Z8300 1.44GHz', 'AMD E-Series E2-6110 1.5GHz',
       'AMD A6-Series 9220 2.5GHz',
       'Intel Celeron Dual Core N3350 1.1GHz',
       'Intel Core i3 7130U 2.7GHz', 'Intel Core i7 7700HQ 2.8GHz',
       'Intel Core i5 2.0GHz', 'AMD Ryzen 1700 3GHz',
       'Intel Pentium Quad Core N4200 1.1GHz',
       'Intel Atom x5-Z8550 1.44GHz',
       'Intel Celeron Du

In [141]:
# Vemos que la mayoría son intel. Comprobamos los que no son intel
data[~data.CPU.str.contains('Intel')]['CPU'].unique()

array(['AMD A9-Series 9420 3GHz', 'AMD E-Series E2-9000e 1.5GHz',
       'AMD E-Series E2-6110 1.5GHz', 'AMD A6-Series 9220 2.5GHz',
       'AMD Ryzen 1700 3GHz', 'AMD FX 9830P 3GHz',
       'AMD E-Series 6110 1.5GHz', 'AMD A6-Series 9220 2.9GHz',
       'AMD E-Series 9000e 1.5GHz', 'AMD A10-Series A10-9620P 2.5GHz',
       'AMD A6-Series A6-9220 2.5GHz', 'AMD A10-Series 9600P 2.4GHz',
       'AMD A8-Series 7410 2.2GHz', 'AMD A12-Series 9720P 2.7GHz',
       'AMD A12-Series 9720P 3.6GHz', 'AMD Ryzen 1600 3.2GHz',
       'AMD A10-Series 9620P 2.5GHz', 'AMD E-Series 7110 1.8GHz',
       'AMD A9-Series A9-9420 3GHz', 'AMD E-Series E2-9000 2.2GHz',
       'AMD A9-Series 9420 2.9GHz', 'AMD A6-Series 7310 2GHz',
       'AMD A12-Series 9700P 2.5GHz', 'AMD A4-Series 7210 2.2GHz',
       'AMD FX 8800P 2.1GHz'], dtype=object)

Tenemos dos fabricantes de procesadores: AMD e Intel

In [145]:
data['CPU_brand'] = data.CPU.apply(lambda x: x.split()[0])

In [158]:
# Función que devuelve la frecuencia de la CPU

def get_GHz(x):
    patron = r"(\d+(?:\.\d+)?)GHz"
    return float(re.findall(patron,x)[0])

In [160]:
data['CPU_freq'] = data.CPU.apply(get_GHz)

Sería interesante tener el número de nucleos. Nos ayudamos de chat GPT para que nos de esta información y obtenemos un diccionario con todos los nucleos de cada procesador

In [162]:
cpu_cores = {
'Intel Core i5 2.3GHz': 4,
'Intel Core i5 1.8GHz': 4,
'Intel Core i5 7200U 2.5GHz': 2,
'Intel Core i7 2.7GHz': 4,
'Intel Core i5 3.1GHz': 4,
'AMD A9-Series 9420 3GHz': 2,
'Intel Core i7 2.2GHz': 4,
'Intel Core i7 8550U 1.8GHz': 4,
'Intel Core i5 8250U 1.6GHz': 4,
'Intel Core i3 6006U 2GHz': 2,
'Intel Core i7 2.8GHz': 4,
'Intel Core M m3 1.2GHz': 2,
'Intel Core i7 7500U 2.7GHz': 2,
'Intel Core i7 2.9GHz': 4,
'Intel Core i3 7100U 2.4GHz': 2,
'Intel Atom x5-Z8350 1.44GHz': 4,
'Intel Core i5 7300HQ 2.5GHz': 4,
'AMD E-Series E2-9000e 1.5GHz': 2,
'Intel Core i5 1.6GHz': 4,
'Intel Core i7 8650U 1.9GHz': 4,
'Intel Atom x5-Z8300 1.44GHz': 4,
'AMD E-Series E2-6110 1.5GHz': 4,
'AMD A6-Series 9220 2.5GHz': 2,
'Intel Celeron Dual Core N3350 1.1GHz': 2,
'Intel Core i3 7130U 2.7GHz': 2,
'Intel Core i7 7700HQ 2.8GHz': 4,
'Intel Core i5 2.0GHz': 4,
'AMD Ryzen 1700 3GHz': 8,
'Intel Pentium Quad Core N4200 1.1GHz': 4,
'Intel Atom x5-Z8550 1.44GHz': 4,
'Intel Celeron Dual Core N3060 1.6GHz': 2,
'Intel Core i5 1.3GHz': 4,
'AMD FX 9830P 3GHz': 4,
'Intel Core i7 7560U 2.4GHz': 2,
'AMD E-Series 6110 1.5GHz': 4,
'Intel Core i5 6200U 2.3GHz': 2,
'Intel Core M 6Y75 1.2GHz': 2,
'Intel Core i5 7500U 2.7GHz': 2,
'Intel Core i3 6006U 2.2GHz': 2,
'AMD A6-Series 9220 2.9GHz': 2,
'Intel Core i7 6920HQ 2.9GHz': 4,
'Intel Core i5 7Y54 1.2GHz': 2,
'Intel Core i7 7820HK 2.9GHz': 4,
'Intel Xeon E3-1505M V6 3GHz': 4,
'Intel Core i7 6500U 2.5GHz': 2,
'AMD E-Series 9000e 1.5GHz': 2,
'AMD A10-Series A10-9620P 2.5GHz': 4,
'AMD A6-Series A6-9220 2.5GHz': 2,
'Intel Core i5 2.9GHz': 4,
'Intel Core i7 6600U 2.6GHz': 2,
'Intel Core i3 6006U 2.0GHz': 2,
'Intel Celeron Dual Core 3205U 1.5GHz': 2,
'Intel Core i7 7820HQ 2.9GHz': 4,
'AMD A10-Series 9600P 2.4GHz': 4,
'Intel Core i7 7600U 2.8GHz': 2,
'AMD A8-Series 7410 2.2GHz': 4,
'Intel Celeron Dual Core 3855U 1.6GHz': 2,
'Intel Pentium Quad Core N3710 1.6GHz': 4,
'AMD A12-Series 9720P 2.7GHz': 4,
'Intel Core i5 7300U 2.6GHz': 2,
'AMD A12-Series 9720P 3.6GHz': 4,
'Intel Celeron Quad Core N3450 1.1GHz': 4,
'Intel Celeron Dual Core N3060 1.60GHz': 2,
'Intel Core i5 6440HQ 2.6GHz': 4,
'Intel Core i7 6820HQ 2.7GHz': 4,
'AMD Ryzen 1600 3.2GHz': 6,
'Intel Core i7 7Y75 1.3GHz': 2,
'Intel Core i5 7440HQ 2.8GHz': 4,
'Intel Core i7 7660U 2.5GHz': 2,
'Intel Core i7 7700HQ 2.7GHz': 4,
'Intel Core M m3-7Y30 2.2GHz': 2,
'Intel Core i5 7Y57 1.2GHz': 2,
'Intel Core i7 6700HQ 2.6GHz': 4,
'Intel Core i3 6100U 2.3GHz': 2,
'AMD A10-Series 9620P 2.5GHz': 4,
'AMD E-Series 7110 1.8GHz': 4,
'Intel Celeron Dual Core N3350 2.0GHz': 2,
'AMD A9-Series A9-9420 3GHz': 2,
'Intel Core i7 6820HK 2.7GHz': 4,
'Intel Core M 7Y30 1.0GHz': 2,
'Intel Xeon E3-1535M v6 3.1GHz': 4,
'Intel Celeron Quad Core N3160 1.6GHz': 4,
'Intel Core i5 6300U 2.4GHz': 2,
'Intel Core i3 6100U 2.1GHz': 2,
'AMD E-Series E2-9000 2.2GHz': 2,
'Intel Celeron Dual Core N3050 1.6GHz': 2,
'Intel Core M M3-6Y30 0.9GHz': 2,
'AMD A9-Series 9420 2.9GHz': 2,
'Intel Core i5 6300HQ 2.3GHz': 4,
'AMD A6-Series 7310 2GHz': 4,
'Intel Atom Z8350 1.92GHz': 4,
'Intel Xeon E3-1535M v5 2.9GHz': 4,
'Intel Core i5 6260U 1.8GHz': 2,
'Intel Pentium Dual Core N4200 1.1GHz': 4,
'Intel Celeron Quad Core N3710 1.6GHz': 4,
'Intel Core M 1.2GHz': 2,
'AMD A12-Series 9700P 2.5GHz': 4,
'Intel Core i7 7500U 2.5GHz': 2,
'Intel Pentium Dual Core 4405U 2.1GHz': 2,
'AMD A4-Series 7210 2.2GHz': 2,
'Intel Core i7 6560U 2.2GHz': 2,
'Intel Core M m7-6Y75 1.2GHz': 2,
'AMD FX 8800P 2.1GHz': 4,
'Intel Core M M7-6Y75 1.2GHz': 2,
'Intel Core i5 7200U 2.50GHz': 2,
'Intel Core i5 7200U 2.70GHz': 2
}

In [167]:
data['CPU_cores'] = data.CPU.apply(lambda x: cpu_cores[x])

In [168]:
data.head()

Unnamed: 0,Manufacturer,Model_Name,Category,Screen_Size,Screen,CPU,RAM,Storage,GPU,Operating_System,Operating_System_Version,Weight,Price,Storage_Capacity,Storage_Type,CPU_brand,CPU_freq,CPU_cores
0,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37,11912523.48,128,SSD,Intel,2.3,4
1,Apple,Macbook Air,Ultrabook,"13.3""",1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34,7993374.48,128,Flash Storage,Intel,1.8,4
2,HP,250 G6,Notebook,"15.6""",Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,,1.86,5112900.0,256,SSD,Intel,2.5,2
3,Apple,MacBook Pro,Ultrabook,"15.4""",IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,,1.83,22563005.4,512,SSD,Intel,2.7,4
4,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37,16037611.2,256,SSD,Intel,3.1,4


In [169]:

re.findall(patron,'Intel Celeron Dual Core N3050 1.6GHz')

['Celeron', 'Dual', 'Core', 'N3050']

In [173]:
def get_cpu_model(x):
    patron = r"(?!(?:AMD|Intel|\d+(?:\.\d+)?GHz)\b)\b\w+\b"
    modelo = re.findall(patron,x)
    return ' '.join(modelo)

In [176]:
data['CPU_model'] = data.CPU.apply(get_cpu_model)

In [177]:
data.head()

Unnamed: 0,Manufacturer,Model_Name,Category,Screen_Size,Screen,CPU,RAM,Storage,GPU,Operating_System,Operating_System_Version,Weight,Price,Storage_Capacity,Storage_Type,CPU_brand,CPU_freq,CPU_cores,CPU_model
0,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 2.3GHz,8,128GB SSD,Intel Iris Plus Graphics 640,macOS,,1.37,11912523.48,128,SSD,Intel,2.3,4,Core i5
1,Apple,Macbook Air,Ultrabook,"13.3""",1440x900,Intel Core i5 1.8GHz,8,128GB Flash Storage,Intel HD Graphics 6000,macOS,,1.34,7993374.48,128,Flash Storage,Intel,1.8,4,Core i5
2,HP,250 G6,Notebook,"15.6""",Full HD 1920x1080,Intel Core i5 7200U 2.5GHz,8,256GB SSD,Intel HD Graphics 620,No OS,,1.86,5112900.0,256,SSD,Intel,2.5,2,Core i5 7200U
3,Apple,MacBook Pro,Ultrabook,"15.4""",IPS Panel Retina Display 2880x1800,Intel Core i7 2.7GHz,16,512GB SSD,AMD Radeon Pro 455,macOS,,1.83,22563005.4,512,SSD,Intel,2.7,4,Core i7
4,Apple,MacBook Pro,Ultrabook,"13.3""",IPS Panel Retina Display 2560x1600,Intel Core i5 3.1GHz,8,256GB SSD,Intel Iris Plus Graphics 650,macOS,,1.37,16037611.2,256,SSD,Intel,3.1,4,Core i5
