
# LapPick: Sistem Rekomendasi Laptop Menggunakan NLP
### Capstone Project Laskar AI

**Tanggal Pembuatan:** 20 May 2025

**Tim:** LAI25-SM035  
**Anggota:**  
- A533YBM071 – ARLIYANDI – STIKOM EL RAHMA  
- A006YBF160 – FATHUR RAHMAN AL FARIZY – Universitas Brawijaya  
- A245YBF227 – IRFAN FAJAR MUTTAQIN – Universitas Kristen Satya Wacana Salatiga  
- A011XBF457 – SHOFURA TSABITAH RAHMAH – Universitas Padjadjaran  

---

## Deskripsi Proyek
LapPick adalah sistem rekomendasi laptop berbasis Natural Language Processing (NLP) untuk membantu calon pembeli memilih laptop berdasarkan kebutuhan (gaming, desain grafis, perkantoran, dll.) dan anggaran.


In [23]:
# Data Processing and Numerical Operations
import pandas as pd
import numpy as np

# Natural Language Processing (NLP)
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
nltk.download('punkt')

# Machine Learning and Model Evaluation
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# Web Scraping (Jika Diperlukan)
import requests
from bs4 import BeautifulSoup
from selenium import webdriver

# Visualization (Jika Diperlukan)
import matplotlib.pyplot as plt

# Optional: TensorFlow (Jika ingin model rekomendasi kompleks)
import tensorflow as tf

# General Settings
import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Development\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!



## 1. Pengumpulan Data

### Tujuan
Mengumpulkan data spesifikasi laptop dari berbagai sumber (e-commerce, dataset publik).

### Langkah
1. Melakukan web scraping dari situs e-commerce.
2. Mengambil dataset dari Kaggle (jika ada).

### Implementasi


In [24]:

# Contoh penggunaan BeautifulSoup untuk scraping data laptop dari e-commerce
df_a = pd.read_csv('datasets/laptop_data.csv')
df_b = pd.read_csv('datasets/LaptopPricePrediction.csv')
df_c = pd.read_csv('datasets/laptops.csv')


In [25]:
df_a.head()
df_a.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        1303 non-null   int64  
 1   Company           1303 non-null   object 
 2   TypeName          1303 non-null   object 
 3   Inches            1303 non-null   float64
 4   ScreenResolution  1303 non-null   object 
 5   Cpu               1303 non-null   object 
 6   Ram               1303 non-null   object 
 7   Memory            1303 non-null   object 
 8   Gpu               1303 non-null   object 
 9   OpSys             1303 non-null   object 
 10  Weight            1303 non-null   object 
 11  Price             1303 non-null   float64
dtypes: float64(2), int64(1), object(9)
memory usage: 122.3+ KB


In [26]:
df_b.head()
df_b.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550 entries, 0 to 549
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        550 non-null    int64  
 1   Name              550 non-null    object 
 2   Processor         550 non-null    object 
 3   RAM               550 non-null    object 
 4   Operating System  550 non-null    object 
 5   Storage           550 non-null    object 
 6   Display           550 non-null    object 
 7   Warranty          550 non-null    object 
 8   Price             550 non-null    object 
 9   rating            550 non-null    float64
dtypes: float64(1), int64(1), object(8)
memory usage: 43.1+ KB


In [27]:
df_c.head()
df_c.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2160 entries, 0 to 2159
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Laptop        2160 non-null   object 
 1   Status        2160 non-null   object 
 2   Brand         2160 non-null   object 
 3   Model         2160 non-null   object 
 4   CPU           2160 non-null   object 
 5   RAM           2160 non-null   int64  
 6   Storage       2160 non-null   int64  
 7   Storage type  2118 non-null   object 
 8   GPU           789 non-null    object 
 9   Screen        2156 non-null   float64
 10  Touch         2160 non-null   object 
 11  Final Price   2160 non-null   float64
dtypes: float64(2), int64(2), object(8)
memory usage: 202.6+ KB



## 2. Pembersihan dan Praproses Data

### Tujuan
Membersihkan data agar konsisten dan siap diproses oleh model.

### Langkah
1. Menghilangkan duplikasi.
2. Mengatasi nilai kosong.
3. Normalisasi teks (misalnya, huruf kecil).

### Implementasi


In [28]:

# Contoh membersihkan data

df = pd.read_csv('datasets/laptops.csv')

df.drop(columns=['Laptop'], inplace=True)  # Nama deskriptif yang bisa digantikan oleh spesifikasi

# Tangani missing value
df['GPU'].fillna('Integrated', inplace=True)
df['Storage type'].fillna('Unknown', inplace=True)
df['Screen'].fillna(df['Screen'].median(), inplace=True)  # Imputasi numerik
df.dropna(inplace=True)  # Jika ada sisa null, drop saja

print(df.head())
print(df.info())


  Status   Brand       Model            CPU  RAM  Storage Storage type  \
0    New    Asus  ExpertBook  Intel Core i5    8      512          SSD   
1    New  Alurin          Go  Intel Celeron    8      256          SSD   
2    New    Asus  ExpertBook  Intel Core i3    8      256          SSD   
3    New     MSI      Katana  Intel Core i7   16     1000          SSD   
4    New      HP         15S  Intel Core i5   16      512          SSD   

          GPU  Screen Touch  Final Price  
0  Integrated    15.6    No      1009.00  
1  Integrated    15.6    No       299.00  
2  Integrated    15.6    No       789.00  
3    RTX 3050    15.6    No      1199.00  
4  Integrated    15.6    No       669.01  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2160 entries, 0 to 2159
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Status        2160 non-null   object 
 1   Brand         2160 non-null   object 
 2   Model        

In [None]:
# Konversi harga ke jutaan jika perlu
df['Final Price'] = df['Final Price'] * 16500  # USD to IDR

# Pastikan kolom RAM dan Storage bertipe numerik
df['RAM'] = df['RAM'].astype(int)
df['Storage'] = df['Storage'].astype(int)

# Normalisasi teks
df['Touch'] = df['Touch'].str.strip().str.lower()
df['Touch'] = df['Touch'].replace({'yes': 1, 'no': 0})

# Preview
df.head()


Unnamed: 0,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,Integrated,15.6,0,16648500.0
1,New,Alurin,Go,Intel Celeron,8,256,SSD,Integrated,15.6,0,4933500.0
2,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,Integrated,15.6,0,13018500.0
3,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,0,19783500.0
4,New,HP,15S,Intel Core i5,16,512,SSD,Integrated,15.6,0,11038665.0


In [33]:
df.to_csv('cleaned_dataset.csv', index=False)


## 3. Pemahaman Kebutuhan Pengguna (NLP)

### Tujuan
Menginterpretasikan input teks dari pengguna untuk memahami kebutuhan.

### Langkah
1. Tokenisasi dan Stemming.
2. Perhitungan TF-IDF.
3. Pembuatan vektor kebutuhan.

### Implementasi


In [30]:

# Contoh NLP dengan TF-IDF

# Buat stemmer
factory = StemmerFactory()
stemmer = factory.create_stemmer()

# Tokenisasi dan stemming
texts = ['Laptop gaming terbaik', 'Laptop desain grafis murah']
processed_texts = [stemmer.stem(text) for text in texts]

# TF-IDF Vectorizer
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(processed_texts)

print('TF-IDF Matrix:')
print(tfidf_matrix.toarray())


TF-IDF Matrix:
[[0.6316672  0.         0.6316672  0.         0.44943642 0.        ]
 [0.         0.53404633 0.         0.53404633 0.37997836 0.53404633]]



## 4. Sistem Rekomendasi

### Tujuan
Merekomendasikan laptop berdasarkan kesesuaian dengan input pengguna.

### Langkah
1. Menghitung kemiripan vektor (cosine similarity).
2. Menampilkan laptop dengan skor tertinggi.

### Implementasi


In [31]:

# Contoh menghitung kemiripan
similarity = cosine_similarity(tfidf_matrix)
print('Cosine Similarity:')
print(similarity)


Cosine Similarity:
[[1.         0.17077611]
 [0.17077611 1.        ]]



## 5. Evaluasi Model

### Tujuan
Mengukur akurasi dan efektivitas model rekomendasi.

### Metode
- Precision, Recall, F1-Score
- Cosine Similarity

### Implementasi


In [32]:

# Contoh evaluasi model
true_labels = [1, 0]
predictions = [1, 1]
print(classification_report(true_labels, predictions))


              precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.50      1.00      0.67         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2

