# Knowledge Graph & Sistem Rekomendasi Fashion

1.  **Membangun Knowledge Graph** dari file CSV menggunakan LLM (LLM Graph Builder).
2.  **Membuat Generator Kueri Rekomendasi** yang memungkinkan untuk meminta saran fashion dalam bahasa alami, di mana LLM akan secara cerdas membuat kueri yang kompleks ke graf.


###Prepare Dependencies & Data Processing

In [None]:
%pip install pandas neo4j google-generativeai langchain langchain-google-genai langchain-community



In [None]:
# 1. Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# 2. Salin file dari folder Graf di Google Drive ke Colab
import shutil

drive_path = "/content/drive/MyDrive/Fashion_Data-Points_-Form-responses-1.xlsx"  # ← ganti nama file sesuai


In [None]:
# 3. Baca dan tampilkan 5 baris pertama (head)
import pandas as pd

df = pd.read_excel(drive_path)
df.head()


Unnamed: 0,Timestamp,1.Age Group,2.Gender,3.Profession,Section 2: Style Preferences\n4. How would you describe your go-to daily outfit? (Select one),5. What’s your favorite color palette for clothing?,6. Do you prioritize functionality or aesthetics in your outfits?,7.Which of these best describes your wardrobe?,Section 3: Shopping Habits\n8. How often do you shop for new clothes?,9.What influences your clothing purchases the most?,10. Where do you typically shop for clothes? (Select all that apply),Section 4: Lifestyle\n11. How often do you attend formal events?,12.Do you often experiment with new styles or stick to what you know?,13. What kind of footwear do you wear most often?,14. How active is your daily lifestyle?,Section 5: Personal Preferences\n15. How important is comfort in your clothing choices\n,"16.If you had to choose, would you prefer timeless pieces or trendy items?",17. From scale 1-10 how much do you think your clothing style reflects about your personality?
0,14/01/2025 19:33:18,18–24,Female,Student,"Chic (e.g., tailored, stylish)","Pastels (soft pink, baby blue)",Slightly prefer aesthetics,Mix-and-match (varied styles),Rarely,Comfort,Local boutiques,- Occasionally (a few times a year),Sometimes experiment,Sneakers,Mostly sedentary,- Extremely important,"Mostly trendy, some timeless",8.0
1,21/01/2025 22:13:55,18–24,Female,Student,"Casual (e.g., jeans, t-shirts)","Dark tones (navy, maroon)",Slightly prefer functionality,Mix-and-match (varied styles),Every few months,Sustainability,Thrift stores,- Occasionally (a few times a year),Sometimes experiment,Sandals/Flats,Moderately active,- Somewhat important,"Mostly timeless, some trendy",6.0
2,21/01/2025 22:24:31,18–24,Female,Student,"Casual (e.g., jeans, t-shirts)","Neutral (black, white, beige)",Slightly prefer functionality,Minimalist (few versatile pieces),Monthly,Comfort,Local boutiques,- Occasionally (a few times a year),Rarely experiment,Sneakers,Moderately active,- Extremely important,"Mostly timeless, some trendy",9.0
3,21/01/2025 22:25:53,18–24,Male,Student,"Casual (e.g., jeans, t-shirts)","Neutral (black, white, beige)",Equal balance of both,Specialized (specific to one style),Every few months,Comfort,Thrift stores,- Rarely (less than once a year),Sometimes experiment,Sneakers,"Very active (e.g., gym, outdoor activities)",- Somewhat important,Always timeless,6.0
4,21/01/2025 22:38:46,18–24,Male,Student,"Casual (e.g., jeans, t-shirts)","Neutral (black, white, beige)",Slightly prefer aesthetics,Minimalist (few versatile pieces),Rarely,Comfort,Thrift stores,- Occasionally (a few times a year),Rarely experiment,Sneakers,"Very active (e.g., gym, outdoor activities)",- Somewhat important,"Mostly timeless, some trendy",6.0


In [None]:
df.isnull().sum()


Unnamed: 0,0
Timestamp,0
1.Age Group,0
2.Gender,0
3.Profession,0
Section 2: Style Preferences\n4. How would you describe your go-to daily outfit? (Select one),1
5. What’s your favorite color palette for clothing?,0
6. Do you prioritize functionality or aesthetics in your outfits?,1
7.Which of these best describes your wardrobe?,2
Section 3: Shopping Habits\n8. How often do you shop for new clothes?,1
9.What influences your clothing purchases the most?,2


In [None]:
# Isi semua kolom yang punya NaN dengan modus-nya
for col in df.columns:
    if df[col].isnull().sum() > 0:
        mode_val = df[col].mode()[0]
        df[col].fillna(mode_val, inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(mode_val, inplace=True)


In [None]:
# Tampilkan sisa nilai null setelah pengisian
print(df.isnull().sum())


Timestamp                                                                                          0
  1.Age Group                                                                                      0
  2.Gender                                                                                         0
  3.Profession                                                                                     0
Section 2: Style Preferences\n4. How would you describe your go-to daily outfit? (Select one)      0
 5. What’s your favorite color palette for clothing?                                               0
 6. Do you prioritize functionality or aesthetics in your outfits?                                 0
  7.Which of these best describes your wardrobe?                                                   0
Section 3: Shopping Habits\n8. How often do you shop for new clothes?                              0
  9.What influences your clothing purchases the most?                                      

### Konfigurasi Kredensial dan Koneksi

Di sel berikutnya, kita akan mengimpor semua library dan mengatur kredensial untuk Neo4j dan LLM API (OpenAI).

In [8]:
# ===================================================================
# Versi Final: Konfigurasi Menggunakan Google Colab Secrets
# ===================================================================

import pandas as pd
import os
import json
import re
from neo4j import GraphDatabase
import google.generativeai as genai
from langchain.chains import GraphCypherQAChain
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.graphs import Neo4jGraph
from langchain.prompts.prompt import PromptTemplate

# UBAH: Mengimpor library 'userdata' khusus untuk Colab Secrets
from google.colab import userdata

# --- MENGAMBIL KREDENSIAL DARI COLAB SECRETS ---
# Kode ini akan secara otomatis mengambil nilai dari Secrets yang telah Anda atur.
try:
    NEO4J_URI = userdata.get('NEO4J_URI')
    NEO4J_USER = userdata.get('NEO4J_USER')
    NEO4J_PASSWORD = userdata.get('NEO4J_PASSWORD')
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    print("✅ Kredensial Neo4j dan Google berhasil diambil dari Colab Secrets.")

    # Inisialisasi Klien Gemini
    genai.configure(api_key=GOOGLE_API_KEY)
    gemini_model = genai.GenerativeModel('gemini-2.0-flash-lite') # Menggunakan model terbaru
    print("✅ Klien Google Gemini berhasil dikonfigurasi.")

except Exception as e:
    print(f"❌ Gagal mengambil kredensial dari Secrets atau mengkonfigurasi klien. Pastikan nama secret sudah benar dan toggle 'Notebook access' aktif. Error: {e}")

✅ Kredensial Neo4j dan Google berhasil diambil dari Colab Secrets.
✅ Klien Google Gemini berhasil dikonfigurasi.


#

### LLM Graph Builder - Logika Pembangunan Graf

In [None]:
import time

# ====== STEP 4: Template Prompt Gemini ======
GRAPH_BUILDER_PROMPT = """
Anda adalah ahli data engineering yang bertugas mengekstrak informasi dari survei fashion untuk knowledge graph.
Berdasarkan teks input, ekstrak entitas dan hubungan sesuai skema di bawah ini.
PENTING: Kembalikan HANYA objek JSON yang valid tanpa teks tambahan, penjelasan, atau tanda markdown seperti ```json.

SKEMA GRAF:
- Node Labels: AgeGroup, Gender, Profession, StylePreferences, ColorPalette, FunctionalityAesthetics, Wardrobe, Lifestyle, ComfortImportance, TimelessTrendy, ClothingPersonalityReflection.
- Properti: id, nilai (nilai dari pertanyaan di survei).

Teks Input:
{text_input}

JSON Output:
"""

def clean_json_output(text):
    match = re.search(r"```json\s*([\s\S]*?)\s*```", text)
    return match.group(1) if match else text

def call_gemini_for_graph_build(text_input):
    prompt = GRAPH_BUILDER_PROMPT.format(text_input=text_input)
    response = gemini_model.generate_content(prompt)
    cleaned_text = clean_json_output(response.text)
    return cleaned_text

# ====== STEP 5: Menulis ke Neo4j ======
def write_to_neo4j_gemini_version(driver, graph_data, respondent_id):
    LABEL_TO_RELATIONSHIP = {
        "AgeGroup": "HAS_AGE_GROUP",
        "Gender": "HAS_GENDER",
        "Profession": "HAS_PROFESSION",
        "StylePreferences": "PREFERS_STYLE",
        "ColorPalette": "PREFERS_COLOR",
        "FunctionalityAesthetics": "PREFERS_FUNCTIONALITY_AESTHETICS",
        "Wardrobe": "HAS_WARDROBE",
        "Lifestyle": "HAS_LIFESTYLE",
        "ComfortImportance": "VALUES_COMFORT",
        "TimelessTrendy": "PREFERS_TIMELESS_TRENDY",
        "ClothingPersonalityReflection": "REFLECTS_PERSONALITY"
    }

    with driver.session() as session:
        respondent_props = graph_data.get("Responden", {})
        if not isinstance(respondent_props, dict): respondent_props = {}
        respondent_props['id'] = respondent_id
        session.run("MERGE (r:Responden {id: $props.id}) SET r += $props", props=respondent_props)

        for label, rel_type in LABEL_TO_RELATIONSHIP.items():
            value = graph_data.get(label)
            if value and value != "N/A" and isinstance(value, str):
                values = [v.strip() for v in value.split(',')]
                for single_value in values:
                    if single_value:
                        session.run(
                            f"""
                            MERGE (t:{label} {{nilai: $prop_val}})
                            WITH t
                            MATCH (r:Responden {{id: $resp_id}})
                            MERGE (r)-[:`{rel_type}`]->(t)
                            """,
                            prop_val=single_value,
                            resp_id=respondent_id
                        )

# ====== STEP 6: Bangun Graf dari DataFrame ======
def build_the_graph(df):
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
    print("🚀 Memulai proses pembangunan knowledge graph...")

    for index, row in df.iterrows():
        respondent_id = f"responden_{index + 1}"

        usia = row.get('1.Age Group', 'N/A')
        gender = row.get('2.Gender', 'N/A')
        profesi = row.get('3.Profession', 'N/A')
        gaya_busana = row.get('Section 2: Style Preferences\n4. How would you describe your go-to daily outfit? (Select one)', 'N/A')
        warna = row.get('5. What’s your favorite color palette for clothing?', 'N/A')
        fungsi_aestetik = row.get('6. Do you prioritize functionality or aesthetics in your outfits?', 'N/A')
        wardrobe = row.get('7.Which of these best describes your wardrobe?', 'N/A')
        lifestyle = row.get('14. How active is your daily lifestyle?', 'N/A')
        comfort = row.get('Section 5: Personal Preferences\n15. How important is comfort in your clothing choices\n', 'N/A')
        timeless_trendy = row.get('16.If you had to choose, would you prefer timeless pieces or trendy items?', 'N/A')
        clothing_personality = row.get('17. From scale 1-10 how much do you think your clothing style reflects about your personality?', 'N/A')

        text_input = (
            f"Data untuk {respondent_id}: Usia {usia}, gender {gender}, profesi {profesi}. "
            f"Gaya busana {gaya_busana}, warna favorit {warna}, preferensi fungsi vs estetika {fungsi_aestetik}, "
            f"wardrobe {wardrobe}, lifestyle {lifestyle}, pentingnya kenyamanan {comfort}, "
            f"timeless vs trendy {timeless_trendy}, dan refleksi kepribadian {clothing_personality}."
        )

        print(f"🛠️ Memproses {respondent_id}...")

        try:
            llm_output = call_gemini_for_graph_build(text_input)
            print(f"Output Gemini untuk {respondent_id}:\n{llm_output}\n")
            graph_data = json.loads(llm_output)
            write_to_neo4j_gemini_version(driver, graph_data, respondent_id)
            print(f"✅ Berhasil menyimpan {respondent_id}")
        except Exception as e:
            print(f"⚠️ Gagal memproses {respondent_id}: {e}")
            print(f"   Output LLM yang diterima: {llm_output}")

        # Delay 2 detik biar nggak kena 429 error
        time.sleep(2)

    driver.close()
    print("🎉 Selesai membangun knowledge graph!")


In [None]:
# JALANKAN SEL INI HANYA SEKALI untuk mengisi database Anda.
# Pastikan Anda sudah menjalankan Sel 4 dan memasukkan kredensial dengan benar.
build_the_graph(df)

🚀 Memulai proses pembangunan knowledge graph...
🛠️ Memproses responden_1...
✅ Berhasil menyimpan responden_1
🛠️ Memproses responden_2...
✅ Berhasil menyimpan responden_2
🛠️ Memproses responden_3...
✅ Berhasil menyimpan responden_3
🛠️ Memproses responden_4...
✅ Berhasil menyimpan responden_4
🛠️ Memproses responden_5...
✅ Berhasil menyimpan responden_5
🛠️ Memproses responden_6...
✅ Berhasil menyimpan responden_6
🛠️ Memproses responden_7...
✅ Berhasil menyimpan responden_7
🛠️ Memproses responden_8...
✅ Berhasil menyimpan responden_8
🛠️ Memproses responden_9...
✅ Berhasil menyimpan responden_9
🛠️ Memproses responden_10...
✅ Berhasil menyimpan responden_10
🛠️ Memproses responden_11...
✅ Berhasil menyimpan responden_11
🛠️ Memproses responden_12...
✅ Berhasil menyimpan responden_12
🛠️ Memproses responden_13...
✅ Berhasil menyimpan responden_13
🛠️ Memproses responden_14...
✅ Berhasil menyimpan responden_14
🛠️ Memproses responden_15...
✅ Berhasil menyimpan responden_15
🛠️ Memproses responden_16

In [None]:
driver = GraphDatabase.driver(uri, auth=(user, password))

def export_nodes_and_relationships():
    with driver.session() as session:
        # Ambil semua node dengan label
        nodes_query = """
        MATCH (n)
        RETURN labels(n) AS labels, properties(n) AS props
        """
        result_nodes = session.run(nodes_query)
        nodes = []
        for record in result_nodes:
            labels = record["labels"]
            props = record["props"]
            props["labels"] = ";".join(labels)
            nodes.append(props)
        df_nodes = pd.DataFrame(nodes)

        # Ambil semua relasi
        rel_query = """
        MATCH (a)-[r]->(b)
        RETURN startNode(r).id AS from, type(r) AS rel_type, endNode(r).id AS to
        """
        result_rels = session.run(rel_query)
        df_rels = pd.DataFrame([r.data() for r in result_rels])

        return df_nodes, df_rels

# Eksekusi dan simpan ke file CSV
nodes_df, rels_df = export_nodes_and_relationships()

# Simpan ke Google Drive
nodes_df.to_csv('/content/drive/MyDrive/Graf/nodes_export.csv', index=False)
rels_df.to_csv('/content/drive/MyDrive/Graf/relationships_export.csv', index=False)

print("✅ Ekspor selesai. File disimpan di Google Drive (/Graf)")

### LLM Query Generator untuk Rekomendasi dengan Gemini - Logika Sistem Rekomendasi dengan Gemini

LangChain membuat perpindahan ke Gemini menjadi sangat mudah di bagian ini. Kita hanya perlu mengganti model LLM di dalam `GraphCypherQAChain`. #

Sekarang setelah *knowledge graph* kita terisi data, kita akan membangun logika untuk berinteraksi dengannya menggunakan bahasa alami.

Di bagian ini, kita akan menggunakan **LangChain** untuk menghubungkan:
1.  **Model Bahasa Gemini** sebagai "otak penerjemah".
2.  **Database Graf Neo4j** kita sebagai sumber pengetahuan.
3.  Sebuah **template prompt khusus** yang telah kita rancang untuk mengajari Gemini cara membuat kueri rekomendasi yang canggih.

In [None]:
# Template prompt untuk rekomendasi tetap sama, karena logikanya tidak berubah
RECOMMENDATION_PROMPT_TEMPLATE = """
Anda adalah seorang ahli Neo4j Cypher yang bertugas membuat kueri untuk sistem rekomendasi fashion.
Tugas Anda adalah mengubah pertanyaan dalam bahasa natural menjadi kueri Cypher yang valid berdasarkan skema graf yang diberikan.
Fokus pada logika 'collaborative filtering': temukan pengguna lain dengan selera serupa, lalu lihat apa lagi yang mereka sukai.

Skema Graf:
{schema}

Contoh Kueri Cypher yang Dihasilkan untuk pertanyaan "rekomendasikan merek untuk responden_1":
```cypher
MATCH (target:Responden {{id: 'responden_1'}})-[:MEMFAVORITKAN_MEREK]->(shared_brand:Merek)
MATCH (twin:Responden)-[:MEMFAVORITKAN_MEREK]->(shared_brand)
WHERE target <> twin
MATCH (twin)-[:MEMFAVORITKAN_MEREK]->(recommendation:Merek)
WHERE NOT (target)-[:MEMFAVORITKAN_MEREK]->(recommendation)
RETURN recommendation.nama AS Rekomendasi, count(*) AS SkorKecocokan
ORDER BY SkorKecocokan DESC LIMIT 5

Sekarang, ubah pertanyaan berikut menjadi kueri Cypher. Hanya kembalikan kueri Cypher, tanpa penjelasan lain.
Pertanyaan: {question}
"""

def setup_recommendation_chain():
"""Menginisialisasi LangChain QA untuk rekomendasi dengan Gemini."""
graph = Neo4jGraph(
url=NEO4J_URI,
username=NEO4J_USER,
password=NEO4J_PASSWORD
)

prompt = PromptTemplate(
    template=RECOMMENDATION_PROMPT_TEMPLATE,
    input_variables=["schema", "question"]
)

chain = GraphCypherQAChain.from_llm(
    cypher_prompt=prompt,
    llm=ChatGoogleGenerativeAI(temperature=0, model="gemini-pro", google_api_key=GOOGLE_API_KEY),
    graph=graph,
    verbose=True,
    return_intermediate_steps=True
)
return chain
def ask_for_recommendation(chain, question):
"""Fungsi untuk menjalankan rantai rekomendasi dan menampilkan hasil."""
print("===================================================================")
print(f"🔎 Mengajukan pertanyaan: {question}")
print("===================================================================")
try:
response = chain.invoke({"query": question})

    print("\n--- Hasil Rekomendasi ---")
    print(f"Jawaban: {response['result']}")

    if response.get('intermediate_steps'):
        print("\n--- Kueri Cypher yang Digunakan ---")
        print(response['intermediate_steps'][0]['query'])

except Exception as e:
    print(f"Terjadi kesalahan: {e}")
print("\n\n")
print("Mempersiapkan rantai rekomendasi...")
try:
recommendation_chain = setup_recommendation_chain()
print("✅ Rantai rekomendasi siap digunakan.")
print("-" * 50)

ask_for_recommendation(recommendation_chain, "sarankan 5 merek untuk responden_3")

ask_for_recommendation(recommendation_chain, "rekomendasikan gaya busana untuk orang-orang yang suka merek Zara")

ask_for_recommendation(recommendation_chain, "siapa saja responden yang suka gaya Casual dan juga berdomisili di Surabaya?")
except NameError as e:
print(f"❌ KESALAHAN: Pastikan Anda sudah menjalankan sel untuk memasukkan Kredensial (API Key & Neo4j). Pesan error: {e}")
except Exception as e:
print(f"❌ Terjadi kesalahan saat inisialisasi: {e}")

IndentationError: expected an indented block after function definition on line 24 (<ipython-input-7-897215228>, line 25)