# **Knowledge Representation in RAG methods**

Contributors:
* Szymon Pająk
* Tomasz Ogiołda

## Temporary notes

### Plan

1. Introduction
2. Background
  - What is RAG? Why is it used?
  - What kinds of knowledge representations RAG can use?
    - Vectorized embeddings
    - Knowledge graph
    - Combination of both
    - Comparison https://neo4j.com/blog/genai/graphrag-manifesto/

  - Explain the dataflow for both knowledge representations (the whole process, from raw data, to querying the knowledge database)
3. Demo

Tools to be used:

- langchain?
- neo4j

4. Resources

- https://neo4j.com/blog/genai/graphrag-manifesto/
- https://neo4j.com/blog/developer/langchain4j-graphrag-vector-stores-retrievers/
- https://neo4j.com/blog/genai/what-is-retrieval-augmented-generation-rag/
- https://neo4j.com/blog/developer/knowledge-graph-rag-application/
- https://neo4j.com/blog/news/graphrag-ecosystem-tools/

## **RAG quickstart & Motivation**

Some text

In [21]:
!pip install neo4j google-generativeai



In [5]:
from google.colab import userdata

NEO4J_URI = userdata.get('NEO4J_URI')
NEO4J_PASS = userdata.get('NEO4J_PASS')
NEO4J_DB_USER = userdata.get('NEO4J_DB_USER')
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

In [20]:
from neo4j import GraphDatabase
import google.generativeai as genai

genai.configure(api_key=GOOGLE_API_KEY)

URI = "neo4j+s://3a2f9088.databases.neo4j.io"

embedding_model = genai.GenerativeModel('models/text-embedding-004')
generative_llm = genai.GenerativeModel('gemini-1.5-flash-latest')

with GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_DB_USER, NEO4J_PASS)) as driver:
    driver.verify_connectivity()
    global db
    db = driver

## **Data preparation & Indexing**

In [24]:
import kagglehub

path = kagglehub.dataset_download("devdope/900k-spotify")

Downloading from https://www.kaggle.com/api/v1/datasets/download/devdope/900k-spotify?dataset_version_number=3...


100%|██████████| 1.00G/1.00G [00:25<00:00, 41.8MB/s]

Extracting files...





In [44]:
import numpy as np
import pandas as pd

songs_csv = path + '/spotify_dataset.csv'

full_df = pd.read_csv(songs_csv)

In [56]:
np.random.seed(12)

df = full_df.sample(20000)
df = df[['Artist(s)','song', 'text', 'emotion', 'Length', 'Tempo', 'Album', 'Genre', 'Release Date', 'Explicit', 'Energy', 'Popularity', 'Danceability', 'Positiveness', 'Liveness']]
df[['Energy', 'Popularity', 'Danceability', 'Positiveness', 'Liveness']] = df[['Energy', 'Popularity', 'Danceability', 'Positiveness', 'Liveness']].astype(int)/100

df.sample(10)

Unnamed: 0,Artist(s),song,text,emotion,Length,Tempo,Album,Genre,Release Date,Explicit,Energy,Popularity,Danceability,Positiveness,Liveness
478214,The Main Squeeze,Ill take another,Give me one more One more give it to me Give m...,sadness,09:39,77,The Main Squeeze,"rock,pop,dance",6th February 2012,No,0.59,0.27,0.37,0.29,0.12
370562,"Pi’erre Bourne,Sharc",All Night,[Chorus: Pi'erre Bourne] Baby just end this sh...,joy,04:59,127,The Life Of Pi'erre 5,hip hop,11th June 2021,Yes,0.49,0.66,0.78,0.09,0.17
27999,Any Trouble,Romance,Your even sweating In the shade Your three pie...,joy,04:03,93,Where Are All The Nice Girls?,hip hop,12th February 2007,No,0.86,0.01,0.54,0.88,0.03
302960,Marc Broussard,Lonely Night in Georgia,Stoplights turn into skylines And my mind turn...,sadness,06:20,123,Carencro,"soul,pop,rock",3rd August 2004,No,0.79,0.3,0.59,0.64,0.2
177297,George Harrison,True Love,[Verse] You give to me and I give to you True ...,joy,02:44,124,Thirty Three & 1/3,"electronic,pop,folk",19th November 1976,No,0.89,0.23,0.56,0.86,0.15
7126,"A$AP Ferg,Pharrell Williams,The Neptunes",Paper Plates,[Intro: Pharrell Williams] Yeah [Chorus: Phar...,anger,02:54,120,Green Juice (feat. Pharrell Williams),hip hop,28th October 2021,Yes,0.6,0.38,0.64,0.14,0.18
440740,Steely Dan,Fire In The Hole,I decline To walk the line They tell me ...,sadness,03:29,73,Can't Buy A Thrill,"pop rock,jazz,rock",1st November 1972,No,0.31,0.53,0.56,0.47,0.05
264006,Kontinuum,Two Moons,Life flows Through the fingers and away from y...,sadness,05:43,163,No Need to Reason,hip hop,6th July 2018,No,0.53,0.13,0.43,0.12,0.12
13154,Aimee Mann,Build That Wall,[Verse 1] She's been a long time on the phone ...,love,04:24,109,Magnolia (Music from the Motion Picture),"alternative rock,rock",7th December 1999,No,0.52,0.27,0.69,0.77,0.12
39700,Bahamas,Never Again,Loosen your mind Open back up to me We would b...,joy,04:30,125,Barchords,folk,1st January 2012,No,0.43,0.21,0.4,0.18,0.11


## **Retrieval**

Some text

In [None]:
# Some code

## **Generation**

Some text

In [None]:
# Some code

## **Challenges & Future Development**

Some text

In [None]:
# Some code