In [1]:
from utils.download_pdfs_from_url import download_papers
from utils import icml_parser 
from pathlib import Path
import pickle

## Project Parameters

In [2]:
PROJECT_DIR = './examples/icml_2024'

paper_pdf_dir = Path(PROJECT_DIR, 'paper_pdfs')
paper_parsed_dir = Path(PROJECT_DIR, 'paper_parsed.pkl')

## Download and Parse all ICML papers

In [3]:
_ = download_papers('https://proceedings.mlr.press/v235/', paper_pdf_dir)

all_papers = icml_parser.parse_folder(paper_pdf_dir)
with open(paper_parsed_dir, 'wb') as f:
    pickle.dump(all_papers, f)

## Load ICML papers fomr pkl

In [3]:
with open(paper_parsed_dir, 'rb') as f:
    all_papers = pickle.load(f)

In [4]:
len(all_papers)

2610

## Test single paper summary

In [9]:
# Google AI Studio parameters 
from utils.paper_ontology import *
import os
from IPython.display import Markdown

genai.configure(api_key='<>')
flash = genai.GenerativeModel('gemini-1.5-flash')

  '''


In [17]:
# Test single paper summary
po = PaperOntology(all_papers[1297], flash)
ps = po.create_summary()
Markdown(ps)

## Genie: Generative Interactive Environments – An Overview for Business Stakeholders

This paper introduces Genie, a groundbreaking AI model that generates and allows interaction within virtual worlds.  Instead of simply creating static images or videos, Genie creates fully interactive environments controllable by the user, opening up exciting new possibilities for several businesses.

**Problem Statement:**

Current generative AI excels at producing individual images or videos, but lacks the ability to create truly interactive and engaging experiences like those found in video games.  Existing methods for creating interactive virtual environments are either labor-intensive (requiring manual design and programming) or limited in scope (only able to generate variations of pre-defined environments).  Genie addresses this gap by aiming to create rich, interactive environments directly from simple user prompts.

**Use Cases Impacted:**

Genie's potential impacts several business sectors:

* **Video Game Development:** Genie can drastically reduce the time and cost of creating new game worlds. Designers could quickly prototype environments from text descriptions or sketches, accelerating the development process.
* **Entertainment and Media:**  Interactive storytelling and virtual experiences could be revolutionized.  Imagine creating personalized interactive movies or immersive virtual tours based on simple prompts.
* **Training and Simulation:**  Genie could generate realistic simulations for training purposes, such as flight simulators, surgical training, or military exercises.  The ability to control the environment offers a valuable advantage.
* **Marketing and Advertising:**  Interactive advertisements and product demonstrations could be created, offering a more engaging and memorable experience for customers.


**Proposed Approach:**

Genie's key innovation lies in its ability to learn to generate interactive environments from *unlabeled* internet videos.  Unlike previous methods which require meticulously labeled data specifying actions and their effects, Genie learns these relationships automatically.  This unsupervised learning is a significant advantage, reducing the need for expensive data annotation.

**Fundamental Techniques:**

Genie leverages several key techniques:

* **Unsupervised Learning:**  Genie learns from a massive dataset of unlabeled gaming videos, eliminating the need for manual labeling of actions.
* **Spatiotemporal Transformers:**  These advanced neural networks allow Genie to process and understand the temporal relationships within video data, essential for generating dynamic interactions.
* **Latent Action Model:**  Genie learns a "latent action space," effectively a set of hidden controls that users can employ to interact with the generated environment.  These actions are inferred directly from the video data, again eliminating the need for manual annotation.
* **Autoregressive Prediction:** The model predicts the next frame in a video sequence based on previous frames and the user's chosen action.

**Existing Methods Used:**

Genie utilizes existing techniques like:

* **VQ-VAE (Vector Quantized Variational Autoencoder):** Used for compressing video frames into discrete tokens, making processing more efficient.
* **MaskGIT:** A transformer-based approach employed for video generation, allowing for the controlled generation of frames.
* **Vision Transformers (ViT):** A fundamental building block of the model's architecture for processing image data.

**Benchmarks and Metrics:**

The researchers tested Genie on two datasets: one comprised of 2D platformer game videos and another containing robot manipulation videos.  They reported:

* **Frechet Video Distance (FVD):** Measures the quality of generated videos.  Lower FVD indicates higher quality.
* **ΔtPSNR (Delta Peak Signal-to-Noise Ratio):** Measures the controllability of the environment, evaluating how much the generated video changes based on user actions.  Higher ΔtPSNR suggests better controllability.

Genie outperformed existing approaches in terms of both video fidelity (FVD) and controllability (ΔtPSNR) by significantly reducing the error rates in generating frames accurately, especially when operating in unsupervised mode. It also showed robustness against inputs that deviate significantly from its training data.


**Main Conclusion and Impact:**

Genie demonstrates the feasibility of generating interactive environments directly from unlabeled video data, opening new avenues for interactive content creation. It outperforms existing methods by efficiently learning from unlabeled video and maintaining controllability. Its unsupervised nature dramatically reduces data annotation costs.  This could revolutionize various industries, as mentioned above, by allowing for faster, cheaper, and more creative content generation.  Future research will focus on improving the model's efficiency, expanding its capabilities, and exploring applications in agent training and other fields.  For businesses, Genie represents a powerful new tool capable of transforming content creation and simulation.


## Test Single Ontology Creation

In [None]:
#neo4j database installation see https://neo4j.com/docs/python-manual/current/install/
#And comments in paper_ontology.py

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "secretgraph")
with GraphDatabase.driver(URI, auth=AUTH) as driver:
    driver.verify_connectivity()
    print("Connection established.")

In [12]:
#Test single ontology creation
pj = po.create_ontology_json()

In [11]:
kg = OntologyKG(URI, AUTH[0], AUTH[1])

kg.clean()
kg.insert(pj)

# Go to http://localhost:7474 and click * to check the graph

In [13]:
po.create_ontology_str()
print(po.ontology_str)

::Model::VideoPoet
---->HAS_ARCHITECTURE::Architecture::Decoder-only transformer
---->PROCESSES::Modality::Images
---->PROCESSES::Modality::Videos
---->PROCESSES::Modality::Text
---->PROCESSES::Modality::Audio
---->IS_A::Model::Large Language Model (LLM)
---->UNDERGOES::Training Stage::Pretraining
---->---->USES::Objective::Multimodal generative objectives
---->UNDERGOES::Training Stage::Task-specific adaptation
---->PERFORMS::Task::Zero-shot video generation
---->PERFORMS::Task::Text-to-video
---->PERFORMS::Task::Image-to-video
---->PERFORMS::Task::Video editing
---->PERFORMS::Task::Video-to-video stylization
---->USES::Tokenizer::MAGVIT-v2
---->USES::Tokenizer::SoundStream
---->HAS_MODULE::Module::Super-resolution module
---->EVALUATED_ON::Dataset::MSR-VTT
---->EVALUATED_ON::Dataset::UCF-101
---->EVALUATED_ON::Dataset::Kinetics 600 (K600)
---->EVALUATED_ON::Dataset::Something-Something V2 (SSv2)
---->EVALUATED_WITH::Metric::Fr´echet Video Distance (FVD)
---->EVALUATED_WITH::Metric::C

## Test QA Engine using neo4j

In [6]:
#RAG Demo https://streamlit-aicamp-1069753422075.us-central1.run.app
from utils.paper_QA import *

#kg.clean()

docs = paper2doc([all_papers[0]])
len(docs)

10

In [7]:
db = EmbeddingDB("neo4j://localhost:7687", "neo4j", "secretgraph", docs)

In [8]:
query = all_papers[0]['sections'][0]['section_content']
_ = db.query(query)

--------------------------------------------------------------------------------
Score:  0.9593460559844971
Creation of nanomaterials with specific morphol-
ogy remains a complex experimental process,
even though there is a growing demand for these
materials in various industry sectors. This study
explores the potential of AI to predict the morphol-
ogy of nanoparticles within the data availability
constraints. For that, we first generated a new
multi-modal dataset that is double the size of anal-
ogous studies. Then, we systematically evaluated
performance of classical machine learning and
large language models in prediction of nanoma-
terial shapes and sizes. Finally, we prototyped
a text-to-image system, discussed the obtained
empirical results, as well as the limitations and
promises of existing approaches.

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score:  0.9230