Spider

GraphRAG

To run GraphRAG, follow the installation instruction here. To run in CLI mode with custimized configuration, you can install GraphRAG via Pip:

pip install graphrag

Then create an input folder to store your customized data.

mkdir -p ./GraphRAG/test/input

Then put some text files inside the input folder you just created. There can be multiple files or multiple folders containing multiple files.

Data Collection

We provided a script collect_data.py to source articles from Wikipedia given a list of prompts. Save the prompts in the CSV format with the header "Prompt" and then change the corresponding path in the script to collect data. The script leverages LangChain. You need to install the following the package before running the script:

pip install -qU langchain_community wikipedia

Initialization

Run the following command to initialize a folder as GraphRAG's base:

graphrag init --root ./GraphRAG/test/input

Then two files will be generated for you in the GraphRAG/test: an environment file .env and a settings file settings.yaml.

You will need to specify your OpenAI API key in the .env: GRAPHRAG_API_KEY=<YOUR_API_KEY>. Additionally, you can change settings in settings.yaml. The most common one might be to change the driver language model to use. Inside settings.yaml, under the first llm block, change model to any OpenAI LLM model that fits your budget. For a complete list of models, check here.

Running the indexing of the tree

To form a hierarchical tree out of the text corpus you just stored, run:

graphrag index --root ./GraphRAG/test

A complete list of arguments can be found here

If you added new files in the input corpus, you can run:

graphrag update --root ./GraphRAG/test

to update the graph.

Querying the knowledge graph with prompts

For a global understanding of the knowledge graph you just built, run:

graphrag query --root ./GraphRAG/test --method global --query "<YOUR QUESTION HERE>"

In our T2I task, to get an answer you probably need to get local details. For example, after generating a graph that includes the entities you want to paint, run:

graphrag query --root ./GraphRAG/test --method local --query "Provide a detailed prompt for image generation of <YOUR IMAGE PROMPT>. It must be possible to directly input your answer into an image generation model for an accurate image."

Evaluation

Data Structure

The evaluation directory contains the prompts.csv file, the generated images, and the pseudo-groundtruth images.

├── evaluation
│   ├── prompts.csv
│   ├── images
│   │   ├── 2gen_img_LLAMAgraph
│   │   │   ├── 200_[prompt].jpg
│   │   │   ├── 201_[prompt].jpg
│   │   │   ├── 202_[prompt].jpg
│   │   │   ├── ...
│   │   │   ├── 399_[prompt].jpg
│   │   ├── 2gen_img_LLAMAgraphnoneigh
│   │   │   ├── ...
│   │   ├── 2gen_img_LLAMAuserprompt
│   │   │   ├── ...
│   │   ├── 2gen_img_graph
│   │   │   ├── ...
│   │   ├── 2gen_img_graphnoneigh
│   │   │   ├── ...
│   │   ├── 2gen_img_userprompt
│   │   │   ├── ...
│   ├── gt_images
│   │   ├── 200
│   │   ├── 201
│   │   ├── 202
│   │   ├── ...
│   │   ├── 399

Environment

The evaluation script requires torch, clip, dreamsim, and other regular Python libraries, including pandas and PIL, to run.

Scripts

remove_empty_images.ipynb is used to remove broken image files from the generation/crawling pipeline.
The CLIPScore is implemented in CLIPScore.ipynb and the visual similarity is implemented in visual_similarity.ipynb.

WikiGraphs

All details of the wikipedia graph processing, community detection, LLAMA prompt generation, Flux image generation, and web scraping are in Wikiraph-LLAMA-Flux-Scraping.

The dataset itself is not included as it is 7GB + 2GB of supplemental files generated by the code.

Code file names describe what they do.

The scraped images are in first101scraped.zip and next99scraped.zip. The generated images are in all_FLux_gen_images.zip. The from_graph_prompts_LLAMA_graph series stores the node texts and LLAMA prompts for the first 76 nodes and 200-230_withprompts.csv stores the same for nodes 200 to 229. 200-240.csv stores the first 6 words (puncutation and numbers removed) as well as the full text of the nodes 200-399.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
GraphRAG		GraphRAG
Wikigraph-LLAMA-Flux-Scraping		Wikigraph-LLAMA-Flux-Scraping
evaluation		evaluation
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

GraphRAG

Data Collection

Initialization

Running the indexing of the tree

Querying the knowledge graph with prompts

Evaluation

Data Structure

Environment

Scripts

WikiGraphs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spider

GraphRAG

Data Collection

Initialization

Running the indexing of the tree

Querying the knowledge graph with prompts

Evaluation

Data Structure

Environment

Scripts

WikiGraphs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages