# GraphReasoning: Scientific Discovery through Knowledge Extraction and Multimodal Graph-based Representation and Reasoning

Markus J. Buehler, MIT, 2024 mbuehler@MIT.EDU

### Example: GraphReasoning: Loading graph and graph analysis

In [9]:
import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"
device='cuda'

In [10]:
from tqdm.notebook import tqdm
from IPython.display import display, Markdown
from huggingface_hub import hf_hub_download
from GraphReasoning import *

### Load graph and embeddings 

In [12]:
#Hugging Face repo
repository_id = "lamm-mit/GraphReasoning"
data_dir='./GRAPHDATA'    

data_dir_output='./GRAPHDATA_OUTPUT/'

graph_name='BioGraph.graphml'

make_dir_if_needed(data_dir)
make_dir_if_needed(data_dir_output)

tokenizer_model="BAAI/bge-large-en-v1.5"

embedding_tokenizer = AutoTokenizer.from_pretrained(tokenizer_model, ) 
embedding_model = AutoModel.from_pretrained(tokenizer_model, ) 

filename = f"{data_dir}/{graph_name}"
file_path = hf_hub_download(repo_id=repository_id, filename=filename,  local_dir='./')
print(f"File downloaded at: {file_path}")

graph_name=f'{data_dir}/{graph_name}'
G = nx.read_graphml(graph_name)

File downloaded at: ././GRAPHDATA/BioGraph.graphml


In [13]:
embedding_file='BioGraph_embeddings_ge-large-en-v1.5.pkl'

generate_new_embeddings=False
if generate_new_embeddings:
    node_embeddings = generate_node_embeddings(G, embedding_tokenizer, embedding_model, )
    save_embeddings(node_embeddings, f'{data_dir}/{embedding_file}')
    
else:
    filename = f"{data_dir}/{embedding_file}"
    file_path = hf_hub_download(repo_id=repository_id, filename=filename, local_dir='./')
    print(f"File downloaded at: {file_path}")

    node_embeddings = load_embeddings(f'{data_dir}/{embedding_file}')

File downloaded at: ././GRAPHDATA/BioGraph_embeddings_ge-large-en-v1.5.pkl


### Load LLM: BioMixtral

In [17]:
from llama_cpp import Llama
import llama_cpp

#m
repository_id='lamm-mit/BioinspiredMixtral'
filename='ggml-model-q5_K_M.gguf'
file_path = hf_hub_download(repo_id=repository_id, filename=filename,  local_dir='./models/')

chat_format="mistral-instruct"

llm = Llama(model_path=file_path,
             n_gpu_layers=-1,verbose= True, #False,#False,
             n_ctx=10000,
             main_gpu=0,
             chat_format=chat_format,
             )

llama_model_loader: loaded meta data with 24 key-value pairs and 995 tensors from ./models/ggml-model-q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32         

In [16]:
file_path

'./models/ggml-model-q5_K_M.gguf'

In [18]:
def generate_BioMixtral (system_prompt='You are a biomaterials cientist.', 
                         prompt="What is spider silk?",temperature=0.333,
                         max_tokens=10000, 
                         ):

    if system_prompt==None:
        messages=[
            {"role": "user", "content": prompt},
        ]
    else:
        messages=[
            {"role": "system",  "content": system_prompt, },
            {"role": "user", "content": prompt},
        ]

    result=llm.create_chat_completion(
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens,
        )
    return result['choices'][0]['message']['content']
     

In [20]:
q='''What is graphene?'''
start_time = time.time()
res=generate_BioMixtral( system_prompt='You design materials.', 
         prompt=q, max_tokens=1024, temperature=0.3,  )

print (res)
deltat=time.time() - start_time
print("--- %s seconds ---" % deltat)
display (Markdown(res))

Llama.generate: prefix-match hit

llama_print_timings:        load time =    5546.25 ms
llama_print_timings:      sample time =      70.73 ms /   152 runs   (    0.47 ms per token,  2149.02 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    8868.12 ms /   152 runs   (   58.34 ms per token,    17.14 tokens per second)
llama_print_timings:       total time =    9294.25 ms /   153 tokens


 Graphene is a single layer of carbon atoms arranged in a hexagonal lattice, with each atom bonded to three neighboring atoms through strong covalent bonds. It has unique mechanical, electrical, and thermal properties that make it an attractive material for various applications. Graphene can be synthesized using different methods, such as chemical vapor deposition (CVD), epitaxial growth on silicon carbide (SiC), and reduction of graphene oxide (GO). The mechanical properties of graphene are influenced by the presence of defects, such as vacancies, grain boundaries, and functional groups. These defects can affect the strength, toughness, and conductivity of graphene.
--- 9.303527116775513 seconds ---


 Graphene is a single layer of carbon atoms arranged in a hexagonal lattice, with each atom bonded to three neighboring atoms through strong covalent bonds. It has unique mechanical, electrical, and thermal properties that make it an attractive material for various applications. Graphene can be synthesized using different methods, such as chemical vapor deposition (CVD), epitaxial growth on silicon carbide (SiC), and reduction of graphene oxide (GO). The mechanical properties of graphene are influenced by the presence of defects, such as vacancies, grain boundaries, and functional groups. These defects can affect the strength, toughness, and conductivity of graphene.

In [24]:
response, (best_node_1, best_similarity_1, best_node_2, best_similarity_2), path, path_graph, shortest_path_length, fname, graph_GraphML = find_path_and_reason(
    G, 
    node_embeddings,
    embedding_tokenizer, 
    embedding_model, 
    generate_BioMixtral, 
    data_dir=data_dir_output,
    verbatim=True,
    include_keywords_as_nodes=True,  # Include keywords in the graph analysis
    keyword_1="collagen",
    keyword_2="copper",
    N_limit=9999,  # The limit for keywords, triplets, etc.
    instruction='Develop a new research idea around collagen and copper.',
    keywords_separator=', ',
    graph_analysis_type='nodes and relations',
    temperature=0.3, 
    inst_prepend='### ',  # Instruction prepend text
    prepend='''You are given a set of information from a graph that describes the relationship 
               between materials, structure, properties, and properties. You analyze these logically 
               through reasoning.\n\n''',  # Prepend text for analysis
    visualize_paths_as_graph=True,  # Whether to visualize paths as a graph
    display_graph=True,  # Whether to display the graph
)
display(Markdown(response))

0nth best fitting node for 'collagen': 'collagen' with similarity: 1
0nth best fitting node for 'copper': 'copper' with similarity: 1
./GRAPHDATA_OUTPUT//shortest_path_2hops_collagen_copper.html
HTML visualization: ./GRAPHDATA_OUTPUT//shortest_path_2hops_collagen_copper.html
GraphML file: ./GRAPHDATA_OUTPUT//shortestpath_2hops_collagen_copper.graphml
You are given a set of information from a graph that describes the relationship 
               between materials, structure, properties, and properties. You analyze these logically 
               through reasoning.

### Consider this list of nodes and relations in a knowledge graph:

Format: node_1, relationship, node_2

The data is:

biomimetic injectable hydrogels, can be processed into, collagen
highly porous collagen strands, promote, collagen
poisson's ratio, has, collagen
complex hierarchical microstructure, Has, collagen
collagen's mechanical properties, are physiologically relevant due to its role in providing structural support 

Llama.generate: prefix-match hit

llama_print_timings:        load time =    5546.25 ms
llama_print_timings:      sample time =     157.28 ms /   323 runs   (    0.49 ms per token,  2053.73 tokens per second)
llama_print_timings: prompt eval time =   33531.03 ms /  4772 tokens (    7.03 ms per token,   142.32 tokens per second)
llama_print_timings:        eval time =   23368.99 ms /   322 runs   (   72.57 ms per token,    13.78 tokens per second)
llama_print_timings:       total time =   57923.38 ms /  5094 tokens


**Response:**  Based on the given information, a potential research idea could be to explore the use of copper-collagen nanoparticles for bone tissue engineering applications. The knowledge graph provides evidence that collagen has been used as a scaffold material in biomedical applications, while copper has shown promise as an antimicrobial agent and osteogenic material. By combining these two materials, it may be possible to create a novel nanocomposite with improved mechanical properties and enhanced biological activity for bone tissue engineering.

The research could involve synthesizing and characterizing the copper-collagen nanoparticles using various techniques such as sol-gel process, post-calcinations, and surface modification methods. The study could also investigate the effects of different ratios of collagen to copper on the properties of the nanocomposite, as well as its biocompatibility and potential toxicity.

The research idea is based on the knowledge graph's description of how collagen can be processed into various forms such as membranes, films, and hydrogels, while copper has been used in a variety of applications including antimicrobial agents and osteogenic materials. The study could also draw inspiration from the knowledge graph's mention of the use of nanoparticles to enhance mechanical properties and improve biocompatibility in composite materials.

Overall, this research idea aims to explore the potential of copper-collagen nanoparticles as a novel material for bone tissue engineering applications, with a focus on improving both mechanical properties and biological activity.

 Based on the given information, a potential research idea could be to explore the use of copper-collagen nanoparticles for bone tissue engineering applications. The knowledge graph provides evidence that collagen has been used as a scaffold material in biomedical applications, while copper has shown promise as an antimicrobial agent and osteogenic material. By combining these two materials, it may be possible to create a novel nanocomposite with improved mechanical properties and enhanced biological activity for bone tissue engineering.

The research could involve synthesizing and characterizing the copper-collagen nanoparticles using various techniques such as sol-gel process, post-calcinations, and surface modification methods. The study could also investigate the effects of different ratios of collagen to copper on the properties of the nanocomposite, as well as its biocompatibility and potential toxicity.

The research idea is based on the knowledge graph's description of how collagen can be processed into various forms such as membranes, films, and hydrogels, while copper has been used in a variety of applications including antimicrobial agents and osteogenic materials. The study could also draw inspiration from the knowledge graph's mention of the use of nanoparticles to enhance mechanical properties and improve biocompatibility in composite materials.

Overall, this research idea aims to explore the potential of copper-collagen nanoparticles as a novel material for bone tissue engineering applications, with a focus on improving both mechanical properties and biological activity.

In [25]:
response, (best_node_1, best_similarity_1, best_node_2, best_similarity_2), path

(" Based on the given information, a potential research idea could be to explore the use of copper-collagen nanoparticles for bone tissue engineering applications. The knowledge graph provides evidence that collagen has been used as a scaffold material in biomedical applications, while copper has shown promise as an antimicrobial agent and osteogenic material. By combining these two materials, it may be possible to create a novel nanocomposite with improved mechanical properties and enhanced biological activity for bone tissue engineering.\n\nThe research could involve synthesizing and characterizing the copper-collagen nanoparticles using various techniques such as sol-gel process, post-calcinations, and surface modification methods. The study could also investigate the effects of different ratios of collagen to copper on the properties of the nanocomposite, as well as its biocompatibility and potential toxicity.\n\nThe research idea is based on the knowledge graph's description of ho