**Gemini Lab**
NASA Space Apps Challenge

**Challenge: Visualize Space Science**

**Tool: Text-To-Diagram**

The purpose of this notebook is to demonstrate the concept of a tool that generates Text-to-Diagram representations for space experiment workflows.

This tool aims to:
- Automatically infer relationships between nodes based on the input text.
- Use Natural Language Processing to analyze the text and extract the entities (nodes) and their relationships (edges).

**How does it work?**


- Taking  a text as an input
- Detect entities and relationships between the words in the text automatically.
- Generate a diagram based on the extracted information.

**Import Libraries:**

In [None]:
!pip install spacy graphviz
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m48.9 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [None]:
import spacy
import graphviz
import re

**Loading spaCy English Model**

In [None]:
nlp = spacy.load('en_core_web_sm')

**Defining function to create diagram**

In [None]:
# Function to process text input and automatically create a diagram
def automatic_text_to_diagram(input_text):
    # Parse the input text with spaCy NLP
    doc = nlp(input_text)

    # Create a new directed graph for the diagram
    dot = graphviz.Digraph()

    # Find entities (nouns) and relations (verbs)
    nodes = []
    relations = []

    # Loop through each token in the parsed sentence
    for token in doc:
        if token.pos_ in ['NOUN', 'PROPN']:  # Use proper nouns and common nouns as nodes
            nodes.append(token.text)
        if token.pos_ == 'VERB':  # Use verbs as relationships
            relations.append(token.text)

    # Automatically connect the nodes with relationships
    if len(nodes) >= 2 and len(relations) >= 1:
        for i in range(len(relations)):
            node1 = nodes[i]  # First node
            node2 = nodes[i+1]  # Next node
            relation = relations[i]  # Relationship (verb)
            # Add nodes and edges to the diagram
            dot.node(node1, node1)
            dot.node(node2, node2)
            dot.edge(node1, node2, relation)

    # Render and display the diagram
    return dot

**Input text:**

- The input text below is taken directly from the description section of the OSD-379 experiment in the NASA Open Science Data Repository (OSDR).

- Link to reference: https://osdr.nasa.gov/bio/repo/data/studies/OSD-379

**Note:** You can modify the input text to generate different diagrams. Keep in mind that this is our initial model, so the output may not always be accurate.

In [None]:
input_text = " In the Rodent Research Reference Mission (RRRM-1), forty female BALB/cAnNTac mice were flown on the International Space Station. To assess differences in outcomes due to age, twenty 10-12 week-old and twenty 32 week-old mice were flown, respectively. To directly assess spaceflight effects, half of the young and old mice (10 old, 10 young) were sacrificed on-orbit after 22-23 days (ISS Terminal, ISS-T), while the other half (10 old, 10 young) were returned live to Earth after 40 days and allowed to recover for 2 days (Live Animal Return, LAR) before sacrifice. Both the ISS-T and LAR animals had independent ground controls (10 mice housed in flight hardware in matched environmental conditions), basal controls (10 mice sacrificed 1 day after launch), and vivarium controls (10 mice housed within standard vivarium habitats). Thus RRRM-1 included a total of 160 mice. This datasets features ribodepleted total RNA-seq data from livers dissected from all groups in RRRM-1. Data from 7-10 livers per group are included. All samples include either Mix 1 or Mix 2 of the ERCC spike-in control."

**Diagram creation**

In [None]:
diagram = automatic_text_to_diagram(input_text)

**Output diagram saved as a pdf image.**

In [None]:
diagram.render('auto-diagram-output', format='png')
diagram.view()

'auto-diagram-output.pdf'