# SSH Tunnel instructions
Use these flags with your ssh command to create tunnels for Neo4j (and optionally Jupyter notebook server). You must first add your public key to the instance list of SSH keys and include the path to your private key in the SSH command. Note: the server's IP does change and will need to be updated. <br>
Port descriptions:
* 7474 - HTTP browser interface with Neo4j
* 7687 - Bolt protocol enables API connections and cypher executions with Neo4j 
* 8888 - Optional access to Jupyter notebook server which needs to be started separately

In [None]:
ssh -i PathToPrivateKey -L 7474:localhost:7474 -L 7687:localhost:7687 -L 8888:localhost:8888 UserName@CurrentInstanceIP

Here is an example of what the command should look like:

In [None]:
ssh -i C:\Users\apruf\.ssh\gc-ssh -L 7474:localhost:7474 -L 7687:localhost:7687 -L 8888:localhost:8888 a_prufrock@34.174.51.185

# Neo4j browser
To access the Neo4j server open a browser and enter the address localhost:7474 <br>
At this time Neo4j does not require authemtication. Change authentication type to "No authentication" and press the connect button. 

# Jupyter notebook server
To access the Jupyter notebook server open a browser and enter the address localhost:8888 <br>
You must first initialize the Jupyter notebook server with the following command in a terminal window. Note: the terminal window must stay open while using the Jupyter notebook hosted on the server, and the window will not be available for other commands. <br>
Once you start the server the terminal window will display your session token at the end of serveral supplied url's. You need to supply this token after navigating to localhost:8888

In [None]:
/home/a_prufrock/.local/bin/jupyter notebook --no-browser --port=8888

# Joern CPG creation

# Step 1: joern-parse
This parses code into a joern compatible .bin file. Here is joern-parse's help information:

In [5]:
!joern-parse 

Usage: joern-parse [options] [input]

  input                  source file or directory containing source files
  -o, --output <value>   output filename
  --language <value>     source language
  --list-languages       list available language options
  --namespaces <value>   namespaces to include: comma separated string
Overlay application stage
  --nooverlays           do not apply default overlays
  --overlaysonly         Only apply default overlays
  --max-num-def <value>  Maximum number of definitions in per-method data flow calculation
Misc
  --help                 display this help message
Args specified after the --frontend-args separator will be passed to the front-end verbatim
java.lang.AssertionError: Input path required
	at io.joern.joerncli.JoernParse$.checkInputPath$$anonfun$1(JoernParse.scala:98)
	at io.joern.joerncli.JoernParse$.checkInputPath$$anonfun$adapted$1(JoernParse.scala:102)
	at scala.util.Try$.apply(Try.scala:210)
	at io.joern.joerncli.JoernParse$.checkInputPat

# Example of joern-parse command 
This parses an example exploit code from exploitDB 47015.c into a joern compatible representation 47015.bin

In [1]:
!joern-parse --output ~/jp_run1/47015.bin /home/a_prufrock/joerndemo/47015.c

Parsing code at: /home/a_prufrock/joerndemo/47015.c - language: `NEWC`
[+] Running language frontend
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /home/mattjtrevi/bin/joern/joern-cli/c2cpg.sh -J-Xmx3998m /home/a_prufrock/joerndemo/47015.c --output /home/a_prufrock/jp_run1
3) start joern, import the cpg: `importCpg("path/to/cpg")`

[+] Applying default overlays
Successfully wrote graph to: /home/a_prufrock/jp_run1
To load the graph, type `joern /home/a_prufrock/jp_run1`


# joern-export
Takes the joern compatible .bin file and outputs neo4jcsv files (data, headers & cypher import script) in the specified graph representation (all representations selected here). 

In [6]:
!joern-export --repr=all --format=neo4jcsv --out ~/je_out6 ~/jp_run1/47015.bin

exported 345 nodes, 1791 edges into /home/a_prufrock/je_out6
Instructions on how to import the exported files into neo4j:
Prerequisite: ensure you have neo4j community server running (enterprise and desktop may work too)
e.g. download from https://neo4j.com/download-center/#community and start via `bin/neo4j console`

Then, in a new terminal:
```
cd <neo4j_root>

# if you have a fresh instance, you must first change the initial password
bin/cypher-shell -u neo4j -p neo4j
# exit the cypher shell

# copy the data files to the `import` directory, where neo4j will find them
cp /home/a_prufrock/je_out6/*_data.csv import

find /home/a_prufrock/je_out6 -name 'nodes_*_cypher.csv' -exec bin/cypher-shell -u neo4j -p <password> --file {} \;
find /home/a_prufrock/je_out6 -name 'edges_*_cypher.csv' -exec bin/cypher-shell -u neo4j -p <password> --file {} \;
```



# Example implementation
This is the implementation used to process the example used in our slide presentation.

In [None]:
import os
import subprocess
import json
from pathlib import Path
import shutil
import re

def expand_path(path):
    """Expand the path to handle both ~ and relative paths"""
    try:
        expanded_path = os.path.expanduser(os.path.expandvars(path))
        return os.path.abspath(expanded_path)
    except Exception as e:
        print(f"Error expanding path: {str(e)}")
        return path

def setup_directories():
    """Create necessary directories if they don't exist"""
    home = str(Path.home())
    output_dir = os.path.join(home, "cpg_graphs")
    cpg_dir = os.path.join(output_dir, "cpg_bins")
    dot_dir = os.path.join(output_dir, "cpg_dot")
    json_dir = os.path.join(output_dir, "cpg_json")
    csv_dir = os.path.join(output_dir, "cpg_csv")  # New CSV directory
    
    for directory in [output_dir, cpg_dir, dot_dir, json_dir, csv_dir]:
        if os.path.exists(directory):
            shutil.rmtree(directory)
        os.makedirs(directory, exist_ok=True)
        print(f"Directory ready: {directory}")
    return output_dir, cpg_dir, dot_dir, json_dir, csv_dir

def run_joern_parse(code_path, cpg_dir):
    """Run joern-parse on the input file"""
    try:
        base_name = os.path.splitext(os.path.basename(code_path))[0]
        cpg_output = os.path.join(cpg_dir, f"{base_name}.bin")
        cmd = ['joern-parse', code_path, '--output', cpg_output]
        print(f"Running joern-parse...")
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        print(f"joern-parse output: {result.stdout}")
        return cpg_output if os.path.exists(cpg_output) else None
    except subprocess.CalledProcessError as e:
        print(f"Error running joern-parse: {e.stderr}")
        return None

def handle_csv_export(cpg_path, csv_dir, base_name):
    """Handle the CSV file export"""
    try:
        # Create export directory for CSV
        export_name = f"{base_name}_csv"
        export_dir = os.path.join(csv_dir, export_name)
        
        cmd = ['joern-export', '--repr=all', '--format=neo4jcsv', '--out', export_dir, cpg_path]
        print("Running joern-export for CSV files...")
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        print(f"joern-export output: {result.stdout}")
        
        # Check if export was successful
        if os.path.exists(export_dir):
            print(f"Successfully created CSV files in: {export_dir}")
            return export_dir
        else:
            print(f"Error: CSV export directory not created at {export_dir}")
            return None
    except subprocess.CalledProcessError as e:
        print(f"Error running joern-export for CSV: {e.stderr}")
        return None
    except Exception as e:
        print(f"Error handling CSV export: {str(e)}")
        return None

def handle_dot_export(cpg_path, dot_dir, base_name):
    """Handle the dot file export"""
    try:
        export_name = f"{base_name}_dot"
        export_dir = os.path.join(dot_dir, export_name)
        
        cmd = ['joern-export', '--repr=all', '--format=dot', '--out', export_dir, cpg_path]
        print("Running joern-export for DOT file...")
        result = subprocess.run(cmd, capture_output=True, text=True, check=True)
        print(f"joern-export output: {result.stdout}")

        source_dot = os.path.join(export_dir, "export.dot")
        final_dot = os.path.join(dot_dir, f"{base_name}.dot")
        
        if os.path.exists(source_dot):
            shutil.copy2(source_dot, final_dot)
            shutil.rmtree(export_dir)
            print(f"Successfully created DOT file: {final_dot}")
            return final_dot
        else:
            print(f"Error: export.dot not found in {export_dir}")
            return None
    except subprocess.CalledProcessError as e:
        print(f"Error running joern-export: {e.stderr}")
        return None
    except Exception as e:
        print(f"Error handling dot export: {str(e)}")
        return None

def parse_dot_file(dot_file):
    """Parse the DOT file and return nodes and edges"""
    nodes = {}
    edges = []
    
    with open(dot_file, 'r') as f:
        content = f.read()
    
    node_pattern = re.compile(r'(\d+)\[label=(.*?)\]')
    for match in node_pattern.finditer(content):
        node_id, label = match.groups()
        nodes[node_id] = label.strip('"')
    
    edge_pattern = re.compile(r'(\d+)\s*->\s*(\d+)')
    for match in edge_pattern.finditer(content):
        source, target = match.groups()
        edges.append((source, target))
    
    return nodes, edges

def dot_to_json(dot_file, json_dir, base_name):
    """Convert DOT file to JSON format"""
    try:
        if not os.path.exists(dot_file):
            print(f"Error: DOT file not found at {dot_file}")
            return False

        print(f"Parsing DOT file: {dot_file}")
        nodes, edges = parse_dot_file(dot_file)
        
        graph_data = {
            "nodes": [{"id": node_id, "label": label} for node_id, label in nodes.items()],
            "links": [{"source": source, "target": target} for source, target in edges]
        }
        
        json_file = os.path.join(json_dir, f"{base_name}.json")
        with open(json_file, 'w') as f:
            json.dump(graph_data, f, indent=2)
        
        print(f"Generated JSON file: {json_file}")
        print(f"Number of nodes: {len(nodes)}")
        print(f"Number of edges: {len(edges)}")
        return True
    except Exception as e:
        print(f"Error converting DOT to JSON: {str(e)}")
        print(f"Error type: {type(e)}")
        return False

def process_file(code_path, output_dir, cpg_dir, dot_dir, json_dir, csv_dir):
    """Process a single file to generate CPG and related outputs"""
    try:
        code_path = expand_path(code_path)
        if not os.path.exists(code_path):
            print(f"Error: Input file does not exist: {code_path}")
            return False

        print(f"\nProcessing file: {code_path}")
        base_name = os.path.splitext(os.path.basename(code_path))[0]

        # Run joern-parse
        cpg_path = run_joern_parse(code_path, cpg_dir)
        if not cpg_path:
            return False

        # Generate dot file
        dot_file = handle_dot_export(cpg_path, dot_dir, base_name)
        if not dot_file:
            return False

        # Generate JSON
        if not dot_to_json(dot_file, json_dir, base_name):
            return False

        # Generate CSV
        csv_export_dir = handle_csv_export(cpg_path, csv_dir, base_name)
        if not csv_export_dir:
            return False

        return True
            
    except Exception as e:
        print(f"Error processing {code_path}: {str(e)}")
        return False

if __name__ == "__main__":
    # Setup directories
    output_dir, cpg_dir, dot_dir, json_dir, csv_dir = setup_directories()

    # Process the specific file
    home = str(Path.home())
    file_path = os.path.join(home, "exploitdb", "shellcodes", "linux_x86", "13310.c")
    
    print(f"Starting CPG generation script...")
    print(f"Input file: {file_path}")
    
    if process_file(file_path, output_dir, cpg_dir, dot_dir, json_dir, csv_dir):
        print(f"\nProcessing complete. Successfully processed {file_path}")
        print(f"\nOutput locations:")
        print(f"- CPG binary files: {cpg_dir}")
        print(f"- DOT files: {dot_dir}")
        print(f"- JSON files: {json_dir}")
        print(f"- CSV files: {csv_dir}")
    else:
        print(f"\nProcessing failed for {file_path}")

    # Print contents of output directories
    print("\nContents of output directories:")
    for dir_name, dir_path in [
        ("CPG binaries", cpg_dir),
        ("DOT files", dot_dir),
        ("JSON files", json_dir),
        ("CSV files", csv_dir)
    ]:
        print(f"\n{dir_name} directory ({dir_path}):")
        if os.path.exists(dir_path):
            for root, dirs, files in os.walk(dir_path):
                level = root.replace(dir_path, '').count(os.sep)
                indent = ' ' * 4 * level
                print(f"{indent}{os.path.basename(root)}/")
                sub_indent = ' ' * 4 * (level + 1)
                for f in files:
                    print(f"{sub_indent}{f}")
        else:
            print("  Directory not found!")