# Hackathon Query Notebook

Welcome to the hackathon! This notebook is your starting point for querying the main dataset.

It's designed to connect to a **remote Tentris server** that is already running and loaded with all the data. You do not need to run or install Tentris yourself.

**Your only task:** Follow the setup steps, run the cells, and start writing your own queries!

### Step 1: Environment Setup with `uv` (One-Time Setup)

This notebook uses [`uv`](https://github.com/astral-sh/uv) for fast and simple Python environment setup. `uv` is a modern replacement for `venv` and `pip`.

**1. Install `uv` (from your terminal):**
Before you can run the cell below, you need to install `uv` on your system. Run **one** of the following commands in your regular terminal (not in the notebook).

*On macOS / Linux:*
```bash
curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | sh
```

*On Windows (PowerShell) or if the above fails:*
```bash
# You may need to run PowerShell as Administrator
pip install uv
```
*(You may need to restart VS Code for it to find the new `uv` command)*

**2. Run this cell to create the environment:**
The cell below will use `uv` to:
1.  Create a local virtual environment in a folder named `.venv`.
2.  Install all the packages defined in `pyproject.toml` (including `ipykernel`).

**3. Set your VS Code Kernel:**
When running the cell, click "Select Kernel" in the top-right and choose create new virtual env - you will use this root-level .venv only to run the first cell. After that, the newly created .venv will be used.

In [4]:
# 1. Clean up any old environment
!rm -rf .venv

# 2. Create a new, empty virtual environment
!uv venv

# 3. Install ipykernel so Jupyter can use this venv
!uv add --dev ipykernel


# 4. Sync your project dependencies from the toml file
!uv pip sync ../tentris-quickstart/pyproject.toml

# 5. (NEW) Register this venv as a Jupyter kernel
!.venv/bin/python -m ipykernel install --user --name="tentris-env" --display-name="Python (Tentris)"


Using CPython [36m3.13.2[39m
Creating virtual environment at: [36m.venv[39m
Activate with: [32msource .venv/bin/activate[39m
[2mResolved [1m41 packages[0m [2min 1ms[0m[0m
[2K[2mInstalled [1m36 packages[0m [2min 98ms[0m[0m                               [0m
 [32m+[39m [1masttokens[0m[2m==3.0.0[0m
 [32m+[39m [1mcomm[0m[2m==0.2.3[0m
 [32m+[39m [1mdebugpy[0m[2m==1.8.17[0m
 [32m+[39m [1mdecorator[0m[2m==5.2.1[0m
 [32m+[39m [1mexecuting[0m[2m==2.2.1[0m
 [32m+[39m [1mipykernel[0m[2m==7.0.1[0m
 [32m+[39m [1mipython[0m[2m==9.6.0[0m
 [32m+[39m [1mipython-pygments-lexers[0m[2m==1.1.1[0m
 [32m+[39m [1mjedi[0m[2m==0.19.2[0m
 [32m+[39m [1mjupyter-client[0m[2m==8.6.3[0m
 [32m+[39m [1mjupyter-core[0m[2m==5.9.1[0m
 [32m+[39m [1mmatplotlib-inline[0m[2m==0.1.7[0m
 [32m+[39m [1mnest-asyncio[0m[2m==1.6.0[0m
 [32m+[39m [1mnumpy[0m[2m==2.3.4[0m
 [32m+[39m [1mpackaging[0m[2m==25.0[0m
 [32m+[39m [1mp

### Step 2: Imports and Configuration

This cell imports the libraries and sets up our connection variables.

In [8]:
import rdflib
import pandas as pd
import tentris # Required to register the Tentris store
from tentris import TentrisHTTPStore
from IPython.display import display, Markdown

# --- üí° IMPORTANT üí° ---
# This is the ONLY line you need to change.
# Set this to the server IP address provided by the hackathon organizers.
ENDPOINT_URL = "http://128.178.219.51:7502"
# ------------------------

# We will create our main 'graph' object in the next step
graph = None

### Step 3: Connect to the Server

This cell creates the `rdflib.Graph` object. It uses the `TentrisHTTPStore`, which is optimized to work with the Tentris server.

**Note:** This cell also runs a test query (`ASK { ?s ?p ?o }`) to make sure the server is reachable. If this cell fails, please double-check the `ENDPOINT_URL` you set in Step 2.

In [9]:
display(Markdown(f"üöÄ Connecting to server at `{ENDPOINT_URL}`..."))

try:
    # Initialize the TentrisHTTPStore with the base endpoint URL.
    # This client is smart and knows how to find the /sparql and /stream endpoints.
    store = TentrisHTTPStore(ENDPOINT_URL)
    
    # Create the graph object
    graph = rdflib.Graph(store)
    
    # Run a simple test query to confirm the connection
    # This will raise an error if the server is unreachable
    graph.query("ASK { ?s ?p ?o }")
    
    display(Markdown("‚úÖ **Connection successful!** The database is ready to be queried."))
    
except Exception as e:
    display(Markdown(f"‚ùå **Connection Failed:** Could not connect to the server.\n\n*Details: {e}*\n\nPlease check the `ENDPOINT_URL` in Step 2 and ensure the server is running."))
    graph = None # Ensure graph is None if connection fails

üöÄ Connecting to server at `http://128.178.219.51:7502`...

‚úÖ **Connection successful!** The database is ready to be queried.

### Step 4: Run Example Queries

Now you're ready to go! The `graph` object is your gateway to the database.

Here are a few examples to get you started. You can (and should!) modify these and create new cells to write your own.

In [11]:
# Example 1: Count all triples in the database
# This is a good way to see how much data you're working with.

if graph:
    query_str_count = "SELECT (COUNT(*) AS ?totalTriples) WHERE { ?s ?p ?o }"
    
    display(Markdown("**Running query:** Counting all triples..."))
    
    results = graph.query(query_str_count)
    
    for row in results:
        display(Markdown(f"Total Triples in Database: **{row.totalTriples}**"))
else:
    display(Markdown("‚ö†Ô∏è Skipping query: Database not connected."))

**Running query:** Counting all triples...

Total Triples in Database: **2000002**

In [12]:
# Example 2: Show 10 triples as a table
# We can use pandas to display the results in a nice table.

if graph:
    query_str_limit = "SELECT * WHERE { ?s ?p ?o } LIMIT 10"
    
    display(Markdown("**Running query:** Getting 10 triples..."))
    
    # Run the query
    results = graph.query(query_str_limit)
    
    # Convert results to a list of dictionaries
    results_list = [row.asdict() for row in results]
    
    # Load into a pandas DataFrame and display
    df = pd.DataFrame(results_list)
    display(df)
    
else:
    display(Markdown("‚ö†Ô∏è Skipping query: Database not connected."))

**Running query:** Getting 10 triples...

Unnamed: 0,s,p,o
0,http://example.org/software/81257,http://schema.org/contentUrl,http://example.org/downloads/software81257.zip
1,http://example.org/software/81257,http://schema.org/datePublished,2007-06-02
2,http://example.org/software/81257,http://schema.org/url,http://example.org/software/81257/
3,http://example.org/software/81257,http://schema.org/dateCreated,2007-06-02
4,http://example.org/software/81257,http://schema.org/license,https://spdx.org/licenses/MIT.html
5,http://example.org/software/81257,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://schema.org/SoftwareSourceCode
6,http://example.org/software/81257,http://schema.org/name,Example Software 81257
7,http://example.org/software/81257,https://open-pulse.epfl.ch/ontology#repository...,https://open-pulse.epfl.ch/ontology#Software
8,http://example.org/software/81257,http://schema.org/codeRepository,http://github.com/exampleorg/repo81257
9,http://example.org/software/81257,http://schema.org/programmingLanguage,R


### Step 5: Your Turn! (Happy Hacking)

This is your canvas. Create new code cells below this one and start building your project.

**Tip:** Don't forget to add `LIMIT 10` to your queries while you are exploring, so you don't accidentally try to print a million rows!