# Opteryx - Query Local Files

In this notebook we will be querying data from a local folder.

We're going to use data of Space Missions from 1957 to 2022 [Kaggle](https://www.kaggle.com/datasets/mysarahmadbhat/space-missions/versions/1?resource=download). The snapshot we are using was published 2022-08-10 by Mysar Ahmad Bhat under a Public Domain licence.

> **Note**   
> Because we are reading from local storage the location where Jupyter is started matters. This script assumes you have started in the 'opteryx/notebooks' folder. You will get errors relating to the dataset not being able to found if Jupyter was not started in the right folder. If this happens, one of two remediations are recommended:
> 1) Kill the Jupyter instance, browse to the 'opteryx/notebooks' folder and restart Jupyter.
> 2) Copy the 'space_missions' folder from 'opteryx/notebooks' to the root folder of Jupyter.

In [None]:
try:
    import opteryx
except ImportError:
    print('opteryx was not able to be imported, please install opteryx by running `pip install opteryx` and restart this notebook before trying again.')

To access data on disk, we need to load the `DiskStorage` connect and register that if a dataset starts with 'space_missions' (our folder), that Opteryx should look for that on disk.

Other connectors are available including `GcsStorage` and `MinIoStorage` for accessing data held on these storage systems.

In [None]:
from opteryx.connectors import DiskStorage

opteryx.register_store('space_missions', DiskStorage)

Now we can query the data held in the 'space_missions' folder like it was a database table. 

In [None]:
# define our SQL statement
sql_statement = "SELECT * FROM space_missions;"

# Create a connection
conn = opteryx.connect()

# create a database cursor
cursor = conn.cursor()

# execute our SQL statement
cursor.execute(sql_statement)

# display the first 10 results
cursor.head(10)

We can query this dataset using a range of standard SQL functionality - for example to find the 5 companies responsible for the most space missions we can run the below query:

In [None]:
# define our SQL statement
sql_statement = "SELECT COUNT(*) AS Missions, Company FROM space_missions GROUP BY Company ORDER BY Missions DESC LIMIT 5;"

# Create a connection
conn = opteryx.connect()

# create a database cursor
cursor = conn.cursor()

# execute our SQL statement
cursor.execute(sql_statement)

# display the results - we've limited to the TOP 5
cursor.head(5)