# MAGIC Gamma Telescope

In this notebook, we will load data simulating the registration of high-energy gamma particles in an atmospheric Cherenkov telescope. <a href="https://archive.ics.uci.edu/dataset/159/magic+gamma+telescope" target="_blank" rel="noopener">Follow this link</a> to get details about this dataset.

To execute queries and upload data to the Exasol database we will be using the <a href="https://github.com/exasol/pyexasol" target="_blank" rel="noopener">`pyexasol`</a> module.

## Prerequisites

Prior to using this notebook the following steps need to be completed:
1. [Configure the sandbox](../sandbox_config.ipynb).

## Setup

### Access configuration

In [2]:
%run ../access_store_ui.ipynb
display(get_access_store_ui('../'))

VBox(children=(Text(value='dss_config.sqlite', description='Config. File Name', style=TextStyle(description_wi…

## Download data

First, we will load the data into Pandas DataFrame.

First, we will load the data into Pandas DataFrame. Each data column represents one of the features and is named accordingly, see section Additional Variable Information in the dataset description. We will name the Pandas columns as per the variable description.

In [3]:
from urllib.request import urlopen
import tempfile
from zipfile import ZipFile
from contextlib import ExitStack
import pandas as pd
from stopwatch import Stopwatch

stopwatch = Stopwatch()

DATA_URL = "https://archive.ics.uci.edu/static/public/159/magic+gamma+telescope.zip"
DATA_FILE = "magic04.data"

resp = urlopen(DATA_URL)
with ExitStack() as stack:
    f = stack.enter_context(tempfile.TemporaryFile())
    f.write(resp.read())
    print(f"Downloading the data took: {stopwatch}")

    f.seek(0)
    z = stack.enter_context(ZipFile(f))
    f = stack.enter_context(z.open(DATA_FILE, "r"))
    df = pd.read_csv(f)

column_names = [
    'fLength',   # major axis of ellipse [mm]
    'fWidth',    # minor axis of ellipse [mm] 
    'fSize',     # 10-log of the sum of the content of all pixels [in #phot]
    'fConc',     # ratio of the sum of two highest pixels over fSize  [ratio]
    'fConc1',    # ratio of highest pixel over fSize  [ratio]
    'fAsym',     # distance from the highest pixel to center, projected onto major axis [mm]
    'fM3Long',   # 3rd root of the third moment along major axis  [mm] 
    'fM3Trans',  # 3rd root of the third moment along minor axis  [mm]
    'fAlpha',    # angle of major axis with vector to origin [deg]
    'fDist',     # distance from the origin to the center of ellipse [mm]
    'class'      # g,h - gamma (signal), hadron (background)
]
df.columns = column_names

print(df.head())

Downloading the data took: 4.71s
    fLength    fWidth   fSize   fConc  fConc1     fAsym  fM3Long  fM3Trans  \
0   31.6036   11.7235  2.5185  0.5303  0.3773   26.2722  23.8238   -9.9574   
1  162.0520  136.0310  4.0612  0.0374  0.0187  116.7410 -64.8580  -45.2160   
2   23.8172    9.5728  2.3385  0.6147  0.3922   27.2107  -6.4633   -7.1513   
3   75.1362   30.9205  3.1611  0.3168  0.1832   -5.5277  28.5525   21.8393   
4   51.6240   21.1502  2.9085  0.2420  0.1340   50.8761  43.1887    9.8145   

    fAlpha    fDist class  
0   6.3609  205.261     g  
1  76.9600  256.788     g  
2  10.4490  116.737     g  
3   4.6480  356.462     g  
4   3.6130  238.098     g  


## Upload data into DB

Let's split data randomly into train and test sets. We will then create two tables - TELESCOPE_TRAIN and TELESCOPE_TEST - and load the datasets into these tables.

In [4]:
from sklearn.model_selection import train_test_split
import pyexasol

# Split the data into train and test sets
df_train, df_test = train_test_split(df, test_size=0.2)

train_table = 'TELESCOPE_TRAIN'
test_table = 'TELESCOPE_TEST'
column_desc = [f'{c} {("DECIMAL(18,4)" if c.startswith("f") else "CHAR(1)")}' for c in column_names]

stopwatch = Stopwatch()

# Create an Exasol connection
dsn = f'{sb_config.EXTERNAL_HOST_NAME}:{sb_config.HOST_PORT}'
with pyexasol.connect(dsn=dsn, user=sb_config.USER, password=sb_config.PASSWORD, compression=True) as conn:

    # Create tables
    sql = f'CREATE OR REPLACE TABLE "{sb_config.SCHEMA}"."{train_table}"({", ".join(column_desc)})'
    conn.execute(query=sql)
    sql = f'CREATE OR REPLACE TABLE "{sb_config.SCHEMA}"."{test_table}" LIKE "{sb_config.SCHEMA}"."{train_table}"'
    conn.execute(query=sql)

    # Import data into Exasol
    conn.import_from_pandas(df_train, (sb_config.SCHEMA, train_table))
    print(f"Imported {conn.last_statement().rowcount()} rows into {train_table}.")
    conn.import_from_pandas(df_test, (sb_config.SCHEMA, test_table))
    print(f"Imported {conn.last_statement().rowcount()} rows into {test_table}.")

print(f"Importing the data took: {stopwatch}")

  return src.to_csv(wrapped_pipe, header=False, index=False, line_terminator='\n', quoting=csv.QUOTE_NONNUMERIC, **kwargs)


Imported 15215 rows into TELESCOPE_TRAIN.
Imported 3804 rows into TELESCOPE_TEST.
Importing the data took: 717.22ms
