# Transforming Scouting Data from SQL to Pandas in Jupyter

### 1. References
* [Psycopg2 Documentation](https://www.psycopg.org/docs/usage.html)
* [Pandas read_sql Documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql)
* [Python pickle Documentation](https://docs.python.org/3/library/pickle.html#examples)

### 2. Imports
There are several python modules that we need to work with scouting data in a Jupyter notebook:

In [1]:
import pickle

import pandas as pd
import psycopg2 as pg2

import server.model.connection as smc
import server.model.event as sme

1. We'll use the first imported package, *pickle*, to save python objects to and read from files on the hard-drive.
2. Pandas and Psycopg are external packages. We use Pandas to manipulate tabular data. We use Psycopg to exchange data with the SQL server.
3. Finally, server.model.connection and server.model.event are modules that are part of the IRS's scouting system. We use these modules to get a database connection and to set the current event within the scouting system.

#### Note:
The order of the imports is not accidental.
* We always import packages from the Python Standard Library first (packages that are included with the basic Python installation), and follow those imports with a blank line.
* Next we import external packages like Pandas and Psycopg, followed by another blank line. By external packages, I mean packages that are publically available, but must be installed separately from Python (e.g., `conda install pandas`).
* Finally we install custom packages that we wrote ourselves.

The imports for `server.model.connection` and `server.model.event` won't work unless the *irsScouting2017* folder is added to your `PYTHONPATH` environment variable. Alternatively, you can run the following lines of code before the import statements:
```python
    import sys
    # This path will be different on your machine. Please change it accordingly.
    project_path = 'C:\Users\stacy\OneDrive\Projects\scouting17\irsScouting2017'
    sys.path.append(project_path)
```

### 3. Getting a Database Connection
The `server.model.connection` module, which we've aliased to `smc`, contains an easy way to get a psycopg2 connection to the scouting database.

In [2]:
pg_pool = smc.set_pool()
conn = pg_pool.getconn()

### 4. Constructing a Pandas Dataframe from SQL Data
Once we have a psycopg2 connection, we can create a Pandas dataframe from a SQL statement:

In [3]:
sql = """
    SELECT * FROM events;
"""
events = pd.read_sql(sql, conn)
events.head()

Unnamed: 0,id,name,state,type,season
0,15697,orwil,,,2017
1,1417,wasno,,,2017
2,1,waamv,,,2017
3,9073,orore,,,2017
4,18193,turing,,,2017


Note how I placed the SQL statement within triple quotes. It's not a big deal when you are using short SQL statements that fit on one line, but for long multi-line SQL statements, triple quotes are very convenients. See how we filter the SQL output to a single event below.

In [4]:
sql = """
    SELECT * FROM events
    WHERE name = 'test_event_2';
"""
test_event2 = pd.read_sql(sql, conn)
test_event2

Unnamed: 0,id,name,state,type,season
0,25394,test_event_2,,,2020


### 5. Setting the Current Event in the Scouting System
We want to tell the scouting system to only return the test data we randomly generated during the 2020 build system. The `server.model.event` module, aliased as `sme`, has an `EventDal` class with a `set_current_event()` method that will do this:

In [5]:
sme.EventDal.set_current_event('test_event_2', 2020)

25394

### 6. Getting the Scouting Data
Use the *vw_measures* view, which is defined within the scouting database, to get an easy-to-read table of scouting datafor the current event:

In [6]:
sql = """
    SELECT * FROM vw_measures;
"""
measures = pd.read_sql(sql, conn)
measures.head()

Unnamed: 0,date,event,season,level,match,alliance,team,station,actor,task,measuretype,phase,attempt,reason,capability,successes,attempts,cycle_times,last_match,num_matches
0,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,movedAuto,boolean,auto,summary,na,,1,1,0,1,3
1,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,pickupPowerCellsG,count,teleop,summary,na,,2,2,0,1,3
2,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,crossOpponentSector,boolean,auto,summary,na,,0,0,0,1,3
3,2019-03-30T11:07:00,test_event_2,2020,qual,002-q,blue,1318,1,robot,startingPosition,enum,auto,summary,na,Load,0,0,0,3,3
4,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,disabled,boolean,finish,summary,na,,0,0,0,1,3


While we're at it, let's get the event schedule. Unfortunately there is a glitch in the database and the schedule for this test event was added twice. We need to delete the duplicate rows.

In [7]:
sql = """
DELETE FROM schedules
WHERE event_id = 25394 AND id >= 205358;
"""
# We have to get a cursor object to run the SQL DELETE query
cur = conn.cursor()
cur.execute(sql)
# We have to run the commit() method to save the deletions to the database.
conn.commit()
# We're not going to use the cursor again, so let's close it.
cur.close()

Now that we've deleted the excess rows from the schedule, we can get the schedule dataframe.

In [8]:
sql = """
SELECT * FROM schedules
WHERE event_id = 25394
ORDER BY match, alliance;
"""
schedule = pd.read_sql(sql, conn)
schedule

Unnamed: 0,id,date,level,match,alliance,team,station,event_id
0,204952,2019-03-30T11:00:00,qual,001-q,blue,3684,3,25394
1,205043,2019-03-30T11:00:00,qual,001-q,blue,4911,2,25394
2,204951,2019-03-30T11:00:00,qual,001-q,blue,3070,1,25394
3,205042,2019-03-30T11:00:00,qual,001-q,red,4089,3,25394
4,204950,2019-03-30T11:00:00,qual,001-q,red,2046,2,25394
...,...,...,...,...,...,...,...,...
404,205356,2019-03-31T10:47:00,qual,068-q,blue,2980,1,25394
405,205340,2019-03-31T10:47:00,qual,068-q,red,2046,3,25394
406,205355,2019-03-31T10:47:00,qual,068-q,red,948,2,25394
407,205339,2019-03-31T10:47:00,qual,068-q,red,4461,1,25394


Let's get the teams as well.

In [9]:
sql = """
SELECT * FROM teams
WHERE teams.name IN (SELECT team FROM schedules WHERE event_id = 25394);
"""
teams = pd.read_sql(sql, conn)
teams.head()

Unnamed: 0,id,name,long_name,city,state,region,year_founded
0,5,1318,Issaquah Robotics Society,Issaquah,Washington,,2004
1,7254,2926,Robo Sparks,Wapato,Washington,,2009
2,1435,3070,Team Pronto,Seattle,Washington,,2009
3,5134,2990,Hotwire,Turner,Oregon,,2009
4,23,4461,Ramen,Seattle,Washington,,2013


Finally, it's a good practice to give your database connection back to the pool when you are done with it, so someone else can use it:

In [10]:
pg_pool.putconn(conn)

### 7. Saving the Dataframe to a File
Wouldn't it be nice if we could just save the dataframe to a file, so we wouldn't have to connect to the database every time we wanted to look at the scouting data? Good news! The *pickle* package from the Python Stnadard Library will allow us to do exactly that. See below for how to save any Python object to a file using *pickle*:

In [11]:
scouting_data = {'schedule': schedule, 'teams': teams, 'measures': measures}

In [12]:
with open('test_evt2.pickle', 'wb') as file:
    pickle.dump(scouting_data, file)

We can run command prompt commands directly from Jupyter if we preface them with an exlamation point. We run `dir` below to verify our file was created:

In [13]:
!dir *.pickle

 Volume in drive C is Windows
 Volume Serial Number is 3870-B62A

 Directory of C:\Users\stacy\OneDrive\projects\PythonClass\Useful Notebooks

02/01/2020  04:14 PM           372,042 test_evt2.pickle
               1 File(s)        372,042 bytes
               0 Dir(s)  112,545,525,760 bytes free


The code to open a pickle file and convert its contents back into a Python file is also very simple:

In [14]:
with open('test_evt2.pickle', 'rb') as file:
    scouting_data_from_file = pickle.load(file)
scouting_data_from_file['measures'].head()

Unnamed: 0,date,event,season,level,match,alliance,team,station,actor,task,measuretype,phase,attempt,reason,capability,successes,attempts,cycle_times,last_match,num_matches
0,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,movedAuto,boolean,auto,summary,na,,1,1,0,1,3
1,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,pickupPowerCellsG,count,teleop,summary,na,,2,2,0,1,3
2,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,crossOpponentSector,boolean,auto,summary,na,,0,0,0,1,3
3,2019-03-30T11:07:00,test_event_2,2020,qual,002-q,blue,1318,1,robot,startingPosition,enum,auto,summary,na,Load,0,0,0,3,3
4,2019-03-30T12:46:00,test_event_2,2020,qual,013-q,blue,1318,1,robot,disabled,boolean,finish,summary,na,,0,0,0,1,3
