# Daily Log to SQL

This is a file to help clean up data from the daily logs and insert them into the Limblab MySQL database. You will need to know the passwords and either be connected to the VPN or running this remotely on Shrek or Donkey to use this.

In [177]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

dbName = "staging_db"
userName = "limblab"
password = input("enter password") # this isn't secure, but it's better than allowing it to go up to git

enter password


### Make sure to update the filename and monkey name below:

Either run for the google sheet **or** the excel version

In [139]:
monkeyName = "Rocket"
ccmID = "19L1"

#### For a Google Sheet

In [140]:
# Using a google sheet
sheetName = "DailyLog"
# file_id is the portion after the "d" in the URL
file_id = "1ICGCMKkMShzQpq1FKOBGjxaMFoDY9_mtydJxAm6Bv3U"
googleURL = f"https://docs.google.com/spreadsheets/d/{file_id}/export?gid=0&format=csv&sheet={sheetName}"

print(googleURL)

log = pd.read_csv(googleURL)

https://docs.google.com/spreadsheets/d/1ICGCMKkMShzQpq1FKOBGjxaMFoDY9_mtydJxAm6Bv3U/export?gid=0&format=csv&sheet=DailyLog


#### For an excel file

You can use forward slashes even if you're using windows. You will need to either do that or replace all of the backslashes with "\\" since it will see a single "\" as an escape key.

In [66]:
# Using an excel file
sheetName = "DailyLog"
fileName = "C:/Users/17204/Downloads/Rocket.xlsx" 
log = pd.read_excel(fileName,sheet_name=sheetName)


### Let's inspect the logs

Most likely we'll just remove any dates that don't have any useful filled information, though you should double check that nothing weird is going on.

In [141]:
log.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 683 entries, 0 to 682
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Day                   683 non-null    object 
 1   Date                  683 non-null    object 
 2   Weight                103 non-null    float64
 3   Start time            99 non-null     object 
 4   End time              84 non-null     object 
 5   H2O (lab)             164 non-null    float64
 6   H20 (bottle)          54 non-null     float64
 7   H2O (total)           683 non-null    int64  
 8   Required Daily        676 non-null    float64
 9   Avg H2O intake        676 non-null    float64
 10  Required average H2)  676 non-null    float64
 11  Supplementary Treats  66 non-null     object 
 12  Pulse size            41 non-null     object 
 13  Reward                37 non-null     object 
 14  Abort                 20 non-null     float64
 15  Fail                  2

### Remove unneeded fields

The fields for the daily logs are:

| Field | | Datatype |
| :-: | :-: | :-: |
| **rec_date** | | date |
| **monkey_id** | | varchar(10) |
| **weight** | | int |
| **start_time** | | time |
| **end_time** | | time |
| **h2o_lab** | | int |
| **h2o_home** | | int |
| **treats** | | varchar(40) |
| **lab_num** | | varchar(10) |
| **num_reward** | | int |
| **num_abort** | | int |
| **num_fail** | | int |
| **num_incomplete** | | int |
| **behavior_notes** | | varchar(1000) |
| **behavior_quality** | | enum: 'bad','ok','good' |
| **health_notes** | | varchar(1000) |
| **cleaned** | | bool/tinyint(1) |
| **other_notes** | | varchar(1000) |
| **day_key** | | int |
| **experiment** | | varchar(1000) |
| **experimentor**| | varchar(50) |


drop any fields that don't align with these and then change the names appropriately

In [142]:
# list of columns. You will need to change these to match the current dataframe columns
dropCols = ['Day', 'H2O (total)', 'Required Daily', 'Avg H2O intake', 'Required average H2)', 
              'Pulse size', 'Pulse size', 'Time doing task']

log.drop(columns = dropCols, inplace=True)

# rename remaining columns to match the database names
# should be a dictionary of {old_name:new_name}
renameCols = {'Date':'rec_date',
             'Weight':'weight',
             'Start time':'start_time',
             'End time':'end_time',
             'H2O (lab)': 'h2o_lab',
             'H20 (bottle)': 'h2o_home',
             'Supplementary Treats':'treats',
             'Lab no.':'lab_num',
             'Reward':'num_reward',
             'Abort':'num_abort',
             'Fail':'num_fail',
             'Incompl':'num_incomplete',
             'Behavioral Notes':'behavior_notes',
             'Health Notes':'health_notes',
             'Cleaned':'cleaned',
             'Other Notes':'other_notes',
             'Person Working':'experimentor'}
log.rename(columns = renameCols, inplace=True)



### Remove invalid days

We don't want entries from days where we didn't record. To that end, we will remove anything where we don't have weight, a start time, and h2o in the lab. I mean this in boolean AND sense, meaning if we have any of those three we will keep the row just to be safe.

In [143]:
dropRows = np.where(log[['weight', 'start_time', 'h2o_lab']].isnull().sum(axis=1)>=3)[0]

log.drop(index = dropRows, inplace=True)

log.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 169 entries, 7 to 618
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   rec_date        169 non-null    object 
 1   weight          103 non-null    float64
 2   start_time      99 non-null     object 
 3   end_time        84 non-null     object 
 4   h2o_lab         164 non-null    float64
 5   h2o_home        16 non-null     float64
 6   treats          66 non-null     object 
 7   num_reward      36 non-null     object 
 8   num_abort       20 non-null     float64
 9   num_fail        20 non-null     float64
 10  num_incomplete  20 non-null     float64
 11  lab_num         48 non-null     object 
 12  experimentor    67 non-null     object 
 13  behavior_notes  32 non-null     object 
 14  health_notes    11 non-null     object 
 15  cleaned         11 non-null     object 
 16  other_notes     3 non-null      object 
dtypes: float64(6), object(11)
memory us

### Change datatypes according to what is needed

As per the definitions described above

In [144]:
log['rec_date'] = pd.to_datetime(log['rec_date'])
log['rec_date'] # it's good to do some sanity checking to make sure these worked alright

7     2020-02-11
8     2020-02-12
9     2020-02-13
10    2020-02-14
11    2020-02-15
         ...    
610   2021-10-05
611   2021-10-06
613   2021-10-08
617   2021-10-12
618   2021-10-13
Name: rec_date, Length: 169, dtype: datetime64[ns]

In [145]:
# add the monkeyID
log['monkey_id'] = ccmID

In [146]:
# cleaning up the 'cleaned' property
YESs = ['Yes','Y']
log['cleaned'] = log['cleaned'].isin(YESs).astype(bool)
log['cleaned'].value_counts(dropna=False)

False    166
True       3
Name: cleaned, dtype: int64

In [147]:
log['h2o_lab'] = log['h2o_lab'].astype(pd.Int64Dtype())
log['h2o_home'] = log['h2o_home'].astype(pd.Int64Dtype())
# log['num_reward'] = log['num_reward'].astype(pd.Int64Dtype())
log['num_abort'] = log['num_abort'].astype(pd.Int64Dtype())
log['num_fail'] = log['num_fail'].astype(pd.Int64Dtype())
log['num_incomplete'] = log['num_incomplete'].astype(pd.Int64Dtype())

In [148]:
log['num_reward'].replace(to_replace = '~75', value = '75', inplace=True)
log['num_reward'].value_counts()
log['num_reward'] = pd.to_numeric(log['num_reward']).astype(pd.Int64Dtype())
log['num_reward']

7      <NA>
8       171
9      <NA>
10     <NA>
11     <NA>
       ... 
610      63
611      26
613    <NA>
617     270
618      50
Name: num_reward, Length: 169, dtype: Int64

In [176]:
df.replace()

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

### Export to MySQL database

you will need to create an ssh tunnel using 

<code>ssh -N -L 3306:localhost:3306 {Username}@{hostname}</code>

In [151]:
# this is set up using an SSH tunnel
engine = create_engine(f"mysql+pymysql://{userName}:{password}@127.0.0.1:3306/{dbName}")

log.to_sql('days', engine, index=False, if_exists="append")

DataError: (pymysql.err.DataError) (1406, "Data too long for column 'lab_num' at row 118")
[SQL: INSERT INTO days (rec_date, weight, start_time, end_time, h2o_lab, h2o_home, treats, num_reward, num_abort, num_fail, num_incomplete, lab_num, experimentor, behavior_notes, health_notes, cleaned, other_notes, monkey_id) VALUES (%(rec_date)s, %(weight)s, %(start_time)s, %(end_time)s, %(h2o_lab)s, %(h2o_home)s, %(treats)s, %(num_reward)s, %(num_abort)s, %(num_fail)s, %(num_incomplete)s, %(lab_num)s, %(experimentor)s, %(behavior_notes)s, %(health_notes)s, %(cleaned)s, %(other_notes)s, %(monkey_id)s)]
[parameters: ({'rec_date': datetime.datetime(2020, 2, 11, 0, 0), 'weight': 7.7, 'start_time': '11:00', 'end_time': '11:30', 'h2o_lab': 200, 'h2o_home': 0, 'treats': 'Tons of craisins', 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': '6', 'experimentor': 'CV/BS', 'behavior_notes': 'Well behaved. Starting to direct handle to targets', 'health_notes': 'Healthy', 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 12, 0, 0), 'weight': 7.7, 'start_time': '11:20', 'end_time': '12:00', 'h2o_lab': 200, 'h2o_home': 0, 'treats': None, 'num_reward': 171, 'num_abort': 46, 'num_fail': 0, 'num_incomplete': 0, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 13, 0, 0), 'weight': None, 'start_time': None, 'end_time': None, 'h2o_lab': 300, 'h2o_home': None, 'treats': None, 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 14, 0, 0), 'weight': None, 'start_time': None, 'end_time': None, 'h2o_lab': 300, 'h2o_home': None, 'treats': None, 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 15, 0, 0), 'weight': None, 'start_time': None, 'end_time': None, 'h2o_lab': 300, 'h2o_home': None, 'treats': None, 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 16, 0, 0), 'weight': None, 'start_time': None, 'end_time': None, 'h2o_lab': 300, 'h2o_home': None, 'treats': None, 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 17, 0, 0), 'weight': None, 'start_time': None, 'end_time': None, 'h2o_lab': 300, 'h2o_home': None, 'treats': None, 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2020, 2, 18, 0, 0), 'weight': 7.6, 'start_time': None, 'end_time': None, 'h2o_lab': 150, 'h2o_home': None, 'treats': None, 'num_reward': None, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': None, 'experimentor': None, 'behavior_notes': None, 'health_notes': None, 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}  ... displaying 10 of 169 total bound parameter sets ...  {'rec_date': datetime.datetime(2021, 10, 12, 0, 0), 'weight': 10.6, 'start_time': '16:30', 'end_time': '18:00', 'h2o_lab': 250, 'h2o_home': 0, 'treats': 'craisins', 'num_reward': 270, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': '1', 'experimentor': 'EA', 'behavior_notes': 'Only did touchpad. He still has the shaking behavior but at least does the task this time. I tried introducing the task to him but he immediately started shaking it and did not stop. When the task was removed, he kept doing the thouchpad.', 'health_notes': 'Head looks dry and clean. Vasectomy site has no puss.', 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'}, {'rec_date': datetime.datetime(2021, 10, 13, 0, 0), 'weight': 10.6, 'start_time': '16:00', 'end_time': '17:00', 'h2o_lab': 200, 'h2o_home': 50, 'treats': 'craisins', 'num_reward': 50, 'num_abort': None, 'num_fail': None, 'num_incomplete': None, 'lab_num': '1', 'experimentor': 'EA', 'behavior_notes': 'Did not really want to work in the lab today. Even with the touchpad task he tried to escape instead of trying to do the simple task.', 'health_notes': 'Head looks dry and clean. Vasectomy site has no puss.', 'cleaned': 0, 'other_notes': None, 'monkey_id': '19L1'})]
(Background on this error at: https://sqlalche.me/e/14/9h9h)