# Daily Log to SQL

This is a file to help clean up data from the daily logs and insert them into the Limblab MySQL database. You will need to know the passwords and either be connected to the VPN or running this remotely on Shrek or Donkey to use this.

In [63]:
import pandas as pd
import numpy as np
from sqlalchemy import engine

dbName = "staging_db"
userName = "LL"

### Make sure to update the filename and monkey name below:

Either run for the google sheet **or** the excel version

In [64]:
monkeyName = "Rocket"
ccmID = "19L1"

#### For a Google Sheet

In [65]:
# Using a google sheet
sheetName = "DailyLog"
# file_id is the portion after the "d" in the URL
file_id = "1ICGCMKkMShzQpq1FKOBGjxaMFoDY9_mtydJxAm6Bv3U"
googleURL = f"https://docs.google.com/spreadsheets/d/{file_id}/export?gid=0&format=csv&sheet={sheetName}"

print(googleURL)

log = pd.read_csv(googleURL)

https://docs.google.com/spreadsheets/d/1ICGCMKkMShzQpq1FKOBGjxaMFoDY9_mtydJxAm6Bv3U/export?gid=0&format=csv&sheet=DailyLog


#### For an excel file

You can use forward slashes even if you're using windows. You will need to either do that or replace all of the backslashes with "\\" since it will see a single "\" as an escape key.

In [66]:
# Using an excel file
sheetName = "DailyLog"
fileName = "C:/Users/17204/Downloads/Rocket.xlsx" 
log = pd.read_excel(fileName,sheet_name=sheetName)


### Let's inspect the logs

Most likely we'll just remove any dates that don't have any useful filled information, though you should double check that nothing weird is going on.

In [67]:
log.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 683 entries, 0 to 682
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Day                   683 non-null    object        
 1   Date                  683 non-null    datetime64[ns]
 2   Weight                103 non-null    float64       
 3   Start time            99 non-null     object        
 4   End time              84 non-null     object        
 5   H2O (lab)             164 non-null    float64       
 6   H20 (bottle)          54 non-null     float64       
 7   H2O (total)           683 non-null    int64         
 8   Required Daily        676 non-null    float64       
 9   Avg H2O intake        676 non-null    float64       
 10  Required average H2)  676 non-null    float64       
 11  Supplementary Treats  66 non-null     object        
 12  Pulse size            41 non-null     object        
 13  Reward              

### Remove unneeded fields

The fields for the daily logs are:

| Field | | Datatype |
| :-: | :-: | :-: |
| **rec_date** | | date |
| **monkey_id** | | varchar(10) |
| **weight** | | int |
| **start_time** | | time |
| **end_time** | | time |
| **h2o_lab** | | int |
| **h2o_home** | | int |
| **treats** | | varchar(40) |
| **lab_num** | | varchar(10) |
| **num_reward** | | int |
| **num_abort** | | int |
| **num_fail** | | int |
| **num_incomplete** | | int |
| **behavior_notes** | | varchar(1000) |
| **behavior_quality** | | enum: 'bad','ok','good' |
| **health_notes** | | varchar(1000) |
| **cleaned** | | bool/tinyint(1) |
| **other_notes** | | varchar(1000) |
| **day_key** | | int |
| **experiment** | | varchar(1000) |
| **experimentor**| | varchar(50) |


drop any fields that don't align with these and then change the names appropriately

In [97]:
# list of columns. You will need to change these to match the current dataframe columns
dropCols = ['Day', 'H2O (total)', 'Required Daily', 'Avg H2O intake', 'Required average H2)', 
              'Pulse size', 'Pulse size', 'Time doing task']

log.drop(columns = dropCols, inplace=True)

# rename remaining columns to match the database names
# should be a dictionary of {old_name:new_name}
renameCols = {'Date':'date',
             'Weight':'weight',
             'Start time':'start_time',
             'End time':'end_time',
             'H2O (lab)': 'h2o_lab',
             'H20 (bottle)': 'h2o_home',
             'Supplementary Treats':'treats',
             'Lab no.':'lab_num',
             'Reward':'num_reward',
             'Abort':'num_abort',
             'Fail':'num_fail',
             'Incompl':'num_incomplete',
             'Behavioral Notes':'behavior_notes',
             'Health Notes':'health_notes',
             'Cleaned':'cleaned',
             'Other Notes':'other_notes',
             'Person Working':'experimentor'}
log.rename(columns = renameCols, inplace=True)



log.variable_counts()

KeyError: "['Day' 'H2O (total)' 'Required Daily' 'Avg H2O intake'\n 'Required average H2)' 'Pulse size' 'Pulse size' 'Time doing task'] not found in axis"

### Remove invalid days

We don't want entries from days where we didn't record. To that end, we will remove anything where we don't have weight, a start time, and h2o in the lab. I mean this in boolean AND sense, meaning if we have any of those three we will keep the row just to be safe.

In [138]:
dropRows = np.where(log[['weight', 'start_time', 'h2o_lab']].isnull().sum(axis=1)>=3)[0]

log.drop(index = dropRows, inplace=True)

log.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 169 entries, 7 to 618
Data columns (total 17 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   date            169 non-null    datetime64[ns]
 1   weight          103 non-null    float64       
 2   start_time      99 non-null     object        
 3   end_time        84 non-null     object        
 4   h2o_lab         164 non-null    float64       
 5   h2o_home        16 non-null     float64       
 6   treats          66 non-null     object        
 7   num_reward      36 non-null     object        
 8   num_abort       20 non-null     float64       
 9   num_fail        20 non-null     float64       
 10  num_incomplete  20 non-null     float64       
 11  lab_num         48 non-null     object        
 12  experimentor    67 non-null     object        
 13  behavior_notes  32 non-null     object        
 14  health_notes    11 non-null     object        
 15  cleane