## Matching location to mood for the Mood-Map

In [None]:
import pandas as pd

In [None]:
location = pd.read_csv('processed_location_city.csv')
diaries = pd.read_csv('timediaries.csv')

In [None]:
location = location.drop(["Unnamed: 0", "Unnamed: 0.1"], axis=1)
location

In [None]:
diaries = diaries.drop(["Unnamed: 0", "sleep_quality", "pred_day", "daily_mood", "problem", "solution", "yn_food", "food"], axis=1)
diaries

Time diaries data are collected every 30 minutes, while the location is collected every minute.
Thus, to associate to a specific location a mood, which is needed for the creation of the mood-map, each location associated to a timestamp with minutes (the last two digits) between 00 (included) and 29 (excluded), will be matched to the mood collected at minute 30, of the same hour.
Instead, locations having timestamps ending with a number between 30 (included) and 59 (excluded), will be associated with the mood collected at minute 00 of the successive hour.

Following the same reasoning, add also the attribute "with who", "where", and "what".

In [None]:
diaries.timestamp = diaries.timestamp.astype(str)
location.timestamp = location.timestamp.astype(str)

In [None]:
# if mood==NaN, remove row
diaries.dropna(subset = ["num_mood"], inplace=True)

In [None]:
diaries = diaries.reset_index()

# sort by timestamp
diaries = diaries.sort_values(by=['timestamp'])
location = location.sort_values(by=['timestamp'])

In [None]:
mood = []
who = []
where_lst = []
what_lst = []

for row in range(len(location)):
    
    tl = location.timestamp[row]
    if int(tl[-2:]) >= 30 and int(tl[-2:]) <= 59:
        for d in range(len(diaries)):
            td = diaries.timestamp[d]
            ending_d = str(int(tl[-4:-2])+1) + '00' # next hour + minutes:00
            
            if len(ending_d) < 4: # dealing with midnight
                ending_d = '0' + ending_d
                
            if td.endswith(ending_d) and td.startswith(tl[:8]): # time range + same day
                mood.append(diaries.num_mood[d])
                who.append(diaries.with_who[d])
                where_lst.append(diaries['where'][d])
                what_lst.append(diaries.what[d])
                break
    
    elif int(tl[-2:]) < 30 and int(tl[-2:]) >= 0:
        for d in range(len(diaries)):
            td = diaries.timestamp[d]
            ending_d = str(int(tl[-4:-2])) + '30' # same hour + minutes:30
            
            if len(ending_d) < 4: 
                ending_d = '0' + ending_d
                # dealing with midnight --> it would be '030', thus is would associate with any hours ending 
                # with 30mins eg 20.30 instead of 00.30
                # thus add '0' in front of it
                
            if td.endswith(ending_d) and td.startswith(tl[:8]): # time range + same day
                mood.append(diaries.num_mood[d])
                who.append(diaries.with_who[d])
                where_lst.append(diaries['where'][d])
                what_lst.append(diaries.what[d])
                break
    
    # there is no mood associated to that time --> one expired timediary (mood==NaN), means lack of 30 matching    
    if len(mood) != row+1:
        mood.append('NaN') #print(row, len(mood))
    if len(who) != row+1:
        who.append('NaN')
    if len(what_lst) != row+1:
        what_lst.append('NaN')
    if len(where_lst) != row+1:
        where_lst.append('NaN')
    
                    
location['mood'] = mood
location['with_who'] = who
location['what'] = what_lst
location['place'] = where_lst

In [None]:
location

## Integrating Bluetooth data

In [None]:
b = pd.read_csv('processed_bluetooth.csv')
b = b.drop(["Unnamed: 0"], axis=1)
b = b.sort_values(by=['timestamp'])
b

Bluetooth data are collected every minute, but records are present in this dataframe only when at least one device has been found nearby.

Bluetooth are added to the location dataframe when the timestamp match perfectly.

In [None]:
b.timestamp = b.timestamp.astype(str)

In [None]:
bluetooth = []
for row in range(len(location)):
    tl = location.timestamp[row]
    for row2 in range(len(b)):
        tb = b.timestamp[row2]
        if tl == tb:
            bluetooth.append(b.devices[row2])
    if len(bluetooth) != row+1:
        bluetooth.append('NaN')

In [None]:
location['bluetooth'] = bluetooth

In [None]:
location

## Save new dataset

In [None]:
# save
# location.to_csv('location_4Ws_bluetooth.csv')

In [None]:
# run same script commenting the line where diaries.dropna() to save this file
# location.to_csv('full_match.csv')


# this file will contain a lot of missing value, thus it is not suited for the BBN or for the mood map,
# but it is good for showing the full picture without missing locations.
# It has been thought of sustituting the missing value for the mood with the mood reported at the end of the day,
# when the user evaluate their day; however, that mood is most likely either influenced by the latest event in the day,
# or costant within the whole day. Thus, it would, in the first case, cause a distortion, and in the last case, it would 
# not make any difference.