# Joe Zoll
# 30 Days of Meditation - Naive Bayes Classifier (Multinomial)
## Intention: Classify meditation instances as > or < 20 minutes based on the frequency of certain types of meditation (i.e. Mindfulness of breathing, Metta, Body scanning, etc.)

In [476]:
import pandas as pd

In [477]:
df = pd.read_csv('data/meditation-sit-log.csv')
meditations = df.copy()

In [478]:
df

Unnamed: 0,Name,Date & Time ⏰,Tags,Length (Minutes),Guided
0,Metta Return 1/7,"October 4, 2022 11:02 AM","Body / Grounding Awareness, Metta",40.0,
1,Quick in car,"October 3, 2022 12:13 PM","Body / Grounding Awareness, Mindfulness of Bre...",5.0,
2,What do I need to change for this semester?,"October 2, 2022 9:47 PM",Contemplation,30.0,
3,Cloudy,"October 2, 2022 3:44 PM",,11.0,
4,Calm cleaning,"October 1, 2022 2:17 PM",Body / Grounding Awareness,17.0,
...,...,...,...,...,...
115,First SP Metta Session | Opening up... somewhe...,"July 8, 2021 2:32 PM","MIDL 03/52, MIDL Metta Loved One",47.0,
116,Sitting With Pain,"July 7, 2021 3:58 PM","Doing Nothing, Stillness",51.0,
117,More Courageous Meditation. Not mine,"July 1, 2021 3:34 PM","MIDL 03/52, MIDL Forgiveness",45.0,
118,The Bravest Meditation Of My Life,"June 29, 2021 3:04 PM","MIDL 03/52, MIDL Forgiveness",65.0,


# Data Cleaning
- Only take meditation sits from September 6, 2022 => October 6, 2022
- remove Date & Time
- remove Name
- remove Guided

- Rename Length (Minutes) => >20 minutes and set values to boolean
- Rename Tags => Practice
- fill nan values

In [479]:
meditations = meditations.loc[:22]

In [480]:
meditations = meditations.drop(['Name', 'Date & Time ⏰', 'Guided'], axis=1)

In [481]:
meditations = meditations.rename(columns={'Length (Minutes)': '+20min'})

In [482]:
meditations = meditations.rename(columns={'Tags': 'practice'})

In [483]:
meditations['+20min'] = meditations['+20min'] >= 20

In [484]:
meditations.head(3)

Unnamed: 0,practice,+20min
0,"Body / Grounding Awareness, Metta",True
1,"Body / Grounding Awareness, Mindfulness of Bre...",False
2,Contemplation,True


practice => dict of counts for all 6 meditation practices => mapped to columns in meditations df

In [485]:
def countPracticeTypes(practiceStr, count = defaultCount):
    currPractices = practiceStr.split(',')
    for practice in currPractices:
        if practice in count:
            count[practice] += 1
        else:
            count[practice] = 1
    return count

In [486]:
def cleanPracticeStr(practiceStr):
    return practiceStr.split(',')[0]

In [487]:
meditations['practice'] = meditations['practice'].fillna(value='Body / Grounding Awareness')

# Frequencies => Features
- Replace all null values in practice
- Clean practice string values (limit to 1 type of practice per instance)
- For each practice, count the frequencies of each time it appears in the dataset and make it's count into its own column

In [490]:
practices = set(meditations['practice'].apply(cleanPracticeStr).unique())
practices.add('Metta')
practices

{'Body / Grounding Awareness',
 'Contemplation',
 'Doing Nothing',
 'Metta',
 'Mindfulness of Breathing',
 'Mindfulness of Fingers Touching',
 'Stillness'}

In [492]:
# I want to add new columns to the DF, all with a value == 0
meditations[list(practices)] = 0
meditations.head(1)

Unnamed: 0,practice,+20min,Metta,Body / Grounding Awareness,Mindfulness of Fingers Touching,Doing Nothing,Mindfulness of Breathing,Contemplation,Stillness
0,"Body / Grounding Awareness, Metta",True,0,0,0,0,0,0,0


In [493]:
# for each row
#    split practiceStr
#    for each str in split:
#        increment corresponding column
for index, row in meditations.iterrows():
    practiceStr = row['practice']
    currPractices = practiceStr.split(',')
    for p in currPractices:
        p = p.strip()
        if p in practices:
            meditations[p][index] += 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  meditations[p][index] += 1


In [494]:
meditations.insert(len(meditations.columns)-1, '+20min', meditations.pop('+20min'))

In [495]:
meditations.head(7)

Unnamed: 0,practice,Metta,Body / Grounding Awareness,Mindfulness of Fingers Touching,Doing Nothing,Mindfulness of Breathing,Contemplation,Stillness,+20min
0,"Body / Grounding Awareness, Metta",1,1,0,0,0,0,0,True
1,"Body / Grounding Awareness, Mindfulness of Bre...",0,1,0,0,1,0,0,False
2,Contemplation,0,0,0,0,0,1,0,True
3,Body / Grounding Awareness,0,1,0,0,0,0,0,False
4,Body / Grounding Awareness,0,1,0,0,0,0,0,False
5,Body / Grounding Awareness,0,1,0,0,0,0,0,False
6,Mindfulness of Breathing,0,0,0,0,1,0,0,True


# So now that I have the frequency columns for every type of meditation IN every sit I have done for the past 30 days, I now can proceed with the Naive Bayes Classifier, construcuting it, and then inputting some instance to check and see if it worked.

# NEXT TIME, we do the math :D

### Notes
- This is not a great model for NB classifier, as there can never be a sit with a practice type that occurs 2 times
- What is the best classifier for when features are True / False?

# Joe's Meditation Classifier App:
### "What type of meditation did you practice today?" ____ "You practice +/- 20 minutes during that sit!"

In [113]:
practice = input("What type of meditation did you practice today?: ")

What type of meditation did you practice today?: 3


plug into classifier and spit out an answer, over or under 20 minutes

# Other

In [410]:
x = countPracticeTypes(meditations['practice'][0])
x
df_count = pd.DataFrame(x, index=['i',])
#meditations['practice'][:1].apply(countPracticeTypes)

Unnamed: 0,Body / Grounding Awareness,Contemplation,Mindfulness of Breathing,Stillness,Doing Nothing,Mindfulness of Fingers Touching,Metta
i,1,0,0,0,0,0,1
