<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 3: NLP on Intermittent Fasting and Keto Diet

---

# Part 1

# Problem Statement

We are part of the nutritional/weight loss company helping stakeholder to identify which are the current trend to promote weight loss strategy. Therefore, the purpose of our study include:
1.	Using Pushshift's API to collect posts from subreddits of intermittent fasting and keto diet.
2.	We'll then use NLP to train a classifier on which subreddit a given post came from.


# Background

Weight loss is nowadays trend, not only benefit in maintaining good body figure but also associate with health advantages. Intermittent fasting and the keto diet are both popular weight loss options that boast plenty of success stories. Some people have seen fantastic results with keto, while others advocate intermittent fasting. 

The keto diet is a high-protein, low-carbohydrate, and fat-rich diet that can be too restrictive for some people. Intermittent fasting is an eating pattern that involves alternating periods of not eating with periods of regular food consumption.

When it comes to intermittent fasting vs keto for weight loss, both prove effective. However, the differences come into play with the long-term effects of pounds lost, and the health advantages that are associated. 


[Check here to read Review Article on Advantages and Disadvantages of the Ketogenic Diet.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480775/)

[Check here for the study on Intermittent Fasting and Metabolic Health.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8839325/)

## Contents

Part 1: 
- Importing Libraries & Reddit Scrapping

Part 2:
- Loading the data set & Exploratory Data Analysis(EDA)
- EDA before Text pre-processing
- Text pre-processing
- EDA after Text pre-processing
- Sentiment Analysis
- Export Dataset for Modeling

Part 3: 
- Baseline Model
- Modelling
- Evaluation and Conceptual Understanding
- Conclusion and Recommendations

# Importing Libraries & Reddit Scrapping

In [1]:
#pip install pmaw pandas
import pandas as pd
from pmaw import PushshiftAPI
api = PushshiftAPI()
# this setting widens how many characters pandas will display in a column:
pd.options.display.max_colwidth = 400

In [2]:
comments = api.search_submissions(subreddit='intermittentfasting', limit=10000, after=1609459200)
print(f'Retrieved {len(comments)} comments from Pushshift')

Retrieved 10000 comments from Pushshift


In [3]:
df = pd.DataFrame(comments)
# preview the comments data
df.head()

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,author_flair_text_color,author_is_blocked,media,media_embed,secure_media,secure_media_embed,author_flair_background_color,banned_by,edited,gilded
0,[],False,mayday_dnaenae,,[],,text,t2_2uv8y28u,False,False,...,,,,,,,,,,
1,[],False,cristiano77th,,[],,text,t2_64exqslw,False,False,...,,,,,,,,,,
2,[],False,Funny-Record-6039,,[],,text,t2_c1cw13mx,False,False,...,,,,,,,,,,
3,[],False,AnxiousHeadOfLettuce,,[],,text,t2_bfomj07b,False,False,...,,,,,,,,,,
4,[],False,valkaress,,[],,text,t2_4ckw169q,False,False,...,,,,,,,,,,


In [4]:
df['selftext'].head()

0    I (27F) have been intermittent fasting for about 2 months. My starting weight was about 145lbs. The first week I did 14:8 and I lost 5 lbs. I didn't change any other habits. I hit a plateau partway through the 2nd week. I've been stuck around 135 lbs. I increased my IF to 18:6 and amped up my workout plan by adding longer, more intense cardio sessions and mixing in some strength training. It's...
1                                                                                                                                                                                                                                                                                                                                  I am healthy and i walk alot in a day. Should i stop walking while water fasting?
2                                                                                                                                                                                             

In [5]:
df['selftext'].isnull().sum()

19

In [6]:
df['text_length'] = df['selftext'].str.len()
df['text_length'].head()

0    660.0
1     81.0
2      0.0
3     70.0
4    679.0
Name: text_length, dtype: float64

In [7]:
(df['text_length']>100).value_counts()

False    5312
True     4688
Name: text_length, dtype: int64

In [8]:
df[df['text_length']>100]['selftext']

0       I (27F) have been intermittent fasting for about 2 months. My starting weight was about 145lbs. The first week I did 14:8 and I lost 5 lbs. I didn't change any other habits. I hit a plateau partway through the 2nd week. I've been stuck around 135 lbs. I increased my IF to 18:6 and amped up my workout plan by adding longer, more intense cardio sessions and mixing in some strength training. It's...
4       I started lifting 4x/week (about 40 minutes), as well as taking a lot of martial arts classes. On monday for example I had nearly 3 hours of exercising in total (2 MA + lift).\n\nI've only just started, and so far I've been sticking to my IF/OMAD routine where I skip breakfast and only eat lunch. Which means I was on a long fast for some of those exercises. And I still felt fine.\n\nSo my ques...
5       Hello everyone,\n\n  I am a mostly lurker here in reddit. First I'd like to say that I started my IF journey almost a year ago, and saw incredible results, It is the first time

In [9]:
# filter dataframe
display(df.loc[(df['text_length']>100), ['title','selftext','subreddit','created_utc']].head())

Unnamed: 0,title,selftext,subreddit,created_utc
0,Plateau sruggles,"I (27F) have been intermittent fasting for about 2 months. My starting weight was about 145lbs. The first week I did 14:8 and I lost 5 lbs. I didn't change any other habits. I hit a plateau partway through the 2nd week. I've been stuck around 135 lbs. I increased my IF to 18:6 and amped up my workout plan by adding longer, more intense cardio sessions and mixing in some strength training. It's...",intermittentfasting,1625589667
4,Can I still do IF/OMAD now that I started exercising intensely?,"I started lifting 4x/week (about 40 minutes), as well as taking a lot of martial arts classes. On monday for example I had nearly 3 hours of exercising in total (2 MA + lift).\n\nI've only just started, and so far I've been sticking to my IF/OMAD routine where I skip breakfast and only eat lunch. Which means I was on a long fast for some of those exercises. And I still felt fine.\n\nSo my ques...",intermittentfasting,1625586042
5,A new mindset,"Hello everyone,\n\n I am a mostly lurker here in reddit. First I'd like to say that I started my IF journey almost a year ago, and saw incredible results, It is the first time I am able to, without external influence (ie. injury, diet imposition by SO, etc.) to lose weight in a consistent fashion.\n\n Ok, on to the topic. I saw a fat person waking by and surprised myself by thinking somethin...",intermittentfasting,1625584307
6,Weekend habits are making it difficult to loose weight.,"Hi everyone,\n\nI have been doing IF (16:8) for almost 3 years. I have remained consistent in terms of weight (138-141lbs) in these 3 years. I am 5' 5.5"". Recently I started 20:4 to push my metabolism a little. I lost my weight from 141-140 in about a week ish. I have strated to eat low-carb diet to help me with fasting. I am fairly good on the weekdays. However I want to enjoy my weekend and...",intermittentfasting,1625582039
7,Are these times acceptable for IF?,"So, due to loss of employment, family has taken me in. Not to be a bother I am trying to use as little as I can before getting back on my feet. My brother makes lunch at 1 PM and the family eats together at 8 PM. So to my understanding of how Intermittent fasting works is, if I wake up at 9 AM, eat first meal at 1 PM and the second meal no longer than 8 hours after the first meal, this will me...",intermittentfasting,1625582007


In [10]:
fasting = df.loc[(df['text_length']>100), ['title','selftext','subreddit','created_utc']]

In [11]:
fasting.isnull().sum()

title          0
selftext       0
subreddit      0
created_utc    0
dtype: int64

In [13]:
comments_2 = api.search_submissions(subreddit='keto', limit=10000, after=1609459200)
print(f'Retrieved {len(comments_2)} comments from Pushshift')

Retrieved 10000 comments from Pushshift


In [14]:
df_2 = pd.DataFrame(comments_2)
# preview the comments data
df_2.head()

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,url_overridden_by_dest,gilded,is_created_from_ads_ui,author_cakeday,author_is_blocked,poll_data,live_audio,thumbnail_height,thumbnail_width,distinguished
0,[],False,__scruffycat__,,[],,text,t2_kjj28,False,False,...,,,,,,,,,,
1,[],False,Smart-Refrigerator37,,[],,text,t2_9kisv4ns,False,False,...,,,,,,,,,,
2,[],False,[deleted],,,,,,,,...,,,,,,,,,,
3,[],False,[deleted],,,,,,,,...,,,,,,,,,,
4,[],False,iAm4uJL,,[],,text,t2_9qxu0mc6,False,False,...,,,,,,,,,,


In [15]:
df_2['text_length'] = df_2['selftext'].str.len()
df_2['text_length'].head()

0    247.0
1    358.0
2      NaN
3      9.0
4      9.0
Name: text_length, dtype: float64

In [16]:
(df_2['text_length']>100).value_counts()

True     5602
False    4398
Name: text_length, dtype: int64

In [17]:
df_2[df_2['text_length']>100]['selftext']

0                                                                                                                                                               I know that it conjunction with keto we need to be in a calorie deficit to actually lose weight. My question was when you calculate your macros, you can choose to go lower than the recommendation. Is there a better % of low in anyone’s experience.
1                                            What your guys favorite keto-friendly noodles? I’m new to keto and looking to try some new things! There are just so many different brands / options, idk where to start. \n\nI love pasta (especially mac n cheese) and would love to find something to substitute the regular noodles.   \n\nAlso, feel free to share your favorite dish that you make with your noodles
5                                                                                                                                                                                   The 

In [18]:
# filter dataframe
display(df_2.loc[(df_2['text_length']>100), ['title','selftext','subreddit','created_utc']].head())

Unnamed: 0,title,selftext,subreddit,created_utc
0,Deficit calories - how much is recommended,"I know that it conjunction with keto we need to be in a calorie deficit to actually lose weight. My question was when you calculate your macros, you can choose to go lower than the recommendation. Is there a better % of low in anyone’s experience.",keto,1614835856
1,Share your favorite Keto-Friendly noodles/pasta!,"What your guys favorite keto-friendly noodles? I’m new to keto and looking to try some new things! There are just so many different brands / options, idk where to start. \n\nI love pasta (especially mac n cheese) and would love to find something to substitute the regular noodles. \n\nAlso, feel free to share your favorite dish that you make with your noodles",keto,1614834758
5,Is there anything wrong with having popcorn?,"The Act II butter lovers box says a cup popped is 3 carbs. The Cronometer app says it’s 3.4, and If I have 14 carbs left today, I could eat 2 cups for 6.7. Is there something wrong with this, or am I in luck as a popcorn lover?",keto,1614832038
7,Having a horrible face fat issue,"When I was in my early teens I gained an extreme amount of weight, and had an extremely fat face, and only recently I have dropped back down to 145lbs. I am a 5 9 male and I think this is extremely skinny for my height, but my face is still extremely fat around my cheeks. I’ve been extremely depressed lately and I don’t even want to go out or speak to people and I feel like this is not who I a...",keto,1614830826
8,"NSV I fit into my ""goal clothes""","Gosh please never do that to yourself. Don't buy goal clothes. Goal clothes are so passive aggressive to yourself.\n\nBUT, I did. I did this to myself seven times no less. Goal shirts from concerts. Many Pokemon themed goal shirts for some reason lol. I have a too small shirt that I bought five years ago, because I was ""losing weight"". I know because it is a 20th anniversary Pokemon shirt, and...",keto,1614829213


In [19]:
keto = df_2.loc[(df_2['text_length']>100), ['title','selftext','subreddit','created_utc']]

In [20]:
keto.isnull().sum()

title          0
selftext       0
subreddit      0
created_utc    0
dtype: int64

In [22]:
df_posts = pd.concat([fasting, keto], axis=0)

In [23]:
df_posts.head()

Unnamed: 0,title,selftext,subreddit,created_utc
0,Plateau sruggles,"I (27F) have been intermittent fasting for about 2 months. My starting weight was about 145lbs. The first week I did 14:8 and I lost 5 lbs. I didn't change any other habits. I hit a plateau partway through the 2nd week. I've been stuck around 135 lbs. I increased my IF to 18:6 and amped up my workout plan by adding longer, more intense cardio sessions and mixing in some strength training. It's...",intermittentfasting,1625589667
4,Can I still do IF/OMAD now that I started exercising intensely?,"I started lifting 4x/week (about 40 minutes), as well as taking a lot of martial arts classes. On monday for example I had nearly 3 hours of exercising in total (2 MA + lift).\n\nI've only just started, and so far I've been sticking to my IF/OMAD routine where I skip breakfast and only eat lunch. Which means I was on a long fast for some of those exercises. And I still felt fine.\n\nSo my ques...",intermittentfasting,1625586042
5,A new mindset,"Hello everyone,\n\n I am a mostly lurker here in reddit. First I'd like to say that I started my IF journey almost a year ago, and saw incredible results, It is the first time I am able to, without external influence (ie. injury, diet imposition by SO, etc.) to lose weight in a consistent fashion.\n\n Ok, on to the topic. I saw a fat person waking by and surprised myself by thinking somethin...",intermittentfasting,1625584307
6,Weekend habits are making it difficult to loose weight.,"Hi everyone,\n\nI have been doing IF (16:8) for almost 3 years. I have remained consistent in terms of weight (138-141lbs) in these 3 years. I am 5' 5.5"". Recently I started 20:4 to push my metabolism a little. I lost my weight from 141-140 in about a week ish. I have strated to eat low-carb diet to help me with fasting. I am fairly good on the weekdays. However I want to enjoy my weekend and...",intermittentfasting,1625582039
7,Are these times acceptable for IF?,"So, due to loss of employment, family has taken me in. Not to be a bother I am trying to use as little as I can before getting back on my feet. My brother makes lunch at 1 PM and the family eats together at 8 PM. So to my understanding of how Intermittent fasting works is, if I wake up at 9 AM, eat first meal at 1 PM and the second meal no longer than 8 hours after the first meal, this will me...",intermittentfasting,1625582007


In [24]:
df_posts.shape

(10290, 4)

In [25]:
df_posts.isnull().sum()

title          0
selftext       0
subreddit      0
created_utc    0
dtype: int64

In [26]:
df_posts['subreddit'].value_counts()

keto                   5602
intermittentfasting    4688
Name: subreddit, dtype: int64

In [27]:
df_posts.to_csv('./data/subreddit_data.csv')

We managed to extract total of 10290 posts which consist of 5602 posts from keto subreddit and 4688 posts from intermittent fasting subreddit. Out of 80 columns, we export 4 columns -- 'title','selftext','subreddit','created_utc' for our further EDA and text processing in Part 2.