# <font color='violet'> Parsing & Exploration
Using data wrangled here: https://github.com/fractaldatalearning/psychedelic_efficacy/blob/main/notebooks/1-kl-wrangle-tabular.ipynb

In [1]:
# ! pip install tqdm 
# !{sys.executable} -m pip install contractions

In [2]:
import pandas as pd
import sys
import contractions
# from tqdm import tqdm # for putting a progress bar on loops

In [3]:
# prepare to add local python functions; import modules from src directory
src = '../src'
sys.path.append(src)

# import local functions
from nlp.parse import remove_accented_chars

In [4]:
df = pd.read_csv('../data/interim/studies_clean.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50652 entries, 0 to 50651
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  50652 non-null  int64  
 1   drug        50652 non-null  object 
 2   rating      50652 non-null  float64
 3   condition   50652 non-null  object 
 4   review      50652 non-null  object 
 5   date        50652 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 2.3+ MB


The review column contains narratives where patients explain their experience with a prescription psych med. Language features from that column need to be extracted or created after any necessary cleaning of strings has been completed. Begin by exploring language used throughout the texts, then do any preparations necessary to conduct sentiment analysis. I'll be drawing quite a bit from the following resources: 
    - https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72

In [5]:
df.review[0]

'I had began taking 20mg of Vyvanse for three months and was surprised to find that such a small dose affected my mood so effectively.  When it came to school work though I found that I needed the 30mg to increase my level of focus (and have been on it for a month since).  I had not experienced decreased appetite until about a month into taking the 20mg.  I find that the greatest benefit of Vyvanse for me is that it tends to stabalize my mood on a daily basis and lessens any bouts of anxiety and depression that i used to face before I was perscribed. a few experiences of nausiea, heavy moodswings on the days I do not take it, decreased appetite, and some negative affect on my short-term memory. My mood has noticably improved, I have more energy, experience better sleep and digestion.'

In [6]:
df[df['review'].str.find("´")!=-1].head(1)

Unnamed: 0.1,Unnamed: 0,drug,rating,condition,review,date


In [7]:
# Remove special characters if there are any. Haven't been able to find anything like é or ä
# in the data, but doing it just in case. This function works in the test suite. 
df['review'] = df['review'].apply(remove_accented_chars)

In [8]:
# Expand contractions. Find some to confirm it works. 
df[df['review'].str.find("'")!=-1].head(1)

Unnamed: 0.1,Unnamed: 0,drug,rating,condition,review,date
9,9,concerta,8.0,adhd,The treatment details were pretty basic. I ju...,0


In [9]:
df.review[9]

"The treatment details were pretty basic.  I just took the medication in the morning at the same time of day every day.  I was allowed to skip it on the weekends or less busy times if I so desired.  I really don't have anything else to put for this section of the survey.  I'm trying to fill up the 50 word requirement though.  You shouldn't really have that requirement on this form.  It seems kind of silly to be writing with nothing left to say. Some of the side affects that I had were:  my stomach would hurt sometimes and other times it was difficult to eat without getting queezy.  I didn't have much of an appetite.  When the medicine wore off, it seemed to have a strong rebound effect and things became difficult for me during that last several hours of my day. I liked taking the medication, I just wish it would have lasted longer during the day.  By 3p-5pm it had wore off and that seemed to by my busiest time of day with the kids, dinner and house stuff all at once. I did not have any

In [10]:
df['review'] = df['review'].apply(contractions.fix)
df.review[9]

'The treatment details were pretty basic.  I just took the medication in the morning at the same time of day every day.  I was allowed to skip it on the weekends or less busy times if I so desired.  I really do not have anything else to put for this section of the survey.  I am trying to fill up the 50 word requirement though.  You should not really have that requirement on this form.  It seems kind of silly to be writing with nothing left to say. Some of the side affects that I had were:  my stomach would hurt sometimes and other times it was difficult to eat without getting queezy.  I did not have much of an appetite.  When the medicine wore off, it seemed to have a strong rebound effect and things became difficult for me during that last several hours of my day. I liked taking the medication, I just wish it would have lasted longer during the day.  By 3p-5pm it had wore off and that seemed to by my busiest time of day with the kids, dinner and house stuff all at once. I did not have

In [None]:
# Don't just got changed to do not; contraction expansion worked.
# 