# <font color='violet'> Continue Language Processing, Continue Deeper EDA
    
Using prescription drug review data analyzed and parsed here: https://github.com/fractaldatalearning/psychedelic_efficacy/blob/main/notebooks/3-kl-studies-early-eda-parse.ipynb

In [1]:
# ! pip install spacy
# ! python -m spacy download en_core_web_sm

In [2]:
import pandas as pd
import spacy
import re
from tqdm import tqdm

In [3]:
df = pd.read_csv('../data/interim/studies_early_parsing.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31559 entries, 0 to 31558
Data columns (total 23 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Unnamed: 0     31559 non-null  int64  
 1   rating         31559 non-null  float64
 2   condition      31559 non-null  object 
 3   review         31559 non-null  object 
 4   date           31451 non-null  object 
 5   drug0          31559 non-null  object 
 6   drug1          18992 non-null  object 
 7   drug2          32 non-null     object 
 8   drug3          23 non-null     object 
 9   drug4          12 non-null     object 
 10  drug5          11 non-null     object 
 11  drug6          7 non-null      object 
 12  drug7          5 non-null      object 
 13  drug8          3 non-null      object 
 14  drug9          2 non-null      object 
 15  drug10         2 non-null      object 
 16  drug11         2 non-null      object 
 17  drug12         2 non-null      object 
 18  drug13

In [4]:
# Delete unnamed column and columns I'd used for eda previously but won't need here.
df = df.drop(columns = ['Unnamed: 0', 'ratings_count', 'count_by_date'])
df.head(3)

Unnamed: 0,rating,condition,review,date,drug0,drug1,drug2,drug3,drug4,drug5,drug6,drug7,drug8,drug9,drug10,drug11,drug12,drug13,drug14,drug15
0,9.0,add,I had began taking 20mg of Vyvanse for three m...,,vyvanse,,,,,,,,,,,,,,,
1,8.0,add,Switched from Adderall to Dexedrine to compare...,,dextroamphetamine,,,,,,,,,,,,,,,
2,8.0,adhd,I have only been on Vyvanse for 2 weeks I sta...,,vyvanse,,,,,,,,,,,,,,,


It's unclear whether it is a good idea to lemmatize the text and remove stopwords. 

With stopword removal during sentiment analysis, it is important to carefully analyze the stopword list and refrain from removing words that indicate negation, such as no and not. It would be possible to use a word list that does not include these words and enables retention of important sentiment information. 

As for lemmatization, it can be less than ideal for sentiment analysis if it (as it often does) lemmatizes words with nuanced meaning to the same word, i.e. bad and worse both becoming bad, losing the intensity of the word worse. But, in my context, since I am hoping to apply this model with new narratives that may or may not contain similar words to those in my current dataset, lemmatization could help with normalization. 

For this reason, I am going to first strip the last remaining tricky symbols from the reviews, and then the last cleaning step can be lemmatization and stopword removal *with a wordlist excluding words that impact sentiment*

<font color='violet'> Deal with remaining punctuation

In [5]:
punctuation = set([token for token in df.review.str.cat(sep=' ') if 
                   token.isalpha()==False])
punctuation

{'\t',
 '\n',
 '\r',
 ' ',
 '!',
 '#',
 '$',
 '%',
 '(',
 ')',
 '+',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 ';',
 '=',
 '?',
 '\\',
 '`',
 '\x7f'}

In [6]:
# Identify obviously unwanted symbols. Some of the symbols above still carry meaning
replacements = ['\t', '\n', '\r', '\\', '`', '\x7f']

# Find where some of these occur, so as to see if removing them work?
df[df.review.str.find('\t')!=-1].head(1)

Unnamed: 0,rating,condition,review,date,drug0,drug1,drug2,drug3,drug4,drug5,drug6,drug7,drug8,drug9,drug10,drug11,drug12,drug13,drug14,drug15
1600,1.0,depression,bull \t19 April 2016\r\r\n\r\r\nBegan initial ...,2016-04-22,duloxetine,,,,,,,,,,,,,,,


In [8]:
df[df.review.str.find('\\')!=-1].head(1)

Unnamed: 0,rating,condition,review,date,drug0,drug1,drug2,drug3,drug4,drug5,drug6,drug7,drug8,drug9,drug10,drug11,drug12,drug13,drug14,drug15
85,9.0,add,Took 20mg 3x a day Immediate release worked be...,,adderall,,,,,,,,,,,,,,,


In [9]:
df.review[85]

'Took 20mg 3x a day Immediate release worked better than the extended version because I had more control of the med to taper it to my needsschedule during the day Dry mouth leading to some dental cavities Desire to smoke is increased Jawhandfoot clenching Loss of appetite great for keeping thin sometimes have to force myself to eatdrink Ability to focus motivation energy stamina all increased Feeling of wellbeing after being in the black hole of depression My sleep is improved have no trouble going to sleep quells disturbing\\busy dreams that used to leave me exhausted in the morning I do not lie awake at night trying to slow down or calm my mind'

In [10]:
df[df.review.str.find('`')!=-1].head(1)

Unnamed: 0,rating,condition,review,date,drug0,drug1,drug2,drug3,drug4,drug5,drug6,drug7,drug8,drug9,drug10,drug11,drug12,drug13,drug14,drug15
2535,10.0,other,I`ve had Ulcerative Colitis YOUC since I was...,2008-06-09,alprazolam,,,,,,,,,,,,,,,


In [11]:
df[df.review.str.find('\x7f')!=-1].head(1)

Unnamed: 0,rating,condition,review,date,drug0,drug1,drug2,drug3,drug4,drug5,drug6,drug7,drug8,drug9,drug10,drug11,drug12,drug13,drug14,drug15
25327,1.0,schizophrenia,I was told that Latuda is a better tablet than...,2016-10-06,lurasidone,latuda,,,,,,,,,,,,,,


In [12]:
df.review[25327]

'I was told that Latuda is a better tablet than taking my older tablet which is stelazine 10ml a day only side affect was that it made me tired all the time and I was sleeping that#039 s all So I#039 m now on 60ml a day I had side effects bad ones and worse thing happened tonight i have never in my life been so aggressive against\x7f my daughter and my husband never had I been like that on Stelazine never\r\r\nMy husband was in tears tonight because of my attitude and my aggressive behavior I didn#039 t feel any remorse what so ever definitely not a good tablet for me I have both Schizophrenia and Bipolar and depression I don#039 t care how hard it is to get Stelazine but I#039 m going to go back on the stelazine its so much better for me and my family    '

In [14]:
# Remove these tokens wherever they are
for symbol in replacements:
    df['review'] = df.review.str.replace(symbol, ' ', regex=False)

df.review[1600]

'bull  19 April 2016      Began initial dose at 2230 hours Felt the medicine working within a frac12; hour Was in a good mood as I had been taken off of Warfarin this date Before drifting off to sleep I glanced at the clock It was approx 2300 hrs I had an odd feeling in my throat that was possibly closing up I remember worrying what is going on here? As the feeling in my throat persisted I also felt my ldquo Adamrsquo s applerdquo  fluttering and an elevated heart rate I soon fell asleep No other meds taken except for Atorvastatin      bull  20 April 2016        I awoke a 0600 hours to go to the bathroom Upon arising I felt a damp spot at the back of my underwear Pulled the sheets back a discovered I had Shit the bed in my sleep Felt real dizzy and drowsy Thought to myself ldquo how amp  why?rdquo  this happened Went to shower      As the morning continued on at 0630 the side effects were evident    1 ldquo Hot flashesrdquo  Absolutely miserable Could not get  stay comfortable Firstly 

In [15]:
df.review[85]

'Took 20mg 3x a day Immediate release worked better than the extended version because I had more control of the med to taper it to my needsschedule during the day Dry mouth leading to some dental cavities Desire to smoke is increased Jawhandfoot clenching Loss of appetite great for keeping thin sometimes have to force myself to eatdrink Ability to focus motivation energy stamina all increased Feeling of wellbeing after being in the black hole of depression My sleep is improved have no trouble going to sleep quells disturbing busy dreams that used to leave me exhausted in the morning I do not lie awake at night trying to slow down or calm my mind'

In [16]:
df.review[2535]

'I ve had Ulcerative Colitis  YOUC  since I was 17 At the age of 25 I thought I was having a heart attack and went to the emergency room I was diagnosed with a panic attack they gave me a XANAX (1mg  No more heart attack Told my doctor the story and with my history of YOUC he prescribed 2mg a day I#039 m now 37 years young and XANAX freed the tyranny of two debilitating conditions    '

In [17]:
df.review[25327]

'I was told that Latuda is a better tablet than taking my older tablet which is stelazine 10ml a day only side affect was that it made me tired all the time and I was sleeping that#039 s all So I#039 m now on 60ml a day I had side effects bad ones and worse thing happened tonight i have never in my life been so aggressive against  my daughter and my husband never had I been like that on Stelazine never   My husband was in tears tonight because of my attitude and my aggressive behavior I didn#039 t feel any remorse what so ever definitely not a good tablet for me I have both Schizophrenia and Bipolar and depression I don#039 t care how hard it is to get Stelazine but I#039 m going to go back on the stelazine its so much better for me and my family    '

'That worked to get rid of the most obviously meaningless symbols. Non-alphabetic characters yet to deal with are: ! # $ ( ) + : ; = ? % 0-9

! I'm going to keep wherever it appears because it's so strongly indicative of sentiment.
For the rest, I'd like to check out individually and see where exactly they appear. 

<font color='violet'> Inspect context of characters that may or may not impact basic meaning and sentiment

In [34]:
df[df.review.str.find('#')!=-1].review.head()

108    I#039 ve tried a few antidepressants over the ...
109    I Have been on Methadone for over ten years an...
112    I smoked for 50+ years  Took it for one week a...
114    If I could give it a 0 I would absolutely do s...
120    Been a heavy drinker for over 6 years since a ...
Name: review, dtype: object

In [20]:
len(df[df.review.str.find('#')!=-1])

19822

In [21]:
# the # symbol appears frequently. Does it provide meaningful content?
df.review[109]

'I Have been on Methadone for over ten years and currentlyI am trying to get off of this drug I Have been decreasing my does 2 mgs per month for over a year I am at 3 mgs and really starting to feel the withdrawI don#039 t plan to get my next 30 dosesbecause its almost rediculous how little it does for me I have 3 does doses of 3 mg and I Am terrified Can anyone give me some truthful encouragement?    '

In [22]:
df.review[112]

'I smoked for 50+ years  Took it for one week and that was it  I didn#039 t think it was possible for me to quit  It has been 6 years now  Great product    '

In all of the first few reviews where it appears, the # is just a glitch, a placeholder for an apostrophe. Any value it might add to the rare review where somebody may have been trying to make some sort of point by using the vernacular of adding a hashtag for emphasis isn't worth it. Get rid of all the #

Next, look into $

In [33]:
df[df.review.str.find('$')!=-1].review.head()

22     Unfortunately I had a doctor who was uneducate...
33     Unpleasant !   By the way is it normal to pay ...
37     Vyvanse is a new medication used to treat ADHD...
93     When i first started going to the methadone cl...
160    Horrible side effects! I tried Ciprelex  escit...
Name: review, dtype: object

In [24]:
len(df[df.review.str.find('$')!=-1])

329

In [25]:
df.review[22]



In [26]:
df.review[33]

'Unpleasant !   By the way is it normal to pay to see the psychiatrist around 150 $ for monthly renewal and says he does not take insurance How do other people handle this   I invite you to answer me at fionahopes to hotmail NOW day 3 up to now  I feel mostly the side effects such as nausea physical numbness and irritated and NO effect of increase on attention   I am paying attention as to know if I should continue or try something else or live without it as I always did Very unpleasant !!! Women seem to do great on Aderall The 2 first day we are fantastic I was easy going task did not birding me and I would overall say that my focus was good However my well being was so great that I was mostly in my mind with great Joy'

In [27]:
df.review[37]

'Vyvanse is a new medication used to treat ADHD  It is a timereleased capsule  It is easy to swallow  no bad taste like pressed pills  and does not upset your stomach  I took one 70mg pill in the morning  The pill releases the drug slowly so you are not hit with all of it at one time  This allows for the medicine to be equally effective all day not just for a few hours I did NOT have any of the normal ADHD med side effects such as  rapid heart rate anxiety jittery feeling  The only downside to this medication is the cost  Since Vyvanse is new there is no generic for it  I paid $160month for 30 pills! I was selfpay though so I am sure having insurance would be ideal! Although it is very expensive it is well worth it I had no side effects with this drug!  I loved it Sustained focus improvement in attention span and greater achievement of daily tasks  Easy to take just one pill a day!'

So far, every time '$' appears, somebody is just talking about how much something costs, which seems like meaningful content. Keep the dolar sign intact. 

Find all the (

My assumption has been that this might appear as part of an emoji. Does that assumption even hold true?

In [29]:
len(df[df.review.str.find(':(')!=-1])

66

In [32]:
# Yes, there are 66 :( emoji alone. Find where else the ( symbol still appears.
df[df.review.str.find('(')!=-1].review.head()

13    With or without this medication I am a philoso...
22    Unfortunately I had a doctor who was uneducate...
31    On 91811 I was prescribed (30) 30mg Vyvanse fo...
71    I take a 30 mg pill shortly after waking and w...
74    I seemed to metabolized the drug quickly Onset...
Name: review, dtype: object

In [35]:
df.review[13]

'With or without this medication I am a philosopher So I will be explain how the drug works on me from a philosophical assessment      Background    I had taken Strattera for three years (20052007) where it helped control ADD where I had difficulty doing so It helped me focus stay organized manage relations and most or all of the benefits I listed prior I dropped the drug for 3 years so that I may learn how to control my ADD without the help of medication During those three years I gained remarkable self control in overcoming ADD However I got to the point where I could not win over ADD which was losing focus randomly During college lectures that I found very intriguing I would often zone outspace out when I could not have been more interested in what was being professed The frustration of this led me to conclude that medical assistance was the best choice      On the drug    I read and understand that the  generally  biggest conside effectto this drug is that there is some sort of dep

The ( and ) were previously removed when adjacent to eltters but are now still present around numbers. Work to delete them from those contexts as well. Possibly with all the number they're around, possibly not.

Check :

In [36]:
df[df.review.str.find(':')!=-1].review.head()

64    1 pill 30 mg Vyvanse as needed I never take mo...
81    My prescriber and I went through about 3 diffe...
91    i was prescribed 60 tablets of 20 mg he told m...
93    When i first started going to the methadone cl...
96    One pill daily as soon as I wake up Decreased ...
Name: review, dtype: object

In [37]:
df.review[64]

'1 pill 30 mg Vyvanse as needed I never take more than the prescribed amount  not trying to get addicted here  This treatment is concurrent with my ongoing treatment for chronic depression A little bit of sleeplessness at first I take Vyvanse in the morning at around 5:30 to fix this I just set my alarm take the pill with a glass of water and go back to sleep for an hour I then wake up around 7 without any problems Although I had mild sleep problems the first two days of taking Vyvanse I had virtually no side effects Granted my dose was low but I had a hard time concentrating and summoning up the energy to do basic things like paying bills etc Vyvanse is no wonder drug but I live pretty much like everyone else now which is great My mother is stunned She says this is the first time she is ever been able to ask me where things are in my apartment and have me know  and be right  I take a medication holiday from Vyvanse most weekends and actual work holidays when I do not need to be able t

In [38]:
df.review[81]

'My prescriber and I went through about 3 different medications before Adderall and those did have unpleasant side effects such as sweaty palms  all the time  dizziness and heart palpitations  Once I found that Adderall did not because side effects for me we gradually increased the dosage from 5 mg to 10mg and then back down to 75 mg which works best for me I did not notice any side effects from Adderall  However it is metabolized very quickly and only stays effective for about 3 to 4 hours  Forgetting to take the second dose around 1:00pm makes for a confusing afternoon until I remember to take it Adderall greatly improved my focus and concentration  Before my diagnosis of adult ADD I had a very difficult time focusing and staying on task  I also experienced hyperfocusing which prevented me from getting things done at work in a timely manner  It has also helped improved my memory'

In [39]:
df.review[91]

'i was prescribed 60 tablets of 20 mg he told me to experiment myself as to what dosage worked for me not exceeding 2 whole pills daily normally i take one whole pill in the morning at 6:307  12 at 12 and 12 before work at 4 pm sometimes on the weekend i will only take 1 because i try to have extras in case needed at another time  i do not want to build up tolerance dry mouth sometimes slight headaches but not completely sure if its just from medicatation or being tired as i do have a very busy schedule ability to stay awake focus do daily household chores run needed errands  helped my organizational skills as far as bills'

All of the first few : appear as part of a timestamp, which it makes sense to keep. There could be other occurances that aren't at all meaningful, such as if a stray : is hanging out adjacent to neither letter (already taken care of in the previous notebook) nor a number (keep it when that's the case) nor another symbol (also keep it here). Check this out. 

In [43]:
df[df.review.str.find(' : ')!=-1].review.head()

591     I was on this firstly for social anxiety disor...
850     It caused severe anxiety attacks along with ot...
1212    Hi there I#039 m 35 years old male have strong...
1383    Took my first dose last night before dinner I ...
2083    Pristiq has significantly helped my depression...
Name: review, dtype: object

In [44]:
df.review[591]

'I was on this firstly for social anxiety disorder  after 1 month I felt pleased happy and had interest in doing any thing then I concluded that I had mild depression but it has done nothing for my SAD  but it helped by giving me the desire to socialize  my ratings : 10 for depression and 6 for SAD    '

Delete the symbool : if it appears surrounded by spaces. Now, see where ; appears

In [41]:
# All of the : in the first few examples are part of a time. Keep those. Check out ;
df[df.review.str.find(';')!=-1].review.head()

6      So far the throwing up has stopped and the hea...
163    I suffer from OCD and intrusivecompulsive thou...
193    Wellbutrin works well for the type of depressi...
242    Today is my 3rd day on medication I was prescr...
279    First of all I did not begin taking this or an...
Name: review, dtype: object

In [45]:
df.review[163]

'I suffer from OCD and intrusivecompulsive thoughts that create almost unbearable anxiety In Oct #039;11 I was prescribed Zoloft and found it to work incredibly On 100mg my OCD is 8590% gone and I can easily rationalize and even blow off my intrusivecompulsive thoughts about 95% of the time I felt better within a week but it did take 34 weeks for me to notice its full effect I#039 d describe the feeling it gives as light airy and happydrifty I haven#039 t experienced any side effects and haven#039 t gained any weight On a side note I also found that one huge factor contributing to my anxiety is too much caffeine even when taking Zoloft  I still have one cup of coffee to wake up in the morning but I#039 ve found that#039 s my strict limit     '

In [46]:
df.review[193]

'Wellbutrin works well for the type of depression recently crippling me  That #039 I Don#039 t Want to Get Out of Bed and Face Yet Another Horrible Day  I Just Want to Give Up#039; type of depression Wellbutrin for me erases those bad feelings and does so very quickly Within 4 hours of my first dose my confidence came surging back The rapidity of action is due to the unique way it works  it isn#039 t the typical antidepressant      Wellbutrin has side effects but none harsh enough to stop me from taking it regular The side effects I experience is  a  a tense feeling  physically  and  b  kind of a #039 chemical taste#039; in my mouth But there#039 s no blurred vision no zombie feeling and no terribly dry mouth The payoff is worth it    '

In [48]:
df.review[242]

'Today is my 3rd day on medication I was prescribed it for social anxiety  despite being a beautiful 6#039;2 male with no reason to be anxious about appearance  generalized anxiety and panic attacks Talking to people in malls convenience stores and being around large groups of people frighten me I have not noticed much of a difference other than lack of sleep crazy amazing lucid dreams for the first two nights and my jaw clenching So far no effect on anxiety and took a benzo because I Am feeling restless but still feeling restless right now I pray this works as I Am unable to work now due to my anxiety I Have always had anxiety but after having mono for 2 months it has spiraled completely out of control I will update this soon Please work!    '

It seems that apart from where it appears in emoji, the ; has no meaningful context. It's already been deleted from next to letters. It should definitely be deleted from where it appears in between two other characters. In one example, it follows a number, and in emoji it is usually preceding another symbol or number as in ;) or ;0, so delete it wherever it just follows another character, as well. What about where it precedes a number as in ;0. Is that even a common occurrance, or when that happens is there just more nonsense happening?


In [50]:
df[df.review.str.find(';0')!=-1].review

761      I currently have my 2nd Vantas Implant and am ...
3511     I was prescribed this a few months ago getting...
4406     My mom died when I was 4 and last semester I a...
12047    I took one 3mg Lunesta at 9;00 pm IT IS NOW 4:...
14480    I was waking up and depressed until around 10;...
17334    Prescribed only by psychiatrists Was weaned of...
22827    In #039;97 I was suffering from depression so ...
25229    I can#039 t speak to the effectiveness of the ...
31270    I started taking Celexa in Nov #039;08 and con...
Name: review, dtype: object

In [51]:
df.review[761]

'I currently have my 2nd Vantas Implant and am very discourged with what i believe are negative side effects I had a Radical protetecomy in June #039;04 nerve sparing  then went through 40 radiation treatments due to the fact my PSA continued to elevate ( ever so slightly ) My Dr recommended the implant it has in my opinion dramatically decreased my sex drive and created a serious problem of erectile dysfunction due to the fact that it reduces your level of testosterone dramatically    '

In [52]:
df.review[3511]

'I was prescribed this a few months ago getting anxiety while driving The prescription is #039;05 mg once a day as neededquot  I have taken it about 35 times a week a few times I have taken it twice a day I have been having family issues recently and I first thought that was the reason my occasional anxiety has progressed to debilitating  flying off the handle depressive dying thoughts chest pains  but  after research I#039 ve concluded that I#039 m actually going through withdrawal when I don#039 t take it everyday Now it#039 s 2:36 AM I#039 m scared mind won#039 t shut off  don#039 t know how to wean off of this properly I#039 m  tired will be calling into work in the morning for lack of sleep Please do not take this and just get counselling for your anxiety first    '

In [53]:
df.review[4406]

'My mom died when I was 4 and last semester I always had anxiety in the background and now at 15 spring semester of #039;08 I had some mild panic attacks and worry nervousness sadness and excessive shyness all day long This medication is a life saver and I love it They say 8 weeks for the full effect and sure it might rebalance your chemicals in 8 weeks but give it a chance as it still takes a while after that for you to break out of your she will    '

These are nonsense. The ; should be deleted anyplace it is adjacent to any number or hanging out by itself. It should only be kept where it appears adjacent to another symbol. 

<font color='violet'> Remaining to investigate: + = ? % 0-9

In [54]:
df[df.review.str.find('+')!=-1].review.head()

98     Started off taking concerta for add related is...
112    I smoked for 50+ years  Took it for one week a...
116    This medication should not be being prescribed...
189    I was prescribed Ambien then later Ambien CR f...
316    Dealing with PTSD for 20+ years has been a blu...
Name: review, dtype: object

In [55]:
df.review[98]

'Started off taking concerta for add related issues but had become dependent and tolerant to the dosage then went on strattera for a few months but had severe sideeffects now on wellbutrin xl for 6 months with relative success magnified anxiety over future stressfull tasks possible hair loss Increased learning capability + focus happier'

In [56]:
df.review[112]

'I smoked for 50+ years  Took it for one week and that was it  I didn#039 t think it was possible for me to quit  It has been 6 years now  Great product    '

In [57]:
df.review[116]

'This medication should not be being prescribed for Bipolar 2 a milder form of a different illness called Biploar 1 for which this medication works very well Folks Doctors NEED to be using SCIENCE to treat illnesses not giving out random prescriptions for 2+ medications for 1 illness If your Doctor is not doing monthly blood tests AT LEAST to see how the medications are affecting your body chemistry and serotonin levels then you need to fire the Doctor and find one who does do hisher job the right way and who is good at the science part of the job Every medication you take counteracts and interacts with the other medications and the foods and other things that you put into your body each day messing with the chemistry of your body    '

In [58]:
# So far all the + seem to add meaning to text. Check out =
df[df.review.str.find('=')!=-1].review.head()

122     Coming from a very problematic childhood I#039...
828     Paxil is good at first but the quot goodquot  ...
1791    I agree w Triple Dee This drug=the worst It  w...
2017    Citalopram has literally given me my life back...
2031    I Have always been skinny but not to a point w...
Name: review, dtype: object

In [59]:
df.review[122]

'Coming from a very problematic childhood I#039 ve been labeled everything from A to Z by psychiatrists and given a slew of meds Wellbutrin in my experience has been beyond beneficial At first for the first week or two I had slight tremors irregular heartbeat major insomnia and headaches but dismissed I still have slight insomnia which is why I take them when I wake up But reading comprehension multitasking thought focus and moods overall have dramatically improved! I#039 m one of those people that may have to take them for the rest of my life but I#039 m not complaining =) One thing that does bother me though is dry mouth from wellbutrin but I take biotene which helps a bit Enough to make it worth it anyways lol    '

In [60]:
df.review[828]

'Paxil is good at first but the quot goodquot  influence will wear out as soon as your body get used to it While on it I felt confident but only on the surface  the med alleviates the physical aspects of SA  On the inside however I#039 m filled with self doubt negative thoughts jealousy of naturally confident people etc It also made me  careless  not give a damn about socializing so I#039 m less interested in social situations = more isolated = fear people#039 s views of me Anyways please do not consider taking it as it#039 s only a temporary fix I was dumb enough to believe the success stories written here and tried it I now have trouble quitting because of the nasty withdrawal symptom    '

In [61]:
df.review[1791]

'I agree w Triple Dee This drug=the worst It  was suppose to help psychotics It turned me into a zombie Also turned me into a compulsive shopper which can be accomplished from the couch I became glued to due to my zombie state With no motivation my house was a mess And the eating!!! OMG I gained a pound a day I have clinical depression amp  massive anxiety I take 40mg of Viibryd Tough tough drug to get on in the beginning but I stuck to it amp  it#039 s been a blessing No sexual or eating side effects I went on 450 WellButrin to survive Midwest winters I stay on it year round AnxietyI take Xanaxfour 5 pills a day3  bedtime amp  I split the 4th12 in the morning  12 at dinner Don#039 t want all these drugs but they keep me going amp  productive    '

In [62]:
# Keep '='. Check out ?
df[df.review.str.find('?')!=-1].review.head()

49     I returned to school midcareer when I saw the ...
62     Afterbreaking for 3 months and receiving antib...
80     already described above  I do not believe in p...
100    I was diagnosed with ADHD in college and have ...
109    I Have been on Methadone for over ten years an...
Name: review, dtype: object

In [63]:
df.review[49]

'I returned to school midcareer when I saw the economy tanking I was unable to grasp the rudiments  law rigorous despite many hours of studying and reading During that time my father died I was depressed and close to failing out I went through therapy and asked about ADD meds I took a test showing attention and memory lapses Doc prescribed 60 mg I could tell an immediate difference My grades improved dramatically which was specifically the result I wanted I now take 70 mg and intend to until graduating and taking the big test in October 11 After that I hope to stop taking it I am not certain that it is effective over 6 months It seems that one might build up a tolerance to the effectiveness This most recent semester my memory seemed to be where it was before I started taking it but I have not found out yet Slow production overconcentration?)   Heart palpitations  occasional esp with caffeine    Pharmacy information says may be habit forming and not recommended for people with a history

In [64]:
df.review[62]

'Afterbreaking for 3 months and receiving antibiotics treatment for the chest problem resumed strattera 60mg till current however currently though with earlier benefits I have a feeling of getting stuck and hypoerfocussing on things that interest me like browsing listening to music etc and lack of the initial kick to complete tasts and look at things in details Can someone help adviseshould my dosage be increased to 80mg or more? I weigh 80kg My dosing has not been basd on weight ED ejaculation when urinating dry mouth stomach upsets sweatingincreased temperatures Chills and Fevers  difficulty breathing at 9th month problems with right lung side and had to request Dr to stop Suffered Effusion on the right sideDry Eyes Started with Fluoxetine for 2 year befoe starting Strattera Initial benefits  Feeling of urgency to complete work Morefocus on details Less emotional labilityFeeling in control Less forgettingLess anxiety panic stopped completely depressive moods disappeared appearing occ

In [65]:
df.review[80]

'already described above  I do not believe in pharmacuticals for neurological disorders      Supplements can provide benefits but the process is long and expensive  There is not standard cure   for any central nervous condition  It takes time and reading and research and taking a pill is only a short term solution  I had allergies and my homeopathic doctor cleared my pathways that were not getting nutrients to my brain  I am now dieting and staying away from gluten and lactose and processed food      I am doing detoxing and find with all the natural changes I have increased energy and do not have to depend on drugs      Who know what the side effects are?  I could have aged my liver when my doctor increase dexedrine to 30 mg I got speed but not the focus and I became OCDseriously   It kept me up all night      Effexor which i took for 10 years numbs you out  I was on 75 mg for add and depression  I did cure depression but not really the add Dexedrine as well as effexor I found for my a

The ? are seemingly mostly used when there is an actual question. I'm unsure what's best here, as I imagine somebody asking a question could impact sentiment, but I'm not sure to what degree. The ? doesn't hold literal meaning the way + = $ do. But, since I'm keeping some special symbols anyway, and since ? could be meaningful, I'll keep them. 

Check out %

In [66]:
df[df.review.str.find('%')!=-1].review.head()

34     recd speed but no focus I took the dexerine in...
43     I have been on antidepression meds for 15 year...
118    I am a 25 year old female I was diagnosed with...
137    I lost my sexuality from the first pill overni...
163    I suffer from OCD and intrusivecompulsive thou...
Name: review, dtype: object

In [67]:
df.review[34]

'recd speed but no focus I took the dexerine in morning and it wears off at   night  I lost weight for the first time as the metabolism was great sadly   i gained 15 lbs off the same diet insomnia I got obsessive complusive and had to stop  I also think I have a bit   of liver damage of taking meds over 10 years  I only took dexedrine for a few months weight loss right away  I felt it working in brain right away but lost the efficacy when I built my tolerance  My MD gave me more and the same thing happened      Meds do not cure add 100% so now I take supplements'

In [68]:
df.review[43]

'I have been on antidepression meds for 15 years but my depression persisted regardless of the treatment  I used to sleep 1216 hours per day and was always tired and fatigued when awake  I was diagnosed with sleep apnea 5 years ago but even sleeping with CPAP did not help with my daytime fatigue  After seeing MANY family practitioners I finally broke down and INVESTED in a very well educated Psychiatrist who diagnoed me with ADD since childhood  His decision to prescribe Vyvanse was brilliant!  I feel better than ever and am totally clear headed and able to consentrate  For the first time in as long as I can remember I can think normally!  No more racing thoughts in my head and my depression and anxiety has all but vanashed!  I do still take an antidepressive but I am CERTAIN that the Vyvanse gave the antidepression meds the ability to work  I can FINALLY sleep a normal 7 to 9 hours and still be awake and alert throughout my 12hour work days  I was worried that the Vyvanse would elevat

In [69]:
df.review[118]

'I am a 25 year old female I was diagnosed with bipolar II disorder about 5 years ago I have been taking 150mg of lamotrigine for over 2 years Thus far I have experienced significant improvements in controlling my bipolar II disorder I recently paired 100mg of sertraline to improve the lows Also I experience rapid cycling I rated this drug 70% as I feel I still have a long way to go in recovery But the drug has definitely allowed me to be a highly functioning individual    '

Here, the % aren't too much a part of nonsense strings and are adjacent to numbers as would be predictable. Keep them. 

All that's left now is numbers. They should be kept in some contexts such as timestamps or statements like '9 days ago', but I've already seen that they appear in some nonsense contexts as well. I'm not sure if there's a simple clear way to tease these out. One thing I'll do is, when deleting symbols, don't always replace them with a string. Some of them seem to be often inserted into the midst of some nonsense including numbers that I might want to be able to pull out in full later, rather than just subdividing the nonsense into smaller parts that will be more difficult to isolate and analyze later. 

<font color='violet'> Summary so far of what to do with symbols:
- Leave for now and investigate further later: 0-9
- Keep where they exist: ! $ + = ? %
- Delte everywhere: # 
- Delete if surrounded by spaces or adjacent to numbers rather than adjacent to another symbol: ( ) : ; 

In [70]:
df['review'] = df.review.str.replace('#', '')
df[df.review.str.find('#')!=-1]

Unnamed: 0,rating,condition,review,date,drug0,drug1,drug2,drug3,drug4,drug5,drug6,drug7,drug8,drug9,drug10,drug11,drug12,drug13,drug14,drug15


In [None]:
# How many rows have ; before trying to strip them from between numbers?
len(df[df.stripped.str.find(';')!=-1])

<font color='violet'> Pick up here with some work I was doing getting symbols where surrounded by numbers or spaces

In [None]:
# for row in tqdm(range(len(df))):
    # Isolate each review
    string_to_strip = df.loc[row,'review']
    # Find all instances of a ; surrounded by a number in this review
    matches = [match for match in re.findall(pattern='([0-9];[0-9])', string=string_to_strip)]
    # For each instance, replace the ;
    # for match in matches:
        string_to_strip = string_to_strip.replace(match, match.replace(';',''))
    # Put the new, stripped string back in place
    df.loc[row,'review'] = string_to_strip

# How many rows still have ;
df[df.review.str.find(';')!=-1]

In [None]:
# Do lemmatization and stopword removal after text is otherwise very clean. 
# nlp = spacy.load('en_core_web_sm')

# Change this stopwords
# stopwords = spacy.lang.en.stop_words.STOP_WORDS        

# df['no_stops_lemm'] = df.review.apply(lambda text: " ".join(token.lemma_ for token in nlp(text) if not token.is_stop))

#df['no_stops_lemm'].head()

Resources with tips for effective EDA visualization with NLP:

https://medium.com/plotly/nlp-visualisations-for-clear-immediate-insights-into-text-data-and-outputs-9ebfab168d5b
    
https://www.numpyninja.com/post/nlp-text-data-visualization
    
https://www.kaggle.com/code/sainathkrothapalli/nlp-visualisation-guide
    
https://medium.com/acing-ai/visualizations-in-natural-language-processing-2ca60dd34ce
    
https://towardsdatascience.com/a-complete-exploratory-data-analysis-and-visualization-for-text-data-29fb1b96fb6a
    
https://towardsdatascience.com/getting-started-with-text-nlp-visualization-9dcb54bc91dd
    
https://www.kaggle.com/code/mitramir5/nlp-visualization-eda-glove
    
https://medium.com/analytics-vidhya/how-to-begin-performing-eda-on-nlp-ffdef92bedf6
    
https://inside-machinelearning.com/en/eda-nlp/
    
https://towardsdatascience.com/fundamental-eda-techniques-for-nlp-f81a93696a75
    
https://neptune.ai/blog/exploratory-data-analysis-natural-language-processing-tools
    
https://www.kdnuggets.com/2019/05/complete-exploratory-data-analysis-visualization-text-data.html
    
