## _Analyzing the Disease and Instrument Name with the Action Performed_
***
DESCRIPTION

How to analyze the disease and instrument name with the action performed, using FuzzyWuzzy String Matching Analysis. 

There are many methods of comparing strings in python. Some of the main methods are:

Using regex

Simple compare

Using difflib

Compared to these, one of the very easy methods is by using fuzzywuzzy library where we can have a score out of 100 which denotes two strings are equal, by giving a similarity index.   

In [2]:
#import the libraries
import os 
import sys
assert sys.version_info >= (3,5)
#data manipulation
import pandas as pd
import numpy as np
#visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#consistent sized plot
from pylab import rcParams
rcParams['figure.figsize'] = 12,5
rcParams['xtick.labelsize'] = 12
rcParams['ytick.labelsize'] = 12
rcParams['axes.labelsize'] = 12
#display options for dataframe
pd.options.display.max_columns = None
#text processing
import nltk
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize
from nltk.stem import SnowballStemmer
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
#text feature engineering
from sklearn.feature_extraction.text import TfidfVectorizer
#regular expressions
import re
#string operations
import string
#ignore warnings
import warnings
warnings.filterwarnings(action='ignore',message='')
#compute articles similarity
from sklearn.metrics.pairwise import cosine_similarity
#extract files
from zipfile import ZipFile


In [7]:
filename = '1574332581_analysingthediseaseandinstrumentnamewiththeactionperformed_lessson3.zip'

with ZipFile(filename,'r') as zip:
    #print the contents of the zip file 
    zip.printdir()
    #extract all the contents of the zip file
    print('extracting contents, please wait ... ')
    zip.extractall()
    print('Done, you can continue to work with the file')

File Name                                             Modified             Size
Analysing the disease and instrument name with the action performed_Lessson3.ipynb 2019-11-21 16:04:54         3555
extracting contents, please wait ... 
Done, you can continue to work with the file


In [9]:
from fuzzywuzzy import process
from fuzzywuzzy import fuzz

In [13]:
diseases = ['Heart Attack', 'Brain Tumor', 'Lung Cancer', 'Fever', 'Throat Infection','Urinal Tract Infection']
instruments = [' electrocardiograph' , ' Sterilizers', ' Defibrillators', 'Thermometer', 'ICU','culture']

In [20]:
q= " An electrocardiogram (ECG) is a medical test that detects cardiac (heart) \
     abnormalities by measuring the electrical activity generated by the heart as it contracts.\
     The machine that records the patient’s ECG is called an electrocardiograph."

u = 'Urinal tract infection can become serios if it is not treated early and can lead to surgery. The cases of prostrate surgery \
     and recurrent urinal tract infection is reported as indicated from the urine culture reports. Sterilizers are used in ICU.'

In [21]:
process.extract(q,diseases)

[('Heart Attack', 57),
 ('Fever', 34),
 ('Urinal Tract Infection', 28),
 ('Lung Cancer', 27),
 ('Brain Tumor', 26)]

In [22]:
process.extract(u,diseases)

[('Urinal Tract Infection', 60),
 ('Throat Infection', 57),
 ('Lung Cancer', 31),
 ('Brain Tumor', 26),
 ('Heart Attack', 24)]

In [18]:
process.extract(u,instruments)

[('culture', 57),
 ('ICU', 38),
 (' Sterilizers', 26),
 ('Thermometer', 26),
 (' Defibrillators', 22)]

In [19]:
fuzz.ratio(q,diseases)

14