# Text Mining Example - KeywordsExtractor

## Installation

### Package

In [1]:
## Install Library
#!pip install textmining-module==0.1.3 #https://pypi.org/project/textmining-module/

In [2]:
## Load up Library
from textmining_module import TextMiner
from textmining_module import KeywordsExtractor


### Data Exploration

In [3]:
## Sample Data

import pandas as pd
import numpy as np

data = pd.read_csv("C:/Users/kwadw.DESKTOP-T9BSTPE/OneDrive/Desktop/My Python Packages/backwardsreg/rows.csv")


In [4]:
## Top rows of data
data.head(10)

Unnamed: 0,Topic,Question,VariableName,Responses,Year,Type,DisplayOrder
0,Health Status/Healthy Days,Would you say that in general your health is---,GENHLTH,1=Excellent 2=Very good 3=Good 4=Fair 5=Po...,2013,Core Question,1
1,Health Status/Healthy Days,"Now thinking about your physical health, which...",PHYSHLTH,__ __=Number of days 88=None 77=DK/NS 99=Re...,2013,Core Question,2
2,Health Status/Healthy Days,"Now thinking about your mental health, which i...",MENTHLTH,__ __=Number of days 88=None 77=DK/NS 99=Re...,2013,Core Question,3
3,Health Status/Healthy Days,"During the past 30 days, for about how many da...",POORHLTH,__ __=Number of days 88=None 77=DK/NS 99=Re...,2013,Core Question,4
4,Health Care Access,"Do you have any kind of health care coverage, ...",HLTHPLN1,1=Yes 2=No 7=DK/NS 9=Refused,2013,Core Question,5
5,Health Care Coverage/Access,Do you have one person you think of as your pe...,PERSDOC2,"1=Yes, only one 2=More than one 3=No 7=DK/N...",2013,Core Question,6
6,Health Care Coverage/Access,Was there a time during the past 12 months whe...,MEDCOST,1=Yes 2=No 7=DK/NS 9=Refused,2013,Core Question,7
7,Health Care Coverage/Access,About how long has it been since you last visi...,CHECKUP1,1=Within past year (anytime less than 12 month...,2013,Core Question,8
8,Inadequate Sleep,"On average, how many hours of sleep do you get...",SLEPTIM1,1-24=Number of hours [1-24] 77=DK/NS 99=Refused,2013,Core Question,9
9,Hypertension (Awareness),"Have you ever been told by a doctor, nurse, or...",BPHIGH4,"1=Yes 2=Yes, but female told only during preg...",2013,Core Question,10


### Format of Code - KeywordsExtractor

``` bash

keywords_df =  KeywordsExtractor(data, 
                                 text_column= 'text_column', 
                                 method= 'yake', 
                                 n=3, 
                                 stopword_language= 'english') 
```

The `KeywordsExtractor` extracts keywords from textual data within a
`pandas` DataFrame. Here's a detailed look at each of its arguments:

-   `data` : The `pandas` DataFrame containing the `text data` from
    which you want to extract keywords. This DataFrame should have at
    least one `text_column` specified by the text_column argument.
    -   `text_column` : (str) The name of the column within the data
        DataFrame that contains the textual data for keyword extraction.
    -   `method` : (str) Specifies the method to be used for keyword
        extraction. The function supports the following methods:
        -   `frequency` : Extracts keywords based on word frequency,
            excluding common stopwords.
        -   `yake` : Utilizes YAKE (Yet Another Keyword Extractor), an
            unsupervised method that considers word frequency and
            position.- - `tf-idf` : Employs Term Frequency-Inverse
            Document Frequency, highlighting words that are particularly
            indicative of the text's content.
        -   `pos` : Focuses on part-of-speech tagging, typically
            selecting nouns as keywords.
        -   `ner`: Uses Named Entity Recognition to identify and extract
            entities (e.g., people, organizations) as keywords.
    -   `n` : (int) The number of keywords to extract from each piece of
        text.
    -   `stopwords_language` : (str) Indicates the language of the
        stopwords to be used for filtering during the keyword extraction
        process. This is relevant for methods that remove common words
        to focus on more meaningful content.

In [5]:
### Example usage

In [6]:
# Example usage
keywords_df =  KeywordsExtractor(data, 
                                 text_column= 'Question', 
                                 method= 'yake', 
                                 n=3, 
                                 stopword_language= 'english') 

[nltk_data] Downloading package punkt to C:\Users\kwadw.DESKTOP-
[nltk_data]     T9BSTPE\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\kwadw.DESKTOP-
[nltk_data]     T9BSTPE\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to C:\Users
[nltk_data]     \kwadw.DESKTOP-T9BSTPE\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [7]:
keywords_df

Unnamed: 0,Topic,Question,VariableName,Responses,Year,Type,DisplayOrder,Keywords
0,Health Status/Healthy Days,Would you say that in general your health is---,GENHLTH,1=Excellent 2=Very good 3=Good 4=Fair 5=Po...,2013,Core Question,1,"[general your health, general, health]"
1,Health Status/Healthy Days,"Now thinking about your physical health, which...",PHYSHLTH,__ __=Number of days 88=None 77=DK/NS 99=Re...,2013,Core Question,2,"[includes physical illness, physical health, i..."
2,Health Status/Healthy Days,"Now thinking about your mental health, which i...",MENTHLTH,__ __=Number of days 88=None 77=DK/NS 99=Re...,2013,Core Question,3,"[mental health, includes stress, problems with..."
3,Health Status/Healthy Days,"During the past 30 days, for about how many da...",POORHLTH,__ __=Number of days 88=None 77=DK/NS 99=Re...,2013,Core Question,4,"[usual activities, poor physical, physical or ..."
4,Health Care Access,"Do you have any kind of health care coverage, ...",HLTHPLN1,1=Yes 2=No 7=DK/NS 9=Refused,2013,Core Question,5,"[Indian Health Service, including health insur..."
...,...,...,...,...,...,...,...,...
5529,Demographics,Have you ever served on active duty in the Uni...,VETERAN3,1=Yes 2=No 7=Don’t know/Not Sure 9=Refused,2014,Core Question,30,"[States Armed Forces, United States Armed, mil..."
5530,Demographics,About how much do you weigh without shoes?,WEIGHT2,50-0999=Weight (pounds) 9000-9998=Weight (kilo...,2014,Core Question,36,"[weigh without shoes, shoes, weigh]"
5531,HIV/AIDS,Where did you have your last HIV test — at a p...,WHRTST10,1=Private doctor or HMO 2=Counseling and testi...,2014,Core Question,92,"[HIV test, HMO office, doctor or HMO]"
5532,Sodium or Salt-Related Behavior,Are you currently watching or reducing your so...,WTCHSALT,1=Yes 2=No 7=Don’t know/Not Sure 9=Refused,2014,Module Question,123,"[salt intake, watching or reducing, reducing y..."
