### Gensim
+ Gensim is an NLP library known for topic modeling
+ It lets you handle large text files without having to load the entire file in memory.


#### Topic Modeling
+ It is a technique to extract the underlying topics from large volumes of text.
+ It refers to the process of identifying hidden topics in a document


#### Installation
+ pip install gensim

#### API Overview

![](gensim_img01.png)
+ Corpora
+ Models
+ Similarities
+ Summarization
+ topic_coherence
+ parsing
+ etc

### Overview and Workflow
+ Document: refers to a text or sentence.
+ Corpus: a collection of documents.
+ Dictionary: a key value pair of the token and its ID useful for creating bag of words and for vectorization
+ Vector: refers to how the docx are represented in numbers or a mathematically convenient representation of a document.
+ Model: an algorithm for transforming vectors from one representation to another.

![](gensim_img02.png)

    
### Workflow
![](gensim_img03.png)
+ text => list of tokens ==> dictionary(id,tokens) ==> apply on new or old document to map them (doc2bow()) ==> model (train on the dictionary and transform to other word vectors)

In [1]:
mytext = """AI has a variety of applications that are anything from super-human to super-machine. Analytics, big data and automation are often the first places the mind of a marketer, brand manager, or support leader goes when thinking about how they can apply AI to their work, but Conversational AI offers a whole new category of capabilities that business leaders need to consider when they serve their customers and stakeholders.
The nativity of Jesus, nativity of Christ, birth of Christ or birth of Jesus is described in the Biblical gospels of Luke and Matthew. The two accounts agree that Jesus was born in Bethlehem in Judea, his mother Mary was betrothed to a man named Joseph, who was descended from King David and was not his biological father, and that his birth was caused by divine intervention.

The nativity is the basis for the Christian holiday of Christmas on December 25, and plays a major role in the Christian liturgical year. Many Christians traditionally display small manger scenes depicting the nativity in their homes, or attend Nativity Plays or Christmas pageants focusing on the nativity cycle in the Bible. Elaborate nativity displays called "creche scenes", featuring life-sized statues, are a tradition in many continental European countries during the Christmas season.

Christian congregations of the Western tradition (including the Catholic Church, the Western Rite Orthodox, the Anglican Communion, and many other Protestants, such as the Moravian Church) begin observing the season of Advent four Sundays before Christmas. Christians of the Eastern Orthodox Church and Oriental Orthodox Church observe a similar season, sometimes called Advent but also called the "Nativity Fast", which begins forty days before Christmas. 
Conversational AI is any machine that a person can talk to. This could be a chatbot on a website or social messaging app, a voice assistant or voice-enabled device, or any other interactive messaging-enabled interface. These solutions allow people to ask questions, get opinions or recommendations, execute transactions, find support or otherwise achieve a context-dependent goal through conversation."""

In [2]:
# Load the Pkg
import gensim

In [3]:
# Methods/Attrib
dir(gensim)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_matutils',
 'corpora',
 'interfaces',
 'logger',
 'logging',
 'matutils',
 'models',
 'parsing',
 'similarities',
 'topic_coherence',
 'utils']

In [4]:
# Preview
print(mytext)

AI has a variety of applications that are anything from super-human to super-machine. Analytics, big data and automation are often the first places the mind of a marketer, brand manager, or support leader goes when thinking about how they can apply AI to their work, but Conversational AI offers a whole new category of capabilities that business leaders need to consider when they serve their customers and stakeholders.
The nativity of Jesus, nativity of Christ, birth of Christ or birth of Jesus is described in the Biblical gospels of Luke and Matthew. The two accounts agree that Jesus was born in Bethlehem in Judea, his mother Mary was betrothed to a man named Joseph, who was descended from King David and was not his biological father, and that his birth was caused by divine intervention.

The nativity is the basis for the Christian holiday of Christmas on December 25, and plays a major role in the Christian liturgical year. Many Christians traditionally display small manger scenes depi

In [5]:
# Remove stopwords
import neattext.functions as nfx

In [6]:
docx = nfx.remove_stopwords(mytext)

In [7]:
docx

'AI variety applications super-human super-machine. Analytics, big data automation places mind marketer, brand manager, support leader goes thinking apply AI work, Conversational AI offers new category capabilities business leaders need consider serve customers stakeholders. nativity Jesus, nativity Christ, birth Christ birth Jesus described Biblical gospels Luke Matthew. accounts agree Jesus born Bethlehem Judea, mother Mary betrothed man named Joseph, descended King David biological father, birth caused divine intervention. nativity basis Christian holiday Christmas December 25, plays major role Christian liturgical year. Christians traditionally display small manger scenes depicting nativity homes, attend Nativity Plays Christmas pageants focusing nativity cycle Bible. Elaborate nativity displays called "creche scenes", featuring life-sized statues, tradition continental European countries Christmas season. Christian congregations Western tradition (including Catholic Church, Wester

In [9]:
# Step 1: Tokenization
# A List of tokens
# Method 1
for sent in docx.split('.'):
    print(sent)
    for token in sent.split():
        print(token)

AI variety applications super-human super-machine
AI
variety
applications
super-human
super-machine
 Analytics, big data automation places mind marketer, brand manager, support leader goes thinking apply AI work, Conversational AI offers new category capabilities business leaders need consider serve customers stakeholders
Analytics,
big
data
automation
places
mind
marketer,
brand
manager,
support
leader
goes
thinking
apply
AI
work,
Conversational
AI
offers
new
category
capabilities
business
leaders
need
consider
serve
customers
stakeholders
 nativity Jesus, nativity Christ, birth Christ birth Jesus described Biblical gospels Luke Matthew
nativity
Jesus,
nativity
Christ,
birth
Christ
birth
Jesus
described
Biblical
gospels
Luke
Matthew
 accounts agree Jesus born Bethlehem Judea, mother Mary betrothed man named Joseph, descended King David biological father, birth caused divine intervention
accounts
agree
Jesus
born
Bethlehem
Judea,
mother
Mary
betrothed
man
named
Joseph,
descended
King
D

In [10]:
list_of_tokens = [[token for token in sent.split()] for sent in docx.split('.')]

In [11]:
print(list_of_tokens)

[['AI', 'variety', 'applications', 'super-human', 'super-machine'], ['Analytics,', 'big', 'data', 'automation', 'places', 'mind', 'marketer,', 'brand', 'manager,', 'support', 'leader', 'goes', 'thinking', 'apply', 'AI', 'work,', 'Conversational', 'AI', 'offers', 'new', 'category', 'capabilities', 'business', 'leaders', 'need', 'consider', 'serve', 'customers', 'stakeholders'], ['nativity', 'Jesus,', 'nativity', 'Christ,', 'birth', 'Christ', 'birth', 'Jesus', 'described', 'Biblical', 'gospels', 'Luke', 'Matthew'], ['accounts', 'agree', 'Jesus', 'born', 'Bethlehem', 'Judea,', 'mother', 'Mary', 'betrothed', 'man', 'named', 'Joseph,', 'descended', 'King', 'David', 'biological', 'father,', 'birth', 'caused', 'divine', 'intervention'], ['nativity', 'basis', 'Christian', 'holiday', 'Christmas', 'December', '25,', 'plays', 'major', 'role', 'Christian', 'liturgical', 'year'], ['Christians', 'traditionally', 'display', 'small', 'manger', 'scenes', 'depicting', 'nativity', 'homes,', 'attend', 'Na

In [12]:
# Method 2: Tokenization Using Simple Process
from gensim.utils import simple_preprocess

In [13]:
# TOkenize our text
list_of_tokens2 = simple_preprocess(docx)

In [14]:
list_of_tokens2

['ai',
 'variety',
 'applications',
 'super',
 'human',
 'super',
 'machine',
 'analytics',
 'big',
 'data',
 'automation',
 'places',
 'mind',
 'marketer',
 'brand',
 'manager',
 'support',
 'leader',
 'goes',
 'thinking',
 'apply',
 'ai',
 'work',
 'conversational',
 'ai',
 'offers',
 'new',
 'category',
 'capabilities',
 'business',
 'leaders',
 'need',
 'consider',
 'serve',
 'customers',
 'stakeholders',
 'nativity',
 'jesus',
 'nativity',
 'christ',
 'birth',
 'christ',
 'birth',
 'jesus',
 'described',
 'biblical',
 'gospels',
 'luke',
 'matthew',
 'accounts',
 'agree',
 'jesus',
 'born',
 'bethlehem',
 'judea',
 'mother',
 'mary',
 'betrothed',
 'man',
 'named',
 'joseph',
 'descended',
 'king',
 'david',
 'biological',
 'father',
 'birth',
 'caused',
 'divine',
 'intervention',
 'nativity',
 'basis',
 'christian',
 'holiday',
 'christmas',
 'december',
 'plays',
 'major',
 'role',
 'christian',
 'liturgical',
 'year',
 'christians',
 'traditionally',
 'display',
 'small',
 'ma

In [None]:
# Step 2: Convert our Tokens to Dictionary of Tokens and ID
# Corpora.Dictionary()

In [15]:
from gensim import corpora

In [16]:
# Methods of Corpora
dir(corpora)

['BleiCorpus',
 'Dictionary',
 'HashDictionary',
 'IndexedCorpus',
 'LowCorpus',
 'MalletCorpus',
 'MmCorpus',
 'OpinosisCorpus',
 'SvmLightCorpus',
 'TextCorpus',
 'TextDirectoryCorpus',
 'UciCorpus',
 'WikiCorpus',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_mmreader',
 'bleicorpus',
 'dictionary',
 'hashdictionary',
 'indexedcorpus',
 'lowcorpus',
 'malletcorpus',
 'mmcorpus',
 'opinosiscorpus',
 'svmlightcorpus',
 'textcorpus',
 'ucicorpus',
 'wikicorpus']

In [17]:
my_dict = corpora.Dictionary(list_of_tokens)

In [18]:
print(my_dict)

Dictionary(158 unique tokens: ['AI', 'applications', 'super-human', 'super-machine', 'variety']...)


In [19]:
# How to see the token and their id
my_dict.token2id

{'AI': 0,
 'applications': 1,
 'super-human': 2,
 'super-machine': 3,
 'variety': 4,
 'Analytics,': 5,
 'Conversational': 6,
 'apply': 7,
 'automation': 8,
 'big': 9,
 'brand': 10,
 'business': 11,
 'capabilities': 12,
 'category': 13,
 'consider': 14,
 'customers': 15,
 'data': 16,
 'goes': 17,
 'leader': 18,
 'leaders': 19,
 'manager,': 20,
 'marketer,': 21,
 'mind': 22,
 'need': 23,
 'new': 24,
 'offers': 25,
 'places': 26,
 'serve': 27,
 'stakeholders': 28,
 'support': 29,
 'thinking': 30,
 'work,': 31,
 'Biblical': 32,
 'Christ': 33,
 'Christ,': 34,
 'Jesus': 35,
 'Jesus,': 36,
 'Luke': 37,
 'Matthew': 38,
 'birth': 39,
 'described': 40,
 'gospels': 41,
 'nativity': 42,
 'Bethlehem': 43,
 'David': 44,
 'Joseph,': 45,
 'Judea,': 46,
 'King': 47,
 'Mary': 48,
 'accounts': 49,
 'agree': 50,
 'betrothed': 51,
 'biological': 52,
 'born': 53,
 'caused': 54,
 'descended': 55,
 'divine': 56,
 'father,': 57,
 'intervention': 58,
 'man': 59,
 'mother': 60,
 'named': 61,
 '25,': 62,
 'Christ

In [20]:
# Check for the type
type(my_dict.token2id)

dict

In [21]:
# How to get the id of a token
my_dict.token2id['applications']

1

In [None]:
### Using Pure Python
# + list of tokens
# + assign ID(number) via enumeration

In [22]:
list_of_tokens[0]

['AI', 'variety', 'applications', 'super-human', 'super-machine']

In [24]:
for i,token in enumerate(list_of_tokens[0]):
    print(token,i)

AI 0
variety 1
applications 2
super-human 3
super-machine 4


In [25]:
def custom_dictionary_maker(list_of_tokens):
    dict_map = {token:i for i,token in enumerate(set(list_of_tokens))}
    return dict_map

In [26]:
custom_dictionary_maker(list_of_tokens[0])

{'AI': 0,
 'applications': 1,
 'variety': 2,
 'super-machine': 3,
 'super-human': 4}

In [27]:
# Find the ID of a token
cust_dict = custom_dictionary_maker(list_of_tokens[0])

In [28]:
cust_dict['applications']

1

### Workflow
![](gensim_img03.png)

In [47]:
my_dict.token2id['language']

KeyError: 'language'

In [48]:
# How to Add More Words/Tokens to your dictionary
ex2 = "natural language processing"

In [49]:
# Step1 Convert to list of tokens
ex2.split()

['natural', 'language', 'processing']

In [50]:
my_dict.add_documents([ex2.split()])

In [51]:
my_dict.token2id

{'AI': 0,
 'anything': 1,
 'applications': 2,
 'super-human': 3,
 'super-machine': 4,
 'variety': 5,
 'Analytics,': 6,
 'Conversational': 7,
 'apply': 8,
 'automation': 9,
 'big': 10,
 'brand': 11,
 'business': 12,
 'capabilities': 13,
 'category': 14,
 'consider': 15,
 'customers': 16,
 'data': 17,
 'first': 18,
 'goes': 19,
 'leader': 20,
 'leaders': 21,
 'manager,': 22,
 'marketer,': 23,
 'mind': 24,
 'need': 25,
 'new': 26,
 'offers': 27,
 'often': 28,
 'places': 29,
 'serve': 30,
 'stakeholders': 31,
 'support': 32,
 'thinking': 33,
 'whole': 34,
 'work,': 35,
 'Biblical': 36,
 'Christ': 37,
 'Christ,': 38,
 'Jesus': 39,
 'Jesus,': 40,
 'Luke': 41,
 'Matthew': 42,
 'birth': 43,
 'described': 44,
 'gospels': 45,
 'nativity': 46,
 'Bethlehem': 47,
 'David': 48,
 'Joseph,': 49,
 'Judea,': 50,
 'King': 51,
 'Mary': 52,
 'accounts': 53,
 'agree': 54,
 'betrothed': 55,
 'biological': 56,
 'born': 57,
 'caused': 58,
 'descended': 59,
 'divine': 60,
 'father,': 61,
 'intervention': 62,
 '

In [52]:
my_dict.token2id['language']

172

In [55]:
docx = nfx.remove_stopwords(mytext.lower())

In [56]:
docx

'ai variety applications anything super-human super-machine. analytics, big data automation often first places mind marketer, brand manager, support leader goes thinking apply ai work, conversational ai offers whole new category capabilities business leaders need consider serve customers stakeholders. nativity jesus, nativity christ, birth christ birth jesus described biblical gospels luke matthew. two accounts agree jesus born bethlehem judea, mother mary betrothed man named joseph, descended king david biological father, birth caused divine intervention. nativity basis christian holiday christmas december 25, plays major role christian liturgical year. many christians traditionally display small manger scenes depicting nativity homes, attend nativity plays christmas pageants focusing nativity cycle bible. elaborate nativity displays called "creche scenes", featuring life-sized statues, tradition many continental european countries christmas season. christian congregations western tra

In [57]:
list_of_tokens = [[token for token in sent.split()] for sent in docx.split('.')]

In [58]:
list_of_tokens

[['ai', 'variety', 'applications', 'anything', 'super-human', 'super-machine'],
 ['analytics,',
  'big',
  'data',
  'automation',
  'often',
  'first',
  'places',
  'mind',
  'marketer,',
  'brand',
  'manager,',
  'support',
  'leader',
  'goes',
  'thinking',
  'apply',
  'ai',
  'work,',
  'conversational',
  'ai',
  'offers',
  'whole',
  'new',
  'category',
  'capabilities',
  'business',
  'leaders',
  'need',
  'consider',
  'serve',
  'customers',
  'stakeholders'],
 ['nativity',
  'jesus,',
  'nativity',
  'christ,',
  'birth',
  'christ',
  'birth',
  'jesus',
  'described',
  'biblical',
  'gospels',
  'luke',
  'matthew'],
 ['two',
  'accounts',
  'agree',
  'jesus',
  'born',
  'bethlehem',
  'judea,',
  'mother',
  'mary',
  'betrothed',
  'man',
  'named',
  'joseph,',
  'descended',
  'king',
  'david',
  'biological',
  'father,',
  'birth',
  'caused',
  'divine',
  'intervention'],
 ['nativity',
  'basis',
  'christian',
  'holiday',
  'christmas',
  'december',
 

In [59]:
list_of_tokens

[['ai', 'variety', 'applications', 'anything', 'super-human', 'super-machine'],
 ['analytics,',
  'big',
  'data',
  'automation',
  'often',
  'first',
  'places',
  'mind',
  'marketer,',
  'brand',
  'manager,',
  'support',
  'leader',
  'goes',
  'thinking',
  'apply',
  'ai',
  'work,',
  'conversational',
  'ai',
  'offers',
  'whole',
  'new',
  'category',
  'capabilities',
  'business',
  'leaders',
  'need',
  'consider',
  'serve',
  'customers',
  'stakeholders'],
 ['nativity',
  'jesus,',
  'nativity',
  'christ,',
  'birth',
  'christ',
  'birth',
  'jesus',
  'described',
  'biblical',
  'gospels',
  'luke',
  'matthew'],
 ['two',
  'accounts',
  'agree',
  'jesus',
  'born',
  'bethlehem',
  'judea,',
  'mother',
  'mary',
  'betrothed',
  'man',
  'named',
  'joseph,',
  'descended',
  'king',
  'david',
  'biological',
  'father,',
  'birth',
  'caused',
  'divine',
  'intervention'],
 ['nativity',
  'basis',
  'christian',
  'holiday',
  'christmas',
  'december',
 

In [60]:
my_dict = corpora.Dictionary(list_of_tokens)

In [61]:
print(my_dict)

Dictionary(169 unique tokens: ['ai', 'anything', 'applications', 'super-human', 'super-machine']...)


### Workflow
![](gensim_img03.png)

#### Mapping Our Document with our Dictionary to Create A Bag of Word with Count/Frequency
+ .docx2bow()

In [62]:
# Tokens
print(list_of_tokens)

[['ai', 'variety', 'applications', 'anything', 'super-human', 'super-machine'], ['analytics,', 'big', 'data', 'automation', 'often', 'first', 'places', 'mind', 'marketer,', 'brand', 'manager,', 'support', 'leader', 'goes', 'thinking', 'apply', 'ai', 'work,', 'conversational', 'ai', 'offers', 'whole', 'new', 'category', 'capabilities', 'business', 'leaders', 'need', 'consider', 'serve', 'customers', 'stakeholders'], ['nativity', 'jesus,', 'nativity', 'christ,', 'birth', 'christ', 'birth', 'jesus', 'described', 'biblical', 'gospels', 'luke', 'matthew'], ['two', 'accounts', 'agree', 'jesus', 'born', 'bethlehem', 'judea,', 'mother', 'mary', 'betrothed', 'man', 'named', 'joseph,', 'descended', 'king', 'david', 'biological', 'father,', 'birth', 'caused', 'divine', 'intervention'], ['nativity', 'basis', 'christian', 'holiday', 'christmas', 'december', '25,', 'plays', 'major', 'role', 'christian', 'liturgical', 'year'], ['many', 'christians', 'traditionally', 'display', 'small', 'manger', 'sce

In [63]:
# Dictionary 
print(my_dict)

Dictionary(169 unique tokens: ['ai', 'anything', 'applications', 'super-human', 'super-machine']...)


In [64]:
# Original Document
docx

'ai variety applications anything super-human super-machine. analytics, big data automation often first places mind marketer, brand manager, support leader goes thinking apply ai work, conversational ai offers whole new category capabilities business leaders need consider serve customers stakeholders. nativity jesus, nativity christ, birth christ birth jesus described biblical gospels luke matthew. two accounts agree jesus born bethlehem judea, mother mary betrothed man named joseph, descended king david biological father, birth caused divine intervention. nativity basis christian holiday christmas december 25, plays major role christian liturgical year. many christians traditionally display small manger scenes depicting nativity homes, attend nativity plays christmas pageants focusing nativity cycle bible. elaborate nativity displays called "creche scenes", featuring life-sized statues, tradition many continental european countries christmas season. christian congregations western tra

In [65]:
bow_corpus = [my_dict.doc2bow(token) for token in list_of_tokens]

In [66]:
bow_corpus

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)],
 [(0, 2),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 1),
  (15, 1),
  (16, 1),
  (17, 1),
  (18, 1),
  (19, 1),
  (20, 1),
  (21, 1),
  (22, 1),
  (23, 1),
  (24, 1),
  (25, 1),
  (26, 1),
  (27, 1),
  (28, 1),
  (29, 1),
  (30, 1),
  (31, 1),
  (32, 1),
  (33, 1),
  (34, 1),
  (35, 1)],
 [(36, 1),
  (37, 2),
  (38, 1),
  (39, 1),
  (40, 1),
  (41, 1),
  (42, 1),
  (43, 1),
  (44, 1),
  (45, 1),
  (46, 2)],
 [(37, 1),
  (42, 1),
  (47, 1),
  (48, 1),
  (49, 1),
  (50, 1),
  (51, 1),
  (52, 1),
  (53, 1),
  (54, 1),
  (55, 1),
  (56, 1),
  (57, 1),
  (58, 1),
  (59, 1),
  (60, 1),
  (61, 1),
  (62, 1),
  (63, 1),
  (64, 1),
  (65, 1),
  (66, 1)],
 [(46, 1),
  (67, 1),
  (68, 1),
  (69, 2),
  (70, 1),
  (71, 1),
  (72, 1),
  (73, 1),
  (74, 1),
  (75, 1),
  (76, 1),
  (77, 1)],
 [(46, 3),
  (70, 1),
  (75, 1),
  (78, 1),
  (79, 1),
  (80, 1),
  (81, 1),
  (82, 1),
  (83, 1),
  (84, 1),
  (8

In [None]:
# What does the numbers mean
+ (token_id,feature_value)
+ (token_id,count_as_a_feature)

In [None]:
# How to Make it Human Readable (How to get the token and the count)


In [88]:
# Get the token using id
my_dict[1]

'anything'

In [89]:
# Get the id using the token
my_dict.token2id["anything"]

1

In [93]:
for word_vec  in bow_corpus:
#     print(word_vec)
    for i,c in word_vec:
        print(my_dict[i],c)

ai 1
anything 1
applications 1
super-human 1
super-machine 1
variety 1
ai 2
analytics, 1
apply 1
automation 1
big 1
brand 1
business 1
capabilities 1
category 1
consider 1
conversational 1
customers 1
data 1
first 1
goes 1
leader 1
leaders 1
manager, 1
marketer, 1
mind 1
need 1
new 1
offers 1
often 1
places 1
serve 1
stakeholders 1
support 1
thinking 1
whole 1
work, 1
biblical 1
birth 2
christ 1
christ, 1
described 1
gospels 1
jesus 1
jesus, 1
luke 1
matthew 1
nativity 2
birth 1
jesus 1
accounts 1
agree 1
bethlehem 1
betrothed 1
biological 1
born 1
caused 1
david 1
descended 1
divine 1
father, 1
intervention 1
joseph, 1
judea, 1
king 1
man 1
mary 1
mother 1
named 1
two 1
nativity 1
25, 1
basis 1
christian 2
christmas 1
december 1
holiday 1
liturgical 1
major 1
plays 1
role 1
year 1
nativity 3
christmas 1
plays 1
attend 1
bible 1
christians 1
cycle 1
depicting 1
display 1
focusing 1
homes, 1
manger 1
many 1
pageants 1
scenes 1
small 1
traditionally 1
nativity 1
christmas 1
many 1
"crech

In [94]:
bow_corpus

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)],
 [(0, 2),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 1),
  (15, 1),
  (16, 1),
  (17, 1),
  (18, 1),
  (19, 1),
  (20, 1),
  (21, 1),
  (22, 1),
  (23, 1),
  (24, 1),
  (25, 1),
  (26, 1),
  (27, 1),
  (28, 1),
  (29, 1),
  (30, 1),
  (31, 1),
  (32, 1),
  (33, 1),
  (34, 1),
  (35, 1)],
 [(36, 1),
  (37, 2),
  (38, 1),
  (39, 1),
  (40, 1),
  (41, 1),
  (42, 1),
  (43, 1),
  (44, 1),
  (45, 1),
  (46, 2)],
 [(37, 1),
  (42, 1),
  (47, 1),
  (48, 1),
  (49, 1),
  (50, 1),
  (51, 1),
  (52, 1),
  (53, 1),
  (54, 1),
  (55, 1),
  (56, 1),
  (57, 1),
  (58, 1),
  (59, 1),
  (60, 1),
  (61, 1),
  (62, 1),
  (63, 1),
  (64, 1),
  (65, 1),
  (66, 1)],
 [(46, 1),
  (67, 1),
  (68, 1),
  (69, 2),
  (70, 1),
  (71, 1),
  (72, 1),
  (73, 1),
  (74, 1),
  (75, 1),
  (76, 1),
  (77, 1)],
 [(46, 3),
  (70, 1),
  (75, 1),
  (78, 1),
  (79, 1),
  (80, 1),
  (81, 1),
  (82, 1),
  (83, 1),
  (84, 1),
  (8

In [97]:
word_count_hr = [[(my_dict[i],c) for i,c in word_vec] for word_vec in bow_corpus]

In [98]:
print(word_count_hr)

[[('ai', 1), ('anything', 1), ('applications', 1), ('super-human', 1), ('super-machine', 1), ('variety', 1)], [('ai', 2), ('analytics,', 1), ('apply', 1), ('automation', 1), ('big', 1), ('brand', 1), ('business', 1), ('capabilities', 1), ('category', 1), ('consider', 1), ('conversational', 1), ('customers', 1), ('data', 1), ('first', 1), ('goes', 1), ('leader', 1), ('leaders', 1), ('manager,', 1), ('marketer,', 1), ('mind', 1), ('need', 1), ('new', 1), ('offers', 1), ('often', 1), ('places', 1), ('serve', 1), ('stakeholders', 1), ('support', 1), ('thinking', 1), ('whole', 1), ('work,', 1)], [('biblical', 1), ('birth', 2), ('christ', 1), ('christ,', 1), ('described', 1), ('gospels', 1), ('jesus', 1), ('jesus,', 1), ('luke', 1), ('matthew', 1), ('nativity', 2)], [('birth', 1), ('jesus', 1), ('accounts', 1), ('agree', 1), ('bethlehem', 1), ('betrothed', 1), ('biological', 1), ('born', 1), ('caused', 1), ('david', 1), ('descended', 1), ('divine', 1), ('father,', 1), ('intervention', 1), ('

In [99]:
# Method 2: Bow of Count using Counter
from collections import Counter

In [100]:
list_of_tokens

[['ai', 'variety', 'applications', 'anything', 'super-human', 'super-machine'],
 ['analytics,',
  'big',
  'data',
  'automation',
  'often',
  'first',
  'places',
  'mind',
  'marketer,',
  'brand',
  'manager,',
  'support',
  'leader',
  'goes',
  'thinking',
  'apply',
  'ai',
  'work,',
  'conversational',
  'ai',
  'offers',
  'whole',
  'new',
  'category',
  'capabilities',
  'business',
  'leaders',
  'need',
  'consider',
  'serve',
  'customers',
  'stakeholders'],
 ['nativity',
  'jesus,',
  'nativity',
  'christ,',
  'birth',
  'christ',
  'birth',
  'jesus',
  'described',
  'biblical',
  'gospels',
  'luke',
  'matthew'],
 ['two',
  'accounts',
  'agree',
  'jesus',
  'born',
  'bethlehem',
  'judea,',
  'mother',
  'mary',
  'betrothed',
  'man',
  'named',
  'joseph,',
  'descended',
  'king',
  'david',
  'biological',
  'father,',
  'birth',
  'caused',
  'divine',
  'intervention'],
 ['nativity',
  'basis',
  'christian',
  'holiday',
  'christmas',
  'december',
 

In [102]:
for line in list_of_tokens:
    print(line)
    for token in line:
        print(token)

['ai', 'variety', 'applications', 'anything', 'super-human', 'super-machine']
ai
variety
applications
anything
super-human
super-machine
['analytics,', 'big', 'data', 'automation', 'often', 'first', 'places', 'mind', 'marketer,', 'brand', 'manager,', 'support', 'leader', 'goes', 'thinking', 'apply', 'ai', 'work,', 'conversational', 'ai', 'offers', 'whole', 'new', 'category', 'capabilities', 'business', 'leaders', 'need', 'consider', 'serve', 'customers', 'stakeholders']
analytics,
big
data
automation
often
first
places
mind
marketer,
brand
manager,
support
leader
goes
thinking
apply
ai
work,
conversational
ai
offers
whole
new
category
capabilities
business
leaders
need
consider
serve
customers
stakeholders
['nativity', 'jesus,', 'nativity', 'christ,', 'birth', 'christ', 'birth', 'jesus', 'described', 'biblical', 'gospels', 'luke', 'matthew']
nativity
jesus,
nativity
christ,
birth
christ
birth
jesus
described
biblical
gospels
luke
matthew
['two', 'accounts', 'agree', 'jesus', 'born', 'b

In [104]:
token_list = [token for line in list_of_tokens for token in line]

In [105]:
word_count_2 = Counter(token_list)

In [106]:
word_count_2

Counter({'ai': 4,
         'variety': 1,
         'applications': 1,
         'anything': 1,
         'super-human': 1,
         'super-machine': 1,
         'analytics,': 1,
         'big': 1,
         'data': 1,
         'automation': 1,
         'often': 1,
         'first': 1,
         'places': 1,
         'mind': 1,
         'marketer,': 1,
         'brand': 1,
         'manager,': 1,
         'support': 2,
         'leader': 1,
         'goes': 1,
         'thinking': 1,
         'apply': 1,
         'work,': 1,
         'conversational': 2,
         'offers': 1,
         'whole': 1,
         'new': 1,
         'category': 1,
         'capabilities': 1,
         'business': 1,
         'leaders': 1,
         'need': 1,
         'consider': 1,
         'serve': 1,
         'customers': 1,
         'stakeholders': 1,
         'nativity': 7,
         'jesus,': 1,
         'christ,': 1,
         'birth': 3,
         'christ': 1,
         'jesus': 2,
         'described': 1,
        

### Workflow
![](gensim_img03.png)

#### Using a TFIDF (Term Frequency Inverse Document Freq) Model 
- to Transform our Bow to Term Frequency Count

In [107]:
from gensim import models

In [108]:
# Attrib/Method
dir(models)

['AuthorTopicModel',
 'BackMappingTranslationMatrix',
 'CoherenceModel',
 'Doc2Vec',
 'FastText',
 'HdpModel',
 'KeyedVectors',
 'LdaModel',
 'LdaMulticore',
 'LdaSeqModel',
 'LogEntropyModel',
 'LsiModel',
 'NormModel',
 'Phrases',
 'RpModel',
 'TfidfModel',
 'TranslationMatrix',
 'VocabTransform',
 'Word2Vec',
 'WordEmbeddingSimilarityIndex',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_fasttext_bin',
 '_utils_any2vec',
 'atmodel',
 'base_any2vec',
 'basemodel',
 'callbacks',
 'coherencemodel',
 'deprecated',
 'doc2vec',
 'doc2vec_corpusfile',
 'doc2vec_inner',
 'fasttext',
 'fasttext_corpusfile',
 'fasttext_inner',
 'hdpmodel',
 'interfaces',
 'keyedvectors',
 'ldamodel',
 'ldamulticore',
 'ldaseqmodel',
 'logentropy_model',
 'lsimodel',
 'normmodel',
 'phrases',
 'rpmodel',
 'tfidfmodel',
 'translation_matrix',
 'utils',
 'utils_any2vec',
 'word2vec',
 'word2vec_corpusfile',
 'word2vec_inner',
 'wr

In [109]:
bow_corpus

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)],
 [(0, 2),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 1),
  (15, 1),
  (16, 1),
  (17, 1),
  (18, 1),
  (19, 1),
  (20, 1),
  (21, 1),
  (22, 1),
  (23, 1),
  (24, 1),
  (25, 1),
  (26, 1),
  (27, 1),
  (28, 1),
  (29, 1),
  (30, 1),
  (31, 1),
  (32, 1),
  (33, 1),
  (34, 1),
  (35, 1)],
 [(36, 1),
  (37, 2),
  (38, 1),
  (39, 1),
  (40, 1),
  (41, 1),
  (42, 1),
  (43, 1),
  (44, 1),
  (45, 1),
  (46, 2)],
 [(37, 1),
  (42, 1),
  (47, 1),
  (48, 1),
  (49, 1),
  (50, 1),
  (51, 1),
  (52, 1),
  (53, 1),
  (54, 1),
  (55, 1),
  (56, 1),
  (57, 1),
  (58, 1),
  (59, 1),
  (60, 1),
  (61, 1),
  (62, 1),
  (63, 1),
  (64, 1),
  (65, 1),
  (66, 1)],
 [(46, 1),
  (67, 1),
  (68, 1),
  (69, 2),
  (70, 1),
  (71, 1),
  (72, 1),
  (73, 1),
  (74, 1),
  (75, 1),
  (76, 1),
  (77, 1)],
 [(46, 3),
  (70, 1),
  (75, 1),
  (78, 1),
  (79, 1),
  (80, 1),
  (81, 1),
  (82, 1),
  (83, 1),
  (84, 1),
  (8

In [110]:
# Instance
tfidf = models.TfidfModel(bow_corpus)

In [111]:
print(tfidf)

TfidfModel(num_docs=13, num_nnz=191)


In [114]:
# Apply to our text
for line in tfidf[bow_corpus]:
    for i,c in line:
        print(i,c)

0 0.2476971243586835
1 0.43327730948768567
2 0.43327730948768567
3 0.43327730948768567
4 0.43327730948768567
5 0.43327730948768567
0 0.2074653332816665
6 0.1814514836395096
7 0.1814514836395096
8 0.1814514836395096
9 0.1814514836395096
10 0.1814514836395096
11 0.1814514836395096
12 0.1814514836395096
13 0.1814514836395096
14 0.1814514836395096
15 0.13241636958266953
16 0.1814514836395096
17 0.1814514836395096
18 0.1814514836395096
19 0.1814514836395096
20 0.1814514836395096
21 0.1814514836395096
22 0.1814514836395096
23 0.1814514836395096
24 0.1814514836395096
25 0.1814514836395096
26 0.1814514836395096
27 0.1814514836395096
28 0.1814514836395096
29 0.1814514836395096
30 0.1814514836395096
31 0.1814514836395096
32 0.13241636958266953
33 0.1814514836395096
34 0.1814514836395096
35 0.1814514836395096
36 0.294788956250531
37 0.4302514655356631
38 0.294788956250531
39 0.294788956250531
40 0.294788956250531
41 0.294788956250531
42 0.21512573276783156
43 0.294788956250531
44 0.29478895625053

In [116]:
# Get the Token and their word_vectorization(new)
for line in tfidf[bow_corpus]:
    for i,c in line:
        print(f'{my_dict[i]}:,{c}')

ai:,0.2476971243586835
anything:,0.43327730948768567
applications:,0.43327730948768567
super-human:,0.43327730948768567
super-machine:,0.43327730948768567
variety:,0.43327730948768567
ai:,0.2074653332816665
analytics,:,0.1814514836395096
apply:,0.1814514836395096
automation:,0.1814514836395096
big:,0.1814514836395096
brand:,0.1814514836395096
business:,0.1814514836395096
capabilities:,0.1814514836395096
category:,0.1814514836395096
consider:,0.1814514836395096
conversational:,0.13241636958266953
customers:,0.1814514836395096
data:,0.1814514836395096
first:,0.1814514836395096
goes:,0.1814514836395096
leader:,0.1814514836395096
leaders:,0.1814514836395096
manager,:,0.1814514836395096
marketer,:,0.1814514836395096
mind:,0.1814514836395096
need:,0.1814514836395096
new:,0.1814514836395096
offers:,0.1814514836395096
often:,0.1814514836395096
places:,0.1814514836395096
serve:,0.1814514836395096
stakeholders:,0.1814514836395096
support:,0.13241636958266953
thinking:,0.1814514836395096
whole:,0

In [None]:
#### Using LDA Model to transform our Bow 
+ LDA
+ bow
+ dictionary
+ model

In [118]:
# Dictionary
print(my_dict)

Dictionary(169 unique tokens: ['ai', 'anything', 'applications', 'super-human', 'super-machine']...)


In [119]:
# Bow /Word Vectorization/Count
bow_corpus

[[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)],
 [(0, 2),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 1),
  (15, 1),
  (16, 1),
  (17, 1),
  (18, 1),
  (19, 1),
  (20, 1),
  (21, 1),
  (22, 1),
  (23, 1),
  (24, 1),
  (25, 1),
  (26, 1),
  (27, 1),
  (28, 1),
  (29, 1),
  (30, 1),
  (31, 1),
  (32, 1),
  (33, 1),
  (34, 1),
  (35, 1)],
 [(36, 1),
  (37, 2),
  (38, 1),
  (39, 1),
  (40, 1),
  (41, 1),
  (42, 1),
  (43, 1),
  (44, 1),
  (45, 1),
  (46, 2)],
 [(37, 1),
  (42, 1),
  (47, 1),
  (48, 1),
  (49, 1),
  (50, 1),
  (51, 1),
  (52, 1),
  (53, 1),
  (54, 1),
  (55, 1),
  (56, 1),
  (57, 1),
  (58, 1),
  (59, 1),
  (60, 1),
  (61, 1),
  (62, 1),
  (63, 1),
  (64, 1),
  (65, 1),
  (66, 1)],
 [(46, 1),
  (67, 1),
  (68, 1),
  (69, 2),
  (70, 1),
  (71, 1),
  (72, 1),
  (73, 1),
  (74, 1),
  (75, 1),
  (76, 1),
  (77, 1)],
 [(46, 3),
  (70, 1),
  (75, 1),
  (78, 1),
  (79, 1),
  (80, 1),
  (81, 1),
  (82, 1),
  (83, 1),
  (84, 1),
  (8

In [120]:
# Load our Model
from gensim import models

In [123]:
lda_model = models.LdaModel(corpus=bow_corpus,id2word=my_dict,random_state=42,num_topics=2)

In [124]:
# Get the topics
print(lda_model.print_topics())

[(0, '0.021*"nativity" + 0.017*"christmas" + 0.015*"ai" + 0.015*"called" + 0.014*"christian" + 0.014*"birth" + 0.013*"church" + 0.013*"orthodox" + 0.011*"advent" + 0.009*"western"'), (1, '0.022*"nativity" + 0.015*"christmas" + 0.013*"many" + 0.012*"ai" + 0.009*"support" + 0.009*"plays" + 0.009*"conversational" + 0.008*"tradition" + 0.008*"season" + 0.008*"birth"')]


In [125]:
for topic in lda_model.print_topics():
    print(topic)

(0, '0.021*"nativity" + 0.017*"christmas" + 0.015*"ai" + 0.015*"called" + 0.014*"christian" + 0.014*"birth" + 0.013*"church" + 0.013*"orthodox" + 0.011*"advent" + 0.009*"western"')
(1, '0.022*"nativity" + 0.015*"christmas" + 0.013*"many" + 0.012*"ai" + 0.009*"support" + 0.009*"plays" + 0.009*"conversational" + 0.008*"tradition" + 0.008*"season" + 0.008*"birth"')


In [126]:
# Get the topics
for topic in lda_model[bow_corpus[0]]:
    print("Topic",topic[0])

Topic 0
Topic 1


In [128]:
# Get the topics
for topic in lda_model[bow_corpus[0]]:
    print("Topic",topic[0])
    print("Weight",topic[1])

Topic 0
Weight 0.9205521
Topic 1
Weight 0.07944793
