# Starting Code for Exercise 6

### Import Modules and Download Data

In [None]:
import re
import requests
from io import StringIO
import pandas as pd
import nltk
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
url_data = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTxbA16lnYbtH-j6PPrPogc6ft03gp0y5mmo1Nq3l-Pxnb05nP1C-mOxUYvTciA2gq5nkwAqz9Y7Imi/pub?gid=646892609&single=true&output=tsv'

In [None]:
def load_dataset(url):
    r = requests.get(url)
    data = r.content.decode('utf8')
    df = pd.read_csv(StringIO(data), sep='\t')
    return df

In [None]:
df = load_dataset(url_data)

### Inspect the Dataset

In [None]:
df.head(15)

Unnamed: 0,name,description,country,founding_date,relevancy
0,Pandora Car Rental,"Welcome to Pandora Car Rental, Car Hire and Ai...",United Kingdom,2011-04-05,0
1,SurplusMatch,SurplusMatch is an online marketplace for cont...,United Kingdom,2008-01-01,2
2,Gimenez Ganga,Giménez Ganga is a company that has been provi...,Switzerland,1959-01-01,0
3,SMC3,"Freight shippers, motor carriers, logistics se...",United States,1935-01-01,0
4,Much Asphalt,Much Asphalt is southern Africa’s commercial s...,South Africa,1965-01-01,0
5,The Hisey Company,The Hisey Company provides quality arbor care ...,United States,2011-02-19,0
6,"FREIGHTALIA, LTD.",#1 Automatic quoting system ever created for F...,United Kingdom,2015-09-26,0
7,Instant Access Au,Instant Access is a provider of Access equipme...,Australia,1968-01-01,1
8,CANOR International,CANOR International provides project managemen...,Hungary,1993-01-01,0
9,LISUTO,LISUTO is a Multi-language batch exhibition sy...,Japan,2016-11-01,1


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   name           2000 non-null   object
 1   description    2000 non-null   object
 2   country        2000 non-null   object
 3   founding_date  2000 non-null   object
 4   relevancy      2000 non-null   int64 
dtypes: int64(1), object(4)
memory usage: 78.2+ KB


In [None]:
# Check for null values
df.isnull().values.any()

False

### Preprocess the Data

In [None]:
nltk.download('stopwords')
stopwords = nltk.corpus.stopwords.words('english')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
stemmer = PorterStemmer()

def prep_process_tokenize(text):
    #websites, email and any punctuation cleaning
    text = re.sub("((\S+)?(http(s)?)(\S+))|((\S+)?(www)(\S+))|((\S+)?(\@)(\S+)?)", " ", text)
    text = re.sub("[^a-zA-Z ]", "", text)
    text = text.lower() # lower case the text
    text = nltk.word_tokenize(text)
    #removing stopwords
    text = [word for word in text if word not in stopwords]
    #stemming
    try:
        text = [stemmer.stem(word) for word in text]
        text = [word for word in text if len(word) > 1]
    except IndexError:
        pass
    return text


def pre_process(text):
    return " ".join(prep_process_tokenize(text))

### Tf-Idf Based Approach (Vector Space Modeling)

This cell passes the pre-processes description texts (which can be seen in the above dataframe) to a vectorizer which applies tfidf weighting --> therefore TfidfVectorizer. 
The pre-processing steps that are applied can be seen in the cell above. 

In [None]:
tfidf = TfidfVectorizer(preprocessor=pre_process).fit_transform(df.description)

I will explain the variables in the cell below: 

*   doc_index_to_compare: get the row index of the row of the company name specified on the right. In this case: Vahanalytics. It first turn the row into a list and then gets the entry at index 0 which is the index of the row in the df. 
*   top_k: just an int defining the number of most similar companies that should be extracted
*   cosine_similarities: this calculates the cosine similarity for the company description at the given index to all other company descriptions in the list. It does so using the tfidf vectorized representation of the descriptions. As the result is a list of lists, it is flattened



In [None]:
doc_index_to_compare = df.index[df['name'] == "Vahanalytics"].tolist()[0]
top_k = 5
cosine_similarities = cosine_similarity(
    tfidf[doc_index_to_compare:doc_index_to_compare + 1], 
    tfidf
    ).flatten()

The cosine similarity array is sorted and the 5 entries with the highest scores are extracted. ``argsort`` allows to still keep the original index. This means that these are the indices (of the descriptions) in the df as well. 

In [None]:
related_docs_indices = cosine_similarities.argsort()[:-top_k - 1:-1]

Based on the indices that we got in the cell above, we can now easily extract the row at the resprctive index from the original dataframe. 

In [None]:
tfidf_result_df = df[df.index.isin(related_docs_indices)]

This is a new df, only containing those entries that are in the top 5 most similar entries in the entire dataset.

In [None]:
tfidf_result_df

Unnamed: 0,name,description,country,founding_date,relevancy
93,Ship Supplies Direct,We aim to use digital technology to transform the marine logistics industry,Singapore,2018-05-14,1
656,BISAF,"BISAF is a technological company for the construction industry. We specialise in cutting edge solutions that make building easier, safer and environmentally friendly.",United Kingdom,2006-05-01,1
695,Vahanalytics,Vahanalytics aims to create better drivers and safer roads by using cutting edge big data and machine learning techniques.,India,2016-01-01,1
1542,GeoSpock,"GeoSpock brings together their expertise of big data engineering to unlock the hidden value of data silos in your organization. Their solution enables you to manage extreme amounts of data at speed enabling your organization to react to key insights in a timely manner for future business success. The technology enables a range of capabilities from data analytics, visualization of spatial data, cutting edge data indexing, custom querying of data sets, and data intelligence. To ensure that their customers get the maximum impact using the GeoSpock solution they work with them on a one to one basis as they understand that each organization approaches their data problems in a bespoke manner, this ensures that you get maximum business impact. In bringing together multiple datasets this enables the cost of data generation to be amortized over many applications, opening up new business models and monetization opportunities, therefore, bringing value to your business. They work across a number of markets including smart cities, automotive, mobile networks, IoT, enterprise, AdTech, asset management, and logistics.",United Kingdom,2013-01-01,1
1982,Axenda,"Axenda is a cloud-based software platform for construction management industry. The software platform is used by constructors and architects to manage day-to-day tasks and grow their businesses. The company's patent-pending algorithm uses machine learning to estimate materials & resources. It aims to predict project's estimates & completion deadlines. In addition, the platform also translates the data into 3D virtual models which give visual feedback of project's progress to clients.",Mexico,2017-01-01,2


#### Extended code for "Much Asphalt"

I just copied the code from above and changed the name.

In [None]:
doc_index_to_compare2 = df.index[df['name'] == "Much Asphalt"].tolist()[0]
top_k = 5
cosine_similarities2 = cosine_similarity(tfidf[doc_index_to_compare2:doc_index_to_compare2 + 1], tfidf).flatten()

In [None]:
related_docs_indices2 = cosine_similarities2.argsort()[:-top_k - 1:-1]

In [None]:
tfidf_result_df2 = df[df.index.isin(related_docs_indices2)]

In [None]:
pd.set_option('max_colwidth', None)
tfidf_result_df2

Unnamed: 0,name,description,country,founding_date,relevancy
4,Much Asphalt,Much Asphalt is southern Africa’s commercial supplier of an extensive range of hot and cold asphalt products to the road construction economy. Much Asphalt owns and operates 15 static plants in the major centres of South Africa and is the majority shareholder in East Coast Asphalt which operates two more in East London and Mthatha.,South Africa,1965-01-01,0
57,Sunland Asphalt,"Sunland Asphalt, a commercial asphalt paving company in Phoenix, provides commercial asphalt paving service at competitive price.",United States,1979-01-01,0
618,Central-Allied Enterprises,"Central States Construction was founded in 1929 by Ernest W. Hallett to produce sand and gravel and construct concrete highways in Minnesota. The business was successful, and in the early 1940s, operations expanded to western Ohio. In the 1940s, the company was heavily involved in the wartime expansion of Wright-Patterson Air Force Base and the post-war construction of the Ohio Turnpike. By the early 1950s, Ohio operations had expanded to include production of sand, gravel, asphalt, and concrete. The Ohio-based portion of the business became known as Allied Enterprises, and it made its permanent presence in Northeastern Ohio by the end of the 50s. Today, Central-Allied Enterprises is one of northeastern Ohio's leading producers of sand, gravel, asphalt, and paved asphalt surfaces.",United States,1929-01-01,0
862,FAST FELT,"The patented product FAST FELT®, with its plastic tabs pre-affixed to the asphalt saturated felt (commonly called ""tar paper"") is the only significant improvement in the recent history of the asphalt saturated felt underlayment products market.",United States,2007-01-01,0
1443,Saldus Celinieks,"Saldus Celinieks is specialising in road construction, extraction of aggregates and asphalt production.",Latvia,1991-01-01,1


***

# Topic Modeling Using LDA

In [None]:
from gensim import models, corpora, similarities
from nltk import FreqDist
import numpy as np
from scipy.stats import entropy
from collections import Counter


### 1. Apply the pre_process function to the description-column to create a new column called `tokenized`. This is the column we plan to use for training the LDA-algorithm.

In [None]:
pre_processed = [prep_process_tokenize(item) for item in df.description]

In [None]:
# the pre-processed text of the first entry
np.array(pre_processed[0])

array(['welcom', 'pandora', 'car', 'rental', 'car', 'hire', 'airport',
       'transfer', 'base', 'dalaman', 'turkey', 'wide', 'rang', 'car',
       'suit', 'budget', 'deliv', 'car', 'free', 'anytim', 'day', 'night',
       'within', 'dalaman', 'local', 'reason', 'book', 'car', 'pandora',
       'car', 'rental', 'unlimit', 'milag', 'vat', 'local', 'tax',
       'airport', 'servic', 'charg', 'applic', 'hour', 'road', 'servic',
       'third', 'parti', 'insur', 'excess', 'theft', 'insur', 'excess',
       'fire', 'insur', 'excess', 'fdw', 'insur', 'excess', 'cdw',
       'collis', 'damag', 'waiver', 'excess', 'twh', 'tyre', 'windscreen',
       'headlight', 'insur', 'excess', 'addit', 'driver', 'childbabi',
       'seat', 'must', 'order', 'hidden', 'extra', 'address', 'hadrian',
       'flat', 'number', 'wellington', 'telford', 'pin', 'code', 'tfrq',
       'tel', 'websit'], dtype='<U10')

### 2. Using this new column `tokenized`, find the 5000 most common tokens.




> I am using the python Counter class to count the tokens in the pre_processed lists and get the first 5000 tokens.





In [None]:
counter = Counter()

for tokens in pre_processed:
    counter.update(tokens)

most_common = counter.most_common(5000) # most common 5000 words

print(most_common[:5]) # the 5 most common words in the descriptions
print(len(counter)) # the vocabulary size of the descriptions (unique pre-processed tokens)

[('servic', 1434), ('compani', 1180), ('provid', 1087), ('construct', 893), ('manag', 843)]
10364




> Extracting the most common words only (without the count) and saving it to a list



In [None]:
most_common_5000 = [word for (word, number) in most_common]

print(len(most_common_5000))
most_common_5000[:5]

5000


['servic', 'compani', 'provid', 'construct', 'manag']

### 3. Remove all tokens that are not in the 5000 most common tokens from the column `tokenized`. 



> In order to do that I slightly changed the pre-process function as can be seen below (additional comment at line 10)




In [None]:
stemmer = PorterStemmer()

def prep_process_tokenize_lda(text, most_common):
    #websites, email and any punctuation cleaning
    text = re.sub("((\S+)?(http(s)?)(\S+))|((\S+)?(www)(\S+))|((\S+)?(\@)(\S+)?)", " ", text)
    text = re.sub("[^a-zA-Z ]", "", text)
    text = text.lower() # lower case the text
    text = nltk.word_tokenize(text)
    #removing stopwords
    text = [word for word in text if word not in stopwords and word in most_common] # only adds the word if it is in the most common list which is passed as an argument
    #stemming
    try:
        text = [stemmer.stem(word) for word in text]
        text = [word for word in text if len(word) > 1]
    except IndexError:
        pass
    return text



> Finally, the decriptions are pre-processed using the new function above and then saved to the 'tokenized' column.



In [None]:
pre_processed_only_most_common = [prep_process_tokenize_lda(text, most_common_5000) for text in df.description]
df['tokenized'] = pre_processed_only_most_common

np.array(df.iloc[0]['tokenized']) # displaying the same entry (the first one) as in the first step. The list is much shorter, so the tokens must have been removed.

array(['pandora', 'car', 'rental', 'car', 'hire', 'airport', 'dalaman',
       'turkey', 'wide', 'suit', 'car', 'free', 'day', 'night', 'within',
       'dalaman', 'book', 'car', 'pandora', 'car', 'rental', 'vat',
       'local', 'airport', 'road', 'third', 'excess', 'theft', 'excess',
       'fire', 'excess', 'fdw', 'excess', 'cdw', 'waiver', 'excess',
       'twh', 'tyre', 'windscreen', 'headlight', 'excess', 'seat', 'must',
       'hidden', 'address', 'hadrian', 'number', 'wellington', 'telford',
       'pin', 'code', 'tfrq', 'tel'], dtype='<U10')

### 4. Implement and execute the `train_lda`-function.


In [None]:
def train_lda(data, num_topics, chunksize):
    """Train LDA.
    Args:
        data: dataframe, the company data
        num_topics: int, the number of topics 
        chunksize: int
    Returns:
        lda: gensim.models.lda, trained-lda-model    
    """

    # extract the tokenized descriptions and convert it to a list of lists
    tokenized_corpus = data.tokenized.tolist()

    # map every token to an index
    id2word = corpora.Dictionary(tokenized_corpus)

    # represents the corpus as tuples of (word-id, document-id)
    corpus = [id2word.doc2bow(text) for text in tokenized_corpus]

    # trains the lad model
    lda = models.ldamodel.LdaModel(num_topics=num_topics, corpus=corpus, chunksize=chunksize, id2word=id2word)

    return lda, corpus

In [None]:
# setting the number of topics and documents
num_topics = 10
num_documents = 2000

In [None]:
lda, corpus = train_lda(df, num_topics, 10)

### 5. Use the `show_topic`-method to inspect the resulting topics.


In [None]:
for i in range(num_topics):
    print(f'Topic {i + 1}:\t{lda.show_topic(i)}')

Topic 1:	[('us', 0.08441924), ('product', 0.05906464), ('per', 0.041050255), ('use', 0.034521285), ('need', 0.03449909), ('get', 0.026101355), ('user', 0.024941608), ('oil', 0.022503605), ('model', 0.022414615), ('month', 0.020604176)]
Topic 2:	[('new', 0.064639285), ('one', 0.033055004), ('market', 0.030082464), ('ltd', 0.027648075), ('north', 0.025136624), ('brand', 0.02405977), ('retail', 0.022389904), ('across', 0.021522088), ('largest', 0.019195445), ('app', 0.018886173)]
Topic 3:	[('also', 0.09852923), ('project', 0.0884902), ('design', 0.072434366), ('drone', 0.04401816), ('space', 0.03269703), ('control', 0.02280834), ('fire', 0.022779142), ('cad', 0.021886786), ('supplier', 0.020904863), ('two', 0.01955086)]
Topic 4:	[('platform', 0.088589504), ('first', 0.06644707), ('offer', 0.045008965), ('access', 0.03368234), ('top', 0.028817791), ('search', 0.026314778), ('tool', 0.01902847), ('track', 0.017303605), ('region', 0.016465008), ('launch', 0.015878765)]
Topic 5:	[('car', 0.06

In [None]:
lda.show_topic(8)

[('system', 0.045638025),
 ('network', 0.039421316),
 ('chain', 0.034769062),
 ('cost', 0.030962704),
 ('data', 0.02939287),
 ('time', 0.028372979),
 ('price', 0.027522666),
 ('within', 0.025265358),
 ('well', 0.023312286),
 ('high', 0.022485912)]

### 6. Convert the LDA-results to a 2D array to use as a document-matrix.


In [None]:
document_matrix = np.array([[prob for (topic, prob) in lda.get_document_topics(bow, minimum_probability=0)] for bow in corpus])



> The maxtrix is 2D, each document is represented by the probabilities for the each topic



In [None]:
document_matrix.shape

(2000, 10)

In [None]:
document_matrix[0] # the representation of the first document

array([0.00185279, 0.19335058, 0.05452616, 0.00185207, 0.35391006,
       0.10978115, 0.00185213, 0.08203942, 0.19898348, 0.00185214],
      dtype=float32)

### 7. Extract the LDA-results for `Much Asphalt` and `Vahanalytics` and use them as a query vector to extract the 5 most closest matches using `get_top_k_similar_docs`.

In [None]:
def jensen_shannon(query, matrix):
    p = query[None,:].T + np.zeros([num_topics, num_documents])
    q = matrix.T
    m = 0.5*(p + q)
    return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))

def get_top_k_similar_docs(query, matrix, k=10):
    """Get the <k> most similar documents (represented by <matrix>) given a <query>.

    Args:
        query: 1D array
        matrix: 2D array
        k: int
    """
    sims = jensen_shannon(query,matrix)
    return sims.argsort()[:k]

#### Results for Much Asphalt

In [None]:
doc_index_much_asphalt = df.index[df['name'] == "Much Asphalt"].tolist()[0]
probabilities_asphalt = document_matrix[doc_index_much_asphalt]

# gets the class number with the highest probability
probabilities_asphalt.argmax()

2



> Looking at the most common words for the most probable topic for the 'Much Asphalt' company



In [None]:
lda.show_topic(probabilities_asphalt.argmax())

[('also', 0.09852923),
 ('project', 0.0884902),
 ('design', 0.072434366),
 ('drone', 0.04401816),
 ('space', 0.03269703),
 ('control', 0.02280834),
 ('fire', 0.022779142),
 ('cad', 0.021886786),
 ('supplier', 0.020904863),
 ('two', 0.01955086)]



> extracting the most similar documents for 'Much Asphalt'



In [None]:
most_similar_docs_asphalt = df[df.index.isin(get_top_k_similar_docs(probabilities_asphalt, document_matrix, 5))]
most_similar_docs_asphalt

Unnamed: 0,name,description,country,founding_date,relevancy,tokenized
4,Much Asphalt,Much Asphalt is southern Africa’s commercial supplier of an extensive range of hot and cold asphalt products to the road construction economy. Much Asphalt owns and operates 15 static plants in the major centres of South Africa and is the majority shareholder in East Coast Asphalt which operates two more in East London and Mthatha.,South Africa,1965-01-01,0,"[much, asphalt, southern, supplier, hot, cold, asphalt, road, much, asphalt, static, major, south, africa, east, coast, asphalt, two, east, london, mthatha]"
312,ALSO Holding AG,"ALSO Holding AG bundles logistics, financial, supply, solution, digital, and IT services together into individual service packages. The company offers services at all levels of the ICT value chain from a single source. It also provides customized services in the logistics, finance, information technology, and digital services sectors, as well as traditional distribution services. The company serves corporate resellers, value added resellers, small and medium-sized business resellers, retailers, and retailer channels. ALSO Holding AG is a Switzerland-based company that was founded in 1984.",Switzerland,1984-01-01,0,"[also, ag, chain, also, well, small, also, ag]"
338,VolkerWessels,"VolkerWessels is a construction firm that is focused on the design, development, realisation and management of construction projects. The Company's primary activities include real estate development, construction, civil engineering, railway construction, road construction, urban mobility objects construction, and energy networks construction, among others.",Netherlands,1854-01-01,1,"[firm, design, real, civil, railway, road, urban, among]"
1443,Saldus Celinieks,"Saldus Celinieks is specialising in road construction, extraction of aggregates and asphalt production.",Latvia,1991-01-01,1,"[road, asphalt]"
1706,Blumatica,"Blumatica develops software solutions for the construction industry. Its solutions are also targeted at security, design and public sector companies. It offers solutions for creating BIM models, creating 2D/3D designs, designing scaffoldings & other structures, estimate security costs on a construction site, design construction site layouts, and more.",Italy,1996-01-01,2,"[also, design, public, sector, bim, site, design, site]"


#### Results for Vahanalystics

In [None]:
doc_index_vahanalysitcs = df.index[df['name'] == "Vahanalytics"].tolist()[0]
probabilities_vahana = document_matrix[doc_index_vahanalysitcs]

# gets the class number with the highest probability
probabilities_vahana.argmax()

8

> Looking at the most common words for the most probable topic for the 'Vahanalytics' company

In [None]:
lda.show_topic(probabilities_vahana.argmax())

[('system', 0.045638025),
 ('network', 0.039421316),
 ('chain', 0.034769062),
 ('cost', 0.030962704),
 ('data', 0.02939287),
 ('time', 0.028372979),
 ('price', 0.027522666),
 ('within', 0.025265358),
 ('well', 0.023312286),
 ('high', 0.022485912)]



> extracting the most similar documents for 'Vahanalytics'



In [None]:
most_similar_docs_vahana = df[df.index.isin(get_top_k_similar_docs(probabilities_vahana, document_matrix, 5))]
most_similar_docs_vahana

Unnamed: 0,name,description,country,founding_date,relevancy,tokenized
130,Vanguard Logistics Services,"Vanguard Logistics Services is the neutral freight consolidation service, offering forwarders and customers of all sizes the world’s largest owned LCL (Less Than Container Load) end-to-end network, unparalleled schedule integrity, and industry-leading information technology applications.",United States,2001-01-01,0,"[neutral, freight, largest, lcl, less, load, endtoend, network]"
254,Paradise Exteriors,Paradise Exteriors is a family owned and operated business. their advantage is the ability to give high quality windows and low prices!,United States,2007-01-01,0,"[give, high, low]"
695,Vahanalytics,Vahanalytics aims to create better drivers and safer roads by using cutting edge big data and machine learning techniques.,India,2016-01-01,1,"[better, safer, big, data]"
1262,Reach,"Reach Promoters Pvt. Ltd. is a development Company promoted by the professionals having over 20 years of real estate domain expertise in consulting and advisory, empowered by transparency, consistency and commitment. Reach brings to life modern concepts & technology aligned with an Indian touch showcasing new thoughts & ideology to real estate development in this country thus by awarding luxury of urban scene in their upcoming high street retail destination clubbed with the state-of-art office spaces together managing a spread of close to a million sq. ft., along with more projects in the pipeline. Reach Promoters’ firm belief is in a well conceptualized and thought provoked world of Retail and Commercial Office suites rather than static concrete structures. Professionals at Reach stand committed to create life & experience in all their endeavors, today and forever to provide volume and velocity of the footfall. Our projects shall bring happiness through our expertise and well laid down thoughts taking care of not only great concepts and designs but also executing them with finesse which shall stand the test of all times. Reach shall deliver high quality projects with the commitment of on time delivery while showcasing the Class and Quality in all their Developments.",India,2015-03-01,2,"[reach, pvt, ltd, real, domain, reach, life, modern, indian, touch, new, real, urban, scene, high, street, retail, spread, close, million, sq, ft, along, reach, firm, well, thought, world, retail, rather, static, reach, stand, life, today, shall, bring, well, laid, care, great, also, shall, stand, test, reach, shall, high, time, class]"
1350,Cryo Storage Solutions,"Cryo Storage Solutions offers cryogenic and -80°c storage, disaster recovery and product/risk management services. They are knowledgeable about the nitrogen and passionate about the paperwork. They know about the cold stuff, letting you get on with the smart stuff. They hold the exclusive distribution territories of the UK and Eire for cryotherm of Germany, and can therefore supply you with high quality and highly efficient bio-banks, dewars, and SiVL pipework systems. Cryo Storage Solutions can offer a supply only arrangement or full turnkey installation services too.",United Kingdom,2017-01-01,0,"[cryo, nitrogen, paperwork, know, cold, stuff, get, smart, stuff, hold, uk, high, cryo, offer, full, turnkey]"




> Write it to excel to easily put it in the labreport



In [None]:
most_similar_docs_vahana.to_excel("vahan.xlsx")
most_similar_docs_asphalt.to_excel("asphalt.xlsx")