Spacey

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


| Item                           | Description                                                                                                        |
|--------------------------------|--------------------------------------------------------------------------------------------------------------------|
| Tokenization                   | Segmenting text into words, punctuation etc.                                                                       |
| Lemmatization                  | Assigning the base forms of words, for example: "was" → "be" or "rats" → "rat".                                    |
| Named Entity Recognition (NER) | Labeling named "real-world" objects, like persons, companies or locations.                                         |
                                    |
                                     |



Stemming algorithm works by cutting the suffix or prefix from the word. Lemmatization is a more powerful operation as it takes into consideration the morphological analysis of the word.

Lemmatization returns the lemma, which is the root word of all its inflection forms.

In [2]:
import en_core_web_sm
nlp = en_core_web_sm.load()

doc = nlp(u"""
Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering
""")

lemma_word1 = [] 
for token in doc:
    lemma_word1.append(token.lemma_)
lemma_word1

['\n',
 'Design',
 'and',
 'develop',
 'and',
 'improve',
 '-PRON-',
 'ios',
 'app',
 'with',
 'teammate',
 'and',
 'backend',
 'team',
 '\n',
 'take',
 'responsibility',
 'of',
 'user',
 'feature',
 'development',
 'from',
 'design',
 'to',
 'release',
 '\n',
 'collaborate',
 'closely',
 'with',
 'product',
 'and',
 'business',
 'team',
 'to',
 'deliver',
 'good',
 'experience',
 'and',
 'feature',
 'for',
 'user',
 '\n',
 'Deal',
 'with',
 'complex',
 'and',
 'challenge',
 'technology',
 'stack',
 '\n',
 'Mentor',
 'junior',
 'software',
 'engineer',
 'and',
 'code',
 'review',
 '\n',
 'be',
 'responsible',
 'for',
 'the',
 'code',
 'quality',
 'in',
 'the',
 'team',
 '\n',
 'learn',
 'and',
 'get',
 'well',
 'at',
 'what',
 '-PRON-',
 'do',
 '\n',
 'Skill',
 '&',
 'requirement',
 '\n',
 'enjoy',
 'programming',
 'and',
 'care',
 'about',
 'user',
 'experience',
 'and',
 'software',
 'craftsmanship',
 '\n',
 'strong',
 'knowledge',
 'of',
 'swift',
 ',',
 'Objective',
 '-',
 'c',
 'a

 Extract the most important keywords from a chunk of text. Hot words.
 Keyword extraction code inside a function. It’s a lot more convenient and we can easily call it whenever we need to extract keywords from a big chunk of text. It accepts a string as an input parameter.

In [3]:
import spacy
from collections import Counter
from string import punctuation
nlp = spacy.load("en_core_web_sm")
def get_hotwords(text):
    result = []
    pos_tag = ['PROPN', 'ADJ', 'NOUN'] 
    doc = nlp(text.lower()) 
    for token in doc:
        if(token.text in nlp.Defaults.stop_words or token.text in punctuation):
            continue
        if(token.pos_ in pos_tag):
            result.append(token.text)
    return result
new_text = """

Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering

"""
output = set(get_hotwords(new_text))
most_common_list = Counter(output).most_common(10)
for item in most_common_list:
  print(item[0])

user
challenging
patterns
junior
stack
attitude
voice
skill
quality
related


In [4]:
from spacy.lang.en import English

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = English()

text = """Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering
"""

#  "nlp" Object is used to create documents with linguistic annotations.
my_doc = nlp(text)

# Create list of word tokens
token_list = []
for token in my_doc:
    token_list.append(token.text)

from spacy.lang.en.stop_words import STOP_WORDS

# Create list of word tokens after removing stopwords
filtered_sentence =[] 

for word in token_list:
    lexeme = nlp.vocab[word]
    if lexeme.is_stop == False:
        filtered_sentence.append(word) 
print(token_list)
print(filtered_sentence)   

['Design', 'and', 'Develop', 'and', 'improve', 'our', 'iOS', 'apps', 'with', 'teammates', 'and', 'backend', 'team', '\n', 'Take', 'responsibilities', 'of', 'user', 'feature', 'development', 'from', 'design', 'to', 'release', '\n', 'Collaborate', 'closely', 'with', 'product', 'and', 'business', 'team', 'to', 'deliver', 'best', 'experiences', 'and', 'features', 'for', 'user', '\n', 'Deal', 'with', 'complex', 'and', 'challenging', 'technologies', 'stack', '\n', 'Mentor', 'junior', 'software', 'engineer', 'and', 'code', 'review', '\n', 'Be', 'responsible', 'for', 'the', 'code', 'quality', 'in', 'the', 'team', '\n', 'Learn', 'and', 'get', 'better', 'at', 'what', 'you', 'do', '\n', 'Skill', '&', 'Requirements', '\n', 'Enjoy', 'programming', 'and', 'care', 'about', 'user', 'experiences', 'and', 'software', 'craftsmanship', '\n', 'Strong', 'knowledge', 'of', 'Swift', ',', 'Objective', '-', 'C', 'and', 'C', '(', 'including', 'C', 'toolchain', '-', 'gcc', ',', 'ld', ',', 'Makefile', ',', 'etc', 

KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.



In [5]:
pip install KeyBERT

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting KeyBERT
  Downloading keybert-0.5.1.tar.gz (19 kB)
Collecting sentence-transformers>=0.3.8
  Downloading sentence-transformers-2.2.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 6.3 MB/s 
Collecting rich>=10.4.0
  Downloading rich-12.4.4-py3-none-any.whl (232 kB)
[K     |████████████████████████████████| 232 kB 37.3 MB/s 
Collecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
[K     |████████████████████████████████| 51 kB 6.4 MB/s 
Collecting transformers<5.0.0,>=4.6.0
  Downloading transformers-4.19.2-py3-none-any.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 33.7 MB/s 
Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 37.1 MB/s 
[?25hCollecting huggingface-hub
  Download

First, document embeddings are extracted with BERT to get a document-level representation. Then, word embeddings are extracted for N-gram words/phrases. Finally, we use cosine similarity to find the words/phrases that are the most similar to the document. The most similar words could then be identified as the words that best describe the entire document.



In [6]:
from keybert import KeyBERT

doc = """
        
Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering
"""
kw_model = KeyBERT()
#keywords = kw_model.extract_keywords(doc)
keywords=kw_model.extract_keywords(doc, keyphrase_ngram_range=(1, 3), stop_words=None)


keywords = kw_model.extract_keywords(doc, highlight=True)

print(keywords)

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

[('ios', 0.4708), ('swift', 0.3806), ('develop', 0.379), ('skills', 0.3636), ('development', 0.354)]


You can set keyphrase_ngram_range to set the length of the resulting keywords/keyphrases:



In [7]:
kw_model.extract_keywords(doc, keyphrase_ngram_range=(3, 3), stop_words='english',
                              use_mmr=True, diversity=0.7)


[('ios application development', 0.6185),
 ('good communication skills', 0.4251),
 ('science equivalent field', 0.0869),
 ('makefile strong knowledge', 0.1537),
 ('mentor junior software', 0.3969)]

In [8]:
kw_model.extract_keywords(doc, keyphrase_ngram_range=(3, 3), stop_words='english',
                              use_mmr=True, diversity=0.2)

[('ios application development', 0.6185),
 ('develop improve ios', 0.6153),
 ('junior software engineer', 0.4842),
 ('skills participate ios', 0.5487),
 ('strong knowledge ios', 0.5077)]

In [9]:
doc = """Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering
      """

kw_model = KeyBERT()
seed_keywords = ["information"]
keywords = kw_model.extract_keywords(doc, use_mmr=True, diversity=0.1, seed_keywords=seed_keywords)
print(keywords)

[('ios', 0.4999), ('skills', 0.4522), ('develop', 0.4306), ('requirements', 0.4051), ('swift', 0.4239)]


In [10]:
jd = """
        
1. Embedded software development for the Automotive Industry will be preferred but not a must
2. Develop Real Time Embedded Systems,
3. Software Coding, design, development, review, test, project deployment
4. Carry out System Integration of independent embedded modules into a coherent system
5. On-site installation, support of customer issues, site testing and trial, commissioning
6. Generate software specifications, documentation
7. Perform software configuration management
Requirements
1. Bachelor in Software/Computer/Electrical Engineering or Computer Science
2. Minimum 3 years of relevant working experience as a Software Engineer or related positions
3. Experience in full software development life cycle for embedded system,
4. Proficient in object-oriented programming and design in c/c++/c#/Java/Python/Linux shell scripts under Windows/Linux Environments.
5. Experiences in Wireless communication system / protocols - RF/WIFI/BT/DSRC communication will be added advantage
6. Experience in Serial Bus communication such as CAN, RS422/RS485, RS232, SPI, I2C, etc
"""
kw_model = KeyBERT()
#keywords = kw_model.extract_keywords(doc)
keywords=kw_model.extract_keywords(jd, keyphrase_ngram_range=(1, 3), stop_words=None)


keywords = kw_model.extract_keywords(jd, highlight=True)

print(keywords)

[('automotive', 0.3856), ('programming', 0.3487), ('embedded', 0.3445), ('linux', 0.3112), ('requirements', 0.3106)]


https://www.analyticsvidhya.com/blog/2019/08/how-to-remove-stopwords-text-normalization-nltk-spacy-gensim-python/?

In [11]:
pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [12]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('word_tokenize')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Error loading word_tokenize: Package 'word_tokenize' not
[nltk_data]     found in index


False

In [13]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [14]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
set(stopwords.words('english'))

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


{'a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 'her',
 'here',
 'hers',
 'herself',
 'him',
 'himself',
 'his',
 'how',
 'i',
 'if',
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it's",
 'its',
 'itself',
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'only',
 'or',
 'other',
 'our',
 'ours',
 'ourselves',
 'out',
 'over',
 'own',
 'r

In [15]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

# sample sentence
text = """

Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering
"""

# set of stop words
stop_words = set(stopwords.words('english')) 

# tokens of words  
word_tokens = word_tokenize(text) 
    
filtered_sentence = [] 
  
for w in word_tokens: 
    if w not in stop_words: 
        filtered_sentence.append(w) 



print("\n\nOriginal Sentence \n\n")
print(" ".join(word_tokens)) 

print("\n\nFiltered Sentence \n\n")
print(" ".join(filtered_sentence)) 


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Original Sentence 


Design and Develop and improve our iOS apps with teammates and backend team Take responsibilities of user feature development from design to release Collaborate closely with product and business team to deliver best experiences and features for user Deal with complex and challenging technologies stack Mentor junior software engineer and code review Be responsible for the code quality in the team Learn and get better at what you do Skill & Requirements Enjoy programming and care about user experiences and software craftsmanship Strong knowledge of Swift , Objective-C and C ( including C toolchain - gcc , ld , Makefile , etc . ) Strong knowledge of iOS platform and familiar with related projects ( Cocoapods , Carthage , Alamofire , etc . ) Familiar with Git and git workflow Good understanding of iOS app architecture and use of design patterns Good understa

In [16]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [17]:
text = """
Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering
"""

In [18]:
to_tokenize = text


In [19]:
import transformers
from transformers import pipeline
summarizer = pipeline("summarization")
summarized = summarizer(to_tokenize, min_length=75, max_length=300)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Your max_length is set to 300, but you input_length is only 257. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=128)


In [20]:
print(summarized)


[{'summary_text': ' Employers need a strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.) \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc. \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0Bachelors degree in Computer Science or equivalent in the field of software engineering. \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0\xa0Participate in iOS application development at least a few app lifecycles (from development to release) \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0 \xa0:\xa0Experience with messaging and voice/video call application development is a plus .'}]


In [21]:

# Program to measure the similarity between 
# two sentences using cosine similarity.
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
  
#X = input("Enter first string: ").lower()
#Y = input("Enter second string: ").lower()
X ="""Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering

"""
Y ="""Senior Software Engineers deliver high-quality functional applications and websites that satisfy the business needs and requirements of our clients. You will actively participate in the software development lifecycle, debug applications, and configure existing systems. We like people who love to code and have a history of building their own programs or applications. You should actively research creative solutions to problems.
Responsibilities
•        Develop and deploy fully functional applications based on business and technical specifications
•        Analyze overall development work and plan out tasks for maximum efficiency while reducing risk
•        Write clean, testable code including unit tests.
•        Debug applications and websites
•        Oversee your team’s technical deliverables, reviewing their code and coaching them to improve their skills
•        Work with multiple client partner teams to implement cohesive end-to-end experiences
•        Document development and operational procedures
Requirements
•        Degree in Computer Science, Engineering or equivalent
•        5+ years of professional software development experience
•        Demonstrable ability to build an end-to-end solution using enterprise technologies (E.g. .NET, Java, Salesforce, etc.)
•        Experience creating enterprise-level applications
•        Familiarity with agile environments
•        Excellent troubleshooting and problem-solving skills
•        Good communication skills


"""
  
# tokenization
X_list = word_tokenize(X) 
Y_list = word_tokenize(Y)
  
# sw contains the list of stopwords
sw = stopwords.words('english') 
l1 =[];l2 =[]
  
# remove stop words from the string
X_set = {w for w in X_list if not w in sw} 
Y_set = {w for w in Y_list if not w in sw}
  
# form a set containing keywords of both strings 
rvector = X_set.union(Y_set) 
for w in rvector:
    if w in X_set: l1.append(1) # create a vector
    else: l1.append(0)
    if w in Y_set: l2.append(1)
    else: l2.append(0)
c = 0
  
# cosine formula 
for i in range(len(rvector)):
        c+= l1[i]*l2[i]
cosine = c / float((sum(l1)*sum(l2))**0.5)
print("similarity: ", cosine)

similarity:  0.23229104921485555


In [22]:
from collections import Counter
from sklearn.metrics.pairwise import cosine_similarity

a_file = ['a', 'b', 'c']
b_file = ['b', 'x', 'y', 'z']

# count word occurrences
a_vals = Counter(a_file)
b_vals = Counter(b_file)

# convert to word-vectors
words  = list(a_vals.keys() | b_vals.keys())
a_vect = [a_vals.get(word, 0) for word in words]       
b_vect = [b_vals.get(word, 0) for word in words]        

# find cosine
len_a  = sum(av*av for av in a_vect) ** 0.5             
len_b  = sum(bv*bv for bv in b_vect) ** 0.5             
dot    = sum(av*bv for av,bv in zip(a_vect, b_vect))   
cosine = dot / (len_a * len_b) 

print(cosine)
print(cosine_similarity([a_vect], [b_vect]))

0.2886751345948129
[[0.28867513]]


In [23]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()

Document1 = """Degree in Information Technology, Computer Science, Biotechnology, Science or equivalent.
At least 5 years of research experience in the field of computer science, Data Preparation, Mathematical modelling and Simulation, Data Analytics and Optimization.
Knowledge of machine learning and data mining techniques in one or more areas of statistical modeling methods, time series, natural language, image/video text mining, optimization, information retrieval.
Proficient in Python, Excel, Scala, R, Matlab.
Proficient in Data science algorithms and strong understanding of the field.
Have experience in web development (HTML, CSS, JS).
Familarity with Linux.
Able to work independently in develop & operate the app in Hadoop environment.
Familiar with Spark, HBase, Hive, Hadoop and Big Data related technologies.
Experience in RHEL Linux, RDBMS and SQL is essential.
Experience in Cloud computing technologies (Azure/AWS).
Knowledge of Windows Server 2008 & 2012 and C# is an added advantage.
Good understanding of building a scalable system.
Practical, hands-on experience with data platforms and tool.
Strong critical thinking, problem-solving, programming skills and computer science knowledge."""

Document2 = """Design and Develop and improve our iOS apps with teammates and backend team
Take responsibilities of user feature development from design to release
Collaborate closely with product and business team to deliver best experiences and features for user
Deal with complex and challenging technologies stack
Mentor junior software engineer and code review
Be responsible for the code quality in the team
Learn and get better at what you do
Skill & Requirements
Enjoy programming and care about user experiences and software craftsmanship
Strong knowledge of Swift, Objective-C and C (including C toolchain - gcc, ld, Makefile, etc.)
Strong knowledge of iOS platform and familiar with related projects (Cocoapods, Carthage, Alamofire, etc.)
Familiar with Git and git workflow
Good understanding of iOS app architecture and use of design patterns
Good understanding of stack from UI to back-end, including web services and server-side integration (Experience with messaging and voice/video call application development is a plus)
Excellent analytical skills and proactive attitude
Good communication skills
Participate in iOS application development at least a few app lifecycles (from development to release)
Bachelors Degree in Computer Science or equivalent in the field of software engineering"""

corpus = [Document1,Document2]

X_train_counts = count_vect.fit_transform(corpus)

pd.DataFrame(X_train_counts.toarray(),columns=count_vect.get_feature_names(),index=['Document 1','Document 2'])


Unnamed: 0,2008,2012,able,about,added,advantage,alamofire,algorithms,an,analytical,...,video,voice,web,what,windows,with,work,workflow,years,you
Document 1,1,1,1,0,1,1,0,1,1,0,...,1,0,1,0,1,3,1,0,1,0
Document 2,0,0,0,1,0,0,1,0,0,1,...,1,1,1,1,0,6,0,1,0,1


In [24]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()

trsfm=vectorizer.fit_transform(corpus)
pd.DataFrame(trsfm.toarray(),columns=vectorizer.get_feature_names(),index=['Document 1','Document 2'])

Unnamed: 0,2008,2012,able,about,added,advantage,alamofire,algorithms,an,analytical,...,video,voice,web,what,windows,with,work,workflow,years,you
Document 1,0.056776,0.056776,0.056776,0.0,0.056776,0.056776,0.0,0.056776,0.056776,0.0,...,0.040396,0.0,0.040396,0.0,0.056776,0.121189,0.056776,0.0,0.056776,0.0
Document 2,0.0,0.0,0.0,0.049146,0.0,0.0,0.049146,0.0,0.0,0.049146,...,0.034968,0.049146,0.034968,0.049146,0.0,0.209807,0.0,0.049146,0.0,0.049146


In [25]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_similarity(trsfm[0:1], trsfm)

array([[1.        , 0.46897458]])

In [26]:

Document3 = """
Responsible for software development, implementation and support of applications, including:
Gather systems requirements from customers (internal and/or external) and ensure users sign-off requirements specification
Involved in design and development stage.
Develop test cases and conduct testing and tune the performance of systems to meet SLA.
Deploy system to production and provide UAT support.
Any other duties as and when assigned.
Defines site objectives by analyzing user requirements, envisioning system features and functionality.
Design and develop user interface by using MS Visual Studio .NET.
Write well designed, testable, efficient code by using best software development practices.
Integrate and test data from various back-end services and databases (SQL).
Able to manage development delivery schedule on time."""

corpus = [Document1,Document3]

X_train_counts = count_vect.fit_transform(corpus)

pd.DataFrame(X_train_counts.toarray(),columns=count_vect.get_feature_names(),index=['Document 1','Document 3'])

vectorizer = TfidfVectorizer()

trsfm=vectorizer.fit_transform(corpus)
pd.DataFrame(trsfm.toarray(),columns=vectorizer.get_feature_names(),index=['Document 1','Document 3'])

cosine_similarity(trsfm[0:1], trsfm)

array([[1.        , 0.29912728]])

# Glove