## **PARSING IBM VACANCY**

In [None]:
############################################
###     Research Trending Vacancies      ###
###     Sber Dep. Research&Innovation    ### 
###   Ivanov Arseny, Sergey Bratchikov   ###
###       A. Efimov, D. Asonov           ###
############################################

In [1]:
import time
import requests
from bs4 import BeautifulSoup
import json
import re
import time
import faker
import pandas as pd
from tqdm import tqdm
from datetime import datetime
from dateutil import parser

In [179]:
from collections import Counter

In [3]:
fake = faker.Faker(locale='en')

In [185]:
ibm_headers = {
    'User-Agent': fake.chrome(),
    'accept-language': 'en-US,en;q=0.9',
    'pragma': 'np-cache',
    'origin': 'https://www.ibm.com',
    'referer': 'https://www.ibm.com/'
}

In [186]:
API_URL = "http://jobsapi-internal.m-cloud.io/api/stjobbulk"

In [187]:
search_payload = {
    'organization': 2242,
    'limitkey': '4A8B5EF8-AA98-4A8B-907D-C21723FE4C6B',
    'facet': 'publish_to_cws:true',
    'fields': 'title,id,update_date,primary_country,description,primary_category,level,url,brand'
}

In [188]:
result = requests.get(API_URL, headers=ibm_headers, params=search_payload)
result

<Response [200]>

In [189]:
result.json()['totalHits'], len(result.json()['queryResult'])

(14930, 14930)

In [190]:
result.json()['queryResult'][0].keys()

dict_keys(['id', 'title', 'primary_country', 'description', 'primary_category', 'level', 'brand', 'url', 'update_date'])

In [191]:
Counter([j['primary_category'] for j in result.json()['queryResult']]).most_common(10)

[('Technical Specialist', 5832),
 ('Consultant', 2769),
 ('Software Development & Support', 1539),
 ('Finance', 1200),
 ('Sales', 891),
 ('Architect', 666),
 ('Project Management', 524),
 ('Human Resources', 262),
 ('Enterprise Operations', 225),
 ('Other', 199)]

In [192]:
Counter([j['brand'] for j in result.json()['queryResult']]).most_common(15)

[('(0063) IBM India Private Limited', 8043),
 ('(0147) International Business Machines Corporation', 1318),
 ('(1072) IBM Dalian Global Delivery Company Limited', 576),
 ('(0022) IBM Brasil-Industria, Maquinas e Servicos Limitada', 466),
 ('(7600) IBM Japan, Ltd.', 453),
 ('(0026) IBM Canada Limited - IBM Canada Limitee', 308),
 ('(7240) IBM Deutschland GmbH', 223),
 ('(0390) IBM de Mexico Comercializacion y Servicios', 222),
 ('(0007) IBM Argentina Sociedad de Responsabilidad Limitada', 221),
 ('(8660) IBM United Kingdom Limited', 192),
 ('(0563) IBM Services Talent Delivery Pte. Ltd.', 187),
 ('(0856) IBM Business Services', 178),
 ('(0112) IBM Romania Srl', 172),
 ('(0648) IBM Japan Digital Services Co.., Ltd - JPM', 118),
 ('(0891) IBM Solutions Delivery, Inc.', 107)]

In [193]:
clear_string = lambda x: re.sub(' +', ' ', re.sub('<.*?>', '', x).replace('•', '\n')).strip()

In [194]:
about_pattern = re.compile(r"Introduction(.+?)Your Role and Responsibilities", flags=re.DOTALL|re.IGNORECASE)
responsibilities_pattern = re.compile(r"Your Role and Responsibilities(.+?)Required Technical and Professional Expertise", flags=re.DOTALL|re.IGNORECASE)
qualifications_pattern = re.compile(r"Required Technical and Professional Expertise(.+?)Preferred Technical and Professional Expertise", flags=re.DOTALL|re.IGNORECASE)

In [196]:
job_dicts = []
for job_info in tqdm(result.json()['queryResult']):
    full_description = clear_string(job_info['description'])

    try:
        about = re.search(about_pattern, full_description).group(1)
        responsibilities = re.search(responsibilities_pattern, full_description).group(1)
        qualifications = re.search(qualifications_pattern, full_description).group(1)
    except:
        # print(f'Error while reading {job_info["url"]}')
        continue

    job_dict = {
        'title': job_info['title'],
        'internal_id' : job_info['id'],
        'url': job_info['url'],
        'description': about,
        'responsibilities': responsibilities,
        'qualifications': qualifications,
        'company': 'IBM',
        'grade': job_info['level'],
        'category': job_info['primary_category'],
        'publish_date': parser.parse(job_info['update_date'])
    }
    job_dicts.append(job_dict)
len(job_dicts)

100%|██████████| 14930/14930 [00:07<00:00, 2005.18it/s]


14205

In [197]:
snapshot = pd.DataFrame(job_dicts)
print(len(snapshot))
snapshot.sample(5)

14205


Unnamed: 0,title,internal_id,url,description,responsibilities,qualifications,company,grade,category,publish_date
7325,Application Developer: Microservices,15748571,https://careers.ibm.com/job/15748571/applicati...,"As an Application Developer, you will lead IBM...","As an Application Developer, you will lead IBM...",Minimum 8+ years of experience in Core Java pr...,IBM,Professional,Technical Specialist,2022-06-24 15:43:13+00:00
10254,Oracle Cloud ERP Financials Consultant,15546422,https://careers.ibm.com/job/15546422/oracle-cl...,"As a Package Consultant at IBM, get ready to t...",Role and Responsibilities : IBM Global Busines...,At least 3 years experience working with at le...,IBM,Professional,Consultant,2022-06-24 19:18:23+00:00
2580,Procurement Sourcing Buyer,15500517,https://careers.ibm.com/job/15500517/procureme...,"At IBM, work is more than a job - it's a calli...",A sneak peek into this role:As a self-driven P...,What you will bring to the team:Bachelor’s deg...,IBM,Professional,Supply Chain,2022-04-22 18:16:46+00:00
8879,Front-end Developer,16052752,https://careers.ibm.com/job/16052752/front-end...,We are looking for a DevOps Engineer who would...,Are you a person who would like to join a team...,"We are expecting:Experience with JavaScript, N...",IBM,Intern,Other,2022-06-24 19:18:41+00:00
395,Data Engineer-Python/Java/Scala&Cloud,15677004,https://careers.ibm.com/job/15677004/data-engi...,"At IBM, work is more than a job - it's a calli...",\n Participate (design & development) in the m...,\n Minimum 5 years experience developing with ...,IBM,Professional,Technical Specialist,2022-05-07 14:14:54+00:00


In [198]:
snapshot.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14205 entries, 0 to 14204
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype                  
---  ------            --------------  -----                  
 0   title             14205 non-null  object                 
 1   internal_id       14205 non-null  int64                  
 2   url               14205 non-null  object                 
 3   description       14205 non-null  object                 
 4   responsibilities  14205 non-null  object                 
 5   qualifications    14205 non-null  object                 
 6   company           14205 non-null  object                 
 7   grade             14205 non-null  object                 
 8   category          14205 non-null  object                 
 9   publish_date      14205 non-null  datetime64[ns, tzutc()]
dtypes: datetime64[ns, tzutc()](1), int64(1), object(8)
memory usage: 1.1+ MB


In [199]:
current_date = datetime.now().strftime('%d-%m-%Y')
current_date

'25-06-2022'

In [200]:
snapshot.to_csv(f'../data/ibm/{current_date}.csv')
snapshot.to_csv(f'../data/ibm/{current_date}.tsv', sep='\t')

#### Проверка на единичной вакансии

In [125]:
full_description = clear_string(result.json()['queryResult'][0]['description'])
full_description

"IntroductionAt IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk.Your Role and ResponsibilitiesAbout the RoleThe IBM Security Product Management team is seeking an experienced Product Manager who is technical, collaborative, and truly excited about building great endpoint security products. In this role, you will bring in-depth knowledge of the endpoint and security analytics market to lead the evolution of IBM Security’s visibility, detection, and prevention technologies for QRadar XDR portfolio. You should be able to translate a big picture vision into an execution strategy backed with market validation and customer insights. You will work closely with oth

In [126]:
re.search(about_pattern, full_description).group(1)

"At IBM, work is more than a job - it's a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you've never thought possible. Are you ready to lead in this new era of technology and solve some of the world's most challenging problems? If so, lets talk."

In [127]:
re.search(responsibilities_pattern, full_description).group(1)

'About the RoleThe IBM Security Product Management team is seeking an experienced Product Manager who is technical, collaborative, and truly excited about building great endpoint security products. In this role, you will bring in-depth knowledge of the endpoint and security analytics market to lead the evolution of IBM Security’s visibility, detection, and prevention technologies for QRadar XDR portfolio. You should be able to translate a big picture vision into an execution strategy backed with market validation and customer insights. You will work closely with other product managers, engineering, research, product marketing, sales, service and support. The successful candidate will have the ability to influence cross-functional teams in the company.Responsibilities\n Act as the product leader for initiatives that enhance Endpoint Detection and Response (EDR) visibility, detection, and prevention for Windows, Linux and macOS.\n Utilize strategic insight and organizational skills to id

In [128]:
re.search(qualifications_pattern, full_description).group(1)

'Demonstrated experience of product management in EDR and/or AV; previous work in malware and attack analysis, research, investigation, and response highly desirable\n Hands-on threat or investigation analyst experience in a SOC/SOAR highly desirable\n Curious about new technologies, systems, and tools\n Excellent communication skills, both verbal and written, with the ability to properly translate and articulate positioning and technology\n Demonstrated ability to collaborate with peers in research, engineering, and product marketing\n Ability to prioritize numerous simultaneous tasks\n Proven ability to work effectively with both local and remote teams\n This position requires up to 25% travel to customer and IBM locations worldwide'