# Job Word Embeddings

here are some goals that I want to accomplish in this notebook
 - clean our granted patent dataset to only have utility work, and have only patent abstract, patent date, and patent id columns (in case we want to use g_cpc_current)
 - clean our kaggle dataset by removing stop words to get relevant words. Then, tokenize, and get word embeddings using gensim
 - clean our g_patent similarily
 - start to use bag of words and tf-idf to get similarities between the job postings and g_patents in nearby years (+/- 2 or 3 years)

## Imports and Reading in Files

In [1]:
# Here, we are going to load the generic g_patent and g_cpc_current DataFrames as well as our pandas, matplotlib etc
import pandas as pd
import gensim
import nltk
from nltk.corpus import stopwords 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
sns.set( style = 'white' )

# Here is a variable change in the width of the string for the columns. I am placing it here as it may need
# to be changed often, so I will move it right here!
pd.options.display.max_colwidth = 1000

# this is the english stop words list
eng_stp_wrds = stopwords.words('english')

In [2]:
# here is the granted patent dataset that we will use alongside the job posting abstracts
df_patent = pd.read_csv("../g_patent.tsv", delimiter='\t', dtype={'patent_id': str, 
                                                        'patent_type': str, 
                                                        'patent_title': str,
                                                        'patent_abstract': str,
                                                        'wipo_kind': str,
                                                        'num_claims': int,
                                                        'withdrawn': int,
                                                        'filename': str}, parse_dates=[2])
df_patent.drop(df_patent[df_patent['patent_type'] != 'utility'].index, inplace=True)
df_patent.drop(axis=1, columns=["patent_type", "patent_title", "wipo_kind","num_claims", "withdrawn", "filename"], inplace=True)
df_patent.head(20) 

Unnamed: 0,patent_id,patent_date,patent_abstract
0,10000000,2018-06-19,"A frequency modulated (coherent) laser detection and ranging system includes a read-out integrated circuit formed with a two-dimensional array of detector elements each including a photosensitive region receiving both return light reflected from a target and light from a local oscillator, and local processing circuitry sampling the output of the photosensitive region four times during each sample period clock cycle to obtain quadrature components. A data bus coupled to one or more outputs of each of the detector elements receives the quadrature components from each of the detector elements for each sample period and serializes the received quadrature components. A processor coupled to the data bus receives the serialized quadrature components and determines an amplitude and a phase for at least one interfering frequency corresponding to interference between the return light and the local oscillator light using the quadrature components."
1,10000001,2018-06-19,"The injection molding machine includes a fixed platen, a moveable platen moving forward and backward by a toggle link, a base plate supporting the toggle link, a driving part for mold clamping to operate the toggle link, a driving part for mold thickness adjustment to adjust a mold thickness, and a control unit to calculate a movement distance gap before a clamping process by controlling the driving part for mold thickness adjustment to move the base plate backward and then move the base plate forward to a target movement position based on a fold amount of the toggle link, and control the driving part for mold thickness adjustment using a value obtained by deducting the movement distance gap from the fold amount of the toggle link when producing a clamp force."
2,10000002,2018-06-19,"The present invention relates to: a method for manufacturing a polymer film, the method including a base film forming step for co-extruding a first resin containing a polyamide-based resin and a second resin containing a copolymer including polyamide-based segments and polyether-based segments; a co-extruded film including a base film including a first resin layer containing a polyamide-based resin, and a second resin layer containing a copolymer having polyamide-based segments and polyether-based segments; to a co-extruded film including a base film including a first resin layer and a second resin layer, which have different melting points; and to a method for manufacturing a polymer film, the method including a base film forming step including a step of co-extruding a first resin and a second resin, which have different melting points."
3,10000003,2018-06-19,"The invention relates to a method for producing a container (2) from a thermoplastic, having at least one surround (4), provided in the container wall (1), for a container opening. The surround (4) comprises a structure behind which parts of the container wall (1) extend and/or which is penetrated by said parts. The method is carried out using a multi-part blow mold that has at least two mold parts, each having at least one cavity, wherein the surround is placed as an insert in the cavity (10) of the blow mold (7). The method comprises pressing the preform that has been forced into the cavity (10) into the structure of the surround (4) by means of a tool which is brought to bear on the preform (12) on the side of the preform facing away from the cavity (10)."
4,10000004,2018-06-19,"The present invention relates to provides a double-oriented film, co-extrude, and of low thickness, with a layered composition that gives the property of being of high barrier to gases and manufactured by the process of co-extrusion of 3 bubbles, which gives the property of when being thermoformed, ensure the distribution of uniform thickness in the walls, base, folds, and corners of the formed tray saving a minimum of 50% of plastic without diminishing its gas barrier and its resistance to puncture."
5,10000005,2018-06-19,"A vacuum forming apparatus is provided that forms an article having a covering bonded to the surface of a substrate in a molding space using a first mold and a second mold. The vacuum forming apparatus is provided with clamps for grasping the covering between the first and second molds arranged at the open positions. The clamps are movable between an interfering position, at which the clamps are located in the movement ranges of the first and second molds, and standby positions, at which the clamps are outside the movement ranges. After the covering is heated, the clamps grasping the covering move to the standby positions and stretch the covering. The first and second molds move to the closed positions and the article is molded between the first and second molds so that the stretched covering and the substrate are bonded to each other."
6,10000006,2018-06-19,"A thermoforming mold device (1) providing a piece with a thin wall starting with a sheet of thermoplastic material is provided. At least one (3) of two parts of the mold (3, 3′) comprises at least one means (4) of local deformation of a sheet (2′) in the mold (3, 3′) in its closed state, the at least one means (4) comprises a piece of hollow molding with a peripheral edge, which can be connected selectively to a source of suction and can be displaced between a folded position, in which the molding piece is situated in close proximity with the wall of the thermoformed piece, and a deployed position, in which the molding piece is applied under pressure with its peripheral edge against the wall of the thermoformed piece upholding the other part of the mold."
7,10000007,2018-06-19,"An expanding tool comprising: an actuator comprising a cylindrical housing that defines an actuator housing cavity; a primary ram disposed within the actuator housing cavity, the primary ram defining an internal primary ram cavity; a secondary ram disposed within the internal primary ram cavity; a cam roller carrier coupled to a distal end of the secondary ram; a drive collar positioned within a distal end of the actuator housing cavity; a roller clutch disposed within an internal cavity defined by an inner surface of the drive collar; a shuttle cam positioned between the roller clutch and a distal end of the primary ram; an expander cone coupled to the primary ram; and an expander head operably coupled to the drive collar."
8,10000008,2018-06-19,"A decorated strip of coated, heat-shrinkable, plastic sheet material is placed in a spiral slot formed in a silicone rubber mold. The spiral slot is defined by a spiral wall having a uniform wall thickness. Upon heating in an oven, the material shrinks, forming a resiliently expansible arc-shaped band that can be worn as a bracelet or wristband."
9,10000009,2018-06-19,"In sterile, additive manufacturing wherein one lamella is successively built upon an underlying lamella until an object is completed, a sterile manufacturing environment is provided. A major chamber large enough to accommodate the manufactured object has sterile accordion pleated sidewalls and a sterile top closed with flap valves. A minor chamber for supporting the nozzles positioned above the major chamber has similar valves in corresponding positions. Nozzles for material deposition penetrate the pair of valves to block air and particles from entry into the major chamber where the nozzles make layer by layer deposition of the object using XY areawise nozzle motion relative to the object as well as Z nozzle vertical motion with the major chamber expanding as the object is formed."


In [3]:
df_patent['patent_abstract'].fillna("filled spot for NaN", inplace=True) # there was apparently some NaN spots in the abstract column

In [4]:
df_cpc = pd.read_table("../g_cpc_current.tsv", delimiter="\t", dtype={"patent_id": str,
                                                               "cpc_sequence": int,
                                                               "cpc_section": str,
                                                               "cpc_subclass": str,
                                                               "cpc_group": str,
                                                               "cpc_type": str,
                                                               "cpc_symbol_position": str})
df_cpc.head(20) # this may not be used within this specific notebook, but it is here loaded in case we want to check the type of patents

Unnamed: 0,patent_id,cpc_sequence,cpc_section,cpc_class,cpc_subclass,cpc_group,cpc_type
0,5664589,0,A,A45,A45D,A45D2/20,inventional
1,10439720,3,G,G02,G02B,G02B6/3652,inventional
2,9840937,2,F,F01,F01D,F01D25/30,inventional
3,7407213,1,B,B62,B62D,B62D33/03,inventional
4,11014297,4,B,B29,B29C,B29C64/245,inventional
5,6959012,0,H,H04,H04J,H04J3/0685,inventional
6,6725745,0,H,H01,H01H,H01H85/0208,inventional
7,8625669,7,H,H04,H04N,H04N19/13,inventional
8,11011577,16,H,H10,H10B,H10B63/20,inventional
9,5087721,1,A,A61,A61K,A61K31/685,inventional


In [5]:
online_df = pd.read_csv('../onlinejobpostings.csv') # here we have the online job posting dataset that was on kaggle
online_df['date'] = pd.to_datetime(online_df['date'], errors = 'coerce')
online_df.drop(axis=1, inplace=True, columns=online_df.columns[4:])
online_df

Unnamed: 0,jobpost,date,Title,Company
0,"AMERIA Investment Consulting Company\r\nJOB TITLE: Chief Financial Officer\r\nPOSITION LOCATION: Yerevan, Armenia\r\nJOB DESCRIPTION: AMERIA Investment Consulting Company is seeking a\r\nChief Financial Officer. This position manages the company's fiscal and\r\nadministrative functions, provides highly responsible and technically\r\ncomplex staff assistance to the Executive Director. The work performed\r\nrequires a high level of technical proficiency in financial management\r\nand investment management, as well as management, supervisory, and\r\nadministrative skills.\r\nJOB RESPONSIBILITIES: \r\n- Supervises financial management and administrative staff, including\r\nassigning responsibilities, reviewing employees' work processes and\r\nproducts, counseling employees, giving performance evaluations, and\r\nrecommending disciplinary action;\r\n- Serves as member of management team participating in both strategic\r\nand operational planning for the company;\r\n- Directs and ove...",2004-01-05,Chief Financial Officer,AMERIA Investment Consulting Company
1,"International Research & Exchanges Board (IREX)\r\nTITLE: Full-time Community Connections Intern (paid internship)\r\nDURATION: 3 months\r\nLOCATION: IREX Armenia Main Office; Yerevan, Armenia \r\nDESCRIPTION: IREX currently seeks to fill the position of a paid\r\nIntern for the Community Connections (CC) Program. The position is based\r\nin the Yerevan office however applicants must be willing to travel\r\nthroughout Armenia as necessary. This position reports directly to the\r\nCC Program Manager.\r\nRESPONSIBILITIES: \r\n- Presenting the CC program to interested parties; \r\n- Assisting in planning and scheduling of programmatic meetings and\r\nevents (this includes coordinating logistics for CC staff, visitors and\r\nparticipants);\r\n- Assisting the Program Staff;\r\n- Translation/Interpretation from Armenian to English and vice versa;\r\n- Helping create, maintain and update the CC filing system and\r\ndatabases;\r\n- Completing general administrative tasks for the CC...",2004-01-07,Full-time Community Connections Intern (paid internship),International Research & Exchanges Board (IREX)
2,"Caucasus Environmental NGO Network (CENN)\r\nJOB TITLE: Country Coordinator\r\nPOSITION DURATION: Renewable annual contract\r\nPOSITION LOCATION: Yerevan, Armenia\r\nJOB DESCRIPTION: Public outreach and strengthening of a growing\r\nnetwork of environmental NGOs, businesses, international organizations\r\nand public agencies. Will serve as primary contact between CENN and\r\npublic. This is a full-time position.\r\nJOB RESPONSIBILITIES: \r\n- Working with the Country Director to provide environmental information\r\nto the general public via regular electronic communications and serving\r\nas the primary local contact to Armenian NGOs and businesses and the\r\nArmenian offices of international organizations and agencies;\r\n- Helping to organize and prepare CENN seminars/ workshops;\r\n- Participating in defining the strategy and policy of CENN in Armenia,\r\nthe Caucasus region and abroad.\r\nREQUIRED QUALIFICATIONS: \r\n- Degree in environmentally related field, or 5 years ...",2004-01-07,Country Coordinator,Caucasus Environmental NGO Network (CENN)
3,"Manoff Group\r\nJOB TITLE: BCC Specialist\r\nPOSITION LOCATION: Manila, Philippines\r\nJOB DESCRIPTION: The LEAD (Local Enhancement and Development for\r\nHealth) BCC Specialist will apply state-of-the-art approaches in working\r\nwith LGUs (Local Government Units) and NGOs to help them to identify and\r\naddress provider-caused barriers to service provision as well as to\r\nidentify and address supports for good service delivery by developing\r\ntools that may be adapted to each LGU's needs. S/he will work with LEAD\r\nstaff across all components to support quality service delivery and will\r\nalso monitor implementation of improved service delivery in LGUs, and\r\nwill provide additional assistance to LGUs and NGOs, as needed. S/he\r\nwill collect all relevant published and grey literature documents,\r\nidentify gaps in knowledge, and work with NGOs and consultants to fill\r\nin the gaps. S/he will establish training for NGOs and LGU\r\nadministration staff pursuing service enh...",2004-01-07,BCC Specialist,Manoff Group
4,"Yerevan Brandy Company\r\nJOB TITLE: Software Developer\r\nPOSITION LOCATION: Yerevan, Armenia\r\nJOB RESPONSIBILITIES: \r\n- Rendering technical assistance to Database Management Systems;\r\n- Realization of SQL servers maintenance activities: back-up and\r\nreplication;\r\n- Participation in designing of software development projects.\r\nREQUIRED QUALIFICATIONS: \r\n- University degree; economical background is a plus;\r\n- Excellent knowledge of Windows 2000 Server, Networking TCP/ IP\r\ntechnologies, MS SQL 2000 Server, Visual Basic 6;\r\n- At least 2 years of experience in database software development;\r\n- Good knowledge of English.\r\nREMUNERATION: Will be commensurate with the norms accepted in the\r\nCompany.\r\nAPPLICATION PROCEDURES: Successful candidates should submit\r\n- CV; \r\n- 2 relevant Recommendation Letters (from previous employers);\r\n- Copy (-ies) of Diploma (-s) and relevant certificates (if available);\r\n- 1 color photo (3x4)\r\neither to: 2 Isakov ...",2004-01-10,Software Developer,Yerevan Brandy Company
...,...,...,...,...
18996,"Technolinguistics NGO\r\n\r\n\r\nTITLE: Senior Creative UX/ UI Designer\r\n\r\n\r\nTERM: Full-time\r\n\r\n\r\nDURATION: Long-term\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: A tech startup of Technolinguistics based in New York\r\nis seeking to add a Senior Creative UX/ UI Designer for its platform\r\ndevelopment team in Yerevan. Technolinguistics is looking for a driven\r\nself-starter, detail-oriented designer who is eager to help the company\r\nachieve its mission.\r\nThe incumbent should love working with a small and global team guided by\r\nwell-defined iterative process to design great user experiences. He/ she\r\nwill directly work with the founders and the advisory team to understand\r\nand own the vision of the company. The incumbent's designs will be used\r\nby the business team for strategy and product meetings, and by the\r\ndevelopment team to illustrate the platform requirements.\r\n\r\n\r\nJOB RESPONSIBILITIES:\r\n- Work closely with produc...",2015-12-28,Senior Creative UX/ UI Designer,Technolinguistics NGO
18997,"""Coca-Cola Hellenic Bottling Company Armenia"" CJSC\r\n\r\n\r\nTITLE: Category Development Manager\r\n\r\n\r\nTERM: Full-time\r\n\r\n\r\nOPEN TO/ ELIGIBILITY CRITERIA: All interested professionals.\r\n\r\n\r\nSTART DATE/ TIME: ASAP\r\n\r\n\r\nDURATION: Long-term with a probation period of 3 months.\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: N/A\r\n\r\n\r\nJOB RESPONSIBILITIES:\r\n- Establish and manage Category Management development and shopper\r\ntoolkit generation;\r\n- Support country sales, key account and marketing teams to deliver\r\nBeverage World and Key Category Projects to customers;\r\n- Develop the joint Business Plan approach with customers in line with\r\n18-month category led brand plans with routines;\r\n- Support category leadership activities working with the stakeholders\r\nand the Commercial team to deliver effective plans;\r\n- Ensure a category driven commercial capability approach is in place\r\nacross the commercial teams.\r\n\r...",2015-12-30,Category Development Manager,"""Coca-Cola Hellenic Bottling Company Armenia"" CJSC"
18998,"""Coca-Cola Hellenic Bottling Company Armenia"" CJSC\r\n\r\n\r\nTITLE: Operational Marketing Manager\r\n\r\n\r\nTERM: Full-time\r\n\r\n\r\nOPEN TO/ ELIGIBILITY CRITERIA: All interested professionals.\r\n\r\n\r\nSTART DATE/ TIME: ASAP\r\n\r\n\r\nDURATION: Long-term with a probation period of 3 months.\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: N/A\r\n\r\n\r\nJOB RESPONSIBILITIES: \r\n- Develop, establish and maintain marketing strategies to meet\r\norganizational objectives;\r\n- Develop the annual marketing plan in conjunction with the sales\r\ndepartment, which details activities to follow during a fiscal year, and\r\nwhich will focus on meeting organizational objectives;\r\n- Manage the entire product line life cycle from strategic planning to\r\ntactical activities;\r\n- Develop and implement a company-wide go-to-market plan, working with\r\nall related departments to execute;\r\n- Manage and coordinate all marketing, advertising and promotional staf...",2015-12-30,Operational Marketing Manager,"""Coca-Cola Hellenic Bottling Company Armenia"" CJSC"
18999,"San Lazzaro LLC\r\n\r\n\r\nTITLE: Head of Online Sales Department\r\n\r\n\r\nDURATION: Long-term\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: San Lazzaro LLC is looking for a well-experienced\r\nindividual to work as a Head of Online Sales Department and to lead the\r\nteam of the startup project of a new online store.\r\n\r\n\r\nJOB RESPONSIBILITIES: \r\n- Handle the project activites of the online store from the start;\r\n- Make sure that the online store continuously runs smoothly;\r\n- Manage the right way of posting/ displaying goods;\r\n- Responsible for updating the database;\r\n- Keep in touch with all the departments of the company as well as the\r\nsuppliers and manufacturers of the products;\r\n- Keep the database of the inventory.\r\n\r\n\r\nREQUIRED QUALIFICATIONS:\r\n- At least 1 year of experience in online sales management in retail;\r\n- Excellent knowledge of the English language (both oral and written);\r\n- Advanced knowledge of MS Ex...",2015-12-30,Head of Online Sales Department,San Lazzaro LLC


## Word Embedding Code

In [6]:
# We are going to be using sklearn to do the cosine similarity comparisons using the tfidf method
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
vectorizer = TfidfVectorizer()

In [8]:
remote_jobs = pd.concat([online_df.loc[online_df['jobpost'].str.contains("remote", case=True)],
                        online_df.loc[online_df['jobpost'].str.contains("WFH", case=True)],
                        online_df.loc[online_df['jobpost'].str.contains("work from home", case=True)],
                        online_df.loc[online_df['jobpost'].str.contains("mobile work", case=True)],
                        online_df.loc[online_df['jobpost'].str.contains("virtual", case=True)],
                        online_df.loc[online_df['jobpost'].str.contains("online meeting", case=True)],
                        online_df.loc[online_df['jobpost'].str.contains("distributed work", case=True)]], ignore_index=True) # we are getting remote job online posts only
remote_jobs

Unnamed: 0,jobpost,date,Title,Company
0,"World Vision Armenia\r\nJOB TITLE: Project Assistant\r\nPOSITION LOCATION: Yerevan, Armenia\r\nJOB DESCRIPTION: World Vision Armenia announces a full-time position\r\nfor Project Assistant for the implementation of a Mobile Medical Teams\r\nand Primary Health care project. The position is based in World Vision\r\nArmenia' National office, Yerevan with extensive countrywide travel.\r\nCandidates must be flexible team players willing to travel extensively\r\nto field locations. \r\nThe Project Assistant will support the Yerevan based MMT staff with\r\nmiscellaneous administrative and project implementation duties.\r\nJOB RESPONSIBILITIES: \r\n- Provide daily administrative and technical support to the MMT Program\r\ncoordinator and Health Program Manager in implementation of the MMT\r\nProgram Activities in the sites;\r\n- Provide minor procurement, registration of drugs and other medical\r\nsupplies, customs clearance and additional support to field staff as\r\nrequired;\r\n...",2004-02-22,Project Assistant,World Vision Armenia
1,"World Vision Armenia\r\nJOB TITLE: MMT Project Manager\r\nPOSITION LOCATION: Yerevan, Armenia\r\nJOB DESCRIPTION: World Vision Armenia announces a full-time position\r\nfor MMT Project Manager for the implementation of a Mobile Medical Teams\r\nand Primary Health care project. The position is based in World Vision\r\nArmenia' National office, Yerevan with extensive countrywide travel.\r\nCandidates must be flexible team players willing to travel extensively\r\nto field locations. \r\nMMT project Manager will lead and work with other members of the MMT\r\nteam. This position is responsible for immediate Management and\r\noversight of program implementation, monitoring and evaluation,\r\nreporting.\r\nJOB RESPONSIBILITIES: As a senior member of the MMT and reporting to\r\nthe Health Program Manager, the MMT Project Manager will manage a team\r\nof three people (two health coordinators and an assistant) for the first\r\nyear of program implementation that will gradually expand...",2004-02-22,MMT Project Manager,World Vision Armenia
2,"World Vision Armenia\r\nJOB TITLE: Health Coordinators (two positions are open)\r\nJOB DESCRIPTION: World Vision Armenia announces full-time positions\r\nfor Health Coordinators for the implementation of a Mobile Medical Teams\r\nand Primary Health care project. The positions are based in World Vision\r\nArmenia' National office, Yerevan with extensive countrywide travel.\r\nCandidates must be flexible team players willing to travel extensively\r\nto field locations. \r\nMMT Health Coordinators will be responsible for direct coordination,\r\nsupervision and technical monitoring of the program success and\r\nconstrains in Lori and Gegharkunik. \r\nJOB RESPONSIBILITIES: As part of MMT team, each Health Coordinator\r\nwill work collaboratively with sites they are responsible for and local\r\npartners and will report to the MMT Project Manager. The essential\r\nresponsibilities include:\r\n- Coordinate the obtaining and/or development/ adaptation of MMT related\r\nguides and prot...",2004-02-22,Health Coordinators (two positions are open),World Vision Armenia
3,"World Vision Armenia\r\nTITLE: Health Coordinator\r\nTERM: Full-time\r\nSTART DATE/ TIME: This position starts in May 2004\r\nLOCATION: World Vision Armenia' National office, Yerevan with extensive\r\ncountrywide travel. Candidate must be flexible team player willing to\r\ntravel extensively to field locations\r\nJOB DESCRIPTION: WORLD VISION ARMENIA announces full-time position of\r\nHealth Coordinator for the implementation of a Mobile Medical Teams and\r\nPrimary Health care project.\r\nMMT Health Coordinator will be responsible for direct coordination,\r\nsupervision and technical monitoring of the program success and\r\nconstrains in Lori.\r\nJOB RESPONSIBILITIES:\r\n- Coordinate the obtaining and/or development/adaptation of MMT related\r\nguides and protocols during the start-up phase. \r\n- Developing, pre-testing and applying new training materials strategies\r\nand plans for increasing and promoting overall program effectiveness and\r\nefficiency\r\n- Support the MMT...",2004-04-23,Health Coordinator,World Vision Armenia
4,"World Vision\r\nTITLE: Project Manager, Sustainable Livelihoods Program\r\nOPEN TO/ ELIGIBILITY CRITERIA: Expatriates\r\nSTART DATE/ TIME: Estimated start date of employment 2nd Quarter 2004\r\nDURATION: 36 months\r\nLOCATION: Tavush Province, Armenia\r\nJOB DESCRIPTION: REPORTS TO: Operations Director - with close\r\ncollaboration with Tavush ADP Manager\r\nGRADE LEVEL: 12\r\nThe purpose of this position is to facilitate the efficient and\r\neffective implementation of the project entitled ""Building Sustainable\r\nRural Livelihoods In Tavush Region - Armenia - building on, integrated\r\nin, and expanding WV Armenia's long-term development activities in\r\nArmenia.\r\nJOB RESPONSIBILITIES: All tasks and responsibilities to be carried out\r\nin close co-ordination with the Operations Director of WV Armenia, the\r\nManager of the Tavush ADP, and relevant support teams in WV Armenia's\r\nNational Office.\r\n- Arrange for a structured project start including office establishment...",NaT,"Project Manager, Sustainable Livelihoods Program",World Vision
...,...,...,...,...
439,"SFL LLC\r\n\r\n\r\nTITLE: Senior System Administrator\r\n\r\n\r\nANNOUNCEMENT CODE: 12100\r\n\r\n\r\nTERM: Full-time\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: SFL LLC is looking for a top-notch, talented, driven\r\nSenior System Administrator.\r\n\r\n\r\nJOB RESPONSIBILITIES:\r\n- Install and configure Windows/ Linux based servers;\r\n- Upgrade and configure the system software that supports the clients'\r\ninfrastructure applications;\r\n- Maintain operational, configuration and other procedures;\r\n- Troubleshoot all OS and server related issues.\r\n\r\n\r\nREQUIRED QUALIFICATIONS:\r\n- University degree in Computer Science or a related field;\r\n- At least 4 years of work experience with Windows servers;\r\n- At least 3 years of work experience in virtual infrastructure (Hyper-V\r\nand Vmware ESXi);\r\n- Strong knowledge of DNS, Active Directory and Group Policy;\r\n- Excellent knowledge of TCP/ IP protocol, firewalls, and network\r\nsecurity in gene...",2015-11-17,Senior System Administrator,SFL LLC
440,"Sourcio CJSC\r\n\r\n\r\nTITLE: Social Media Specialist\r\n\r\n\r\nTERM: Full-time\r\n\r\n\r\nSTART DATE/ TIME: ASAP\r\n\r\n\r\nDURATION: Long-term\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: Sourcio is looking for a driven Social Media Specialist\r\nto enlarge targeted virtual communities, interact with network users and\r\ngrow company visibility for its core product Eye Care Plus.\r\n\r\n\r\nJOB RESPONSIBILITIES:\r\n- Build and execute a social media strategy through competitive research,\r\nplatform determination, messaging and audience identification; \r\n- Generate, edit, publish and share the daily optimized content (original\r\ntexts, articles, images, video or HTML) to establish connections and\r\nencourage community members to take action;\r\n- Set up and optimize company pages across different social outlets\r\n(Facebook, Twitter, LinkedIn, Pinterest) to increase the visibility of\r\nthe company's social content;\r\n- Moderate all user-generate...",2015-11-26,Social Media Specialist,Sourcio CJSC
441,"Orange Armenia CJSC\r\n\r\n\r\nTITLE: Senior System Engineer\r\n\r\n\r\nSTART DATE/ TIME: ASAP\r\n\r\n\r\nDURATION: Permanent\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: The Senior System Engineer will build out, maintain and\r\ntroubleshoot the IT production infrastructure based on Windows and UNIX/\r\nLinux systems.\r\n\r\n\r\nJOB RESPONSIBILITIES:\r\n- Manage and monitor all installed systems and the infrastructure;\r\n- Install, configure, test and maintain operating systems, application\r\nsoftware and system management tools;\r\n- Install, configure, test and maintain SAN infrastructure (FC switches,\r\ndisk arrays and tape libraries);\r\n- Proactively ensure the highest level of systems and infrastructure\r\navailability;\r\n- Monitor and test system performance for potential bottlenecks; identify\r\npossible solutions and work with vendors to implement those fixes;\r\n- Maintain security, backup and redundancy strategies;\r\n- Write and maintain c...",2015-12-02,Senior System Engineer,Orange Armenia CJSC
442,"Dasaran.am\r\n\r\n\r\nTITLE: Senior Web Developer\r\n\r\n\r\nTERM: Full-time\r\n\r\n\r\nOPEN TO/ ELIGIBILITY CRITERIA: All interested candidates.\r\n\r\n\r\nSTART DATE/ TIME: ASAP\r\n\r\n\r\nLOCATION: Yerevan, Armenia\r\n\r\n\r\nJOB DESCRIPTION: Dasaran.am is looking for a Senior Web Developer who is\r\nmotivated to work in a fast-paced environment and apply modern\r\nprogramming practices for the best user experiences. The responsibilities\r\nwill include translation of UI/ UX design wireframes to an actual code\r\nthat will produce visual elements of the application. The ideal candidate\r\nwill work closely with the UI/ UX Designer(s) and ensure technical\r\nimplementation taking an active role in defining how the application\r\nworks.\r\n\r\n\r\nJOB RESPONSIBILITIES: \r\n- Write a well designed, testable, efficient code by using the best\r\nsoftware development practices;\r\n- Ensure the technical feasibility of UI/ UX designs;\r\n- Integrate JavaScript with the front-end...",2015-12-22,Senior Web Developer,Dasaran.am


In [12]:
cos_count = { "count":  [0] * 12}
cos_count = pd.DataFrame(cos_count)
cos_count.index = range(2004, 2016, 1)
cos_count # we are making a dataframe to be able to display and plot our results

Unnamed: 0,count
2004,0
2005,0
2006,0
2007,0
2008,0
2009,0
2010,0
2011,0
2012,0
2013,0


In [15]:
for year in range(2004, 2016):
    year_df = remote_jobs.loc[remote_jobs['date'].dt.year == year]
    print(year_df['jobpost'])        

0      World Vision Armenia\r\nJOB TITLE:   Project Assistant\r\nPOSITION LOCATION:   Yerevan, Armenia\r\nJOB DESCRIPTION:   World Vision Armenia announces a full-time position\r\nfor Project Assistant for the implementation of a Mobile Medical Teams\r\nand Primary Health care project. The position is based in World Vision\r\nArmenia' National office, Yerevan with extensive countrywide travel.\r\nCandidates must be flexible team players willing to travel extensively\r\nto field locations. \r\nThe Project Assistant will support the Yerevan based MMT staff with\r\nmiscellaneous administrative and project implementation duties.\r\nJOB RESPONSIBILITIES:   \r\n- Provide daily administrative and technical support to the MMT Program\r\ncoordinator and Health Program Manager in implementation of the MMT\r\nProgram Activities in the sites;\r\n- Provide minor procurement, registration of drugs and other medical\r\nsupplies, customs clearance and additional support to field staff as\r\nrequired;\

In [29]:
for i in range(len(cos_sim)):
    if not cos_sim[i]: continue
    year = int(remote_jobs['date'].loc[i].year)
    cos_count.loc[year, 'count'] += len(cos_sim[i])
cos_count # here we are updating the column "count" to get the count of number of patents our online jobs had a cosine similarity of at least above 0.6

ValueError: cannot convert float NaN to integer

In [23]:
cos_sim = []
for row in range(len(remote_jobs)):
    print("We are in the" + str(row) + "row")
    group = []
    curr = remote_jobs.loc[row]
    curr_patent_df = df_patent[df_patent['patent_date'].dt.year.eq(curr.date.year)]
    if len(curr_patent_df) == 0: continue
    for p_row in range(curr_patent_df.index[0], curr_patent_df.index[-1], 20):
        doc1 = curr['jobpost']
        doc2 = curr_patent_df['patent_abstract'].loc[p_row]
        
        docs = [doc1, doc2]
        tfidf_vector = vectorizer.fit_transform(docs)
        similarities = cosine_similarity(tfidf_vector)
        if similarities[0][1] > 0.6:
            group.append(p_row)
            
    cos_sim.append(group)

We are in the0row
We are in the1row
We are in the2row
We are in the3row
We are in the4row
We are in the5row
We are in the6row
We are in the7row
We are in the8row
We are in the9row
We are in the10row
We are in the11row
We are in the12row
We are in the13row
We are in the14row
We are in the15row
We are in the16row
We are in the17row
We are in the18row
We are in the19row
We are in the20row
We are in the21row
We are in the22row
We are in the23row
We are in the24row
We are in the25row
We are in the26row
We are in the27row
We are in the28row
We are in the29row
We are in the30row
We are in the31row
We are in the32row
We are in the33row
We are in the34row
We are in the35row
We are in the36row
We are in the37row
We are in the38row
We are in the39row
We are in the40row
We are in the41row
We are in the42row
We are in the43row
We are in the44row
We are in the45row
We are in the46row
We are in the47row
We are in the48row
We are in the49row
We are in the50row
We are in the51row
We are in the52row
We 

In [None]:
ax = cos_count.plot(lw=2, colormap='jet', marker='.', markersize=10, title='Online')
ax.set(xlabel="Year", ylabel="Patents Per 10k /Unemployment percentage")
ax