## Monitoring Job Listings

The goal of this notebook is to show you how the api_crawler library works.

For this example, let's focus on monitoring job posts. We will gather data from the top three job boards. First, we need to import the necessary modules from the api_crawler library and create their instances.

While we are at it, let's also set a job title to monitor and create the API instances.

In [1]:
from api_crawler import LinkedInAPI, Glassdoor, IndeedAPI


job_role_to_monitor = 'Data Analyst'


linked_in_api = LinkedInAPI()

glassdoor_api = Glassdoor()

indeed_api = IndeedAPI()

Now, if we want to scrape the job listings, all we have to do is use the get_job_postings_date method of the appropriate object. And then, we'll get a list of all the job ads in the format we need.

Just as in the examples below:

In [2]:
linked_in_jobs_data = linked_in_api.get_job_postings_data(job_role_to_monitor)
linked_in_jobs_data[0]

{'title': 'Data Analyst, Lyft Media',
 'subtitle': 'Lyft',
 'location': 'New York, NY',
 'link': 'https://www.linkedin.com/jobs/view/data-analyst-lyft-media-at-lyft-3941227696?position=1&pageNum=0&refId=LcCyONwTMBZKZ53egbJo5w%3D%3D&trackingId=iavJSyRMx5e5dRtxyJmAuw%3D%3D&trk=public_jobs_jserp-result_search-card',
 'list_date': '2024-06-03'}

In [3]:
glassdoor_jobs_data = glassdoor_api.get_job_postings_data(job_role_to_monitor, close=False)
glassdoor_jobs_data[0]

{'company_name': 'Grammarly, Inc.',
 'title': 'People Consultant',
 'location': 'United States',
 'salary': '$169K\xa0(Employer est.)',
 'snippet': '401(k) and RRSP matching. Our People Partner team provides strategic business partnerships and coaching and develops people-related solutions to meet critical……\n\nSkills: Management\n      \n',
 'date': '2d',
 'link': 'https://www.glassdoor.com/job-listing/people-consultant-grammarly-inc-JV_KO0,17_KE18,31.htm?jl=1009312643258'}

In [4]:
indeed_jobs_data = indeed_api.get_job_postings_data(job_role_to_monitor, close=False)
indeed_jobs_data[0]

{'job_title': 'Data Analyst - Remote',
 'company_name': 'Hallmark Cards',
 'job_location': 'Remote in Missouri',
 'snippet': '\nWe focus on using data to understand performance trends and our consumers, leveraging a vast number of data sources and types.\n41 CFR 60-1.35(c).\n',
 'date': 'PostedToday',
 'link': 'https://www.indeed.com/rc/clk?jk=8e131462b2aa86b9&bb=1NNTJSvrikyTsSghr8ZdTU5UU3AEsv1AAD29SiT7AE4G5oEcyH4Q6IcDd4xxne-UQzMgV17dSqs-IEdvpS2mEahOAcLYToyyl9W6ZJQsXIB1Uw9FUu8-aTUQJRghFuuu&xkcb=SoBo67M3A802-CSKBx0LbzkdCdPP&fccid=f6b7f1c44b44197c&vjs=3'}

## Creating Your Own Data Lakes

Now, in addition to fetching data, the api_crawler library also stores the results in a data lake. You can find this data lake in the folder specified by the LAKES_BASE_DIR variable in the .env file.


The goal here is to help you keep all the data you gather and create your own databases of external data. Feel free to use this data to create or fine-tune your own AI models, monitor specific information outside your organization.

Furthermore, the JSON lakes provide detailed information about your request. This includes the time it was made, the arguments passed to it, and any other data that might be useful for you later. Once again, the goal is to help you build your own databases of external data and explore the data as you wish.


----


Now, on the example above, we've created jobs to monitor only one job listing.

But if you want to keep an eye on many listings at once, you can easily do so by using a list of job roles. As in the example below:

In [5]:
job_roles_to_monitor = ['Data Analyst', 'Data Scientist', 'Data Engineer']

glassdoor_api = Glassdoor()
indeed_api = IndeedAPI()

linked_in_jobs_data = []
glassdoor_jobs_data = []
indeed_jobs_data = []


for job_role in job_roles_to_monitor:
    linked_in_jobs_data.extend(linked_in_api.get_job_postings_data(job_role))
    glassdoor_jobs_data.extend(glassdoor_api.get_job_postings_data(job_role, close=False))
    indeed_jobs_data.extend(indeed_api.get_job_postings_data(job_role, close=False))

glassdoor_api.close()
indeed_api.close()

## Summarize With AI

One of the biggest uses for fetching and scraping data is to have AI summarize it and keep track of real-time events without wasting time.

And to do this with the api_crawler library, you'd simply need to add an extra step and have AI summarize the information. You can do this in two ways: either right after fetching the data or using the data stored in the data lakes.

And as you can see in the example below, the AI summarization step is probably the easiest. 

In [6]:
## Suggest skills to land the highest paying jobs

prompt = '''
Based on the job postings below, please suggest me the skills that are most important to land the highest paying jobs.

####

{data}

####

Begin!
'''


data = "##### \n\n".join([f"Title: {job['title']}; Snippet: {job['snippet']}" for job in glassdoor_jobs_data[:10]])

In [7]:
from openai import OpenAI
client = OpenAI(api_key='YOUR_OPENAI_API_KEY')


response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": prompt.format(data=data)}
  ]
)


ai_response = response.choices[0].message.content

print(f'AI response:\n\n {ai_response}')

AI response:

 Based on the job postings provided, the skills that appear to be most important for landing high-paying jobs are:

1. **Research Skills:** Research skills are mentioned in multiple job postings, such as quantitative research, veterinary talent research, and data analysis. Being proficient in research methodologies and tools like Stata, Boolean searches, and data analysis skills is valuable.

2. **Communication Skills:** Strong communication skills are crucial for roles like telemarketer, order filler, and accounts payable clerk. Being able to communicate effectively with customers, vendors, and team members is a highly sought-after skill.

3. **Microsoft Excel:** Proficiency in Microsoft Excel is mentioned in several job postings like business development representative, data entry coordinator, and part-time remote research assistant. Excel skills are often essential for data analysis, reporting, and organizational tasks.

4. **Customer Service:** Customer service skills

## Conclusion

In this notebook, we demonstrated how to use the `api_crawler` library to monitor job listings from top job boards like LinkedIn, Glassdoor, and Indeed. We covered the steps to set up API instances, fetch job postings, and store the data in a data lake for further analysis. Additionally, we explored how to use AI to summarize job data and extract valuable insights.

By leveraging these tools, you can efficiently track job market trends and make data-driven decisions to enhance your career or business strategies.

Please go ahead and explore other examples and experiment with the `api_crawler` library to unlock its full potential.