# Objetivo

Encontrar, capturar e imprimir de maneira personalizada informações sobre ofertas de trabalho com Python como requisito que foram postadas recentemente no site timesjobs.com

# Código

In [1]:
from bs4 import BeautifulSoup
import requests #biblioteca usada para solicitar informações de websites

In [2]:
html_text = requests.get('https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&searchTextSrc=&searchTextText=&txtKeywords=python&txtLocation=')
html_text

<Response [200]>

O método "get" solicitou e importou todo o texto de html da URL passada como argumento. Mas o valor retornado foi "<Response [200]>", código que indica que a solicitação foi bem sucedida. Para termos o retorno do texto HTML importado, precisamos especificar o atributo "text" do retorno de "get":

In [3]:
html_text = requests.get('https://www.timesjobs.com/candidate/job-search.html?searchType=personalizedSearch&from=submit&searchTextSrc=&searchTextText=&txtKeywords=python&txtLocation=').text

In [6]:
soup = BeautifulSoup(html_text, 'lxml')
#cria uma instancia de Beautiful Soup com o conteudo de html_text e o parser lxml

In [9]:
job = soup.find('li', class_ = 'clearfix job-bx wht-shd-bx')
#encontra o primeiro elemento de tag "li" da classe com o nome igual ao da string passada no 2° argumento
#"li" significa "list item"
#é um item de uma lista não ordenada definidade pela tag "ul" (unordered list)

In [12]:
company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')
#retorna o primeiro elemento h3 de job da classe especificada
#captura o conteudo de texto e liga à variável company_name
#substitui os espaços em brancos (tem alguns a mais do que precisa) por nada
#neste caso é o nome da empresa ofertando a vaga
print(company_name)


INFINITYGROUP




In [18]:
skills = job.find('span', class_ = 'srp-skills').text.replace(' ', '')
#repete o mesmo processo, dessa vez capturando das KeySkills requeridas na vaga
print(skills)


python,css,django,html,bootstrap




In [19]:
print(f'''
Company name: {company_name}
Skills: {skills}
''')


Company name: 
INFINITYGROUP


Skills: 
python,css,django,html,bootstrap





In [30]:
jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
for job in jobs: #itera por todas as vagas presentes na pagina
    published_date = job.find('span', class_ = 'sim-posted').span.text
    #captura o texto na tag contendo a data de publicação da vaga
    if 'few' in published_date: #só entra no bloco se for uma vaga publicada recentemente
        #repete todo o codigo anterior
        company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')
        skills = job.find('span', class_ = 'srp-skills').text.replace(' ', '')
        print(f'''
        Company name: {company_name}
        Skills: {skills}
        ''')


        Company name: 
WingGlobalITServices


        Skills: 

springboot,python,java,django,jpa,hibernate


        

        Company name: 
eastindiasecuritiesltd.


        Skills: 
python,hadoop,machinelearning


        

        Company name: 
RootInfoSolutions


        Skills: 
python2,python3,django,javascript,html,postgresql/mysql,restapi,celeryrabbitmq,redis.,css,oop


        

        Company name: 
arttechnologyandsoftwareindiapvtltd


        Skills: 

rest,python,database,django,api


        

        Company name: 
destinyhrgroupservices


        Skills: 

fundamentals,python,css,javascript,jquery,database,django,java,json,html


        

        Company name: 
day1technologies


        Skills: 

rest,python,django,git,postgresql,sql,docker


        

        Company name: 
DREAMAJAXTECHNOLOGIES


        Skills: 
python,django,api,sql,nosql


        

        Company name: 
xoniertechnologiespvtltd


        Skills: 
python,django,testingtools,debugging,storag

## Refinando o algoritmo

In [44]:
unfamiliar_skill = input("Put some skill that you're unfamiliar with: ")
print(f"Filtering out {unfamiliar_skill}")
print("")

jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
for job in jobs: #itera por todas as vagas presentes na pagina
    published_date = job.find('span', class_ = 'sim-posted').span.text
    #captura o texto na tag contendo a data de publicação da vaga
    skills = job.find('span', class_ = 'srp-skills').text.replace(' ', '')
    if 'few' in published_date and unfamiliar_skill not in skills:
        #só entra no bloco se for uma vaga publicada recentemente sem a unfamiliar_skill como pre-requisito
        #repete todo o codigo anterior
        job_url = job.a['href']
        #captura a url da pagina da vaga 
        #uma alternativa seria job.a.get('href')
        company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')
        print(f"Company name: {company_name.strip()}")
        #metodo strip remove todos os espaços vazios da string
        print(f"Skills: {skills.strip()}")
        print(f"More information: {job_url}")
        print("") #imprime string vazia para adicionar mais espaçamento entre as vagas

Put some skill that you're unfamiliar with:  django


Filtering out django

Company name: eastindiasecuritiesltd.
Skills: python,hadoop,machinelearning
More information: https://www.timesjobs.com/job-detail/python-engineer-east-india-securities-ltd-kolkata-2-to-5-yrs-jobid-KEkE19WqPbFzpSvf__PLUS__uAgZw==&source=srp

Company name: NTTDataVertexSoftwareInc.
Skills: pythonprogramming,nodejs,agilemethodologies,communication,softwaredevelopment,technologyconsulting
More information: https://www.timesjobs.com/job-detail/python-developer-ntt-data-vertex-software-inc-hyderabad-secunderabad-3-to-9-yrs-jobid-xI6N9bREwgpzpSvf__PLUS__uAgZw==&source=srp

Company name: NTTDataVertexSoftwareInc.
Skills: pythonprogramming,nodejs,agilemethodologies,communicationskills,softwaredevelopment,technologyconsulting
More information: https://www.timesjobs.com/job-detail/python-developer-ntt-data-vertex-software-inc-hyderabad-secunderabad-3-to-6-yrs-jobid-oImR6lbVV99zpSvf__PLUS__uAgZw==&source=srp

Company name: magnarustechnologiesprivatelimited
Skills: imageproc

### Versão com um looping para filtrar mais unfamiliar skills:

In [56]:
unfamiliar_skills = []
unfamiliar_skill = ''

#recebe as skills de input do usuario e adiciona ao fim da lista até que o input seja Q
while 1:
    unfamiliar_skill = input("Put some skill that you're unfamiliar with (enter Q when done): ")
    if unfamiliar_skill == 'Q':
        break
    unfamiliar_skills.append(unfamiliar_skill)

#imprime as skills a serem filtradas
print("Filtering out ", end="")
for i in range(0, len(unfamiliar_skills)):
    print(unfamiliar_skills[i], end="")
    if i < len(unfamiliar_skills) - 1:
        print(", ", end="")
print("")
print("")


jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
for job in jobs: 
    published_date = job.find('span', class_ = 'sim-posted').span.text
    skills = job.find('span', class_ = 'srp-skills').text.replace(' ', '')
    if 'few' in published_date:
        has_unfamiliar_skill = 0
        for skill in unfamiliar_skills:
            if skill in skills:
                has_unfamiliar_skill = 1
                break
        if has_unfamiliar_skill:
            continue
        job_url = job.a['href']
        company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')
        print(f"Company name: {company_name.strip()}")
        print(f"Skills: {skills.strip()}")
        print(f"More information: {job_url}")
        print("")

Put some skill that you're unfamiliar with (enter Q when done):  django
Put some skill that you're unfamiliar with (enter Q when done):  linux
Put some skill that you're unfamiliar with (enter Q when done):  Q


Filtering out django, linux

Company name: eastindiasecuritiesltd.
Skills: python,hadoop,machinelearning
More information: https://www.timesjobs.com/job-detail/python-engineer-east-india-securities-ltd-kolkata-2-to-5-yrs-jobid-KEkE19WqPbFzpSvf__PLUS__uAgZw==&source=srp

Company name: NTTDataVertexSoftwareInc.
Skills: pythonprogramming,nodejs,agilemethodologies,communication,softwaredevelopment,technologyconsulting
More information: https://www.timesjobs.com/job-detail/python-developer-ntt-data-vertex-software-inc-hyderabad-secunderabad-3-to-9-yrs-jobid-xI6N9bREwgpzpSvf__PLUS__uAgZw==&source=srp

Company name: NTTDataVertexSoftwareInc.
Skills: pythonprogramming,nodejs,agilemethodologies,communicationskills,softwaredevelopment,technologyconsulting
More information: https://www.timesjobs.com/job-detail/python-developer-ntt-data-vertex-software-inc-hyderabad-secunderabad-3-to-6-yrs-jobid-oImR6lbVV99zpSvf__PLUS__uAgZw==&source=srp

Company name: magnarustechnologiesprivatelimited
Skills: im

## Criando um script de execução por período de tempo

In [None]:
#coloca o algoritmo numa função
def find_jobs():
    unfamiliar_skills = []
    unfamiliar_skill = ''
    
    #recebe as skills de input do usuario e adiciona ao fim da lista até que o input seja Q
    while 1:
        unfamiliar_skill = input("Put some skill that you're unfamiliar with (enter Q when done): ")
        if unfamiliar_skill == 'Q':
            break
        unfamiliar_skills.append(unfamiliar_skill)

    #imprime as skills a serem filtradas
    print("Filtering out ", end="")
    for i in range(0, len(unfamiliar_skills)):
        print(unfamiliar_skills[i], end="")
        if i < len(unfamiliar_skills) - 1:
            print(", ", end="")
    print("")
    print("")
    
    
    jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
    for job in jobs: 
        published_date = job.find('span', class_ = 'sim-posted').span.text
        skills = job.find('span', class_ = 'srp-skills').text.replace(' ', '')
        if 'few' in published_date:
            has_unfamiliar_skill = 0
            for skill in unfamiliar_skills:
                if skill in skills:
                    has_unfamiliar_skill = 1
                    break
            if has_unfamiliar_skill:
                continue
            job_url = job.a['href']
            company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')
            print(f"Company name: {company_name.strip()}")
            print(f"Skills: {skills.strip()}")
            print(f"More information: {job_url}")
            print("")

In [67]:
import time
if __name__ == '__main__': #caso o arquivo esteja sendo executado diretamente
    while True: #loop que não para
        find_jobs() #executa a função
        time_wait = 10 #tempo de espera de 10 min
        print(f'Waiting {time_wait} minutes')
        time.sleep(time_wait * 60) #trava o loop por 600 segundos

Put some skill that you're unfamiliar with (enter Q when done):  linux
Put some skill that you're unfamiliar with (enter Q when done):  Q


Filtering out linux

Company name: WingGlobalITServices
Skills: springboot,python,java,django,jpa,hibernate
More information: https://www.timesjobs.com/job-detail/python-developer-wing-global-it-services-panchkula-2-to-5-yrs-jobid-__PLUS__fePw2G3HHZzpSvf__PLUS__uAgZw==&source=srp

Company name: eastindiasecuritiesltd.
Skills: python,hadoop,machinelearning
More information: https://www.timesjobs.com/job-detail/python-engineer-east-india-securities-ltd-kolkata-2-to-5-yrs-jobid-KEkE19WqPbFzpSvf__PLUS__uAgZw==&source=srp

Company name: RootInfoSolutions
Skills: python2,python3,django,javascript,html,postgresql/mysql,restapi,celeryrabbitmq,redis.,css,oop
More information: https://www.timesjobs.com/job-detail/python-developer-root-info-solutions-delhi-delhi-ncr-2-to-3-yrs-jobid-fny4oGqOeQZzpSvf__PLUS__uAgZw==&source=srp

Company name: arttechnologyandsoftwareindiapvtltd
Skills: rest,python,database,django,api
More information: https://www.timesjobs.com/job-detail/python-developer-art-technol

KeyboardInterrupt: 

## Colocando as informações sobre as vagas em arquivos

In [70]:
#coloca o algoritmo numa função
def find_jobs():
    unfamiliar_skills = []
    unfamiliar_skill = ''
    
    #recebe as skills de input do usuario e adiciona ao fim da lista até que o input seja Q
    while 1:
        unfamiliar_skill = input("Put some skill that you're unfamiliar with (enter Q when done): ")
        if unfamiliar_skill == 'Q':
            break
        unfamiliar_skills.append(unfamiliar_skill)

    #imprime as skills a serem filtradas
    print("Filtering out ", end="")
    for i in range(0, len(unfamiliar_skills)):
        print(unfamiliar_skills[i], end="")
        if i < len(unfamiliar_skills) - 1:
            print(", ", end="")
    print("")
    print("")
    
    
    jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
    for index, job in enumerate(jobs):
    #função enumerate returna uma tupla com o indice do elemento e o próprio elemento
        published_date = job.find('span', class_ = 'sim-posted').span.text
        skills = job.find('span', class_ = 'srp-skills').text.replace(' ', '')
        if 'few' in published_date:
            has_unfamiliar_skill = 0
            for skill in unfamiliar_skills:
                if skill in skills:
                    has_unfamiliar_skill = 1
                    break
            if has_unfamiliar_skill:
                continue
            job_url = job.a['href']
            company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ', '')

            with open(f'posts/{index}.txt', 'w') as f:
            #cria e abre um arquivo de texto no modo de escrita com o associa com uma variavel f
                f.write(f"Company name: {company_name.strip()}\n")
                f.write(f"Skills: {skills.strip()}\n")
                f.write(f"More information: {job_url}")
                #metodo write escreve a string do argumento no documento associado a f
            print(f'File saved: {index}.txt')